oe2fps command-line options¶
The following comes from oe2fps --help
:
usage: oe2fps [-h] [--path] [--circular] [--tree] [--numbits INT]
[--minbonds INT] [--maxbonds INT] [--minradius INT]
[--maxradius INT] [--atype ATYPE] [--btype BTYPE] [--maccs166]
[--substruct] [--rdmaccs] [--rdmaccs/1] [--aromaticity NAME]
[--id-tag NAME] [--type TYPE_STRING] [--using FILENAME]
[--in FORMAT] [-o FILENAME] [--out FORMAT]
[--errors {strict,report,ignore}] [--progress] [--help-formats]
[-R NAME=VALUE] [--delimiter {tab,whitespace,to-eol,space}]
[--has-header] [--version] [--license-check]
[filenames [filenames ...]]
Generate FPS or FPB fingerprints from a structure file using OEChem
positional arguments:
filenames input structure files (default is stdin)
optional arguments:
-h, --help show this help message and exit
--aromaticity NAME use the named aromaticity model (same as '-R
aromaticity=NAME')
--id-tag NAME tag name containing the record id (SD files only)
--type TYPE_STRING Specify a chemfp type string
--using FILENAME Get the fingerprint type from the metadata of a
fingerprint file
--in FORMAT input structure format (default guesses from filename)
-o FILENAME, --output FILENAME
save the fingerprints to FILENAME (default=stdout)
--out FORMAT output structure format (default guesses from output
filename, or is 'fps')
--errors {strict,report,ignore}
how should structure parse errors be handled?
(default=ignore)
--progress, --no-progress
Show a progress bar (default: show unless the output
is a terminal)
--help-formats list the available formats and reader arguments
-R NAME=VALUE specify a reader argument
--delimiter {tab,whitespace,to-eol,space}
delimiter style for SMILES and InChI files. Alias for
'-R delimiter=VALUE'.
--has-header Skip the first line of a SMILES or InChI file Alias
for '-R has_header=1'
--version show program's version number and exit
--license-check Check the license and report results to stdout.
path, circular, and tree fingerprints:
--path generate path fingerprints (default)
--circular generate circular fingerprints
--tree generate tree fingerprints
--numbits INT number of bits in the fingerprint (default=4096)
--minbonds INT minimum number of bonds in the path or tree
fingerprint (default=0)
--maxbonds INT maximum number of bonds in the path or tree
fingerprint (path default=5, tree default=4)
--minradius INT minimum radius for the circular fingerprint
(default=0)
--maxradius INT maximum radius for the circular fingerprint
(default=5)
--atype ATYPE atom type flags, described below (default=Default)
--btype BTYPE bond type flags, described below (default=Default)
166 bit MACCS substructure keys:
--maccs166 generate MACCS fingerprints
881 bit ChemFP substructure keys:
--substruct generate ChemFP substructure fingerprints
ChemFP version of the 166 bit RDKit/MACCS keys:
--rdmaccs, --rdmaccs/2
generate 166 bit RDKit/MACCS fingerprints (version 2)
--rdmaccs/1 use the version 1 definition for --rdmaccs
ATYPE is one or more of the following, separated by the '|' character
Arom AtmNum Chiral EqArom EqHBAcc EqHBDon EqHalo FCharge HCount HvyDeg
Hyb InRing
The following shorthand terms and expansions are also available:
DefaultPathAtom = AtmNum|Arom|Chiral|FCharge|HvyDeg|Hyb|EqHalo
DefaultCircularAtom = AtmNum|Arom|Chiral|FCharge|HCount|EqHalo
DefaultTreeAtom = AtmNum|Arom|Chiral|FCharge|HvyDeg|Hyb
and 'Default' selects the correct value for the specified fingerprint.
Examples:
--atype Default
--atype "Arom|AtmNum|FCharge|HCount"
--atype Arom,AtmNum,FCharge,HCount
BTYPE is one or more of the following, separated by the '|' character
Chiral InRing Order
The following shorthand terms and expansions are also available:
DefaultPathBond = Order|Chiral
DefaultCircularBond = Order
DefaultTreeBond = Order
and 'Default' selects the correct value for the specified fingerprint.
Examples:
--btype Default
--btype Order|InRing
To simplify command-line use, a comma may be used instead of a '|' to
separate different fields. Example:
--atype AtmNum,HvyDegree
By default, chemfp will use the filename extension to determine the
structure file format type and possible compression. Most of the file
readers support configuration parameters. Use the '-R' option to
specify those parameters.
Use '--help-formats' to list available formats and reader parameters.
Supported oe2fps formats¶
The following comes from oe2fps --help-formats
:
These are the structure file formats that chemfp can read when using
the OEChem toolkit.
By default, chemfp uses the filename extension to determine the format
type. If the filename ends with ".gz" then it is intepreted as a gzip
compressed file, and the second-to-last extension is used to determine
the format type. Unknown or unsupported extensions are interpreted as
a SMILES file.
(The OEChem structure file readers do not support Zstandard
compression.)
You may instead specify the file format by name (see below), which is
especially important when reading from stdin, which has no associated
filename extension.
The supported filename extensions are:
File Type Extension(s)
========== =============
SMILES can, ism, isosmi, smi, usm
SDF mdl, rxn, sd, sdf
InChI inchi
Tripos Mol2 mol2, mol2h
PDB ent, pdb
XYZ xyz
SKC skc
Macromodel mmd, mmod
ChemDraw CDX cdx
OE binary oeb
OEB compressed oez
CIF cif
mmCIF mmcif
FASTA fasta
CSV csv
Append a '.gz' to the filename to indicate that the contents are
gzip-compressed.
The format can also be specified by name using the '--in' option:
File Type Format name
========== =============
SMILES smi, can, usm
SDF sdf
InChI inchi
Tripos Mol2 mol2, mol2h
PDB pdb
XYZ xyz
SKC skc
Macromodel mmod
ChemDraw CDX cdx
OE binary oeb
OEB compressed oez
CIF cif
mmCIF mmcif
FASTA fasta
CSV csv
Append a '.gz' to the format name to indicate that the contents are
gzip-compressed.
The input format parsers can be configured with the "-R" option. For
example, the following reader arguments tell the SMILES readers that
the fields are whitespace delimited and the first line is a header.
-R delimiter=whitespace -R has_header=true
All formats handle the following two reader arguments:
aromaticity - one of 'openeye', 'daylight', 'tripos', 'mdl', or 'mmff'
(this can also be set via the older '--aromaticity' command-line option)
flavor - a '|' or ',' separated list of flavor names, or a numeric value.
A leading '-' means to remove the given flavor. Examples include:
o Canon,Strict -- the bitwise merger of the format's Canon and Strict values
o Default,-Kekule -- the format's Default flavor but without the Kekule bits
(every flavor has a Default)
o 42 -- the specific OEChem flavor value 42
The SMILES and InChI formats also handle reader arguments for the
delimiter style and the presence of an initial header line using the
following:
delimiter - one of 'to-eol' (Daylight/OEChem style), 'tab',
'whitespace', 'space', or 'native' (for the native toolkit style)
has_header - '1' if the first line contains a header, else '0'.
The supported format, default reader arguments, and input flavors are:
Format: can
aromaticity: openeye
delimiter: to-eol
flavor: Default
default flags: <none>
available flags: Canon, Strict
has_header: 0
Format: cdx
aromaticity: openeye
flavor: Default
default flags: SuperAtom
available flags: SuperAtom
Format: cif
aromaticity: openeye
flavor: Default
default flags: BondHydToClosest, BondOrder, FormalCrg, ImplicitH,
NormalizeHydPos, OccFilterOneHalf, RemovePBCImages,
RemoveQuestionMarkInLabel, Rings
available flags: BondHydToClosest, BondOrder, FormalCrg, ImplicitH,
NormalizeHydPos, OccFilterOneHalf, RemovePBCImages,
RemoveQuestionMarkInLabel, Rings
Format: csv
aromaticity: openeye
flavor: Default
default flags: Header
available flags: Header
Format: fasta
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: CustomResidues, EmbeddedSMILES
Format: inchi
aromaticity: <N/A>
delimiter: to-eol
flavor: Default
no flavor flags available
has_header: 0
Format: mmcif
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: NoAltLoc
Format: mmod
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: FormalCrg
Format: mol2
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: Forcefield, M2H
Format: mol2h
aromaticity: openeye
flavor: Default
default flags: M2H
available flags: M2H
Format: oeb
aromaticity: <N/A>
flavor: Default
no flavor flags available
Format: oez
aromaticity: <N/A>
flavor: Default
no flavor flags available
Format: pdb
aromaticity: openeye
flavor: Default
default flags: BondOrder, Connect, END, ENDM, FormalCrg, ImplicitH,
Rings, SecStruct
available flags: ALL, ALTLOC, BondOrder, CHARGE, Connect, DATA, END,
ENDM, FORMALCHARGE, FormalCrg, ImplicitH, RADIUS, Rings,
SecStruct, TER
Format: sdf
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: FixBondMarks, SuppressEmptyMolSkip,
SuppressImp2ExpENHSTE
Format: sdf3k
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: FixBondMarks, SuppressEmptyMolSkip,
SuppressImp2ExpENHSTE
Format: skc
aromaticity: openeye
flavor: Default
no flavor flags available
Format: smi
aromaticity: openeye
delimiter: to-eol
flavor: Default
default flags: <none>
available flags: Canon, Strict
has_header: 0
Format: usm
aromaticity: openeye
delimiter: to-eol
flavor: Default
default flags: <none>
available flags: Canon, Strict
has_header: 0
Format: xyz
aromaticity: openeye
flavor: Default
default flags: BondOrder, Connect, FormalCrg, ImplicitH, Rings
available flags: BondOrder, Connect, FormalCrg, ImplicitH, Rings
See https://docs.eyesopen.com/toolkits/cpp/oechemtk/molreadwrite.html#flavored-input-and-output
for documentation about the flavors for each format.