.. _oe2fps: oe2fps command-line options ==================================== The following comes from ``oe2fps --help``: .. code-block:: none usage: oe2fps [-h] [--path] [--circular] [--tree] [--numbits INT] [--minbonds INT] [--maxbonds INT] [--minradius INT] [--maxradius INT] [--atype ATYPE] [--btype BTYPE] [--maccs166] [--substruct] [--rdmaccs] [--rdmaccs/1] [--aromaticity NAME] [--id-tag NAME] [--type TYPE_STRING] [--using FILENAME] [--in FORMAT] [-o FILENAME] [--out FORMAT] [--errors {strict,report,ignore}] [--progress] [--help-formats] [-R NAME=VALUE] [--delimiter {tab,whitespace,to-eol,space}] [--has-header] [--version] [--license-check] [filenames [filenames ...]] Generate FPS or FPB fingerprints from a structure file using OEChem positional arguments: filenames input structure files (default is stdin) optional arguments: -h, --help show this help message and exit --aromaticity NAME use the named aromaticity model (same as '-R aromaticity=NAME') --id-tag NAME tag name containing the record id (SD files only) --type TYPE_STRING Specify a chemfp type string --using FILENAME Get the fingerprint type from the metadata of a fingerprint file --in FORMAT input structure format (default guesses from filename) -o FILENAME, --output FILENAME save the fingerprints to FILENAME (default=stdout) --out FORMAT output structure format (default guesses from output filename, or is 'fps') --errors {strict,report,ignore} how should structure parse errors be handled? (default=ignore) --progress, --no-progress Show a progress bar (default: show unless the output is a terminal) --help-formats list the available formats and reader arguments -R NAME=VALUE specify a reader argument --delimiter {tab,whitespace,to-eol,space} delimiter style for SMILES and InChI files. Alias for '-R delimiter=VALUE'. --has-header Skip the first line of a SMILES or InChI file Alias for '-R has_header=1' --version show program's version number and exit --license-check Check the license and report results to stdout. path, circular, and tree fingerprints: --path generate path fingerprints (default) --circular generate circular fingerprints --tree generate tree fingerprints --numbits INT number of bits in the fingerprint (default=4096) --minbonds INT minimum number of bonds in the path or tree fingerprint (default=0) --maxbonds INT maximum number of bonds in the path or tree fingerprint (path default=5, tree default=4) --minradius INT minimum radius for the circular fingerprint (default=0) --maxradius INT maximum radius for the circular fingerprint (default=5) --atype ATYPE atom type flags, described below (default=Default) --btype BTYPE bond type flags, described below (default=Default) 166 bit MACCS substructure keys: --maccs166 generate MACCS fingerprints 881 bit ChemFP substructure keys: --substruct generate ChemFP substructure fingerprints ChemFP version of the 166 bit RDKit/MACCS keys: --rdmaccs, --rdmaccs/2 generate 166 bit RDKit/MACCS fingerprints (version 2) --rdmaccs/1 use the version 1 definition for --rdmaccs ATYPE is one or more of the following, separated by the '|' character Arom AtmNum Chiral EqArom EqHBAcc EqHBDon EqHalo FCharge HCount HvyDeg Hyb InRing The following shorthand terms and expansions are also available: DefaultPathAtom = AtmNum|Arom|Chiral|FCharge|HvyDeg|Hyb|EqHalo DefaultCircularAtom = AtmNum|Arom|Chiral|FCharge|HCount|EqHalo DefaultTreeAtom = AtmNum|Arom|Chiral|FCharge|HvyDeg|Hyb and 'Default' selects the correct value for the specified fingerprint. Examples: --atype Default --atype "Arom|AtmNum|FCharge|HCount" --atype Arom,AtmNum,FCharge,HCount BTYPE is one or more of the following, separated by the '|' character Chiral InRing Order The following shorthand terms and expansions are also available: DefaultPathBond = Order|Chiral DefaultCircularBond = Order DefaultTreeBond = Order and 'Default' selects the correct value for the specified fingerprint. Examples: --btype Default --btype Order|InRing To simplify command-line use, a comma may be used instead of a '|' to separate different fields. Example: --atype AtmNum,HvyDegree By default, chemfp will use the filename extension to determine the structure file format type and possible compression. Most of the file readers support configuration parameters. Use the '-R' option to specify those parameters. Use '--help-formats' to list available formats and reader parameters. Supported oe2fps formats ---------------------------------------------------- The following comes from ``oe2fps --help-formats``: .. code-block:: none These are the structure file formats that chemfp can read when using the OEChem toolkit. By default, chemfp uses the filename extension to determine the format type. If the filename ends with ".gz" then it is intepreted as a gzip compressed file, and the second-to-last extension is used to determine the format type. Unknown or unsupported extensions are interpreted as a SMILES file. (The OEChem structure file readers do not support Zstandard compression.) You may instead specify the file format by name (see below), which is especially important when reading from stdin, which has no associated filename extension. The supported filename extensions are: File Type Extension(s) ========== ============= SMILES can, ism, isosmi, smi, usm SDF mdl, rxn, sd, sdf InChI inchi Tripos Mol2 mol2, mol2h PDB ent, pdb XYZ xyz SKC skc Macromodel mmd, mmod ChemDraw CDX cdx OE binary oeb OEB compressed oez CIF cif mmCIF mmcif FASTA fasta CSV csv Append a '.gz' to the filename to indicate that the contents are gzip-compressed. The format can also be specified by name using the '--in' option: File Type Format name ========== ============= SMILES smi, can, usm SDF sdf InChI inchi Tripos Mol2 mol2, mol2h PDB pdb XYZ xyz SKC skc Macromodel mmod ChemDraw CDX cdx OE binary oeb OEB compressed oez CIF cif mmCIF mmcif FASTA fasta CSV csv Append a '.gz' to the format name to indicate that the contents are gzip-compressed. The input format parsers can be configured with the "-R" option. For example, the following reader arguments tell the SMILES readers that the fields are whitespace delimited and the first line is a header. -R delimiter=whitespace -R has_header=true All formats handle the following two reader arguments: aromaticity - one of 'openeye', 'daylight', 'tripos', 'mdl', or 'mmff' (this can also be set via the older '--aromaticity' command-line option) flavor - a '|' or ',' separated list of flavor names, or a numeric value. A leading '-' means to remove the given flavor. Examples include: o Canon,Strict -- the bitwise merger of the format's Canon and Strict values o Default,-Kekule -- the format's Default flavor but without the Kekule bits (every flavor has a Default) o 42 -- the specific OEChem flavor value 42 The SMILES and InChI formats also handle reader arguments for the delimiter style and the presence of an initial header line using the following: delimiter - one of 'to-eol' (Daylight/OEChem style), 'tab', 'whitespace', 'space', or 'native' (for the native toolkit style) has_header - '1' if the first line contains a header, else '0'. The supported format, default reader arguments, and input flavors are: Format: can aromaticity: openeye delimiter: to-eol flavor: Default default flags: available flags: Canon, Strict has_header: 0 Format: cdx aromaticity: openeye flavor: Default default flags: SuperAtom available flags: SuperAtom Format: cif aromaticity: openeye flavor: Default default flags: BondHydToClosest, BondOrder, FormalCrg, ImplicitH, NormalizeHydPos, OccFilterOneHalf, RemovePBCImages, RemoveQuestionMarkInLabel, Rings available flags: BondHydToClosest, BondOrder, FormalCrg, ImplicitH, NormalizeHydPos, OccFilterOneHalf, RemovePBCImages, RemoveQuestionMarkInLabel, Rings Format: csv aromaticity: openeye flavor: Default default flags: Header available flags: Header Format: fasta aromaticity: openeye flavor: Default default flags: available flags: CustomResidues, EmbeddedSMILES Format: inchi aromaticity: delimiter: to-eol flavor: Default no flavor flags available has_header: 0 Format: mmcif aromaticity: openeye flavor: Default default flags: available flags: NoAltLoc Format: mmod aromaticity: openeye flavor: Default default flags: available flags: FormalCrg Format: mol2 aromaticity: openeye flavor: Default default flags: available flags: Forcefield, M2H Format: mol2h aromaticity: openeye flavor: Default default flags: M2H available flags: M2H Format: oeb aromaticity: flavor: Default no flavor flags available Format: oez aromaticity: flavor: Default no flavor flags available Format: pdb aromaticity: openeye flavor: Default default flags: BondOrder, Connect, END, ENDM, FormalCrg, ImplicitH, Rings, SecStruct available flags: ALL, ALTLOC, BondOrder, CHARGE, Connect, DATA, END, ENDM, FORMALCHARGE, FormalCrg, ImplicitH, RADIUS, Rings, SecStruct, TER Format: sdf aromaticity: openeye flavor: Default default flags: available flags: FixBondMarks, SuppressEmptyMolSkip, SuppressImp2ExpENHSTE Format: sdf3k aromaticity: openeye flavor: Default default flags: available flags: FixBondMarks, SuppressEmptyMolSkip, SuppressImp2ExpENHSTE Format: skc aromaticity: openeye flavor: Default no flavor flags available Format: smi aromaticity: openeye delimiter: to-eol flavor: Default default flags: available flags: Canon, Strict has_header: 0 Format: usm aromaticity: openeye delimiter: to-eol flavor: Default default flags: available flags: Canon, Strict has_header: 0 Format: xyz aromaticity: openeye flavor: Default default flags: BondOrder, Connect, FormalCrg, ImplicitH, Rings available flags: BondOrder, Connect, FormalCrg, ImplicitH, Rings See https://docs.eyesopen.com/toolkits/cpp/oechemtk/molreadwrite.html#flavored-input-and-output for documentation about the flavors for each format.