Class CircularFingerprinter
- java.lang.Object
-
- org.openscience.cdk.fingerprint.AbstractFingerprinter
-
- org.openscience.cdk.fingerprint.CircularFingerprinter
-
- All Implemented Interfaces:
IFingerprinter
public class CircularFingerprinter extends AbstractFingerprinter implements IFingerprinter
Circular fingerprints: for generating fingerprints that are functionally equivalent to ECFP-2/4/6 and FCFP-2/4/6 fingerprints, which are partially described by Rogers et al. [Rogers and Hahn. J. Chem. Inf. Mod.. 2010. 50].
While the literature describes the method in detail, it does not disclose either the hashing technique for converting lists of integers into 32-bit codes, nor does it describe the scheme used to classify the atom types for creating the FCFP-class of descriptors. For this reason, the fingerprints that are created are not binary compatible with the reference implementation. They do, however, achieve effectively equal performance for modelling purposes.
The resulting fingerprint bits are presented as a list of unique bits, each with a 32-bit hashcode; typically there are no more than a hundred or so unique bit hashcodes per molecule. These identifiers can be folded into a smaller array of bits, such that they can be represented as a single long binary number, which is often more convenient.
The integer hashing is done using the CRC32 algorithm, using the Java CRC32 class, which is the same formula/parameters as used by PNG files, and described in:
http://www.w3.org/TR/PNG/#D-CRCAppendixImplicit vs. explicit hydrogens are handled, i.e. it doesn't matter whether the incoming molecule is hydrogen suppressed or not.
Implementation note: many of the algorithms involved in the generation of fingerprints (e.g. aromaticity, atom typing) have been coded up explicitly for use by this class, rather than making use of comparable functionality elsewhere in the CDK. This is to ensure that the CDK implementation of the algorithm is strictly equal to other implementations: dependencies on CDK functionality that could be modified or improved in the future would break binary compatibility with formerly identical implementations on other platforms.
For the FCFP class of fingerprints, atom typing is done using a scheme similar to that described by Green et al [Green1994 (not found in db)].
The fingerprints and their uses have been described in Clark et al. [Clark, A. et. al.. J. Cheminformatics. 2014. 6].
Important! this fingerprint can not be used for substructure screening.- Author:
- am.clark
- Source code:
- main
- Belongs to CDK module:
- standard
- Keywords:
- fingerprint, similarity
- Created on:
- 2014-01-01
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCircularFingerprinter.FP
-
Field Summary
Fields Modifier and Type Field Description static intCLASS_ECFP0static intCLASS_ECFP2static intCLASS_ECFP4static intCLASS_ECFP6static intCLASS_FCFP0static intCLASS_FCFP2static intCLASS_FCFP4static intCLASS_FCFP6
-
Constructor Summary
Constructors Constructor Description CircularFingerprinter()Default constructor: uses the ECFP6 type.CircularFingerprinter(int classType)Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.CircularFingerprinter(int classType, int len)Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcalculate(IAtomContainer mol)Calculates the fingerprints for the givenIAtomContainer, and stores them for subsequent retrieval.IBitFingerprintgetBitFingerprint(IAtomContainer mol)Calculates the circular fingerprint for the givenIAtomContainer, and folds the result into a single bitset (see getSize()).ICountFingerprintgetCountFingerprint(IAtomContainer mol)Calculates the circular fingerprint for the givenIAtomContainer, and returns a datastructure that enumerates all of the fingerprints, and their counts (i.e.CircularFingerprinter.FPgetFP(int N)Returns the requested fingerprint.intgetFPCount()Returns the number of fingerprints generated.protected List<Map.Entry<String,String>>getParameters()Map<String,Integer>getRawFingerprint(IAtomContainer mol)Invalid: it is not appropriate to convert the integer hash codes into strings.intgetSize()Returns the extent of the folded fingerprints.voidsetPerceiveStereo(boolean val)Sets whether stereochemistry should be re-perceived from 2D/3D coordinates.-
Methods inherited from class org.openscience.cdk.fingerprint.AbstractFingerprinter
getFingerprint, getVersionDescription
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.openscience.cdk.fingerprint.IFingerprinter
getFingerprint, getVersionDescription
-
-
-
-
Field Detail
-
CLASS_ECFP0
public static final int CLASS_ECFP0
- See Also:
- Constant Field Values
-
CLASS_ECFP2
public static final int CLASS_ECFP2
- See Also:
- Constant Field Values
-
CLASS_ECFP4
public static final int CLASS_ECFP4
- See Also:
- Constant Field Values
-
CLASS_ECFP6
public static final int CLASS_ECFP6
- See Also:
- Constant Field Values
-
CLASS_FCFP0
public static final int CLASS_FCFP0
- See Also:
- Constant Field Values
-
CLASS_FCFP2
public static final int CLASS_FCFP2
- See Also:
- Constant Field Values
-
CLASS_FCFP4
public static final int CLASS_FCFP4
- See Also:
- Constant Field Values
-
CLASS_FCFP6
public static final int CLASS_FCFP6
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
CircularFingerprinter
public CircularFingerprinter()
Default constructor: uses the ECFP6 type.
-
CircularFingerprinter
public CircularFingerprinter(int classType)
Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.- Parameters:
classType- one of CLASS_ECFP{n} or CLASS_FCFP{n}
-
CircularFingerprinter
public CircularFingerprinter(int classType, int len)Specific constructor: initializes with descriptor class type, one of ECFP_{p} or FCFP_{p}, where ECFP is for the extended-connectivity fingerprints, FCFP is for the functional class version, and {p} is the path diameter, and may be 0, 2, 4 or 6.- Parameters:
classType- one of CLASS_ECFP{n} or CLASS_FCFP{n}len- size of folded (binary) fingerprint
-
-
Method Detail
-
setPerceiveStereo
public void setPerceiveStereo(boolean val)
Sets whether stereochemistry should be re-perceived from 2D/3D coordinates. By default stereochemistry encoded asIStereoElements are used.- Parameters:
val- perceived from 2D
-
getParameters
protected List<Map.Entry<String,String>> getParameters()
- Overrides:
getParametersin classAbstractFingerprinter
-
calculate
public void calculate(IAtomContainer mol) throws CDKException
Calculates the fingerprints for the givenIAtomContainer, and stores them for subsequent retrieval.- Parameters:
mol- chemical structure; all nodes should be known legitimate elements- Throws:
CDKException
-
getFPCount
public int getFPCount()
Returns the number of fingerprints generated.- Returns:
- total number of unique fingerprint hashes generated
-
getFP
public CircularFingerprinter.FP getFP(int N)
Returns the requested fingerprint.- Parameters:
N- index of fingerprint (0-based)- Returns:
- instance of a fingerprint hash
-
getBitFingerprint
public IBitFingerprint getBitFingerprint(IAtomContainer mol) throws CDKException
Calculates the circular fingerprint for the givenIAtomContainer, and folds the result into a single bitset (see getSize()).- Specified by:
getBitFingerprintin interfaceIFingerprinter- Parameters:
mol- IAtomContainer for which the fingerprint should be calculated.- Returns:
- the fingerprint
- Throws:
CDKException
-
getCountFingerprint
public ICountFingerprint getCountFingerprint(IAtomContainer mol) throws CDKException
Calculates the circular fingerprint for the givenIAtomContainer, and returns a datastructure that enumerates all of the fingerprints, and their counts (i.e. does not fold them into a bitmask).- Specified by:
getCountFingerprintin interfaceIFingerprinter- Parameters:
mol- IAtomContainer for which the fingerprint should be calculated.- Returns:
- the count fingerprint
- Throws:
CDKException
-
getRawFingerprint
public Map<String,Integer> getRawFingerprint(IAtomContainer mol) throws CDKException
Invalid: it is not appropriate to convert the integer hash codes into strings.- Specified by:
getRawFingerprintin interfaceIFingerprinter- Throws:
CDKException
-
getSize
public int getSize()
Returns the extent of the folded fingerprints.- Specified by:
getSizein interfaceIFingerprinter- Returns:
- the size of the fingerprint
-
-