Institute of Bioinformatics - Finding Analogs - Similarity Search with Rchemcpp

Finding structural analogs in ChEMBL, Drugbank and the Connectivity Map

We have developed an efficient method for finding structural analogs to a query compound in different databases. Structural analogs are compounds having a similar chemical structure to a given query compound and are vital for drug design to improve the final product in terms of effectivity, toxicity, side effects, bacterial resistance. The similarity measure is given by molecule kernels, i.e., similarities based on substructures that are shared between the molecules.


Find Analogs in ChEMBL (ChEMBL 18 SDF, 2014-03)

Find Analogs in DrugBank (DrugBank 4.0 SDF, 2014-02)

Find Analogs in Cmap


Here is an example SDF file for submission: CHEMBL553.sdf. Note: Please submit SDF files containing one compound only.


Authors: Martin Wischenbart, Günter Klambauer


Rchemcpp

G. Klambauer, M. Wischenbart, M. Mahr, T. Unterthiner, A. Mayr, and S. Hochreiter (2015) Rchemcpp: Rchemcpp: a web service for structural analoging in ChEMBL, Drugbank and the Connectivity Map. Bioinformatics. doi: 10.1093/bioinformatics/btv373 Rchemcpp at Institute of Bioinformatics, Rchemcpp Package at Bioconductor.org, Rchemcpp publication at Bioinformatics

Molecule Kernels

L. Ralaivola, S. J. Swamidass, H. Saigo, and P. Baldi (2005) Graph kernels for chemical informatics. Neural Networks 18:1093-1110. DOI: 10.1016/j.neunet.2005.07.009
H. Kashima, K. Tsuda, and A. Inokuchi (2004) Marginalized kernels between labeled graphs. Proceedings of the Twentieth International Conference on Machine Learning 321–328. Link to PDF
P. Mahé and J.-P. Vert (2009) Graph kernels based on tree patterns for molecules. Machine Learning 75:(1), 3–35. DOI: 10.1007/s10994-008-5086-2

ChEMBL

A. Gaulton, L. Bellis, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, R. Akhtar, A.P. Bento, B. Al-Lazikani, D. Michalovich, and J.P. Overington (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Research Database Issue 40(D1): D1100-D1107. DOI: 10.1093/nar/gkr777 PMID:21948594

DrugBank

V. Law, C. Knox, Y. Djoumbou, T. Jewison, A.C. Guo, Y. Liu, A. Maciejewski, D. Arndt, M. Wilson, V. Neveu, A. Tang, G. Gabriel, C. Ly, S. Adamjee, Z.T. Dame, B. Han, Y. Zhou, D.S. Wishart (2014) DrugBank 4.0: Shedding New Light on Drug Metabolism. Nucleic Acids Res. 42(1): D1091-7. PubMed ID: 24203711
C. Knox, V. Law, T. Jewison, P. Liu, S. Ly, A. Frolkis, A. Pon, K. Banco, C. Mak, V. Neveu, Y. Djoumbou, R. Eisner, A.C. Guo, D.S. Wishart (2011) DrugBank 3.0: A Comprehensive Resource for 'omics' Research on Drugs. Nucleic Acids Res. 39(Database issue):D1035-41. PubMed ID: 21059682
D.S. Wishart, C. Knox, A.C. Guo, D. Cheng, S. Shrivastava, D. Tzur, B. Gautam, M. Hassanali (2008) DrugBank: A Knowledgebase for Drugs, Drug Actions and Drug Targets. Nucleic Acids Res. 36(Database issue):D901-6. PubMed ID: 18048412
D.S. Wishart, C. Knox, A.C. Guo, S. Shrivastava, M. Hassanali, P. Stothard, Z. Chang, J. Woolsey (2006) DrugBank: A Comprehensive Resource for in Silico Drug Discovery and Exploration. Nucleic Acids Res. 34(Database issue):D668-72. PubMed ID: 16381955

Cmap

J. Lamb, E.D. Crawford, D. Peck, J.W. Modell, I.C. Blat, M.J. Wrobel, J. Lerner, J.-P. Brunet, A. Subramanian, K.N. Ross, M. Reich, H. Hieronymus, G. Wei, S.A. Armstrong, S.J. Haggarty, P.A. Clemons, R. Wei, S.A. Carr, E.S. Lander and T.R. Golub (2006) The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 313 1929-1935. DOI: 10.1126/science.1132939
J. Lamb (2007) The Connectivity Map: A New Tool for Biomedical Research. Nature Reviews Cancer 7 54-60. Link to PDF