Molecule mining

Molecule mining

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

Typical approaches to calculate chemical similarities use chemical fingerprints, but this loses the underlying information about the molecule topology. Mining the molecular graphs directly avoids this problem. So does the inverse QSAR problem which is preferable for vectorial mappings.

Contents

Coding(Moleculei,Moleculej\neqi)

Kernel methods

  • Marginalized graph kernel[1]
  • Optimal assignment kernel[2][3][4]
  • Pharmacophore kernel[5]

Maximum Common Graph methods

  • MCS-HSCS[6] (Highest Scoring Common Substructure (HSCS) ranking strategy for single MCS)
  • Small Molecule Subgraph Detector (SMSD)[7]- is a Java based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This will help us to find similarity/distance between two molecules. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure). [8]

Coding(Moleculei)

Molecular query methods

Methods based on special architectures of neural networks

See also

References

  1. ^ H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF
  2. ^ H. Fröhlich, J. K. Wegner, A. Zell, Optimal Assignment Kernels For Attributed Molecular Graphs, The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDF
  3. ^ H. Fröhlich, J. K. Wegner, A. Zell, Kernel Functions for Attributed Molecular Graphs - A New Similarity Based Approach To ADME Prediction in Classification and Regression, QSAR Comb. Sci., 2006, 25, 317-326. DOI 10.1002/qsar.200510135
  4. ^ H. Fröhlich, J. K. Wegner, A. Zell, Assignment Kernels For Chemical Compounds, International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer
  5. ^ P. Mahe, L. Ralaivola, V. Stoven, J. Vert, The pharmacophore kernel for virtual screening with support vector machines, J Chem Inf Model, 2006, 46, 2003-2014. DOI 10.1021/ci060138m
  6. ^ J. K. Wegner, H. Fröhlich, H. Mielenz, A. Zell, Data and Graph Mining in Chemical Space for ADME and Activity Data Sets, QSAR Comb. Sci., 2006, 25, 205-220. DOI 10.1002/qsar.200510009
  7. ^ S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. DOI:10.1186/1758-2946-1-12
  8. ^ http://www.ebi.ac.uk/thornton-srv/software/SMSD/
  9. ^ R. D. King, A. Srinivasan, L. Dehaspe, Wamr: a data mining tool for chemical data, J. Comput.-Aid. Mol. Des., 2001, 15, 173-181. DOI 10.1023/A:1008171016861
  10. ^ L. Dehaspe, H. Toivonen, King, Finding frequent substructures in chemical compounds, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.
  11. ^ A. Inokuchi, T. Washio, T. Okada, H. Motoda, Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis, Journal of Computer Aided Chemistry, 2001, 2, 87-92.
  12. ^ A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs, IBM Research, Tokyo Research Laboratory, 2002.
  13. ^ A. Clare, R. D. King, Data mining the yeast genome in a lazy functional language, Practical Aspects of Declarative Languages (PADL2003), 2003.
  14. ^ M. Kuramochi, G. Karypis, An Efficient Algorithm for Discovering Frequent Subgraphs, IEEE Transactions on Knowledge and Data Engineering, 2004, 16(9), 1038-1051.
  15. ^ M. Deshpande, M. Kuramochi, N. Wale, G. Karypis, Frequent Substructure-Based Approaches for Classifying Chemical Compounds, IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8), 1036-1050.
  16. ^ C. Helma, T. Cramer, S. Kramer, L. de Raedt, Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds, J. Chem. Inf. Comput. Sci., 2004, 44, 1402-1411. DOI 10.1021/ci034254q
  17. ^ T. Meinl, C. Borgelt, M. R. Berthold, Discriminative Closed Fragment Mining and Perfect Extensions in MoFa, Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.
  18. ^ T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, Mining Fragments with Fuzzy Chains in Molecular Databases, Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004.
  19. ^ T. Meinl, M. R. Berthold, Hybrid Fragment Mining with MoFa and FSG, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  20. ^ S. Nijssen, J. N. Kok. Frequent Graph Mining and its Application to Molecular Databases, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  21. ^ C. Helma, Predictive Toxicology, CRC Press, 2005.
  22. ^ M. Wörlein, Extension and parallelization of a graph-mining-algorithm, Friedrich-Alexander-Universität, 2006. PDF
  23. ^ K. Jahn, S. Kramer, Optimizing gSpan for Molecular Datasets, Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.
  24. ^ X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, 2002, 721-724.
  25. ^ A. Karwath, L. D. Raedt, SMIREP: predicting chemical activity from SMILES, J Chem Inf Model, 2006, 46, 2432-2444. DOI 10.1021/ci060159g
  26. ^ H. Ando, L. Dehaspe, W. Luyten, E. Craenenbroeck, H. Vandecasteele, L. Meervelt, Discovering H-Bonding Rules in Crystals with Inductive Logic Programming, Mol Pharm, 2006, 3, 665-674 . DOI 10.1021/mp060034z
  27. ^ P. Mazzatorta, L. Tran, B. Schilter, M. Grigorov, Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity, J. Chem. Inf. Model., 2006, ASAP alert. DOI 10.1021/ci600411v
  28. ^ N. Wale, G. Karypis. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification, ICDM, ''2006, 678-689.
  29. ^ A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates, In Proc. of ECML--PKDD, pp. 365–376, 2008.
  30. ^ Baskin, I. I.; V. A. Palyulin and N. S. Zefirov (1993). "[A methodology for searching direct correlations between structures and properties of organic compounds by using computational neural networks]". Doklady Akademii Nauk SSSR 333 (2): 176–179. 
  31. ^ I. I. Baskin, V. A. Palyulin, N. S. Zefirov (1997). "A Neural Device for Searching Direct Correlations between Structures and Properties of Organic Compounds". J. Chem. Inf. Comput. Sci. 37 (4): 715–721. doi:10.1021/ci940128y. 
  32. ^ D. B. Kireev (1995). "ChemNet: A Novel Neural Network Based Method for Graph/Property Mapping". J. Chem. Inf. Comput. Sci. 35 (2): 175–180. doi:10.1021/ci00024a001. 
  33. ^ A. M. Bianucci; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina (2000). "Application of Cascade Correlation Networks for Structures to Chemistry". Applied Intelligence 12 (1-2): 117–146. doi:10.1023/A:1008368105614. 
  34. ^ A. Micheli, A. Sperduti, A. Starita, A. M. Bianucci (2001). "Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure-Activity Relationship Studies of Benzodiazepines". J. Chem. Inf. Comput. Sci. 41 (1): 202–218. doi:10.1021/ci9903399. PMID 11206375. 
  35. ^ O. Ivanciuc (2001). "Molecular Structure Encoding into Artificial Neural Networks Topology". Roumanian Chemical Quarterly Reviews 8: 197–220. 
  36. ^ A. Goulon, T. Picot, A. Duprat, G. Dreyfus (2007). "Predicting activities without computing descriptors: Graph machines for QSAR". SAR and QSAR in Environmental Research 18 (1-2): 141–153. doi:10.1080/10629360601054313. PMID 17365965. 

Further reading

  • Schölkopf, B., K. Tsuda and J. P. Vert: Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004.
  • R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2001. ISBN 0-471-05669-3
  • Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997. ISBN 0-521-58519-8
  • R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2000. ISBN 3527299130

See also

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Structure mining — or structured data mining is the process of finding and extracting useful information from semi structured data sets. Graph mining is a special case of structured data mining[citation needed]. Contents 1 Description 2 See also …   Wikipedia

  • Quantitative structure-activity relationship — (QSAR) is the process by which chemical structure is quantitatively correlated with a well defined process, such as biological activity or chemical reactivity.For example, biological activity can be expressed quantitatively as in the… …   Wikipedia

  • Partition coefficient — In chemistry and the pharmaceutical sciences, a partition (P) or distribution coefficient (D) is the ratio of concentrations of a compound in the two phases of a mixture of two immiscible solvents at equilibrium.[1] The terms gas/liquid partition …   Wikipedia

  • Cheminformatics — (also known as chemoinformatics and chemical informatics) is the use of computer and informational techniques, applied to a range of problems in the field of chemistry. These in silico techniques are used in pharmaceutical companies in the… …   Wikipedia

  • Knowledge discovery — is a concept of the field of computer science that describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data. It is often described as deriving knowledge from the input… …   Wikipedia

  • Pharmacophore — A pharmacophore was first defined by Paul Ehrlich in 1909 as a molecular framework that carries ( phoros ) the essential features responsible for a drug’s ( =pharmacon s ) biological activity (Ehrlich. Dtsch. Chem. Ges. 1909, 42: p.17). In 1977,… …   Wikipedia

  • JOELib — Infobox Software name = JOELib caption = developer = JOELib development team latest release version = 2007 03 03 latest release date = March 03, 2007 operating system = Cross platform genre = Cheminformatics/Molecular modelling license = GNU… …   Wikipedia

  • Chemical graph theory — is the topology branch of mathematical chemistry which applies graph theory to mathematical modelling of chemical phenomena.[1] The pioneers of the chemical graph theory are Alexandru Balaban, Ivan Gutman, Haruo Hosoya, Milan Randić and Nenad… …   Wikipedia

  • Maximum common subgraph isomorphism problem — In complexity theory, maximum common subgraph isomorphism (MCS) is an optimization problem that is known to be NP hard. The formal description of the problem is as follows: Maximum common subgraph isomorphism(G1, G2) Input: Two graphs G1 and G2.… …   Wikipedia

  • Biomolecular Object Network Databank — The Biomolecular Object Network Databank (BOND) is a bioinformatics databank containing information on small molecule and protein sequences, structures and interactions. The databank integrates a number of existing databses to provide a… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”