DIMENSIONALITY OF BIG DATA SETS EXPLORED BY CLUJ DESCRIPTORS

Authors

  • Claudiu N. LUNGU Department of Chemistry, Faculty of Chemistry and Chemical Engineering, Babeș-Bolyai University; Iuliu Hațieganu University of Medicine and Pharmacy Cluj-Napoca, Romania. Email: lunguclaudiu5555@gmail.com. https://orcid.org/0000-0002-5416-3142
  • Sara ERSALI Department of Chemistry, Faculty of Chemistry and Chemical Engineering, Babeș-Bolyai University, Cluj-Napoca, Romania. Corresponding author: diudea@chem.ubbcluj.ro. https://orcid.org/0000-0001-9456-6678
  • Beata SZEFLER Department of Physical Chemistry, Faculty of Pharmacy, Collegium Medicum, Nicolaus Copernicus University, Bydgoszcz, Poland. Email: beatas@cm.umk.pl. https://orcid.org/0000-0001-8433-3520
  • Atena PÎRVAN-MOLDOVAN Department of Chemistry, Faculty of Chemistry and Chemical Engineering, Babeș-Bolyai University, Cluj-Napoca, Romania. Corresponding author: diudea@chem.ubbcluj.ro. https://orcid.org/0000-0002-3092-3135
  • Subhash BASAK Duluth Natural Resources Research Institute, Department of Chemistry and Biochemistry, University of Minnesota, USA. Corresponding author: Email: diudea@chem.ubbcluj.ro. https://orcid.org/0000-0002-2086-5867
  • Mircea V. DIUDEA Department of Chemistry, Faculty of Chemistry and Chemical Engineering, Babeș-Bolyai University, Cluj-Napoca, Romania. Email: diudea@chem.ubbcluj.ro. https://orcid.org/0000-0003-2556-6329

DOI:

https://doi.org/10.24193/subbchem.2017.3.16

Keywords:

topological descriptor, QSAR, data dimensionality, mutagenity, principal component analysis (PCA), Ames test

Abstract

Dimensionality of a relatively big data set (95 compounds) observed for toxicity (mutagenicity) was explored in order to compute QSAR models. Distinct molecular descriptors were used. Dimensionality of data, using PCA, correlation plots and clustering, was evaluated. Analyzing data dimensionality allowed model optimization. Docking studies and PCA were used in order to expand data dimensionality. Pearson correlation coefficient (r2) values, obtained for both perceptive and predictive models, were satisfactory.

References

Jolliffe I.T. Principal Component Analysis, Series: Springer Series in Statistics, 2nd ed., Springer, NY, , XXIX, 487 p. 28 . ISBN 978-0-387-95442-4, 2002.

Hair, J. F. Jr., Anderson, R. E., Tatham, R. L. & Black, W. C. Multivariate Data Analysis (3rd ed). New York: Macmillan, 1995.

Wallace A.D. Progress in Molecular Biology and Translational Sciences, 2012,112, 89.

Basak, S.C.; Vraćko, M.; Witzmann, F.A. Current Computer Aided Drug Design, 2016, 12(4), 259.

Deng Z, Chuaqui C, Singh J, Journal of Medicinal Chemistry. 2004, 47 (2), 337.

Campbell M.K., Grimshaw J.M., Elbourne D.R., BMC Medical Research Methodology, 2004, 4, 9.

Norman R. Draper, Smith H., Applied Regression Analysis. Wiley, New York, 1998.

Wold, S; Sjöström, M.; Eriksson,L., Chemometrics and Intelligent Laboratory Systems 2001, 58, 109.

Fisher, R.A., Annals of Eugenics 1936, 7, 179.

Fernández, S., Graves, A., Schmidhuber, J., In Proc. 20th Int. Joint Conf. on Artificial In℡ligence, Ijcai: 2007, 774.

Kohavi, R., Mateo, C.A., Morgan K., Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. San. 1995, 2 (12), 1137.

San-Martin A1, Donoso V, Leiva S, Bacho M, Nunez S, Gutierrez M, Rovirosa J, Bailon-Moscoso N, Camacho SC, Aviles OM, Cazar ME, Current Topics Medicinal Chemistry, 2015, 15(17), 1743.

Gramatica P. QSAR &Combinatorial Science 2007

Downloads

Published

2017-09-29

How to Cite

LUNGU, C. N., ERSALI, S. ., SZEFLER, B. ., PÎRVAN-MOLDOVAN, A. ., BASAK, S. ., & DIUDEA, M. V. . (2017). DIMENSIONALITY OF BIG DATA SETS EXPLORED BY CLUJ DESCRIPTORS. Studia Universitatis Babeș-Bolyai Chemia, 62(3), 197–204. https://doi.org/10.24193/subbchem.2017.3.16

Issue

Section

Articles

Most read articles by the same author(s)

1 2 > >> 

Similar Articles

<< < 11 12 13 14 15 16 17 18 19 20 > >> 

You may also start an advanced similarity search for this article.