TY - JOUR
T1 - Analysis of the co-evolutions of correlations as a tool for QSAR-modeling of carcinogenicity
T2 - An unexpected good prediction based on a model that seems untrustworthy
AU - Toropova, Alla P.
AU - Toropov, Andrey A.
AU - Diaza, Rodolfo Gonella
AU - Benfenati, Emilio
AU - Gini, Guesippina
PY - 2011
Y1 - 2011
N2 - To validate QSAR models an external test set is increasingly used. However the definition of the compounds for the test set is still debated. We studied, co-evolutions of correlations between optimal descriptors and carcinogenicity (pTD50) for the subtraining, calibration, and test set. Weak correlations for the sub-training set are sometimes accompanied by quite good correlations for the external test set. This can be explained in terms of the probability theory and can help define a suitable test set. The simplified molecular input line entry system (SMILES) was used to represent the molecular structure. Correlation weights for calculating the optimal descriptors are related to fragments of the SMILES. The statistical quality of the model is: n=170, r2=0.6638, q2=0.6554, s=0.828, F=331 (sub-training set); n=170, r 2=0.6609, r2
pred=0.6520, s=0.825, F=331 (calibration set); and n=61, r2=0.7796, r2 pred=0.7658, Rm
2=0.7448, s=0.563, F=221 (test set). The calculations were done with CORAL software (http://www.insilico.eu/ coral/).
AB - To validate QSAR models an external test set is increasingly used. However the definition of the compounds for the test set is still debated. We studied, co-evolutions of correlations between optimal descriptors and carcinogenicity (pTD50) for the subtraining, calibration, and test set. Weak correlations for the sub-training set are sometimes accompanied by quite good correlations for the external test set. This can be explained in terms of the probability theory and can help define a suitable test set. The simplified molecular input line entry system (SMILES) was used to represent the molecular structure. Correlation weights for calculating the optimal descriptors are related to fragments of the SMILES. The statistical quality of the model is: n=170, r2=0.6638, q2=0.6554, s=0.828, F=331 (sub-training set); n=170, r 2=0.6609, r2
pred=0.6520, s=0.825, F=331 (calibration set); and n=61, r2=0.7796, r2 pred=0.7658, Rm
2=0.7448, s=0.563, F=221 (test set). The calculations were done with CORAL software (http://www.insilico.eu/ coral/).
KW - Balance of correlations
KW - Carcinogenicity
KW - Co-evolutions of correlations
KW - Optimal descriptor
KW - QSAR
UR - http://www.scopus.com/inward/record.url?scp=79952655871&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952655871&partnerID=8YFLogxK
U2 - 10.2478/s11532-010-0135-7
DO - 10.2478/s11532-010-0135-7
M3 - Article
AN - SCOPUS:79952655871
SN - 1895-1066
VL - 9
SP - 165
EP - 174
JO - Central European Journal of Chemistry
JF - Central European Journal of Chemistry
IS - 1
ER -