The r_{m}^{2} metrics in validation of QSAR models
QSAR and its importance
The quantitative structureactivity relationship (QSAR) modeling is a computational tool dealing with the correlation between biological activity/toxicity/property of a molecule and its structural features [1, 2]. In QSAR study, the variations of biological activity/toxicity/property within compounds of a congeneric series are correlated with changes in measured or computed features of the molecules referred to as descriptors. These descriptors measure properties of the molecules which broadly include their hydrophobic, steric and electronic features in addition to the various structural patterns. QSAR models developed employing a series of molecules with a definite response help in screening large databases of new molecules bearing the specific response [3]. It thus cuts short the huge expenditure of money and time for the preliminary experimental studies. The QSAR technique thus provides an alternative pathway for design and development of new molecules with improved/desired response pattern. The pharmacophoric features and descriptors obtained from the developed QSAR models may also be utilized for virtual screening [4] of large libraries of diverse compounds for a definite response parameter. Besides this, the identification of the prime features imparting improved activity to the molecules under a particular study facilitates the in silico design of new molecules with enhanced potency. Thus, a focused library [4] may be developed by compiling the newly designed molecules with a specific response.
Validation of a QSAR model
Validation is a crucial aspect of any QSAR modeling. It is the process by which the reliability and relevance of a procedure are established for a specific purpose [5]. Many a time, formal validation is one of the most overlooked steps in the model development. For many in the QSAR community, the validation of a model is little more than an assessment of statistical fit and, occasionally, predictivity using crossvalidation techniques. However, it is now being accepted that validation is a more holistic process that includes assessment of issues such as data quality, applicability of the model and mechanistic interpretability in addition to statistical assessment [6]. Any QSAR model needs to be properly validated before its use for interpreting and predicting biological responses of noninvestigated compounds. There exists a number of ways to express the performance of a model. The conventional approach adopted in QSAR analysis, based on multiple linear regression, is to consider R^{2}, adjusted R^{2} or R_{a}^{2} (the explained variance), and s (the standard error of estimate) [7]. However, acceptable values of these statistical parameters are not always sufficient enough to judge model predictivity and alternative methods are employed to assess the predictive ability of the developed QSAR models. To optimally determine the predictive quality of the models, these are required to be further validated using various validation techniques. Both internal and external validations are performed to assess to reliability and the predictive potential of the developed models. The conventional validation strategies include the calculation of cross validated squared correlation coefficient (Q^{2}) for internal validation [8] and the predictive squared correlation coefficient (R^{2}_{pred}) for external validation [9], both bearing threshold value of 0.5.
The internal validation [8] procedure involves the leaveoneout (LOO) or leavemanyout (LMO) crossvalidation technique followed by the calculation of the crossvalidated squared correlation coefficient, LOOQ^{2} or LMOQ^{2}. These techniques involve removal of one or group of compounds from the training set followed by development of the QSAR model based on the reduced dataset. The model thus built with the remaining molecules is used to predict the response of the deleted compound/compounds. This cycle is repeated till all the molecules of the dataset have been deleted once. The crossvalidated squared correlation coefficient (LOOQ^{2}) is calculated according to the following formula (Eq. 1).
Q^{2} = 1 – [sqrt(sum(Yobs_train –Ypred_train)^2)] / [sum((Yobs_train Average_Yobs_train)^2)] (1)
In the above equation, Y_{obs(train)} is the observed response (training set), Y_{pred(train)} is the predicted response of the training set molecules based on the LOO/LMO technique while is the mean response data of the training set compounds. A problem with LOO crossvalidation is that a small change in the data can cause a huge variation in the type of the QSAR model selected. Thus, a QSAR or QSPR (quantitative structureproperty relationship) model is chiefly valued in terms of its predictivity, indicating its ability to predict the response parameter for compounds not used in developing the correlation, i.e. molecules not included in the training set. Such a procedure for checking model predictivity based on molecules not included in the training set is referred to as external validation. The QSAR model thus developed is used for response prediction of the test set molecules followed by the estimation of the external predictive parameter (R^{2}_{pred}) (Eq. 2) [9] which reflects the degree of correlation between the observed and predicted activity data for the test set molecules, thereby ensuring the model predictive ability.
R^{2}_{pred} = 1 – [sqrt(sum(Yobs_test –Ypred_test)^2)] / [sum((Yobs_test Average_Yobs_train)^2)] (2)
In Eq. (2), Y_{obs(test)} and Y_{pred(test)} are the observed and predicted response data respectively for the test set compounds.
From the above equations, it can be noted that the values of Q^{2} and R^{2}_{pred} are dependent on the mean response value of the training set compounds and its distance from each of the response values of the corresponding training and test set compounds respectively. As the denominator term in both the equations increases [sum((Yobs Average_Yobs_train)^2], the values of the internal and external predictive parameters increase, apparently suggesting improved predictive ability of the developed QSAR model. Thus, a dataset comprising of molecules exhibiting a wide response range may show significantly acceptable values for the two parameters, although large differences may exist between the predicted and corresponding observed response values for the training and test set molecules. To better indicate both the internal and external predictive capacities of a QSAR model and to ascertain the proximity in the values of the predicted and observed response data, the r_{m}^{2} metrics (average r_{m}^{2} and delta r_{m}^{2}) developed by Roy et al [10, 11] are calculated.
Average r_{m}^{2} = (r_{m}^{2} + r^{/}_{m}^{2})/2 (3)
Delta r_{m}^{2} = abs(r_{m}^{2} – r^{/}_{m}^{2}) (4)
Here, r_{m}^{2} = r^{2 }x [1 – sqrt (r^{2} r_{0}^{2})] and r^{/}_{m}^{2} = r^{2} x [1 – sqrt (r^{2} r^{/}_{0}^{2})]. Squared correlation coefficient values between the observed and predicted values of the test set compounds (LOO predicted values for training set compounds) with intercept (r^{2}) and without intercept (r_{0}^{2}) were calculated for determination of r_{m}^{2}. Change of the axes gives the value of r^{/}_{0}^{2} and the r^{/}_{m}^{2} metric is calculated based on the value of r^{/}_{0}^{2}. The correlation between the observed (y) and predicted (x) values is same to that between the predicted (y) and observed (x) values in the presence of an intercept of the corresponding least squares regression lines. However, this is not true when the intercept is set to zero. Thus, the value of r^{/}_{m}^{2} will be different from that of r_{m}^{2} and the difference (delta r_{m}^{2}) between these two metrics may also be used as a measure of the goodness of predictions. Moreover, as either of r_{m}^{2}^{ }or r^{/}_{m}^{2} may penalize heavily the quality of the model in terms of predictions, an average (average r_{m}^{2}) between the two is calculated. The calculation of the r_{m}^{2} metrics for the training set [average r_{m}^{2} (LOO) and delta r_{m}^{2} (LOO)] determines reliability of the developed model while that of test set data [average r_{m}^{2} (test) and delta r_{m}^{2} (test)] estimates the closeness between the values of the predicted and the corresponding observed response data. The overall performance of the QSAR models may also be checked using the overall validation parameters like average r_{m}^{2} (overall) and delta r_{m}^{2} (overall). In addition to the traditional parameters involved in judging the predictive potential of a QSAR model, the r_{m}^{2} metrics have been extensively used by Roy and coworkers [1216] as well as other groups of researchers [1721] to assess the prediction power of the QSAR models. The use of r_{m}^{2} metrics has been implemented in the CORAL freeware available at http://www.insilico.eu/coral. QSAR models bearing acceptable values for all the traditional parameters can be finally assessed based on the r_{m}^{2} metrics. Those with average r_{m}^{2} values above the threshold of 0.5 and with a delta r_{m}^{2} value less than 0.2 are considered to be predictive and reliable ones. A web service for computation of r_{m}^{2} is now available at http://203.200.173.43:8080/rmsquare/ .
References
Updated on September 12, 2012
