Introduction to Non-linear Curve Fitting
There are three general techniques that are used to improve the fit of a curve to a set of observations. Which technique is used will depend on the degree and type of deviation from a "good" fit and the particular problem being examined.
Transformations. The first approach generally attempted is to do a relatively simple transformation of one or both of the variables. This will create a simple curve. With luck, it will match the pattern of the data values well. The advantage of this process is that it is both simple to do and it lets you use the linear curve-fitting programs. This allows you to directly compare the results of the new functions to the results of the linear fit. The good news is that this is a simple procedure and it is adequate in many situations.
Adding terms to the equation. It is possible to make a function that describes a curve by adding one or more "non-linear" terms to the equation. Generally, these new terms are produced by squaring and cubing the values of the X-variable. A multiple regression is then run using two or three terms. This sort of procedure produces what are known as "quadratic" and "cubic" equations.
Fitting a special function. In some disciplines, there are well established functions that express non-linear relationships. Often, these have a theoretical bases such as the operation of some laws of physics or chemistry. Some of these curves can be fit through the use of an appropriate transformation. When this is not possible, then a special non-linear curve fitting procedure needs to be used. In SAS, PROC NUN can be use for these complex fittings.
Steps in the Fitting of Non-linear Data
1. Create the necessary transformations and new variables in the DATA step. Remember that you can't take the log of zero. Here is a set of transformation shown in the context of a SAS DATA step:
DATA RAW; INPUT XVALUE YVALUE; LOGY = LOG (YVALUE); LOGX = LOG (XVALUE); XSQ = XVALUE * XVALUE; XCUBE = XSQ * XVALUE
2. Use these new values in a series of PROC GLM steps. 190 The following GLM procedures use the data matrix created in the previous DATA step:
PROC GLM DATA=RAW; MODEL YVALUE=XVALUE; TITLEl 'LINEAR MODEL'; PROC GLM DATA=RAW; MODEL LOGY=XVALUE; TITLEl 'EXPONENTIAL MODEL'; PROC GLM DATA=RAW; MODEL YVALUE=LOGX; TITLEl 'LOGARITHMIC MODEL'; PROC GLM DATA=RAW; MODEL LOGY=LOGX; TITLEl 'ALLOMETRIC/POWER/LEARNING MODEL'; PROC GLM DATA=RAW; MODEL YVALUE=XVALUE XSQ; TITLEl 'QUADRATIC MODEL'; PROC GLM DATA=RAW; MODEL YVALUE=XVALUE XSQ XCUBE; TITLEl 'CUBIC MODEL';
3. Examine the GLM results for improvement in the fit. Did the transformation increase the r2 over the linear case? Remember Occam's Razor.
You might want to select one or two that are good and generate an OUTPUT data matrix that you can use to plot the results and the residuals.
4. Look at the plot of the residuals for the equation that has been selected. Does it pass the appropriate scrutiny?
5. Make sure to transform the coefficients from the GLM listing so that they can be used in a standard form in the equation that is chosen.
In the example transformations, only two changes are needed to put the equations into a standard mathematical format.
With the exponential function, transform the GLM-printed intercept value by taking its antilog (i.e., a=eintercept). Then you can use this transformed value in the equation
Y value = a * e b * X value
With the allometric function, do a similar transformation on the intercept value and then use it in the equation Y value = a * X valueb It is probably easiest to do these transformations on a calculator. There is no easy way to do it in SAS.
Example Using a Quadratic Equation
This example closely follows the one given for a linear, least-squares curve (page 179). That's not surprising. The same mathematical procedure is used for both. All that is being done is that an additional term is added to the equation that describes the curve. This new term consists of the X-values squared. The overall equation is called a "quadratic equation." Its form is:
Predicted-Y = a + bX + cX2
You will see from the following example that you get your X-squared values simply by creating a new variable in your data matrix by squaring the variable that you choose for the X-variable. You then include this term in the equation when you do the least-squares analysis with PROC GLM.
OPTIONS LS=64 NOOVP NOCENTER; TITLE1 'STANDARD BIVARIATE ANALYSIS PROCEDURE'; DATA RAW; INPUT DEGR C LENGTH; TEMP SQ = DEGR C * DEGR c; LABEL DEGR C -'TEMPERATURE (C)' LENGTH = 'RHIZOME LENGTH (MM) '; CARDS; 12.3 15.3 14.2 11.9 16.9 9.6 20.1 7.5 24.9 5.3 13.4 13.8 13.7 14.2 15.5 10.3 22.6 6.3 17.8 8.4 18.3 7.9 19.6 7.2 PROC PRINT DATA=RAW; TITLE2 'RAW DATA LISTING'; PROC UNIVARIATE DATA=RAW NORMAL PLOT; VAR DEGR C LENGTH; TITLE2 'CHECK FOR DISTRIBUTION OF EACH VARIABLE'; PROC PLOT DATA=RAW; PLOT LENGTH * DEGR C; TITLE2 'SCATTERGRAM OF RAW DATA'; PROC GLM DATA=RAW; MODEL LENGTH = DEGR_C TEMP_SQ; OUTPUT OUT=PLOTINFO P=PRED LEN R=RESIDUAL; TITLE2 'OBTAIN EQUATION FOR LINE OF BEST FIT'; PROC PLOT DATA=PLOTINFO; PLOT LENGTH * DEGR C = '0' PRED LEN * DEGR C = 'P' I OVERLAY; TITLE2 'RAW DATA AND-LINE OF BEST FIT'; PROC PLOT DATA=PLOTINFO; PLOT RESIDUAL * DEGR C; TITLE2 'INFORMATION ON RESIDUALS';