Fitting a Quadratic/Parabola to Data (in R)

Introduction

In 2019, I spent some time on a project attempting to gently introduce some basic Machine Learning concepts to "career" learners who, like me, may have rusty calculus skills. In doing this project, I taught myself enough R to create the plots I needed. (R is a widely-used statistical programming language.) The project name is "Intro to ML via Curve Fitting" and its project page is here:

https://sites.google.com/view/rcreamer/home/techstem/mlintro

One of the slides in the above project was an example of a plausible application of curve fitting to the auto-focus algorithm a camera manufacturer might use. Here is this example slide from the project's slide deck:

(This plot page w/code is here: https://sites.google.com/view/rcreamer/home/techstem/mlintro/plotsandassocrcode/plot-33)

Recently, I ran across a real-world example of achieving the best focus for telescope astrophotography and realized that the [above slide | code | method] might be of some use to astrophotographers. As a result, I adapted the code from the above slide's plotting code to allow astrophotographers to plug their focusing data into my code in RStudio to fit a parabola to their data and hopefully read off the optimum focus setting value from the fitted parabola's vertex's x-coordinate.

Please note that in astrophotography the "edges" my camera auto-focus slide mentioned are usually star images. Star images are usually round in an astrophoto, so the radius or diameter of star circles should be the minimization goal. The image of a star is not a point because of the optical system of telescopes. In fact, it is close to being a Gaussian of revolution and is usually called a Point Spread Function.

In order to do the autofocus calculation properly for stars/astrophotography, it would be better to fit 2D Gaussians to star images and find the focus position setting which minimizes the diameter of star image Gaussians. However, this extra refinement is out of scope for this page's purpose/goal. Also, I should mention that most star images are usually not ideal Gaussians because at least some of the pixels in the center are over exposed and therefore their brightness is "clipped" resulting in a flattened Gaussian peak.

Examples and RStudio Background

This page contains some R code (#Rstats) for fitting a parabola/quadratic polynomial to data.

To illustrate this, I've included (3) plots: (code is included, below)

A hypothetical example camera manufacturer auto focus algorithm taken from my Intro to ML project
Another (astro) data set with all data points including some outliers
The same (astro) data set as prior bullet but with outliers omitted

Note that in curve fitting, better fits can be obtained if outlier points are omitted (judiciously).

The code for these plots is included below. Users can replace the code's (x, y) data with their own data and then run the code in RStudio (a free download).

Here is plot #1 (camera auto-focus example):

Here is plot #2 (all astro data points included):

Here is plot #3 (astro data with (3) outlier points omitted):

Here is a screenshot of RStudio:

How to Run the Code in RStudio

Note that the instructions below include pasting the R source code into the IDE's coding window. This isn't strictly required because you could tweak the code in a separate text editor on your computer and just paste this code directly into the IDE's Console pane. That said, using the coding pane editor to tweak and save your code can be more convenient.

The "Console" pane is on the bottom left.

The source code file pane is the upper-left pane

To run the included code:

File -> New File -> R Script [ Creates a new source code tab in code pane ]
Copy/paste the included code below into the new R file code pane (upper-left pane)
In the Console pane, execute these two commands:
- rm ( list = ls() ) [ Clears all loaded code ]
- Cntrl-L [ Clears the Console pane ]
Paste the included code into the Console pane
Click in the Console pane and press Enter to load the pasted code
The included code will automatically run after the last step (that's what runFunc() does)

The RStudio plot window is the lower-right pane. You can export a plot to a file using the plot pane's Export option.

Note: Permission to use the code and make derivative works for non-commercial uses is allowed:)

You can assign the (x, y) variables to whichever data you choose (or add your own new data variables/values) in the code excerpt, below:

### Assign (x, y) to the (xNoOutliers, yNoOutliers) data above

x = xNoOutliers

y = yNoOutliers

Below is the copy/paste friendly source code:

##############################################################

### Inquiries: email 2to32minus1@gmail.com

##############################################################

### Permission is granted for use and derivative works of this code for non-commercial purposes

### Globals

fontScale = 1.2 # Global font size variable

lf = 1.2 # Enlarge font of plot labels

verbose = FALSE # Verbose printing

lw = 2 # Line width

xMargin = 1 # X-axis margin for plots

yMargin = 1 # Y-axis margin for plots

de = 0.01 # Small constant for estimating derivatives numerically

smlPts = 1 # Small data point/circle

medPts = 1.5 # Medium data point/circle

lrgPts = 2.0 # Large data point/circle

### Run this function to generate this plot

runFunc = function() {

dw = 1000

dh = 750

fitQuadratic()

}

### R code to fit a parabola to data and print out the vertex x-coordinate

fitQuadratic = function() {

### data from Intro to ML project slide (camera auto focus)

xSlide = c( 2, 4, 6, 8, 10, 12 )

ySlide = c( 10, 6, 5, 4.8, 6.25, 10.3 )

### Real world data, all data points

xAllPts = c( 6916, 6946, 6976, 7006, 7036, 7066, 7096, 7126, 7156, 7186, 7216, 7246 )

yAllPts = c( 3.91, 2.88, 2.34, 2.20, 2.12, 2.27, 2.46, 2.62, 3.02, 3.71, 4.19, 4.21 )

### Real world data with (3) outlier points omitted

xNoOutliers = c( 6946, 6976, 7006, 7036, 7066, 7096, 7126, 7156, 7186 )

yNoOutliers = c( 2.88, 2.34, 2.20, 2.12, 2.27, 2.46, 2.62, 3.02, 3.71 )

### Assign (x, y) to the (xNoOutliers, yNoOutliers) data above

x = xNoOutliers

y = yNoOutliers

plotTitle = "Fit Quadratic"

plot( x, y, ylim=c(0, max(y)+2), main=plotTitle, ylab="y", xlab="x", cex.main=1.3, cex.axis=1.3, cex.lab=1.3, cex=2.0, lwd=2 )

lmFit = lm( y ~ x + I(x^2) )

coeff = lmFit$coefficients

xPred = seq( x[1] - 1, x[length(x)] + 1, length=100 )

yPred = evalPoly( xPred, coeff )

lines( xPred, yPred, col="blue", lwd=lw )

a = coeff[3]; b = coeff[2]; c = coeff[1]

vertexX = -b/(2*a)

vertexY = a * vertexX^2 + b * vertexX + c

minY = gpl()$yLim[1]

arrows( vertexX, minY, vertexX, vertexY, col="magenta", lwd=lw, angle=15, length = 0.3 )

label = sprintf( "Vertex x-coord: %.3f", vertexX );

fontSz = 1.1 * fontScale

legend( "topleft", bty="n",inset=c(0.005,0.01),

legend=label, col=c("transparent","transparent"),

cex=fontSz, pch=15, y.intersp=1.1 )

grid()

}

### Compute f(x) for a polynomial and a vector of x-coordinates

evalPoly = function( x, coeff ) {

if ( length( coeff ) < 1 ) return( c(0) )

termSum = 0

for ( i in 1:length(coeff) ) {

termSum = termSum + coeff[i] * x^(i-1)

}

return( termSum )

}

### Get Plot Limits : gpl()$xLim[1] --> left x-coord

gpl = function() {

u = par( "usr" )

return( list( xLim=u[1:2], yLim=u[3:4] ) )

}

runFunc()

Example Session in RStudio Console Pane

Below is a short illustrative session using the RStudio Console pane.

A few things to mention:

Be sure to clear all code and variables from the Console pane before trying to load/run new code.
This command does exactly that (clears all variables/code): rm( list = ls() )
To run code in RStudio, paste it into the Console pane and then press Enter.
The session below shows that you can invoke/call functions directly in the Console pane as well vs. how the included code executes via the runFunc() function.

In the (2) session screenshots below, magenta text is what I typed into the window. White text is the Console pane's output.

Page updated

Google Sites

Report abuse