Least Squares Fit of a Quadratic Curve to Data

This time around, I'll use an example that many people have seen in High School physics class. An apparatus is available that marks a strip of paper at even intervals in time. The paper is pulled through the marker by a falling weight. By measuring the positions of the points on the strip (distance fallen), it is possible to obtain a value for "g", the acceleration of gravity.

The laws of motion say that the distance fallen is given by the equation:

Gravitational acceleration can be deduced if we can find a best fit of the data to a general quadratic equation:

I've selected subscript notation for the coefficients in the quadratic to fit into matrix notation. I will also use subscript notation to keep track of data values. The time of the ith data point is ti, and the measured distance from the start to that point is di. The total number of data points is "n".

As with a linear fit, the concept of "best fit" requires definition of some measure of the error between the data and the fit curve. The functional form for error is a simple generalization of the linear error function:

As before, the minimum error is at the point where the partial derivatives of the error function with respect to the coefficients are all zero.

The equation resulting from evaluating the partial derivative with respect to c1 is:

Dividing both sides of the final form by 2 and rearranging gives:

The equation resulting from evaluating the partial derivative with respect to c2 is:

Dividing both sides of the final form by 2 and rearranging gives:

The equation resulting from evaluating the partial derivative with respect to c3 is:

Dividing both sides of the final form by 2 and rearranging gives:

:



Using standard notation for linear algebra, these three equations can be written as:



This can now be translated into Fortran for solution. The program lsq.f implements the equations in a form very similar to the above notation. However other approaches are more efficient. Take a look at lsq2.f, lsq3.f, and the following use of Fortran 90 intrinsic functions. Assume that the arrays "t" and "d" contain the "n" data points (see the file fall.data). Then we load a Fortran doubly dimensioned array as

a(1,1) = n

a(1,2) = sum(t(1:n))

a(1,3) = dot_product(t(1:n),t(1:n))

a(2,1) = a(1,2)

a(2,2) = a(1,3)

a(2,3) = sum(t(1:n)**3)

a(3,1) = a(1,3)

a(3,2) = a(2,3)

a(3,3) = sum(t(1:n)**4)

The array for the right hand side can be set as:

rhs(1) = sum(d(1:n))

rhs(2) = dot_product (d(1:n)*t(1:n))

rhs(3) = sum ( d(1:n)*t(1:n)**2)

These two arrays can be passed to an appropriate linear system solver to obtain the values for the coefficients.

My interest, here, is showing you how to generate a curve that in a well defined way comes close to a set of data. However, if I was serious about experimentally determining the value of "g", I would still have some work to do. Enough information exists in this problem to make a meaningful statement about the uncertainty in my value for "g". If a coefficient resulting from a least squares fit is important to you, take the time to learn the uncertainty methodology from a good text on statistical or experimental methods.


Back to the Lecture / Table of Contents / Home


Written and Maintained by John Mahaffy : jhm@psu.edu