Python:Interpolation

From PrattWiki
Revision as of 14:00, 16 April 2020 by DukeEgr93 (talk | contribs)
Jump to navigation Jump to search

Introduction

Interpolation is a process by which "gaps" in a data set may be filled using relatively simple equations. Interpolation differs from fitting in that:

  • Interpolations are required to exactly hit all the data points, whereas fits may not hit any of the data points, and
  • Interpolations are based on, often simple, mathematical formulas without regard to the underlying system which produced the data.

Basic Types of 1-D Interpolation

There are several basic types of interpolation; the examples below are based on the following data set:

$$\begin{array}{c|c} \mbox{Time}~t, \mbox{s} & \mbox{Temperature}~T, ^o\mbox{C}\\ \hline 0 & 5 \\ 2 & 3 \\ 8 & 10 \\ 12 & 15 \\ 20 & 14 \end{array}$$

Nearest Neighbor

Nearest neighbor interpolation means that for any given input, the output will be based on the dependent value in the data set obtained at the independent value of the data set closest to the input. For example, in the data set above, $$f(4)$$ would give a temperature of 3 since time 4 is closest to time 2 in the data set. Similarly, $$f(11)$$ would return a temperature if 15 since time 11 is closest to time 12.

There are several advantages of nearest neighbor:

  • Very simple "calculation" - really, there is no calculation other than finding out which independent value is closest
  • The interpolated values are always values in the data set - if you have some system that is only capable of producing particular values, nearest neighbor interpolation will never return an impossible value.

There are also disadvantages:

  • Technically undetermined half-way between measured data points,
  • Potentially large discontinuities between data points, and
  • No potential to estimate any kind of rate information between points.

Linear Interpolation

Linear interpolation involves figuring out the equation of a straight line between data points. The output will be based on the line connecting the points to the left and right of the input. For example, in the data set above, $$f(4)$$ would be found by finding the equation of the line between (2, 3) and (8, 10). The general formula for finding the value of $$f(x)$$ based on some value $$x$$ between the data points to $$x$$'s left $$x_L$$ (where $$y=y_L$$) and the data point to $$x$$'s right $$x_R$$ (where $$y=y_R$$) is:

$$\begin{align*} f(x)&=y_L+\frac{y_R-y_L}{x_R-x_L}(x-x_L) \end{align*}$$

so in this case, the process for finding $$f(4)$$ would be:

$$\begin{align*} f(4)&=3+\frac{10-3}{8-2}(4-2)=5.33 \end{align*}$$

Note that there will be a different equation for each interval; if you were to want to calculate $$f(16)$$, you would need to use the data points at (12, 15) and (20, 14) to get:

$$\begin{align*} f(x)&=y_L+\frac{y_R-y_L}{x_R-x_L}(x-x_L)\\ f(16)&=15+\frac{14-15}{20-12}(16-12)=14.5 \end{align*}$$

Cubic Spline Interpolation

Cubic spline interpolation involves coming up with a third-order equation for each interval between the data points. Because there are now four free coefficients for each equation, cubic spline interpolations can not only satisfy the requirement that the interpolation functions hit each of the data points but also satisfy additional requirements. In the case of cubic splines, two additional requirements are to have the slopes of the interpolating functions match at the interior data points and to have the curvatures of the interpolating functions match at the interior points. It turns out that these additional restrictions are not sufficient to completely determine all the coefficients for all the equations for all the intervals -- there are actually two too few restrictions. Given that, there are some different options for how to deal with the additional two equations:

  • Not-a-knot: this is the default option; basically, the calculations look at what would have happened if the second and penultimate data points didn't exist.
  • Clamped: this option allows you to specify the first derivative at the start and the first derivative at the end. In Python, the default case for clamped end conditions is to set these slopes to zeros.
  • Natural: this option allows you to specify the second derivative at the start and the second derivative at the end. In Python, the default case for natural end conditions is to set the second derivatives to zero.

Important Notes

  • Be sure to import the CubicSpline function with
from scipy.interpolate import CubicSpline
  • Note that the result of the CubicSpline function is a function; that is:
import numpy as np
from scipy.interpolate import CubicSpline
x = np.array([1, 2, 3, 4])
y = np.array([3, 1, 4, 1])
my_fun = CubicSpline(x, y, bc_type='natural')
my_fun([1.5, 2.5, 3.5])
will create a function to calculate interpolated values and then uses it to create a list of three estimates.

Example Code

In Python, interpolation can be performed using the interp1d method of the scipy.interpolate package. This method will create an interpolation function based on the independent data, the dependent data, and the kind of interpolation you want with options inluding nearest, linear, and cubic (which uses not-a-knot conditions). Alternately, if you want to do some form of cubic spline, especially some form that is not not-a-knot, you can use the CubicSpline method of the scipy.interpolate package.

2-D Interpolation

Starter