Today Professor Overton was out of town, so the acting chair of the CS Department Professor Marsha Berger handled the lecture. It was a good experience to have a completely different professor teach a session of a class this far into the semester; It helps me disconnect the information from the presentation a bit.
We covered issues surrounding trying to find a function, typically a polynomial, which interpolates some set of data points. This is a very basic problem, and there are lots of effective strategies in various disciplines for approaching it. Statistics, for example, uses linear regression through R2 data to find a line which minimizes the squares of the errors for all points. We have also covered several ways of doing this sort of work in Machine Learning this semester.
We took a very general approach to the problem in class today. The core idea is that, for n data points, you can find a n-1th degree polynomial which interpolates those points with n+1 coefficients. As an aside, this strategy can only go so far. Very high-order polynomials (starting around degree 10) behave a bit counter-intuitively, and end up not being very useful. In any case, it's more often than not possible to find a low-order interpolation for enormous datasets. Linear regression is an excellent example of this.
Our basic example was a set of two data points in R2: (1,1) and (2,3). In order to do interpolation, you need to find a polynomial for each data point. Explicitly, you need to find {cj} such that p(Xi) = Yi for i=0..n. So for our example we need to fill in the polynomial p1(X) = C0 + C1X with our two data points:
C0 + C1(1) = 1C0 + C1(2) = 3This obviously a linear system, and as such can be represented by a matrix:
|1 1||C0|=|1||1 2||C1| |3|This is (trivially) solved to give us the solution -1 + 2X, so C0 = -1 and C1 = 2. The equation 0 = 2x -1 is our interpolation of the test data. We really have no way of knowing, though, whether a line is the proper way to represent the underlying distribution of the data. It might be that a quadratic or a cubic or any other higher-order polynomial is a better fit. We can't know until we have more data points to evaluate. It turns out that there's no good algorithmic way to determine what is proper when that data arrives. It's basically a qualitative judgment.
The general case of this technique generates a square matrix of size n+1, with 1 in the first column (what I wouldn't give for a fixed-width font here):
|1 X0^1 X0^2 ... X0^n||C0| |Y0||1 X1^1 X1^2 ... X1^n||C1|=|Y1||... ... ... ... ... ||..| |..||1 Xn^1 Xn^2 ... Xn^n||Cn| |Yn|This is called a Vandermonde Matrix. It's important to note that it would be singular if any two data points were identical, so all data must be distinct. If it turns out that multiple identical data points matter, one way to keep the matrix from becoming singular is to add information on the derivative at each point into the matrix. This would make something like the two points at the intersection of a loop distinct as they'd have different slopes. A Vandermonde matrix augmented with this derivative information is called a Hermite matrix.
I mentioned earlier that there's no way to know which polynomial best interpolates a given set of data. It does turn out that it's easy to show that *any* polynomial of a given degree which interpolates the data is as good as any other, and all such are effectively just restatements of each other.
Suppose there exist 2 interpolating polynomials P(x) and Q(x) for some data. If you state that their difference must be another polynomial of the same degree D(x), you get the following form:
D(X) = P(X) - Q(X)If we state that P and Q are equal, then D(Xi) must equal 0 for i=0..n. At this point my notes break down a bit, because the next statement I see is that since D(Xi) = 0 we can say that P(X) must equal Q(X), but this is a tautology. I'll need to go back and find out which step I missed.
In any case, this fact that any two polynomials of an equal degree are as good as any other (keep in mind that the coefficients associated with each will be different) has uses primarily when you want to scale data away from very large or very small magnitudes to reduce computation error.
The last thing I wanted to note is that these polynomials end up looking a lot like Taylor series. In fact, Taylor series are just an extreme degeneracy of interpolation which uses all derivative (the 1st, 2nd, ... nth derivatives of P) information.