Determining discrete data-set extremes

Mike J Courtney

When collecting or computing discrete data (particularly time-varying data), the resulting resolution often is limited by influences such as computational burden, A/D sampling rates, storage limitations, and transmit rates. This can be a problem when you are interested in the data that describes the minimum and/or maximum values within the continuous function. Examples of this include the determination of the true peak of a computed-signal cross-correlation profile and the reconstruction of the precise peaks and valleys of an environmental measurement. Determining these minima and maxima, collectively referred to as "extrema," requires interpolation between the known discrete data points. However, if the interpolation resolution is too course, the true extrema might be overlooked; if the resolution is too fine, the computation takes entirely too long. William Press et al. point out a number of algorithms for more efficiently finding the extrema of a data set in Numerical Recipes in C, Second Edition (Cambridge University Press, 1988). These algorithms typically consist of bracketed searches and are often iterative. In this article, I'll present an alternative approach derived from the well-established equation for cubic-spline interpolation. This approach performs a direct, noniterative determination of the extrema of a discrete data set.

Algorithm Overview

The cubic spline extrema algorithm computes the relative extrema (i.e. maximums and minimums) of the continuous function that describes a discrete data set.

It uses all the given data to compute second derivatives at each point, which are called knots in cubic spline parlance. Incidentally, the term 'spline' originates from drafting, in which pieces of wood were used to 'draw' smooth curves by bending them between knots. Those pieces of wood were called splines. The shape made by the spline between the knots is essentially a third degree (or cubic) polynomial. A disadvantage of tradition polynomial interpolators that attempt to solve complicated data-fitting is that they attempt to derive a single algorithm that satisfies all the data points. And furthermore, iterative searching and interpolation within those curves is required to hunt for the extrema.

However, spline interpolation computes a different polynomial at each interval. And my algorithm takes that a step further by forcing constraints at the knots while computing derivatives such that we yield equations that describe the existence and location of any extrema within each spline. This process directly yields the x values without the need for iterative searching. The y values are then computed based upon the associated spline equation. The original derivation of the algorithm spans several handwritten pages that I still possess. Here are the final equations that came out of that work.

Algorithm Derivation

The goal of cubic spline intepolation is to derive an interpolation formula in which the the first and second derivatives of the spline polynomials are equals at the knots. This results in a formula with interval splines that intersect at the knots while exhibiting a smooth transition from one interval to the next. Given a data set described by the general function yj = y(xj), the linear interpolation in the interval between xi and xi+1 can be expressed as equation 1:
  y = Ayi + Byi+1 + Cy" + Dy"i+1
where " denotes the second derivative and the following apply:
  A = (xi+1 - x) / (xi+1 - xi)
  B = 1 - A
  C = ⅙ [(A3 - A)(xi+1 - xi)2]
  D = ⅙ [(B3 - B)(xi+1 - xi)2]
The first derivative of our equation for y is denoted as dy/dx and is solved as equation 2:
  dy/dx = (yi+1 - yi) / (xi+1 - xi) - ⅙ [(3A2 - 1)(xi+1 - xi) * y"i] + ⅙ [(3B2 - 1)(xi+1 - xi) * y"i+1]

If you then take this further and set the above equation to 0, then a new equation can be derived that represents the extrema of our first equation. This then allows the identification of the points at which y remains constant with respect to finite changes in x.
Expressing equation 2 in the quadratic form ax2 + bx + c = 0 such that x can be solved, we get the coefficients of the quadratic:
  a = 3(y"i+1 - y"i)
  b = 6(xi+1y"i - xiy"i+1)
  c = 6yi+1 - 6yi - (2x2i+1 - x2i + 2xixi+1) * y"i - (x2i+1 - 2x2i - 2xixi+1) * y"i+1
and solving for that quadratic is essentially the foundation of cubic extrema equation.
Using these cubic-extrema quadratic coefficents and quadratic root solver yields the candidate extrema. If they lie within the current interval of examination then they are a valid abscissa value at which the ordinate extrema exists. If not, then no extrema lies within that particular spline interval.

Overall Code Flow

The code steps for applying the algorithm are:

Compute the Second Derivatives

The derivative form of the cubic spline has everything required to perform the calculation of the quadratic coefficents except the second derivatives of the input data ordinates y". Those can be computed as the solution of a system of N spline equations in N unknowns. The equations can be represented in symmetric triagonal form, and thus can be solved with a degenerate version of Gaussian elimination.

Compute the Quadratic Roots

Once we have the second derivatives at the knows, we can solve for the quadratic coefficients shown earlier. Because one or more coefficents may be very small, the results are computed using a quadratic solver that avoids accuracy errors. The software implementation includes checks for conditions that could result in distastrous operations, such as taking the square root of a negative number of dividing by zero. Either of these indicates early-on that an interval does not have an extrema.

Perform Bounds Checking

If at least one of the roots is a valid abscissa candidate, then bounds checking is performed to determine if the root lies within the current interval. It is does, then its a valid root and also an abscissa location at which an extremum lies.

Calculate the Extremum-Ordinate

If a valid abscissa has been found, then it's utilized to compute the corresponding ordinate value (aka the y value). Since the second derivative has already been computed, this calculation is easy using the standard equation for a cubic spline.