LINEAR MULTIPLE REGRESSION
SOLUTION EQUATIONS
This web page describes the linear least squares method for solving for the best combination of the two "constants," S1 and S2, that are to be multiplied with the two independent variables, X and Y, to predict the dependent variable, Z. The model for predicting Z is linear, which means that : Z = S1 * X + S2 * Y (the multiplication symbol, *, seems to be higher than I'd like, so apologies for that!). The "best" combination of S1 and S2 values is the combination that minimizes the sum of squares of the difference between the dependent variable and the dependent variable predicted by the model. Since the model defines a plane surface with arbitrary tilts in both the X and Y axes, our task is to locate that tilted plane surface that best "fits" the dependent variable in the Z direction. A brute force method for determining the best combination of S1 and S2 would be to try all possible combinations of S1 and S2 values, evaluate each one by calculating the RMS fit of the Z values with respect to the predicted value, and at the end of this infinitely long process choose as the solution the S1 and S2 values that in combination hprovide the best fit. However, there's a far easier method, described here.
I will use the notation x, y, z to represent the sequence of measured variables before averages are subtracted. For the task of this web site x represents "time" (i.e., horizontal location), y represents "theta," and z represents either U or V, the horizontal wind components. The sequences xi (i = 1 to N), yi and zi, have averages xbar, ybar and zbar. Subtracting the sequences from their respective averages yields sequences Xi, Yi, and Zi. (Note: I don't want to spend the time "subscripting," so the reader will have to get used to Xi meaning Xi etc. I also am unable to use Greek symbols, so I'll be using E to represent the Greek capital Sigma symbol. Apologies.) The independent variables are X and Y, and the dependent variable is Z. The model to be used is a linear combination of X and Y to predict Z, namely: Z = S1 * X + S2 * Y. The "constants" S1 and S2 determine the "fit" and are the two parameters we shall vary in search of a solution for the best fit. Each observation consists of a measured (with average removed) X, Y and Z, or Xi, Yi and Zi. For any combination of values for S1 and S2 we can calulate the predicted z-values for all N observations according to Zpi = S1 * Zi + S2 * Yi. The parameter we wish to minimize is then E(Zi - Zpi)2, or the sum of squares of all N differences. This sum of differences squared, SDS, is:
SDS = E(Zi - Zpi)2 = E (Zi - (S1 * Xi + S2 * Yi))2
The square the above expression has 6 terms:
SDS = E Zi - 2 S1 E(XiZi) - 2 S2 E(YiZi) + S12 E(Xi2) + 2 S1 S2 E(XiYi) + S22 E(Yi2 )
Think of a 2-dimensional space, an S1-S2 space, and at each location there is a value for SDS, which we can use to define a surface in the third dimension. Our task is to find the lowest point of the surface and note the S1 and S2 values for that location. At this minimum location we know that the partial derivatives of SD with respect to S1 and S2 will both be zero. So, the solution is to take the partial derivatives, leading to two equations, set them to zero, and solve the two equations for the two unknowns S1 and S2. When this is done we get:
S1 = (c e - b f) / (a e - b b)
S2 = (c b - a f) / (b b - a e)
where a = E(Xi2)
b = E(XiYi)
c = E(XiZi)
e = E(Yi2 )
f = E(YiZi)
Voila! We now have the equation that does the best job of predicting the independent variable U (or V) from "time" and "theta" for a chunk of data with N sets of time/theta/U (or time/theta/V). S1 corresponds to the horizontal gradient, and S2 corresponds to the theta gradient (a version of vertical gradient).
The correlation coefficient squared is computed using concepts that I shall not describe, but with the following result:
R2 = (E(S1Xi + S2Yi)(Zi))2 / (E(S1Xi + S2Yi)2 ) * (E(Zi2))
QuickBASIC programs have been written that read-in files that contain the measured quantities time, temperature, theta, U, V, lapse rate and turbulence intensity (one line of data per second), and calculations are performed that produce RRi corresponding to chunks of the input data. Chunk sizes so far investigated are 31 seconds, and 61 seconds. Weighted averages of this data is also performed, using as weight for each chunk-based RRi the following:
Weight = C (theta Population Standard Deviation) * (R2 for U plus R2 for V) / (C + dUV/dt)
where dUV/dt = ((S1 for U)2
+ (S1 for V)2 )0.5 and is the rate of change of the
wind vector with respect to time (horizontal location)
and C is chosen to be 0.07,
which is a typical value for the absoluet value of dUV/dt.
and the Population SD of
theta is ( E(Yi2 ) )0.5
Averaging times are subjectively selected. So far I've chosen
times of 30 seconds and 60 seconds.
____________________________________________________________________
This site opened: August 11, 2000. Last Update: August 11, 2000