CHAPTER 8

Statistical Retrieval Procedure

The previous chapter dealt with the Backus-Gilbert retrieval procedure, which is based on simple manipulations of weighting functions and corresponding manipualtions of observables to arrive at a set of imaginary weighting functions, called averaging kernels, and the brightness temperatures that such averaging kernels would produce. The final step was to assign this hypothetical TB to an applicable altitude corresponding to the hypothetical TB's averaging kernel. The entire procedure had a minimum of assumptions, the principal one being the shape of the weighting function for each observable (frequency and elevation angle of viewing direction). A wide range of assumed temperature profiles would produce a similar set of weighting function shapes, so the Backus-Gilbert retrieval procedure should work in almost any atmosphere regime (tropical, mid-latitude, polar summer, polar winter, etc).

In this chapter we will deal with a procedure that assumes we know what atmospheric regime we are are flying in, and this added information will lead to an increment of performance improvement. However, it is often criticized for biasing the T(z) solution toward an average for that regime, and when an unusual T(z) is encountered its unusual features will be under-represented in the T(z) solution. The user must be alert to this situation, and be prepared to take steps to not be misled. Again, I'm getting ahead of myself; let's describe what the "statistical retrieval procedure" is.

Concepts Underlying Statistical Retrieval Procedure

Consider the hypothetical situation of an MTP flying over many RAOB sites on many dates at altitude Zo. After completing this hypothetical set of observations it would be possible to create a data base of MTP observables associated with each known RAOB-based profile, T(z). The user could then pose the question "If I'm flying at altitude Zo and I want to know the most likely air temperature at altitude Z, how might I use this data base to determine T at Z?" One approach is to perform a multiple regression of T at Z (as the dependent variable) upon the MTP observables (as the independent variables). This could be done using a spreadsheet, for example. The multiple regression solution produces a coefficient for each observable, allowing T (at altitude Z) to be calculated in the following manner:

    T = C0 + C1 * O
1 + C2 * O2 + ... + CN * ON

where there are N observables (N is typically between 10 and 30). The coefficients are called "retrieval coefficients." The number of RAOB flybys for such an analysis, R, would have to exceed N by a large amount, and it is generally recognized that R must exceed ~100 to assure that the coefficients
Ci are useful.

In order to derive T for another altitude the above procedure could be repeated. If there are L altitudes for which coefficients are calculated then there will be a N x L matrix of coefficients that can be used to calculate a profile of T(z) for L discreet altitudes.

For flight at another altitude the entire procedure, above, would have to be repeated. When this has been done for a range of altitudes, such as 6, 8, 10, ... 20 km, the collection of retrieval coefficients for all flight altitudes can be referred to as a "retrieval coefficient set." A retrieval coefficient set obtained in this way would be useable for flight at any altitude for future flights provided the flights were made in the same geographical region during the same season. Henceforth I shall use the term RCs to mean "retrieval coefficients" and also "a retrieval coefficient set."

Clearly, it is not feasible to fly in a specific area during a specific season for the hundreds of hours needed to obtain several hundred cases of simultaneous RAOB data simply to be prepared for the science flights at that location and season. Instead of using actual MTP observations as input to the multiple regression analyses it is possible to use "calculated observables." This oxymoron-sounding terminology is not misleading, since it should be possible to calculate what a perfectly calibrated MTP would observe if it were placed at a specified altitude in an atmosphere with T(z) given by a RAOB. This, in fact, is a practical way for deriving RCs.

Dealing With Stochastic Uncertainties and Systematic Errors

It is impossible to achieve a perfect calibration of any radiometer. Fortunately, there's a way to allow for an imperfectly calibrated MTP when calculating RCs. The process can best be described using matrix terminology, which I will not do in this web page. I will merely state that an "error matrix" can be created (the diagonal elements of an N x N array are used to specify estimated systematic uncertainties for the N observables), and this matrix is added to an auto-covariance matrix of the observables before it is inverted, etc. The matrix manipulation procedure which I will not describe is equivalent to repeating the perfect MTP calculation of RCs many times, where each time the perfect observables have been altered in a stochastic manner that simulates systematic uncertainties.

Even if the MTP were perfectly calibrated it is still subject to a stochastic component of noise on each observable. This stochastic component of observable uncertainty is usually less important than the systematic error component, but there is a way to allow for it when calculating RCs. Usually the two components (stochastic and systematic) are "lumped together" (orthogonally added) and treated in the manner alluded to above.

This completes my description of the "statistical retrieval" procedure. It has emphasized underlying concepts instead of procedural steps. Additional procedural information can be found in many places, including my Rockwell tutorial web page http://brucegary.net/RKW/. Also, Dr. M. J. Mahoney has become expert in many sophisticated versions of the statistical retrieval procedure and I prefer to refer the reader to his web pages on this subject (when they become available).

Comparison with Backus-Gilbert

It is interesting to compare the values of statistical RCs with the values of corresponding Backus-Gilbert coefficients. Recall that both approaches make use of the same expression for calculating temperature at a specific altitude: 
T = C0 + C1 * O1 + C2 * O2 + ... + CN * ON. When the coefficient values are ordered by their associated observable's applicable altitude both plots are similar. As shown in the next figure the coefficient values typically start out small in absolute value, then go negative before abruptly growing to high positive values in the region corresponding to the observable that contains the most "information" about the altitude in question, then decrease and go negative, then oscillate above and below zero to vanishing small values. This intuitively expected behavior is just a restatement of where information resides among the series of observables for the altitude under consideration. The presence of negative regions astride the main positive region simply means that the statistical retrieval procedure leads to retrieved T(z) that has sharper structure features than the "simplest possible" procedure described in Chapter 6 - in which TB(Ra) is converted to T(z) by assuming T = TB and z = Zo+Ra*sine(theta). The advantage of the statistical retrieval over Backus-Gilbert is that it is easier to calculate RCs.

RC vs Observable range

Figure 8.1. Typical shape of RC values plotted versus the applicable altitude of the associated observable. This RC sequence is for retrieving temperature at an altitude of 2.4 km.

There's another property of the statistical retrieval procedure that should be kept in mind. Since it involves finding a solution that minimizes the RMS of residuals (in observable space) it will produce solutions that tend to resemble the average of the RAOBs used in the simulation archive. This is both a strength and a weakness of the statistical retrieval procedure. It is a strength when flying though air that resembles the archive, but it is a weakness when flying though air that differs from the archive. In other words, when a rare but significant atmospheric situation is encountered the simple statistical retrieval procedure is likely to produce T(z) results that are inferior to a Backus-Gilbert based retrieval. Fortunately, the statistical retrieval user has a way to recognize this situation, and recovering from naievely accepting a misleading result. It involves comparing observables with the archive average observables, and when they differ by more than some subjective threshold an automatic search for a new set of RCs can be performed. This is a strength of the statistical retrieval procedure in which Dr. Mahoney has become expert.

The next chapter describes ways to overcome the limitations of the statistical retrieval procedure by 1) stratifying RCs, 2) trial and error RC selection, and 3) calulation of RCs based on a post-mission selection of RAOBs resembling those encountered during the mission. The chapter after that describes a totally new procedure, a search for the best fit within an immense data base of pre-calulated observables for selecting T(z). It also describes a "mutation and evolution" retrieval procedure that is
very amusing for its oddity.

Go to Chapter #9 (next chapter)
Go to Chapter #7 (previous chapter)

Return to Introduction

____________________________________________________________________

This site opened:  July 24, 2005 Last Update:  June 22, 2007