Principal Components Analysis [1]

PCA is a mathematical method of reorganising the information in a set of data. The purpose of principal component analysis is to derive a small number of linear combinations (principal components) of a set of variables (population characteristics) that retain the maximal amount of information about those population characteristics. The term "maximal amount of information" means the maximal ability to explain the variance of the original data.

For example, a multi-country survey may collect the following data:

    age, sex, marital status, race, ethnicity, ancestry, national citizenship, refugee status, disability status, rural/urban residency, nation of residency, household size, number of dependents, health insurance status, immunisation status, general health status, nourishment status, daily protein intake, vitamin intake, medicines taken, adjusted gross income, alimony received, child support received, employment status, years of employment, worker class (blue collar, white collar, etc.), occupation, education level, school enrollment status, hours worked per week, housing unit type, enrollment in government assistance status, etc.
Interpreting that volume of data may be very difficult. PCA allows those different characteristics of the respondents to be summarised in a few variables, which can then be compared across respondents.


1. This definition is based on the Wikipedia definition for Principal Components Analysis [disclaimer], www.statistics.com/resources/glossary/p/pca.php, and www.spectroscopyeurope.com/TD_16_6.pdf (accessed 28 December 2006).