
eBook - ePub
Exploratory Multivariate Analysis by Example Using R
- 262 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
Exploratory Multivariate Analysis by Example Using R
About this book
Full of real-world case studies and practical advice, Exploratory Multivariate Analysis by Example Using R, Second Edition focuses on four fundamental methods of multivariate exploratory data analysis that are most suitable for applications. It covers principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) a
Trusted by 375,005 students
Access to over 1.5 million titles for a fair monthly price.
Study more efficiently using our study tools.
Information
Subtopic
Computer Science GeneralIndex
Mathematics1
Principal Component Analysis (PCA)
1.1 Data — Notation — Examples
Principal component analysis (PCA) applies to data tables where rows are considered as individuals and columns as quantitative variables. Let xik be the value taken by individual i for variable k, where i varies from 1 to I and k from 1 to K.
Let denote the mean of variable k calculated over all individual instances of I:
and sk the standard deviation of the sample of variable k (uncorrected):
Data subjected to a PCA can be very diverse in nature; some examples are listed in Table 1.1.
This first chapter will be illustrated using the “orange juice” dataset chosen for its simplicity since it comprises only six statistical individuals or observations. The six orange juices were evaluated by a panel of experts according to seven sensory variables (odour intensity, odour typicality, pulp content, intensity of taste, acidity, bitterness, sweetness). The panel’s evaluations are summarised in Table 1.2.
1.2 Objectives
The data table can be considered either as a set of rows (individuals) or as a set of columns (variables), thus raising a number of questions relating to these different types of objects.
TABLE 1.1
Some Examples of Datasets
Some Examples of Datasets
Field | Individuals | Variables | xik |
Ecology | Rivers | Concentration of pollutants | Concentration of pollutant k in river i |
Economics | Years | Economic indicators | Indicator value k for year i |
Genetics | Patients | Genes | Expression of gene k for patient i |
Marketing | Brands | Measures of satisfaction | Value of measure k for brand i |
Pedology | Soils | Granulometric composition | Content of component k in soil i |
Biology | Animals | Measurements | Measure k for animal i |
Sociology | Social classes | Time by activity | Time spent on activity k by individuals from social class i |
TABLE 1.2
The Orange Juice Data
The Orange Juice Data

1.2.1 Studying Individuals
Figure 1.1 illustrates the types of questions posed during the study of individuals. This diagram represents three different situations where 40 individuals are described in terms of two variables: j and k. In graph A, we can clearly identify two distinct classes of individuals. Graph B illustrates a dimension of variability which opposes extreme individuals, much like graph A, but which also contains less extreme individuals. The cloud of individuals is therefore long and thin. Graph C depicts a more uniform cloud (i.e., with no specific structure).
Interpreting the data depicted in these examples is relatively straightforward as they are two dimensional. However, when individuals are described by a large number of variables, we require a tool to explore the space in which these individuals evolve. Studying individuals means identifying the similarities between individuals from the point of view of all the variables. In other words, to provide a typology of the individuals: which are the most similar individuals (and the most dissimilar)? Are there groups of individuals which are homogeneous in terms of their similarities? In addition, we should look for common dimensions of variability which oppose extreme and intermediate individuals.

FIGURE 1.1
Representation of 40 individuals described by two variables: j and k.
Representation of 40 individuals described by two variables: j and k.
In the example, two orange juices are considered similar if they were evaluated in the same way according to all the sensory descriptors. In such cases, the two orange juices have the same main dimensions of variability and are thus said to have the same sensory “profile.” More generally, we want to know whether or not there are groups of orange juices with similar profiles, that is, sensory dimensions which might oppose extreme juices with more intermediate juices.
1.2.2 Studying Variables
Following the approach taken to study the individuals, might it also be possible to interpret the data from the variables? PCA focuses on the linear relationships between variables. More complex links also exist, such as quadratic relationships, logarithmics, exponential functions, and so forth, but they are not studied in PCA. This may seem restrictive, but in practice many relationships can be considered linear, at least for an initial approximation.
Let us consider the example of the four variables (j, k, l, and m) in Figure 1.2. The clouds of points constructed by working from pairs of variables show that variables j and k (graph A) as well as variables l and m (graph F) are strongly correlated (positively for j and k and negatively for l and m). However, the other graphs do not show any signs of relationships between variables. The study of these variables also suggests that the four variables are split into two groups of two variables, (j, k) and (l, m), and that, within one group, the variables are strongly correlated, whereas between groups, the variables are uncorrelated. In exactly the same way as for constructing groups of individuals, creating groups of variables may be useful with a view to synthesis. As for the individuals, we identify a continuum with groups of both very unusual variables and intermediate variables, which are to some extent linked to both groups. In the example, each group can be represented by one single variable as the variables within each group are very strongly correlated. We refer to these variables as synthetic variables.

FIGURE 1.2
Representation of the relationships between four variables: j, k, l, and m, taken two-by-two.
Representation of the relationships between four variables: j, k, l, and m, taken two-by-two.
When confronted with a very small number of variables, it is possible to draw conclusions from the clouds of points, or from the correlation matrix which groups together all of the linear correlation coefficients r(j, k) between the pairs of variables. However, when working with a great...
Table of contents
- Cover
- Title Page
- Copyright Page
- Table of Contents
- Preface
- 1 Principal Component Analysis (PCA)
- 2 Correspondence Analysis (CA)
- 3 Multiple Correspondence Analysis (MCA)
- 4 Clustering
- 5 Visualisation
- Appendix
- Bibliography of Software Packages
- Bibliography
- Index
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Exploratory Multivariate Analysis by Example Using R by Francois Husson,Sebastien Le,Jérôme Pagès in PDF and/or ePUB format, as well as other popular books in Mathematics & Computer Science General. We have over 1.5 million books available in our catalogue for you to explore.