Dataset with missing names, and you want to
quickly identify the relation between variables, identify the 'Principle
Components'.
>library(ggplot2)
>data(mpg)
>data <-
mpg[,c("displ", "year", "cyl", "cty",
"hwy")]
# get the numeric columns
only for this easy demo
>prcomp(data,
scale=TRUE)
Standard deviations:
[1] 1.8758132 1.0069712 0.5971261 0.2658375
0.2002613
Rotation:
PC1
PC2 PC3 PC4 PC5
displ 0.49818034 -0.07540283 0.4897111
0.70386376 -0.10435326
year 0.06047629 -0.98055060 -0.1846807
-0.01604536 0.02233245
cyl 0.49820578 -0.04868461 0.5028416 -0.68062021 0.18255766
cty -0.50575849 -0.09911736 0.4348234
0.15195854 0.72264881
hwy -0.49412379 -0.14366800 0.5330619 -0.13410105 -0.65807527
Here is how you interpret
the result:
(1) The standard
deviations, which is the diagonal matrix in the middle when you apply the
singular value decomposition. Explains how much variance each 'Principle
Component'? / layer / transparency explains in the whole variance in the
matrix. For example,
70 % = 1.8758132^2 /
(1.8758132^2 + 1.0069712^2 + 0.5971261^2 + 0.2658375^2 + 0.2002613^2)
Which indicates the first
column itself already explains 70% of the variance in the whole matrix.
(2) Now let's look at the
first column in the rotation matrix / V:
PC1
displ 0.49818034
year 0.06047629
cyl 0.49820578
cty -0.50575849
hwy -0.49412379
We can see: displ has a
positive relation with cyl and negative relation with cty and hwy. And in this
dominant layer, year is not that obvious.
The makes sense, the more
displacement or cylinders you have in your car, it probably has a very high
MPG.
Here is the plot between
the variables just for you information.
pairs(data)
No comments:
Post a Comment