9512.net

# Multivariate Analysis_Chap1_Aspects_图文

STAT 4020 Multivariate Analysis
Dr. Peng Xiaoling Office: E409 xlpeng@uic.edu.hk Tel: 3620623 TA: Miss Zhou Haiying

2010-8-17

www.uic.edu.hk/~xlpeng

1

Text Book
Johnson, R.A. and Wichern, D.W. (2002), Applied Multivariate Statistical Analysis, The 6th Edition, Prentice-Hall International Editions

http://www.zker.cn/Book/1187999

Reference Books
1. Morrison, D.F. (1990), Multivariate Statistical Methods, The 3rd Edition, McGraw-Hill International Editions. 2. 方开泰 (1986), 实用多元统计分析. 华东师范大学出版 社。

2010-8-17

www.uic.edu.hk/~xlpeng

2

Assessment
1. 2. 3. 4.

Exam: 60% Mid Test: 15% Mini Project: 15% Continuous assessment: 10%

Software Support: R, SAS, MATLAB

2010-8-17

www.uic.edu.hk/~xlpeng

3

Introduction (with some examples) Graphical Presentation Multivariate Normal Distribution Statistical Inference Principal Components (主成份分析) Factor Analysis （因子分析） Canonical Correlation Analysis （典型相关分析） Discriminant Analysis （判别分析） Cluster Analysis （聚类分析） Decision Tree (optional) （决策树）
2010-8-17 www.uic.edu.hk/~xlpeng 4

What is multivariate analysis?

Multivariate statistical analysis is concerned with data collected on several dimensions of the same individual. Such observations are common in the social, behavioral, financial, life and medical sciences:

2010-8-17

www.uic.edu.hk/~xlpeng

5

Example : 8 students’ scores for 5 subjects
x1 Maths 100 98 89 80 70 66 55 50 76 x2 Physics 97 88 90 75 68 75 70 61 78 x3 Chemistry 98 94 88 85 72 63 68 65 79 x4 Chinese 85 82 80 70 82 70 68 63 75 x5 English 90 80 87 60 73 65 75 70 75

student no. 1 2 3 4 5 6 7 8 mean

2010-8-17

www.uic.edu.hk/~xlpeng

6

Usually an m-dimensional data set is denoted as a matrix:

x 11 x 12 ... x 1 m x
21

x

22

... x ... x

2 m

... x n1 x

n 2

nm

,

where every row corresponds to the observation values from an individual & is denoted as: xk = (xk1, xk2, …, xkm)′ , (k = 1, 2, …n) xk is the exam scores of student k, every column in the matrix corresponds to the values of a variable (all the students’ scores for one exam). The notation xk1 represents the value of the kth individual (student) on the ith variable (exam).
2010-8-17 www.uic.edu.hk/~xlpeng 7

Examples
Example 1.1(Industral) Paper is manufactured in continuous sheets several feet wide. Because of the orientation of fibers within the paper, it has a different strength when measured in the direction produced by the machine than when measured across, or at right angles to, the machine direction. Variables are: X1: density (grams/cubic centimeter) X2: strength (pounds) in the machine direction X3: strength (pounds) in the cross direction

2010-8-17

www.uic.edu.hk/~xlpeng

8

2010-8-17

www.uic.edu.hk/~xlpeng

9

TABLE .2 PAPE 1 R-QU ALI TY M AS U E RE M NTS E

Stre ngth Spe cim n De ns ity M achine dir. Cros s dir. e 1 0.801 121.41 70.42 2 0.824 127.7 72.47 3 0.841 129.2 78.2 4 0.816 131.8 74.89 5 0.84 135.1 71.21 6 0.842 131.5 78.39 7 0.82 126.7 69.02 8 0.802 115.1 73.1 9 0.828 130.8 79.28 10 0.819 124.6 76.48 11 0.826 118.31 70.25 12 0.802 114.2 72.88 13 0.81 120.3 68.23 14 0.802 115.7 68.12 15 0.832 117.51 71.62 16 0.796 109.81 53.1 17 0.759 109.1 50.85 18 0.77 115.1 51.68 19 0.759 118.31 50.6 20 0.772 112.6 53.51 21 0.806 116.2 56.53 22 0.803 118 70.7 23 0.845 131 74.35 24 0.822 125.7 68.29 25 0.971 126.1 72.1 26 0.816 125.8 70.64 27 0.836 125.5 76.33 28 0.815 127.8 76.75 29 0.822 130.5 80.33 30 0.822 127.9 75.68 31 0.843 123.9 78.54 32 0.824 124.1 71.91 33 0.788 120.8 68.22 34 0.782 107.4 54.42 35 0.795 120.7 70.41 36 0.805 121.91 73.68 37 0.836 122.31 74.93 38 0.788 110.6 53.52 39 0.772 103.51 48.93 40 0.776 110.71 53.67 41 0.758 113.8 52.42
S OU RCE Data courtes y of S ONOCO Prod : ucts . I nc.

2010-8-17

www.uic.edu.hk/~xlpeng

10

The 10 largest U.S. industrial corporations in 1990 yield the following data, where x1: sales (millions of dollars) x2: profits (millions of dollars) x3: assets (millions of dollars)

2010-8-17

www.uic.edu.hk/~xlpeng

11

x1=sales x2 =profits x3=assets Company (millions of dollars) (millions of dollars) (millions of dollars) General Motors 126974 4224 173297 Ford 96933 3835 160893 Exxon 86656 3510 83219 IBM 63438 3758 77734 General Electric 55264 3939 128344 Mobil 50976 1809 39080 Philip Morris 39069 2946 38528 Chrysler 36156 359 51038 du Pont 35209 2480 34715 Texaco 32416 2413 25636 SOURCE: "Fortune 500," Fortune, 121, 9April 23, 1990), 346-367.
2010-8-17 www.uic.edu.hk/~xlpeng 12

Example 1.3 (Environment study) The following data are 42 measurements on airpollution variables recorded at 12:00 noon in the Los Angeles area on different days. x1: Wind x2: Solar rad. x3: CO x4: NO x5: NO2 x6: O3 x7: HO

2010-8-17

www.uic.edu.hk/~xlpeng

13

TABLE 1.3 AIR POLLUTUIB DATA Wind (x1 ) Solar rad. (x2 ) CO (x3 ) NO (x4 ) NO2 (x5 ) 8 98 7 2 12 7 107 4 3 9 7 103 4 3 5 10 88 5 2 8 6 91 4 2 8 8 90 5 2 12 9 84 7 4 12 5 72 6 4 21 7 82 5 1 11 8 64 5 2 13 6 71 5 4 10 6 91 4 2 12 7 72 7 4 18 10 70 4 2 11 10 72 4 1 8 9 77 4 1 9 8 76 4 1 7 8 71 5 3 16 9 67 4 2 13 9 69 3 3 9 10 62 5 3 14 9 88 4 2 7 8 80 4 2 13 5 30 3 3 5 6 83 5 1 10 8 84 3 2 7 6 78 4 2 11 8 79 2 1 7 6 62 4 3 9 10 37 3 1 7 8 71 4 1 10 7 52 4 1 12 5 48 6 5 8 6 75 4 1 10 10 35 4 1 6 8 85 4 1 9 5 86 3 1 6 5 86 7 2 13 7 79 7 4 9 7 79 5 2 8 6 68 6 2 11 8 40 4 3 6 SOURCE: Data courtesy of Professor G.C. Tiao

O3 (x6 ) 8 5 6 15 10 12 15 14 11 9 3 7 10 7 10 10 7 4 2 5 4 6 11 2 23 6 11 10 8 2 7 8 4 24 9 10 12 18 25 6 14 5

HO (x7 ) 2 3 3 4 3 4 5 4 3 4 3 3 3 3 3 3 3 4 3 3 4 3 4 3 4 3 3 3 3 3 3 4 3 3 2 2 2 2 3 2 3 2

2010-8-17

www.uic.edu.hk/~xlpeng

14

Example 1.4 (Sport statistics)

The data on page 44 (table 1.9) give national track records for men. x1: 100m (seconds) x2: 200m (seconds) x3: 400m (seconds) x4: 800m (minutes) x5: 1500m (minutes) x6: 3000m (minutes) x7: Marathon (minutes)
2010-8-17 www.uic.edu.hk/~xlpeng 15

Data Structure

2010-8-17

www.uic.edu.hk/~xlpeng

16

Sample covariance and sample correlation

2010-8-17

www.uic.edu.hk/~xlpeng

17

What can we do on multivariate data?
The objectives of scientific investigations, for which multivariate methods most naturally lend themselves, include the following;
1. 2. 3. 4. 5.

Data reduction or structural simplification: Projection idea, factor analysis, structural model etc. Sorting & grouping: grouping variables, observations,…. Investigation of the dependence among variables. Prediction Hypothesis construction & testing

2010-8-17

www.uic.edu.hk/~xlpeng

18

Techniques
Multivariate Linear Models Principal Component Analysis Factor Analysis Canonical Correlation Analysis Discriminant Analysis Clustering Analysis Decision tree

2010-8-17

www.uic.edu.hk/~xlpeng

19

Applications of Multivariate Techniques
Medicine & Health Sociology Business & Economics Education Biology Environmental Studies Geology Psychology Sports Data Mining

2010-8-17

www.uic.edu.hk/~xlpeng

20

Graphical Representation
Graphs are useful for discovering the underlying structure of a data set. - When the data set has only 2 characteristics (variables or factors), the graph can be represented on the plane, and we have a scatter or curve plot. - In case of 3 characteristics, we can also plot the graph on the plane using a 2-dimensional projection. - When the number of characteristics is more than 3, more sophisticated graphical representations of multi-dimensional data can be used.

2010-8-17

www.uic.edu.hk/~xlpeng

21

Graphs of multi-dimensional data
What I am going to talk

Multiple scatter plots Graphs of Growth Curves Stars Chernoff Faces
2010-8-17 www.uic.edu.hk/~xlpeng 22

Multiple Scatter Plots for paper strength measurements

Paper is manufactured in continuous sheets several feet wide. Because of the orientation of fibers within the paper, it has a different strength when measured in the direction produced by the machine than when measured across, or at right angles to, the machine direction. Variables are:

X1: density (grams/cubic centimeter) X2: strength (pounds) in the machine direction X3: strength (pounds) in the cross direction

2010-8-17

www.uic.edu.hk/~xlpeng

23

TABLE .2 PAPE 1 R-QU ALI TY M AS U E RE M NTS E

Stre ngth Spe cim n De ns ity M achine dir. Cros s dir. e 1 0.801 121.41 70.42 2 0.824 127.7 72.47 3 0.841 129.2 78.2 4 0.816 131.8 74.89 5 0.84 135.1 71.21 6 0.842 131.5 78.39 7 0.82 126.7 69.02 8 0.802 115.1 73.1 9 0.828 130.8 79.28 10 0.819 124.6 76.48 11 0.826 118.31 70.25 12 0.802 114.2 72.88 13 0.81 120.3 68.23 14 0.802 115.7 68.12 15 0.832 117.51 71.62 16 0.796 109.81 53.1 17 0.759 109.1 50.85 18 0.77 115.1 51.68 19 0.759 118.31 50.6 20 0.772 112.6 53.51 21 0.806 116.2 56.53 22 0.803 118 70.7 23 0.845 131 74.35 24 0.822 125.7 68.29 25 0.971 126.1 72.1 26 0.816 125.8 70.64 27 0.836 125.5 76.33 28 0.815 127.8 76.75 29 0.822 130.5 80.33 30 0.822 127.9 75.68 31 0.843 123.9 78.54 32 0.824 124.1 71.91 33 0.788 120.8 68.22 34 0.782 107.4 54.42 35 0.795 120.7 70.41 36 0.805 121.91 73.68 37 0.836 122.31 74.93 38 0.788 110.6 53.52 39 0.772 103.51 48.93 40 0.776 110.71 53.67 41 0.758 113.8 52.42
S OU RCE Data courtes y of S ONOCO Prod : ucts . I nc.

2010-8-17

www.uic.edu.hk/~xlpeng

24

Scatterplots & boxplots of paper-quality data

2010-8-17

www.uic.edu.hk/~xlpeng

25

3D scatter plot

2010-8-17

www.uic.edu.hk/~xlpeng

26

Looking for group structure in three dimensions
SVL: snout-vent length HLS: hind limb span

The gender for the lizard data in Table 1.3 are

fmffmfmfmfmfm mmmfmmmffmff
2010-8-17 www.uic.edu.hk/~xlpeng 27

Males are typically larger than females.

2010-8-17

www.uic.edu.hk/~xlpeng

28

Profile
The profile is a simple way to represent the m-dimensional points graphically. Here, every individual is represented by m vertical / horizontal lines or bars. Their heights correspond to the values of variables respectively, which are arranged along a horizontal / vertical baseline. Sometimes, the tops of the lines are connected by a polygonal line.

Profiles for one individual
2010-8-17 www.uic.edu.hk/~xlpeng 29

Graphs of Growth Curves
Example 1.10. Table 1.4 gives the weights (wt) in kilograms and lengths (lngth) in centimeters of seven female bears at 2,3,4 and 5 years of age.

2010-8-17

www.uic.edu.hk/~xlpeng

30

Combined growth curves for weight for seven grizzly bears

2010-8-17

www.uic.edu.hk/~xlpeng

31

2010-8-17

www.uic.edu.hk/~xlpeng

32

2010-8-17

www.uic.edu.hk/~xlpeng

33

Glyph
A Glyph is a circle of fixed radius with m rays of various lengths representing the values of the m variables. In Figure 2.3, we have n = 3 individuals and m = 5 variables. Individual 1 has x1, x2 and x4 at medium level, and x3 and x5 at high level, which individual 3 has x1, x3 and x5 at low level, and x2 and x4 at medium level.

Glyphs
2010-8-17 www.uic.edu.hk/~xlpeng 34

Polygon (Stars)
Steps to draw a polygon are: 1. draw a circle & place on it m equally-spaced points 2. connect these m points to the center & obtain m radii. Every radius is regarded as a coordinate axis corresponding to a variable & is scaled properly according to the values of the variables. 3. plot m values of an individual on the m axes respectively & connect them to form a polygon with m sides The number of sides of the polygon is equal to the number of variables.
2010-8-17 www.uic.edu.hk/~xlpeng 35

Polygon plot

Polygon plot - cobweb
2010-8-17 www.uic.edu.hk/~xlpeng 36

Chernoff Face
The face graph was proposed by Chernoff (1973). Every multidimensional point is visualized as a cartoon face drawn by a computer. A Chernoff face is composed of: the face’s outline, nose, mouth, eyes & eyebrows… eyebrows… Each of the features of a face can reflect the value of a variable. The face’s fatness & thinness, pleasure, anger, sorrow & happiness can make a deep impression.

2010-8-17

www.uic.edu.hk/~xlpeng

37

Face graph
2010-8-17 www.uic.edu.hk/~xlpeng 38

Example 1.12 Utility data as Chernoff faces

22 public utility companies were represented as Chernoff faces. We have the following correspondences.

2010-8-17

www.uic.edu.hk/~xlpeng

39

2010-8-17

www.uic.edu.hk/~xlpeng

40

Example 1.13 Using Chernoff faces to show changes over time

2010-8-17

www.uic.edu.hk/~xlpeng

41

Example : Percentage of Republican Votes in U.S. Presidential Election

state 1932 Missouri 35 Maryland 36 Kentucky 40 Louisiana 7 Mississippi 4 S.Carolina 2

1936 38 37 40 11 3 1

1940 1960 1964 1968 48 50 36 45 41 46 35 42 42 54 36 44 14 29 57 23 4 25 87 14 4 49 59 39

2010-8-17

www.uic.edu.hk/~xlpeng

42

Profiles, polygons, glyphs & face graphs

2010-8-17

www.uic.edu.hk/~xlpeng

43

Andrews’ Method
Andrews’ (1972) suggests that every point x = ( x1 , x2 , , xm )' in m-dimensional space corresponds to a trigonometric polynomial function as follows:
fx (t ) = x1 2 + x2 sin t + x3 cos t + x4 sin 2t + x5 cos 2t + π ≤ t ≤ π.

(2.6.1)

This is a function of t which is easily plotted in 2 dimensions. student 4 in Example2.1 has the scores So, its corresponding function is:
fx(t ) = 80 2 + 75 sin t + 85 cos t + 70 sin 2t + 60 cos 2t , π ≤ t ≤ π (2.6.2)
44

x = (80,75,85,70,60 )'

2010-8-17

www.uic.edu.hk/~xlpeng

In order that the variable with the larger absolute value does not stand out excessively, it is better to normalize the data before plotting, for example,
x'ki = xki xmin,i xmax,i xmin,i

,

(k = 1,2,, n;

i = 1,2 , m ),

(2.6.3)

where xmax,i = max {xki }, 1≤ k ≤ n

xmin,i =

min {x }. 1 ≤ k ≤ n ki

(2.6.4)

Since this method uses the Fourier basis, some people call it a Fourier series or trigonometric polynomial plot.
2010-8-17 www.uic.edu.hk/~xlpeng 45

Fourier series plot

2010-8-17

www.uic.edu.hk/~xlpeng

46

2.7 Constellation
Star & path Every individual is represented as a star in a semicircle with radius one. The graph is constructed by the following steps: 1. transform the data xki to θki, where

0 ≤ θki ≤ π

,

and the θki can be regarded as the angular values.

xki xmin, i θk = π , ( k = 1,2, , n; i = 1, 2, , m ), xmax, i xmin, i
where

(2.7.1)

xmax, i
2010-8-17

max {x }, = 1 ≤ k ≤ n ki

xmin, i

min {x }. = 1 ≤ k ≤ n ki (2.7.2)
47

www.uic.edu.hk/~xlpeng

2. select weights w1, w2, …, wm, which satisfy

wi ≥ 0

and

∑ wi = 1
i =1

m

(2.7.3)

3. draw a semicircle with radius one & draw a diameter as its bottom side 4. map every point onto this semicircle

m ξ = k ∑ i =1 wi cos θ ki m η k = ∑ i =1 wi sin θ ki ,
2010-8-17

( k = 1, 2, … , n ).

(2.7.4)
48

www.uic.edu.hk/~xlpeng

Sometimes, (2.4) is replaced by the following:
l ξ = ∑i =1 wi cosθ ki k (l ) l ηk (l ) = ∑i =1 wi sin θ ki , (l = 1,2,…, m; k = 1,2..., n).

(2.7.5)

For every individual, we now plot not only a star

(ξ k (m ), η k (m )) = (ξ k , η k ) but also m points (ξ k (l ), η k (l )), l = 1 , 2 , … , m .
we can connect them by a polygonal line called a path. This is called a constellation graph & was proposed by Wakimoto & Taguri (1978)
2010-8-17 www.uic.edu.hk/~xlpeng 49

Constellation graph

Constellation graph with paths

2010-8-17

www.uic.edu.hk/~xlpeng

50

Multivariate Analysis_Chap1_Aspects_图文.ppt
Multivariate Analysis_Chap1_Aspects_理学_高等教育_教育专区。Note From UIC Statistics course. STAT 4020 Multivariate Analysis Dr. Peng Xiaoling Office: E409 ...
Multivariate Analysis_chap12_Clustering_图文.ppt
Multivariate Analysis_chap12_Clustering_理学_高等教育_教育专区。Note From UIC...Aspects Of Multivariat... 暂无评价 48页 1下载券 Multivariate Analysis_.....
Multivariate Analysis_Chap10_canonical_图文.ppt
Multivariate Analysis_Chap10_canonical_理
Multivariate Analysis_Chap11_discriminant_图文.ppt
Multivariate Analysis_Chap11_discriminan
Multivariate Analysis_Chap8_PCA_图文.ppt
Multivariate Analysis_Chap8_PCA_理学_高等教育_教育专区。Note From UIC ...AspectsOfMultivariateA... 82页 5下载券 Multivariate Analysis_... 暂无评价...

VFD_Wang_2017_Chap1_图文.pdf
VFD_Wang_2017_Chap1_电子/电路_工程科技_专业资料。Visc
10_Multivariate analysis 1_图文.pdf
Multivariate analysis (1/2) Biostatistics Xinhai Li Multivariate analysis ? ...(1/2) latitude 33.14451 33.25447 33.13351 33.38947 33.38738 aspect ...
multivariate analysis-1.pdf
1.1 List the most important words in Our Soceity: 1 Aspects of Statistics and Multivariate Analysis 3 What is Statistics? 1 Aspects of Statistics and ...
session_1_regulatory_aspects_part_1_图文.pdf
session_1_regulatory_aspects_part_1_英语考试_外语学习_教育专区。PHARMACEUTICAL...which can only be interpreted by the use of multivariate data analysis and...
Multivariate Analysis_图文.ppt
Multivariate Analysis_理学_高等教育_教育专区。主万分分析讲座 Multivariate Analysis 914008 芳 OUTLINE(大) 1. 2. 3. 4. Introduction to random vectors ...
AspectsOfMultivariateAnalysis_图文.ppt
AspectsOfMultivariateAnalysis_理学_高等教育_教育专区。多变量统计分析:硕士博士 ...Enter data in the data grid Select Data1 under Project Analysis ?...
chapt3-1 Titrimetric analysis_图文.ppt
chapt3-1 Titrimetric analysis_医药卫生_专业资料。3 Chapter3 Titrimetric Analysis 3.1 General Principles 基本原理 3.2 Acid-base Titration 3.3 Complexometric...
Aspects of Multivariate Analysis_图文.ppt
Aspects of Multivariate Analysis_理学_高等教育_教育专区。Aspects of Multivariate Analysis What Is Multivariate Analysis? ? Statistical methodology to analy ...

Chap1-Introduc_图文.ppt
Chap1-Introduc_工学_高等教育_教育专区。Hey! You are about to begin what...3. Dimension analysis: check the correctness of physical equation. 4. ...
chap1_概述_图文.ppt
chap1_概述_计算机软件及应用_IT/计算机_专业资料。EDA技术信息科学与
Chap1 Overview(Probability and Statistics_图文.ppt
Chap1 Overview(Probability and Statistics_工学_...Multivariate : more than two variables, e.g. ...information about the following aspects of the ...
Chap1-intro_图文.ppt
Chap1-intro_IT/计算机_专业资料。计算机科学导论 Chapter 1 Introduction 1 ...or at least a two-year degree in the technical aspects of using computers...
c-chap1_图文.ppt
c-chap1_IT/计算机_专业资料。.. 主讲: 主讲:聂倩 课程目标: 课