9512.net
甜梦文库
当前位置:首页 >> 理学 >>

Multivariate Analysis


STAT 4020 Multivariate Analysis
Dr. Peng Xiaoling Office: E409 xlpeng@uic.edu.hk Tel: 3620623 TA: Miss Zhou Haiying

2010-8-17

www.uic.edu.hk/~xlpeng

1

Text Book
Johnson, R.A. and Wichern, D.W. (2002), Applied Multivariate Statistical Analysis, The 6th Edition, Prentice-Hall International Editions

http://www.zker.cn/Book/1187999

Reference Books
1. Morrison, D.F. (1990), Multivariate Statistical Methods, The 3rd Edition, McGraw-Hill International Editions. 2. 方开泰 (1986), 实用多元统计分析. 华东师范大学出版 社。

2010-8-17

www.uic.edu.hk/~xlpeng

2

Assessment
1. 2. 3. 4.

Exam: 60% Mid Test: 15% Mini Project: 15% Continuous assessment: 10%

Software Support: R, SAS, MATLAB

2010-8-17

www.uic.edu.hk/~xlpeng

3

Main contents
Introduction (with some examples) Graphical Presentation Multivariate Normal Distribution Statistical Inference Principal Components (主成份分析) Factor Analysis (因子分析) Canonical Correlation Analysis (典型相关分析) Discriminant Analysis (判别分析) Cluster Analysis (聚类分析) Decision Tree (optional) (决策树)
2010-8-17 www.uic.edu.hk/~xlpeng 4

What is multivariate analysis?

Multivariate statistical analysis is concerned with data collected on several dimensions of the same individual. Such observations are common in the social, behavioral, financial, life and medical sciences:

2010-8-17

www.uic.edu.hk/~xlpeng

5

Example : 8 students’ scores for 5 subjects
x1 Maths 100 98 89 80 70 66 55 50 76 x2 Physics 97 88 90 75 68 75 70 61 78 x3 Chemistry 98 94 88 85 72 63 68 65 79 x4 Chinese 85 82 80 70 82 70 68 63 75 x5 English 90 80 87 60 73 65 75 70 75

student no. 1 2 3 4 5 6 7 8 mean

2010-8-17

www.uic.edu.hk/~xlpeng

6

Usually an m-dimensional data set is denoted as a matrix:



x 11 x 12 ... x 1 m x
21

x

22

... x ... x

2 m

... x n1 x

n 2

nm

,

where every row corresponds to the observation values from an individual & is denoted as: xk = (xk1, xk2, …, xkm)′ , (k = 1, 2, …n) xk is the exam scores of student k, every column in the matrix corresponds to the values of a variable (all the students’ scores for one exam). The notation xk1 represents the value of the kth individual (student) on the ith variable (exam).
2010-8-17 www.uic.edu.hk/~xlpeng 7

Examples
Example 1.1(Industral) Paper is manufactured in continuous sheets several feet wide. Because of the orientation of fibers within the paper, it has a different strength when measured in the direction produced by the machine than when measured across, or at right angles to, the machine direction. Variables are: X1: density (grams/cubic centimeter) X2: strength (pounds) in the machine direction X3: strength (pounds) in the cross direction

2010-8-17

www.uic.edu.hk/~xlpeng

8

2010-8-17

www.uic.edu.hk/~xlpeng

9

TABLE .2 PAPE 1 R-QU ALI TY M AS U E RE M NTS E

Stre ngth Spe cim n De ns ity M achine dir. Cros s dir. e 1 0.801 121.41 70.42 2 0.824 127.7 72.47 3 0.841 129.2 78.2 4 0.816 131.8 74.89 5 0.84 135.1 71.21 6 0.842 131.5 78.39 7 0.82 126.7 69.02 8 0.802 115.1 73.1 9 0.828 130.8 79.28 10 0.819 124.6 76.48 11 0.826 118.31 70.25 12 0.802 114.2 72.88 13 0.81 120.3 68.23 14 0.802 115.7 68.12 15 0.832 117.51 71.62 16 0.796 109.81 53.1 17 0.759 109.1 50.85 18 0.77 115.1 51.68 19 0.759 118.31 50.6 20 0.772 112.6 53.51 21 0.806 116.2 56.53 22 0.803 118 70.7 23 0.845 131 74.35 24 0.822 125.7 68.29 25 0.971 126.1 72.1 26 0.816 125.8 70.64 27 0.836 125.5 76.33 28 0.815 127.8 76.75 29 0.822 130.5 80.33 30 0.822 127.9 75.68 31 0.843 123.9 78.54 32 0.824 124.1 71.91 33 0.788 120.8 68.22 34 0.782 107.4 54.42 35 0.795 120.7 70.41 36 0.805 121.91 73.68 37 0.836 122.31 74.93 38 0.788 110.6 53.52 39 0.772 103.51 48.93 40 0.776 110.71 53.67 41 0.758 113.8 52.42
S OU RCE Data courtes y of S ONOCO Prod : ucts . I nc.

2010-8-17

www.uic.edu.hk/~xlpeng

10

Example 1.2 (Business)

The 10 largest U.S. industrial corporations in 1990 yield the following data, where x1: sales (millions of dollars) x2: profits (millions of dollars) x3: assets (millions of dollars)

2010-8-17

www.uic.edu.hk/~xlpeng

11

x1=sales x2 =profits x3=assets Company (millions of dollars) (millions of dollars) (millions of dollars) General Motors 126974 4224 173297 Ford 96933 3835 160893 Exxon 86656 3510 83219 IBM 63438 3758 77734 General Electric 55264 3939 128344 Mobil 50976 1809 39080 Philip Morris 39069 2946 38528 Chrysler 36156 359 51038 du Pont 35209 2480 34715 Texaco 32416 2413 25636 SOURCE: "Fortune 500," Fortune, 121, 9April 23, 1990), 346-367.
2010-8-17 www.uic.edu.hk/~xlpeng 12

Example 1.3 (Environment study) The following data are 42 measurements on airpollution variables recorded at 12:00 noon in the Los Angeles area on different days. x1: Wind x2: Solar rad. x3: CO x4: NO x5: NO2 x6: O3 x7: HO

2010-8-17

www.uic.edu.hk/~xlpeng

13

TABLE 1.3 AIR POLLUTUIB DATA Wind (x1 ) Solar rad. (x2 ) CO (x3 ) NO (x4 ) NO2 (x5 ) 8 98 7 2 12 7 107 4 3 9 7 103 4 3 5 10 88 5 2 8 6 91 4 2 8 8 90 5 2 12 9 84 7 4 12 5 72 6 4 21 7 82 5 1 11 8 64 5 2 13 6 71 5 4 10 6 91 4 2 12 7 72 7 4 18 10 70 4 2 11 10 72 4 1 8 9 77 4 1 9 8 76 4 1 7 8 71 5 3 16 9 67 4 2 13 9 69 3 3 9 10 62 5 3 14 9 88 4 2 7 8 80 4 2 13 5 30 3 3 5 6 83 5 1 10 8 84 3 2 7 6 78 4 2 11 8 79 2 1 7 6 62 4 3 9 10 37 3 1 7 8 71 4 1 10 7 52 4 1 12 5 48 6 5 8 6 75 4 1 10 10 35 4 1 6 8 85 4 1 9 5 86 3 1 6 5 86 7 2 13 7 79 7 4 9 7 79 5 2 8 6 68 6 2 11 8 40 4 3 6 SOURCE: Data courtesy of Professor G.C. Tiao

O3 (x6 ) 8 5 6 15 10 12 15 14 11 9 3 7 10 7 10 10 7 4 2 5 4 6 11 2 23 6 11 10 8 2 7 8 4 24 9 10 12 18 25 6 14 5

HO (x7 ) 2 3 3 4 3 4 5 4 3 4 3 3 3 3 3 3 3 4 3 3 4 3 4 3 4 3 3 3 3 3 3 4 3 3 2 2 2 2 3 2 3 2

2010-8-17

www.uic.edu.hk/~xlpeng

14

Example 1.4 (Sport statistics)

The data on page 44 (table 1.9) give national track records for men. x1: 100m (seconds) x2: 200m (seconds) x3: 400m (seconds) x4: 800m (minutes) x5: 1500m (minutes) x6: 3000m (minutes) x7: Marathon (minutes)
2010-8-17 www.uic.edu.hk/~xlpeng 15

Data Structure

2010-8-17

www.uic.edu.hk/~xlpeng

16

Sample covariance and sample correlation

2010-8-17

www.uic.edu.hk/~xlpeng

17

What can we do on multivariate data?
The objectives of scientific investigations, for which multivariate methods most naturally lend themselves, include the following;
1. 2. 3. 4. 5.

Data reduction or structural simplification: Projection idea, factor analysis, structural model etc. Sorting & grouping: grouping variables, observations,…. Investigation of the dependence among variables. Prediction Hypothesis construction & testing

2010-8-17

www.uic.edu.hk/~xlpeng

18

Techniques
Multivariate Linear Models Principal Component Analysis Factor Analysis Canonical Correlation Analysis Discriminant Analysis Clustering Analysis Decision tree

2010-8-17

www.uic.edu.hk/~xlpeng

19

Applications of Multivariate Techniques
Medicine & Health Sociology Business & Economics Education Biology Environmental Studies Geology Psychology Sports Data Mining

2010-8-17

www.uic.edu.hk/~xlpeng

20

Graphical Representation
Graphs are useful for discovering the underlying structure of a data set. - When the data set has only 2 characteristics (variables or factors), the graph can be represented on the plane, and we have a scatter or curve plot. - In case of 3 characteristics, we can also plot the graph on the plane using a 2-dimensional projection. - When the number of characteristics is more than 3, more sophisticated graphical representations of multi-dimensional data can be used.

2010-8-17

www.uic.edu.hk/~xlpeng

21

Graphs of multi-dimensional data
What I am going to talk

Multiple scatter plots Graphs of Growth Curves Stars Chernoff Faces
2010-8-17 www.uic.edu.hk/~xlpeng 22

Multiple Scatter Plots for paper strength measurements

Paper is manufactured in continuous sheets several feet wide. Because of the orientation of fibers within the paper, it has a different strength when measured in the direction produced by the machine than when measured across, or at right angles to, the machine direction. Variables are:

X1: density (grams/cubic centimeter) X2: strength (pounds) in the machine direction X3: strength (pounds) in the cross direction

2010-8-17

www.uic.edu.hk/~xlpeng

23

TABLE .2 PAPE 1 R-QU ALI TY M AS U E RE M NTS E

Stre ngth Spe cim n De ns ity M achine dir. Cros s dir. e 1 0.801 121.41 70.42 2 0.824 127.7 72.47 3 0.841 129.2 78.2 4 0.816 131.8 74.89 5 0.84 135.1 71.21 6 0.842 131.5 78.39 7 0.82 126.7 69.02 8 0.802 115.1 73.1 9 0.828 130.8 79.28 10 0.819 124.6 76.48 11 0.826 118.31 70.25 12 0.802 114.2 72.88 13 0.81 120.3 68.23 14 0.802 115.7 68.12 15 0.832 117.51 71.62 16 0.796 109.81 53.1 17 0.759 109.1 50.85 18 0.77 115.1 51.68 19 0.759 118.31 50.6 20 0.772 112.6 53.51 21 0.806 116.2 56.53 22 0.803 118 70.7 23 0.845 131 74.35 24 0.822 125.7 68.29 25 0.971 126.1 72.1 26 0.816 125.8 70.64 27 0.836 125.5 76.33 28 0.815 127.8 76.75 29 0.822 130.5 80.33 30 0.822 127.9 75.68 31 0.843 123.9 78.54 32 0.824 124.1 71.91 33 0.788 120.8 68.22 34 0.782 107.4 54.42 35 0.795 120.7 70.41 36 0.805 121.91 73.68 37 0.836 122.31 74.93 38 0.788 110.6 53.52 39 0.772 103.51 48.93 40 0.776 110.71 53.67 41 0.758 113.8 52.42
S OU RCE Data courtes y of S ONOCO Prod : ucts . I nc.

2010-8-17

www.uic.edu.hk/~xlpeng

24

Scatterplots & boxplots of paper-quality data

2010-8-17

www.uic.edu.hk/~xlpeng

25

3D scatter plot

2010-8-17

www.uic.edu.hk/~xlpeng

26

Looking for group structure in three dimensions
SVL: snout-vent length HLS: hind limb span

The gender for the lizard data in Table 1.3 are

fmffmfmfmfmfm mmmfmmmffmff
2010-8-17 www.uic.edu.hk/~xlpeng 27

Males are typically larger than females.

2010-8-17

www.uic.edu.hk/~xlpeng

28

Profile
The profile is a simple way to represent the m-dimensional points graphically. Here, every individual is represented by m vertical / horizontal lines or bars. Their heights correspond to the values of variables respectively, which are arranged along a horizontal / vertical baseline. Sometimes, the tops of the lines are connected by a polygonal line.

Profiles for one individual
2010-8-17 www.uic.edu.hk/~xlpeng 29

Graphs of Growth Curves
Example 1.10. Table 1.4 gives the weights (wt) in kilograms and lengths (lngth) in centimeters of seven female bears at 2,3,4 and 5 years of age.

2010-8-17

www.uic.edu.hk/~xlpeng

30

Combined growth curves for weight for seven grizzly bears

2010-8-17

www.uic.edu.hk/~xlpeng

31

2010-8-17

www.uic.edu.hk/~xlpeng

32

2010-8-17

www.uic.edu.hk/~xlpeng

33

Glyph
A Glyph is a circle of fixed radius with m rays of various lengths representing the values of the m variables. In Figure 2.3, we have n = 3 individuals and m = 5 variables. Individual 1 has x1, x2 and x4 at medium level, and x3 and x5 at high level, which individual 3 has x1, x3 and x5 at low level, and x2 and x4 at medium level.

Glyphs
2010-8-17 www.uic.edu.hk/~xlpeng 34

Polygon (Stars)
Steps to draw a polygon are: 1. draw a circle & place on it m equally-spaced points 2. connect these m points to the center & obtain m radii. Every radius is regarded as a coordinate axis corresponding to a variable & is scaled properly according to the values of the variables. 3. plot m values of an individual on the m axes respectively & connect them to form a polygon with m sides The number of sides of the polygon is equal to the number of variables.
2010-8-17 www.uic.edu.hk/~xlpeng 35

Polygon plot

Polygon plot - cobweb
2010-8-17 www.uic.edu.hk/~xlpeng 36

Chernoff Face
The face graph was proposed by Chernoff (1973). Every multidimensional point is visualized as a cartoon face drawn by a computer. A Chernoff face is composed of: the face’s outline, nose, mouth, eyes & eyebrows… eyebrows… Each of the features of a face can reflect the value of a variable. The face’s fatness & thinness, pleasure, anger, sorrow & happiness can make a deep impression.

2010-8-17

www.uic.edu.hk/~xlpeng

37

Face graph
2010-8-17 www.uic.edu.hk/~xlpeng 38

Example 1.12 Utility data as Chernoff faces

22 public utility companies were represented as Chernoff faces. We have the following correspondences.

2010-8-17

www.uic.edu.hk/~xlpeng

39

2010-8-17

www.uic.edu.hk/~xlpeng

40

Example 1.13 Using Chernoff faces to show changes over time

2010-8-17

www.uic.edu.hk/~xlpeng

41

Example : Percentage of Republican Votes in U.S. Presidential Election

state 1932 Missouri 35 Maryland 36 Kentucky 40 Louisiana 7 Mississippi 4 S.Carolina 2

1936 38 37 40 11 3 1

1940 1960 1964 1968 48 50 36 45 41 46 35 42 42 54 36 44 14 29 57 23 4 25 87 14 4 49 59 39

2010-8-17

www.uic.edu.hk/~xlpeng

42

Profiles, polygons, glyphs & face graphs

2010-8-17

www.uic.edu.hk/~xlpeng

43

Andrews’ Method
Andrews’ (1972) suggests that every point x = ( x1 , x2 , , xm )' in m-dimensional space corresponds to a trigonometric polynomial function as follows:
fx (t ) = x1 2 + x2 sin t + x3 cos t + x4 sin 2t + x5 cos 2t + π ≤ t ≤ π.

(2.6.1)

This is a function of t which is easily plotted in 2 dimensions. student 4 in Example2.1 has the scores So, its corresponding function is:
fx(t ) = 80 2 + 75 sin t + 85 cos t + 70 sin 2t + 60 cos 2t , π ≤ t ≤ π (2.6.2)
44

x = (80,75,85,70,60 )'

2010-8-17

www.uic.edu.hk/~xlpeng

In order that the variable with the larger absolute value does not stand out excessively, it is better to normalize the data before plotting, for example,
x'ki = xki xmin,i xmax,i xmin,i

,

(k = 1,2,, n;

i = 1,2 , m ),

(2.6.3)

where xmax,i = max {xki }, 1≤ k ≤ n

xmin,i =

min {x }. 1 ≤ k ≤ n ki

(2.6.4)

Since this method uses the Fourier basis, some people call it a Fourier series or trigonometric polynomial plot.
2010-8-17 www.uic.edu.hk/~xlpeng 45

Fourier series plot

2010-8-17

www.uic.edu.hk/~xlpeng

46

2.7 Constellation
Star & path Every individual is represented as a star in a semicircle with radius one. The graph is constructed by the following steps: 1. transform the data xki to θki, where

0 ≤ θki ≤ π

,

and the θki can be regarded as the angular values.

xki xmin, i θk = π , ( k = 1,2, , n; i = 1, 2, , m ), xmax, i xmin, i
where

(2.7.1)

xmax, i
2010-8-17

max {x }, = 1 ≤ k ≤ n ki

xmin, i

min {x }. = 1 ≤ k ≤ n ki (2.7.2)
47

www.uic.edu.hk/~xlpeng

2. select weights w1, w2, …, wm, which satisfy

wi ≥ 0

and

∑ wi = 1
i =1

m

(2.7.3)

3. draw a semicircle with radius one & draw a diameter as its bottom side 4. map every point onto this semicircle

m ξ = k ∑ i =1 wi cos θ ki m η k = ∑ i =1 wi sin θ ki ,
2010-8-17

( k = 1, 2, … , n ).

(2.7.4)
48

www.uic.edu.hk/~xlpeng

Sometimes, (2.4) is replaced by the following:
l ξ = ∑i =1 wi cosθ ki k (l ) l ηk (l ) = ∑i =1 wi sin θ ki , (l = 1,2,…, m; k = 1,2..., n).

(2.7.5)

For every individual, we now plot not only a star

(ξ k (m ), η k (m )) = (ξ k , η k ) but also m points (ξ k (l ), η k (l )), l = 1 , 2 , … , m .
we can connect them by a polygonal line called a path. This is called a constellation graph & was proposed by Wakimoto & Taguri (1978)
2010-8-17 www.uic.edu.hk/~xlpeng 49

Constellation graph

Constellation graph with paths

2010-8-17

www.uic.edu.hk/~xlpeng

50



更多相关文章:
Multivariate Analysis_图文.ppt
Multivariate Analysis - Multivariate Analysis 914008 芳 OUTLINE(大) 1. 2. 3. 4. Introductio...
...Taxonomy and Multivariate Analysis System,Versio....pdf
NTSYS-pcNumerical Taxonomy and Multivariate Analysis System,Version 2.0,User Guide_自然科学_专业资料。NTSYS聚类分析使用指南NTSYSpc Numerical Taxonomy and ...
Multivariate data analysis.pdf
(1981) Multivariate Analysis. Wiley. Cox, Trevor (2005) An Introduction to Multivariate Analysis. Arnold. Everitt, Brian (2005), An R and S-PLUS? ...
Multivariate Analysis_chap12_Clustering_图文.ppt
Multivariate Analysis_chap12_Clustering_理学_高等教育_教育专区。Note From UIC Statistics course. Chapter 12 Cluster Analysis What is Cluster Analysis? Finding...
Multivariate Analysis ch5.pdf
Multivariate Analysis ch5 - CHAPTER 5 IN
Applied Multivariate Statistical Analysis.pdf
Applied Multivariate Statistical Analysis - MATH2142/MATH1309: Lecture Note 1 1 MULTIVARIATE AN...
multivariate analysis-1.pdf
1.1 List the most important words in Our Soceity: 1 Aspects of Statistics and Multivariate Analysis 3 What is Statistics? 1 Aspects of Statistics and ...
Applied_Multivariate_Data_Analysis_ch7.pdf
Applied_Multivariate_Data_Analysis_ch7_教育学/心理学_人文社科_专业资料。因素分析,统计料整理源:宇著,多量分析 第七章 因素分析(Factor Analysis):...
Multivariate Analysis_Chap8_PCA_图文.ppt
Multivariate Analysis_Chap8_PCA_理学_高等教育_教育专区。Note From UIC Statistics course. Ch 8 Principal Components Analysis (pp 426-476) Main contents ...
Multivariate Analysis ch2_图文.pdf
Multivariate Analysis ch2 - CHAPTER 2 MA
Applied_Multivariate_Data_Analysis_ch8[1].pdf
Applied_Multivariate_Data_Analysis_ch8[1]_医学_高等教育_教育专区。yinyong 料整理源:宇著,多量分析 第八章 典型相分析(Canonical Analysis):典型...
Multivariate Analysis_Chap10_canonical_图文.ppt
Multivariate Analysis_Chap10_canonical_理学_高等教育_教育专区。Note From UIC Statistics course. Chapter 10 Canonical Correlation Analysis Introduction Canonical ...
Multivariate Analysis_Chap1_Aspects_图文.ppt
Multivariate Analysis_Chap1_Aspects_理学_高等教育_教育专区。Note From UIC Statistics course. STAT 4020 Multivariate Analysis Dr. Peng Xiaoling Office: E409 ...
multivariate analysis comparing microbial air content of an ....pdf
multivariate analysis comparing microbia
Multivariate Analysis_Chap9_Factor.ppt
Multivariate Time Series... 40页 1财富值 Applied_Multivariate_Dat... 11页 免费 Fisher and Multivariate ... 15页 免费 Multivariate Analysis-Cl... 5...
Multivariate Analysis and Monitoring of Sequencing Batch ....pdf
Multivariate Analysis and Monitoring of Sequencing Batch Reactor Using Multiway Independent Component Analysis ChangKyoo Yoo* and Peter A. Vanrolleghem BIOMATH...
安徽省灵璧县居民肝癌死亡率与距濉河距离的相关性研究_图文.pdf
.IntheBYMmodelmultivariateanalysis,addingthevolumeoffertilizerandpesticidesusedpercultivatedarea,GDPpercapitatodomultivariateanalysiswere,therelationbetweenmortalityrateof...
Multivariate Analysis-Discrete Variables (Correspondence ....pdf
Multivariate Analysis-Discrete Variables (Correspondence Models)_医药卫生_专业资料。医学百科全书(统计学) Multi ariate Analysis: Classi?cation and Discrimination ...
AspectsOfMultivariateAnalysis_图文.ppt
AspectsOfMultivariateAnalysis - 多变量统计分析:硕士博士 高等数据分析必修课程... AspectsOfMultivariateAnalysis_理学_高等教育_教育专区。多变量统计分析:硕士博士 高等...
CHPTER 1 多维数据与多元统计(多元统计分析课件-西安交通大学 ....doc
I conclude this brief overview of multivariate data analysis with a quotation from F.H.C.Marriott (The interpretation of multiple observation. London: ...
更多相关标签:

All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。
甜梦文库内容来自网络,如有侵犯请联系客服。zhit325@126.com|网站地图