9512.net

Example Parallel coordinate plot of Iris data

High Dimensional Data Visualisation: the Textile Plot
Natsuhiko KUMASAKA
PhD Student School of Fundamental Science and Technology Keio University

Ritei SHIBATA
Department of Mathematics

Building good models from data
Exploring data through visualisation
Finding outliers Clustering observations Investigating relationships between variables
Har d to

exp lo

re h

igh d

ime nsio n

al d ata

Parallel Coordinate Plots
(Inselberg 1985, Wegman 1990)

Visualising a set of points in high dimensional space Axes are placed in parallel (not right angle) Coordinates of each point are connected by segments

Example: Parallel coordinate plot of Iris data

Iris

Example: Parallel coordinate plot of Iris data

Iris

Sepal.Length

Example: Parallel coordinate plot of Iris data

Iris

Sepal.Width

Example: Parallel coordinate plot of Iris data

Iris
Petal.Length

Example: Parallel coordinate plot of Iris data

Iris
Petal.Width

One polygonal line indicates one observation

Difficult to understand any mechanism behind the data

The number of the intersections increases

Location and scale of each axis are independently chosen

All coordinate points fill up the range of the axis.

Choosing appropriate locations and scales and the order of the axes

Textile plot
(Kumasaka and Shibata, submitted) A parallel coordinate plot
Locations and scales are simultaneously chosen
All polygonal lines are aligned as horizontally as possible

Order of axes is carefully chosen
To provide a clear image of the data to the user

Any kind of data can be displayed
Numerical data Unordered categorical data Ordered categorical data Missing values

Named by analogy to a fabric
Warp and Weft

Go od

Tex tile!

Choice of locations and scales for numerical data
Data (p-dimensional n observations)

Data vector

Coordinate vector (for numerical data)

Choice of locations and scales for numerical data
Data (p-dimensional n observations)

Data vector

Coordinate vector (for numerical data)

Choice of locations and scales for numerical data
Data (p-dimensional n observations)

Data vector

Coordinate vector (for numerical data)

Choice of locations and scales for numerical data
Data (p-dimensional n observations)

Data vector

Coordinate vector (for numerical data)

Criterion
Coordinate vector

Location parameter vector Scale parameter vector Ideal coordinate vector

The sum of squared deviations is minimised

Solution of the ideal coordinate vector

Constraint

Solution of location parameter

Solution of scale parameter

Optimal choice of locations and scales

Order of axes
According to the squared distance
The further left axis is closer to the mean vector

Categorical data vector
To determine a coordinate of each level
Encoding the categorical data vector
Example Using a treatment contrast

by a set of contrasts

Coordinate vector

Choice of locations and scales for numerical and categorical data
Data Matrix
Encoded matrix for a categorical data vector with Original data vector for a numerical data vector levels

Coordinate vector Location parameter vector Scale parameter vector

Sum of squared deviations is minimised

Solution of location and scale

Categorical data on parallel coordinate plot

versicolor

virginica

setosa

Ordered categorical data
Using the specific contrast matrix

Example

Missing values
Indicator matrix reflecting missing information

Sum of squared deviations

Constraint

Design of display
Textile plot
Understanding various aspect of data Points displayed on a axis are carefully chosen Further classification of data types

Way of displaying points on a axis
Numerical data
Continuous data
Continuous line

Discrete data
Tick marks

Arrow head to show the orientation Possible minimum and maximum Ｎon-numerical data
Possible levels Ordered categorical data

Arrows
Logical

Coloured

All data
Multiplicity on the coordinate is represented by the area of the circle Missing value Label (with unit or numeral)

Textile plot of Iris data

TOPIX (Tokyo Stock Price Index) from Jan 1991 to Oct 2002

TOPIX (Tokyo Stock Price Index) from Jan 1991 to Oct 2002

Two significant features
Knot A point on a axis, where all polygonal lines are pass through Isolated data vector Parallel wefts Segments horizontally aligned between two axes Perfect linear relationship or mapping between two data vectors

Preparation
Assumption
No missing values and no ordered categorical data Normalisation

Matrix notations

Knot

Simplified condition for a knot to occur

Parallel wefts

TOPIX (Tokyo Stock Price Index) from Jan 1991 to Oct 2002

Textile plot
Visualisation for understanding data
Polygonal lines are aligned as horizontally as possible Any kind of data can be displayed Symbols for points displayed are carefully chosen Knot and Parallel wefts

Implemented on R DandDR (http://www.stat.math.keio.ac.jp/DandDIV/)
Add-on package for R Interface between DandD and R Receiving data and necessary information Creating a dad object on R
List object which consists of data and attributes Own plot method producing the textile plot

Further developments
Non-linear transformations Design enhancements
Using colour Line width and thickness

Dynamic or interactive display
Improving user interface Java Language

Reference
A. Inselberg, The plane with parallel coordinates, The Visual Computer 1 (1985) 69-91. E. Wegman, Hyperdimensional data analysis using parallel coordinates. Journal of The American Statistical Association 85 (1990) 664--675.

(iris.data,iris.target,test_size=0.4) from sklearn.metrics import ...matplotlib.pyplot as plt import random def create_coordinate(): x,y=[]...

data.frame(hc\$merge,hc\$height)[50:55] 3.绘制聚类图 聚类完成后可以使用 plot()绘制出聚类的树图。 plot(hc, hang = -1, labels=iris\$Species) 4....
R语言学习
data,eps=0.3,MinPts=4); #自组织映射 library(kohonen); data=as.matrix(iris[,-5]); somModel=som(data,grid=somgrid(15,10,”hexagonal”)); plot...