9512.net

甜梦文库

甜梦文库

当前位置：首页 >> >> # Example Parallel coordinate plot of Iris data

High Dimensional Data Visualisation: the Textile Plot

Natsuhiko KUMASAKA

PhD Student School of Fundamental Science and Technology Keio University

Ritei SHIBATA

Department of Mathematics

Building good models from data

Exploring data through visualisation

Finding outliers Clustering observations Investigating relationships between variables

Har d to

exp lo

re h

igh d

ime nsio n

al d ata

Parallel Coordinate Plots

(Inselberg 1985, Wegman 1990)

Visualising a set of points in high dimensional space Axes are placed in parallel (not right angle) Coordinates of each point are connected by segments

Example: Parallel coordinate plot of Iris data

Iris

Example: Parallel coordinate plot of Iris data

Iris

Sepal.Length

Example: Parallel coordinate plot of Iris data

Iris

Sepal.Width

Example: Parallel coordinate plot of Iris data

Iris

Petal.Length

Example: Parallel coordinate plot of Iris data

Iris

Petal.Width

One polygonal line indicates one observation

Difficult to understand any mechanism behind the data

The number of the intersections increases

Location and scale of each axis are independently chosen

All coordinate points fill up the range of the axis.

Choosing appropriate locations and scales and the order of the axes

Textile plot

(Kumasaka and Shibata, submitted) A parallel coordinate plot

Locations and scales are simultaneously chosen

All polygonal lines are aligned as horizontally as possible

Order of axes is carefully chosen

To provide a clear image of the data to the user

Any kind of data can be displayed

Numerical data Unordered categorical data Ordered categorical data Missing values

Named by analogy to a fabric

Warp and Weft

Go od

Tex tile!

Choice of locations and scales for numerical data

Data (p-dimensional n observations)

Data vector

Coordinate vector (for numerical data)

Choice of locations and scales for numerical data

Data (p-dimensional n observations)

Data vector

Coordinate vector (for numerical data)

Choice of locations and scales for numerical data

Data (p-dimensional n observations)

Data vector

Coordinate vector (for numerical data)

Choice of locations and scales for numerical data

Data (p-dimensional n observations)

Data vector

Coordinate vector (for numerical data)

Criterion

Coordinate vector

Location parameter vector Scale parameter vector Ideal coordinate vector

The sum of squared deviations is minimised

Solution of the ideal coordinate vector

Constraint

Solution of location parameter

Solution of scale parameter

Optimal choice of locations and scales

Order of axes

According to the squared distance

The further left axis is closer to the mean vector

Categorical data vector

To determine a coordinate of each level

Encoding the categorical data vector

Example Using a treatment contrast

by a set of contrasts

Coordinate vector

Choice of locations and scales for numerical and categorical data

Data Matrix

Encoded matrix for a categorical data vector with Original data vector for a numerical data vector levels

Coordinate vector Location parameter vector Scale parameter vector

Sum of squared deviations is minimised

Solution of location and scale

Categorical data on parallel coordinate plot

versicolor

virginica

setosa

Ordered categorical data

Using the specific contrast matrix

Additional constraints

Example

Missing values

Indicator matrix reflecting missing information

Sum of squared deviations

Constraint

Design of display

Textile plot

Understanding various aspect of data Points displayed on a axis are carefully chosen Further classification of data types

Way of displaying points on a axis

Numerical data

Continuous data

Continuous line

Discrete data

Tick marks

Arrow head to show the orientation Possible minimum and maximum Ｎon-numerical data

Possible levels Ordered categorical data

Arrows

Logical

Coloured

All data

Multiplicity on the coordinate is represented by the area of the circle Missing value Label (with unit or numeral)

Textile plot of Iris data

TOPIX (Tokyo Stock Price Index) from Jan 1991 to Oct 2002

TOPIX (Tokyo Stock Price Index) from Jan 1991 to Oct 2002

Two significant features

Knot A point on a axis, where all polygonal lines are pass through Isolated data vector Parallel wefts Segments horizontally aligned between two axes Perfect linear relationship or mapping between two data vectors

Preparation

Assumption

No missing values and no ordered categorical data Normalisation

Matrix notations

Knot

Simplified condition for a knot to occur

Parallel wefts

TOPIX (Tokyo Stock Price Index) from Jan 1991 to Oct 2002

Textile plot

Visualisation for understanding data

Polygonal lines are aligned as horizontally as possible Any kind of data can be displayed Symbols for points displayed are carefully chosen Knot and Parallel wefts

Implemented on R DandDR (http://www.stat.math.keio.ac.jp/DandDIV/)

Add-on package for R Interface between DandD and R Receiving data and necessary information Creating a dad object on R

List object which consists of data and attributes Own plot method producing the textile plot

Further developments

Non-linear transformations Design enhancements

Using colour Line width and thickness

Dynamic or interactive display

Improving user interface Java Language

Thank you for your attention.

Reference

A. Inselberg, The plane with parallel coordinates, The Visual Computer 1 (1985) 69-91. E. Wegman, Hyperdimensional data analysis using parallel coordinates. Journal of The American Statistical Association 85 (1990) 664--675.

- Development of parallel algorithms in Data Field Haskell
- Tiled Parallel Coordinates for the Visualization of Time-Varying Multichannel EEG Data
- Algorithm Engineering of Parallel Algorithms and Parallel Data Structures
- Design, Implementation and Evaluation of ParaDict, a Data Parallel Library for Dictionaries
- Automatic optimization of parallel data
- Parallel coordinate descent methods for big data optimization
- Effect of Data Distribution in Parallel Mining of Associations 1999
- Abstract Parallel bulk-loading of spatial data
- Refinement of data parallel programs in Pei
- Efficient Data Parallel Implementations of Highly Irregular Problems
- Using Curves to Enhance Parallel Coordinate Visualisations
- Parallel Coordinate
- APPLIED PARALLEL COORDINATES FOR LOGS
- a parallel #-coordinate ocean model
- Java编程那点事

更多相关文章：
**
数据挖掘实训报告-
**

(*iris*.*data*,*iris*.target,test_size=0.4) from sklearn.metrics import ...mat*plot*lib.pyplot as plt import random def create_*coordinate*(): x,y=[]...**
聚类分析
**

*data*.frame(hc$merge,hc$height)[50:55] 3.绘制聚类图 聚类完成后可以使用 *plot*()绘制出聚类的树图。 *plot*(hc, hang = -1, labels=*iris*$Species) 4....**
R语言学习
**

*data*,eps=0.3,MinPts=4); #自组织映射 library(kohonen); *data*=as.matrix(*iris*[,-5]); somModel=som(*data*,grid=somgrid(15,10,”hexagonal”)); *plot*... 更多相关标签：

(