当前位置:首页 >> >>

Gene expression analysis using fuzzy k-means clustering


Genome Informatics 14: 334–335 (2003)

Gene Expression Analysis Using Fuzzy K-Means Clustering
Chinatsu Arima

Taizo Hanai

Masahiro Okamoto
Okahon@brs.kyushu-u.ac.jp Laboratory for Bioinformatics, Graduate School of Systems Life Sciences, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, Fukuoka 812-8581, Japan

Keywords: gene expression analysis, Fuzzy k-means, clustering



The recent advances of array technologies have made it possible to monitor huge amount of genes expression data. Clustering, for example, hierarchical clustering, self-organizing maps (SOM), kmeans clustering, has become important analysis for such gene expression data. We have applied the Fuzzy adaptive resonance theory (Fuzzy ART) [5] to the gene clustering of DNA microarray data and the clustering result using this method was more suitable for biological knowledge than those of the ordinary method including hierarchical clustering, SOM, and k-means clustering. In this study, therefore, Fuzzy k-means [2, 3] clustering method was applied to this data, since this method also have fuzziness as Fuzzy ART. We veri?ed the clustering results using Fuzzy k-means clustering by comparing with those of hierarchical clustering, k-mean clustering, Fuzzy ART and SOM.


Fuzzy K-Means Clustering

The fuzzy k-means clustering (Fig. 1) is done with based on following equation (1).

J(K, m) =
k=1 i=1

(uki )m d2 (xi , ck )


K and N are the number aof clusterswhich genes to ‘fuzziness’ is parameter and relate in the data is to ‘fuzziness’ of resultdata sets, m is a parameter which relatethe degree of membership of ) is the distance from gene ing clusters, uki is the degree of parameters in this equationi are membership of gene x in . The cluster k, d2 (xi , ck ) is the distance from gene xcomponents of and the i to centroid . These unknown ck . The parameters in this equation are the cluster centroid c1 c2 by Lagrange method. vector ck and the components of the membership vectors shows the belonging ratio to a cluster uki . These unknown parameters canthe optimized by gene be representative Lashows grange method. Calculated uki shows the belonging ratio . In this study, a centroid was set to 2.0 and the number of clusters to a cluster k and centroid ck shows the representative gene was of a 5. For the number of the a parameter expression pro?leset to cluster k. In this study, clusters in the Figure 1: Fuzzy k-means clustering. m was set to 2.0 and the number of clusters was set to 5. For the number of the clusters in the other clustering method, we selected the same number as that of clusters using Fuzzy k-means clustering in order to compare the clustering results.

Gene Expression Analysis Using Fuzzy K-Means Clustering



Data Preprocessing

In this study, we used expression data from a study of Chu et al. [1]. Saccharomyces cerevisiae was synchronized by transferred them to sporulation medium (SPM) at t=0 to maximize the synchrony of sporulation. RNA was harvested at time t=0, 0.5, 2, 5, 7, 9 and 11.5 hours after transfer to SPM. Polyadenylated RNA was prepared by puri?cation with oligo(dT) cellulose column. Each gene’s mRNA expression level just before transfer to SPM was used as control. About 6100 genes of expression pro?les are included in this data [6]. Using them, we followed the same method [1] to extract the genes that showed signi?cant increase of mRNA levels during sporulation. Among them, we ?nally selected 45 genes, whose functions are biologically characterized by Kupiec et al. [4].


Results and Discussion

The result of Fuzzy k-means clustering is shown in Fig. 2. This ?gure shows the representative time course data in each cluster and these values come from the centroid. ‘Early’, ‘Middle’, ‘MidLate’ and ‘Late’ genes, which were characterized in Mitchell, were used as ‘index genes’. As the result, the cluster 1, 2 and 3 have only ‘Early’ genes, cluster 4 have only ‘Mid-late’ genes and cluster 5 have three ‘Late’ genes and two ‘Middle’ genes. In order to compare the result of clus- Figure 2: The result of Fuzzy k-means clustering. tering methods, we de?ned the correctness ratio for the clustering result based on index genes. The calculation for the correctness ratio was executed as follows. The majority of the index gene de?ned the character of the cluster. The correctness ratio was calculated by division by the number of minor genes in the total number of genes in the cluster. Table 1 shows the correctness ratios of ?ve clustering method. Table 1: The correctness ratios ?ve clustering algorithm. Fuzzy k-means 0.90 Fuzzy ART 0.90 Hierarchical clustering 0.81 k-means clustering 0.86 SOM 0.86

[1] Chu, S., Derisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., and Herskowitz, I., The transcriptional program of sporulation in budding yeast, Science, 282:699–705, 1998. [2] Dembele, D. and Kastner, P., Fuzzy C-means method for clustering microarray data, Bioinformatics, 19:973–980, 2003. [3] Gasch, A. and Eisen, M., Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering, Genome Biology, 3(11):research0059.1–research0059.22, 2002. [4] Kupiec, M., Ayers, B., Esposito, R.E., and Mitchell, A.P., The molecular and cellular biology of the yeast Saccaromyces, Cold Spring Harbor, 889–1036, 1997. [5] Tomida, S., Hanai, T., Honda, H., and Kobayashi, T., Gene expression analysis using Fuzzy ART, Genome Informatics, 12:245–246, 2001. [6] The data set is available at http://cmgm.stanford.edu/pbrown/sporulation/



All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。