9512.net

甜梦文库

甜梦文库

当前位置：首页 >> >> # Discriminative Topological Features Reveal Biological Network Mechanisms

Discriminative Topological Features Reveal Biological Network Mechanisms

arXiv:q-bio/0402017v1 [q-bio.MN] 9 Feb 2004

Manuel Middendorf1, Etay Ziv2, Carter Adams3, Jen Hom4 , Robin Koytcheff4, Chaya Levovitz5, Gregory Woods3, Linda Chen6, Chris Wiggins7,8

1 Department 4 Fu

of Physics, 2 College of Physicians and Surgeons, 3 Columbia College,

Foundation School of Engineering and Applied Sciences, 5 Barnard College, of Mathematics, 7 Department of Applied Physics and Applied Mathematics,

8 Center

6 Department

for Computational Biology and Bioinformatics;

Columbia University, New York NY 10027

Abstract

Recent genomic and bioinformatic advances have motivated the development of numerous random network models purporting to describe graphs of biological, technological, and sociological origin. The success of a model has been evaluated by how well it reproduces a few key features of the real-world data, such as degree distributions, mean geodesic lengths, and clustering coef?cients. Often pairs of models can reproduce these features with indistinguishable ?delity despite being generated by vastly different mechanisms. In such cases, these few target features are insuf?cient to distinguish which of the different models best describes real world networks of interest; moreover, it is not clear a priori that any of the presently-existing algorithms for network generation offers a predictive description of the networks inspiring them. To derive discriminative classi?ers, we construct a mapping from the set of all graphs to a highdimensional (in principle in?nite-dimensional) “word space.” This map de?nes an input space for classi?cation schemes which allow us for the 1

?rst time to state unambiguously which models are most descriptive of the networks they purport to describe. Our training sets include networks generated from 17 models either drawn from the literature or introduced in this work, source code for which is freely available [1]. We anticipate that this new approach to network analysis will be of broad impact to a number of communities.

1 Introduction

The post-genomic revolution has ushered in an ensemble of novel crises and opportunities in rethinking molecular biology. The two principal directions in genomics, sequencing and transcriptome studies, have brought to light a number of new questions and forced the development of numerous computational and mathematical tools for their resolution. The sequencing of whole organisms, including homo sapiens, has shown that in fact there are roughly the same number of genes in men and in mice.

Moreover, much of the coding regions of the chromosomes (the subsequences which are directly translated into proteins) are highly homologous. The complexity comes then, not from a larger number of parts, or more complex parts, but rather through the complexity of their interactions and interconnections. Coincident with this biological revolution – the massive and unprecedented volume of biological data – has blossomed a technological revolution with the popularization and resulting exponential growth of the Internet. Researchers studying the topology of the Internet [2] and the World Wide Web [3] attempted to summarize their topologies via statistical quantities, primarily the distribution P (k) over nodes of given connectivity or degree k, which it was found, was completely unlike that of a “random” or Erdos-Renyi graph 1 . Instead, the distribution obeyed a power-law P (k) ? k ?γ for large k. This observation created a ?urry of activity among mathematicians at the turn of the millennium both in (i) measuring the degree distributions of innumerable technological, sociological, and biological graphs (which generically, it turned out, obeyed such power-law distributions) and (ii) proposing myriad models of randomly-generated graph topologies which mimicked these degree distributions (cf. [6] for a thorough review). The success of these latter efforts reveals a conundrum for mathematical modeling: a metric which is universal (rather than discriminative) cannot be used for choosing the model which best describes a network of interest. The question posed is one of classi?cation, meaning the construction of an algorithm, based on training data from multiple classes, which can place data of interest within one of the classes with small test loss.

It will be a question for historians of science to ponder why the Erdos-R? nyi model of networks was used e as the universal straw man, rather than the Price model [4, 5], inspired by a naturally-occurring graph (the citation graph), which gives a power-law degree distribution.

1

Figure 1: Ambiguity in network mechanisms: we plot the degree distribution of two graphs generated using radically different algorithms. The red line results from an algorithm of the Barabasi class [3]; the blue from the “static” model of Kim et al [8]. The distributions are indistinguishable, illustrating the insuf?ciency of degree distributions as a classifying metric.

In this paper, we present a natural mapping from a graph to an in?nite-dimensional vector space using simple operations on the adjacency matrix. We then test a number of different classi?cation (including density estimation) algorithms which prove to be effective in ?nding new metrics for classifying real world data sets. We selected 17 different models proposed in the literature to model various properties of naturally occurring networks. Among them are various biologically inspired graph-generating algorithms which were put forward to model genetic or protein interaction networks. To assess their value as models of their intended referent, we classify data sets for the E. coli genetic network, the C. elegans neural network and the yeast S. cerevisiae protein interaction network. We anticipate that this new approach will provide a general tool of analysis and classi?cation in a broad diversity of communities. 2

and nnz map words to integers. This allows us to plot any graph in a high-dimensional data space: the coordinates are the integers resulting [D(A)]ij = Aij δij (1) from these path-based functionals of the graph’s adjacency matrix. and its complement U = I ? D. (Note that we do not use Einstein’s summation convention. The coordinates of the in?nite-dimensional data Indices i and j are not summed over.) We despace are given by integer-valued functionals ?ne the primitive alphabet {A; T, U, D} as the adjacency matrix A and the operations T, U, D F (L1 L2 . . . Ln A) (2) with the transpose operation T (A) ≡ AT distingushing walks “up” the graph from walks where each Li is a letter of the operational ”down” the graph. From the letters of this al- alphabet and F is an operator from the set phabet we can construct words (a series of op- {sum, sumD, sumU, nnz, nnzD, nnzU}. We erations) of arbitrary length. A number of re- found it necessary only to evaluate words with dundancies and trivial cases can be eliminated n ≤ 4 (counting all walks up to length 5) to con(for example, the projection operations satisfy struct low test-loss classi?ers. Therefore, our DU = UD = 0) leading to the operational al- word space is a 6 4 5i = 4680-dimensional i=1 phabet {A, AT, AU, AD, AUT }. The resulting vector space, but since the words are not linearly word is a matrix representing a set of possible independent (e.g., sumU + sumD = sum), the walks generated by the original graph. An ex- dimensionality of the manifold explored is acample is shown in Figure 2. tually much smaller. However, we continue to use the full data space since a particular word, Each word determines two relevant statistics of though it may be expressed as a linear combinathe network: the number of distinct walks and tion of other words, may be a better discriminathe number of distinct pairs of endpoints. These tor than any of its summands. two statistics are determined by either summing the entries of the matrix (sum) or counting the In [9], we discuss several possible interpretanumber of nonzero elements (nnz) of the ma- tions of words, motivated by algorithms for trix, respectively. Thus the two operations sum ?nding subgraphs. Previously studied metrics 3

The input space used for classifying graphs was introduced in our earlier work [9] as a technique for ?nding statistically signi?cant features and subgraphs in naturally occurring biological and technological networks. Given the adjacency matrix A representing a graph (i.e., Aij = 1 iff there exists an edge from j to i), multiplications of the matrix count the number of walks from one node to another (i.e., [An ]ij is the number of unique walks from j to i in n steps). Note that the adjacency matrix of an undirected graph is symmetric. The topological structure of a network is characterized by the number of open and closed walks of given length. Those can be found by calculating the diagonal or nondiagonal components of the matrix, respectively. For this we de?ne the projection operation D such that

Figure 2: The elements of the matrix AT A count these two walks. T A corresponds to one step “up” the graph, the following A to one step “down”. The last node could be either the same as the starting node as in the ?rst subgraph (accounted for by the diagonal part DAT A) or a different node as in the second subgraph (accounted for by the non-diagonal part UAT A).

can sometimes be interpreted in the context of words. For example, the transitivity of a network can be de?ned as 3 times the number of 3cycles divided by the number of pairs of edges that are incident on a common vertex. For a loopless graph (without self-interactions), this can also be calculated as a simple expression in word space: sum(DAAA)/sum(UAA). Note that this expression of transitivity as the quotient of two words implies separation in two dimensions rather than in one. However, there are limitations to word space. For example, a similar measure, the clustering coef?cient, de?ned as the average over all vertices of the number of 3cycles containing the vertex divided by the number of paths of length two centered at that vertex, cannot be easily expressed in word space because vertices must be considered individually to compute this quantity. Of course, the utility of word space is not that it encompasses previously studied metrics, but that it can elucidate new metrics in an unbiased, systematic way, as illustrated below.

ming problem with Lagrangian 1 L(w, b) = |w|2 ? C 2

m

ξi

i=1

(3)

with yi (w · xi + b) ≥ 1 ? ξi ; i = 1, . . . , m where f (x) = w · x + b is the equation of the hyperplane, xi are training examples and yi ∈ {?1, +1} their class labels. Here, C is a ?xed parameter determining the trade-off between small errors ξi and a large margin 2/|w|. 1 We set C to a default value ( m m x2 )?1 . We i=1 i observe that training and test losses have a negligible dependence on C since most test losses are near or equal to zero even in low-dimensional projections of the data space.

2.2 Robustness

Our objective is to determine which of a set of proposed models most accurately describes a given real data set. After constructing a classi?er enjoying low test loss, we classify our given real data set to ?nd a ‘best’ model. However, the real network may lie outside of any of the sampled distributions of the proposed models in word space. In this case we interpret our classi?cation as a prediction of the least erroneous model.

2

Classi?cation Methods

2.1 SVMs

A standard classi?cation algorithm which has been used with great success in myriad ?elds is the support vector machine, or SVM [10]. This technique constructs a hyperplane in a high-dimensional feature space separating two classes from each other. Linear kernels are used for the analysis presented here; extensions to appropriate nonlinear kernels are possible.

We distinguish between the two cases by noting the following: Consider building a classi?er for apples and grapefruit which is then faced with an orange. The classi?er may then decide that, based on the feature size the orange is an apple. However, based on the feature taste the orange is classi?ed as a grapefruit. That is, if we train our classi?er on different subsets of words and always get the same prediction, the given real network must come closest to the preWe rely on a freely available C-implementation dicted class based on any given choice of feaof SVM-Light [11], which uses a working set tures we might look at. We therefore de?ne a selection method to solve the convex program- robust classi?er as one which consistently clas4

si?es a test datum in the same class, irrespec- Ni = card(Fi ). We then maximize tive of the subset of features chosen. And we Ni 5 1 measure robustness as the ratio of the number of Q(λ) = log p(xfi (j) , λ) (5) 5 i=1 j=1 consistent predictions over the total number of subspace-classi?cations. as a function of λ. In all cases we found that Q(λ) had a well pronounced maximum as long as the data was not oversampled. Because words can only take integer values, too many training 2.3 Generative Classi?ers examples can lead to the situation that the data take exactly the same values with or without the A generative model, in which one infers the dis- hold-out set. In this case, maximizing Q(λ) cortribution from which observations are drawn, al- responds to p(x, λ) having single peaks around lows a quantitative measure of model assign- the integer values, so that λ tends to zero. Therement: the probability of observing a given word- fore, we restrict the number of training examples value given the model. For a robust classi?er, in to 4Nv , where Nv is the number of unique intewhich assignment is not sensitively dependent ger values taken by the training set. With this reon the set of features chosen, the conditional striction Q(λ) showed a well-pronounced maxiprobabilities should consistently be greatest for mum at a non-zero λ for all words and models. one class. We perform density estimations with Gaussian kernels for each individual word, allowing calculation of p(C = c|Xj = x), the probability of being assigned to class c given a particular value x of word j. By comparing ratios of likelihood values among the different models, it is therefore possible, for the case of non-robust classi?ers, to determine which of the features of an orange come closest to an apple and which features come closest to a grapefruit.

2.4 Word Ranking and Decision Trees

The simplest scheme to ?nd new metrics which can distinguish among given models is to take a large number of training examples for a pair of network models and ?nd the optimal split between both classes for every word separately. We then test every one-dimensional classi?er on a hold-out set and rank words by lowest test loss. We compute the estimated density at a word Below we show that this simple approach is alvalue x0 from the training data xi (i = 1, . . . , m) ready very successful. as 1 p(x0 , λ) = m(2λ2 π)1/2

m

e

i=1

? 1 (|xi ?x0 |/λ)2 2

(4)

where we optimize the smoothing parameter λ by maximizing the probability of a hold-out set using 5-fold cross-validation. More precisely, we partition the training examples into 5-folds Fi = {xfi (j) }j=1...Ni , where fi is the set of indices associated with fold i (i = 1 . . . 5) and 5

Extending these results, one can ask how many words one needs to distinguish entire sets of different models, as estimated by building a multiclass decision tree and measuring its test loss for different numbers of terminal nodes. We use Matlab’s Statistical Toolbox with a binary multiclass cost function to decide the splitting at each node. To avoid over-?tting the data, we prune trained trees and select the subtree with minimal test loss by 10-fold cross-validation.

Additionally, we propose a different approach using decision trees to ?nd most discriminative words. For every possible model pair (i, j) for 1 ≤ i < j ≤ Nmod where Nmod is the total number of models, we build a binary decision tree, but restricted so that at every level of each tree the same word has to be used for all the trees. At every level the best word is chosen according to the smallest average training loss over all binary trees. The model is not meant to be a substitution to an ordinary multi-class decision tree. It merely represents another algorithm which may be useful to ?nd a ?xed number of most discriminative words, for example for visualization of the distributions in a three-dimensional subspace.

can be given reasonable bounds. Such a generated graph is accepted if the number of edges M falls into the speci?ed interval IM around M0 , thereby creating a distribution of graphs associated to each model which could describe the real data set with given N0 and M0 .

4 Results

We apply our methods to three different real data sets: the E. coli genetic network [25](directed), the S. cerevisiae protein interaction network [26](undirected), and the C. elegans neural network [27](directed).

Each node in E. coli’s genetic network represents an operon coding for a putative transcrip3 Network Models tional factor. An edge exists from operon i to operon j if operon i directly regulates j by bindWe sample training data for undirected graphs ing to its operator site. This gives a very sparse from six growth models, one scale-free static adjacency matrix with a total of 423 nodes and model [12][8][13], the Small World model [14], 519 edges. and the Erd¨ s-R? nyi model [15]. Among the o e six growth models two are based on preferen- The S. cerevisiae protein interaction network tial attachment [16][7], three on a duplication- has 2114 nodes and 2203 undirected edges. Its mutation mechanism [17][18], and one on sparseness is therefore comparable to E. coli’s purely random growth [19]. For directed genetic network. graphs we similarly train on two preferential attachment models [20], two static models The C. elegans data set represents the organ[21][22][8], three duplication-mutation models ism’s fully mapped neural network. Here, each [23][24], and the directed Erd¨ s-R? nyi model node is a neuron and each edge between two o e [15]. More detailed descriptions and source nodes represents a functional, directed connection between two neurons. The network consists code are available on our website [1]. of 306 neurons and 2359 edges, and is therefore In order to classify real data, we sample train- about 7 times more dense than the other two neting examples of the given models with a ?xed works. total number of nodes N0 , and allow a small interval IM of 1-2% around the total number of We create training data for undirected or diedges M0 of the considered real data set. All rected models according to the real data set. All additional model parameters are sampled uni- parameters other than the numbers of nodes and formly over a given range which is speci?d by edges were drawn from a uniform distribution the model’s creators in most cases, otherwise over their range. We sampled 1000 examples 6

per model for each real data set, trained a pairwise multi-class SVM on 4/5 of the sampled data and tested on the 1/5 hold-out set. We determine a prediction by counting votes for the different classes. Table 1 summarizes the main results.

Ltr Ltst Nsv Winner Robustness E. coli 1.6% 1.6% 109 Kumar 1.0 C. elegans 0.5% 0.5% 51 MZ .97 S. cerevisiae 2.1% 1.8% 106 Sole 0.64

idea, but allows two free parameters, a probability controlling the number of edges copied and a probability controlling the number of random edges created. MZ is essentially a directed version of Sole. Moreover, we observe that none of the preferential attachment models came close to being a predicted model for one of our biological networks even though they, and other preferential attachment models in the literature, were created to explain power-law degree distributions. The duplication-mutation scheme arises as the more successful one. Kumar and MZ were classi?ed with almost perfect robustness against 500-dimensional subspace sampling. With 26 different choices of subspaces, E. coli was always classi?ed as Kumar. We therefore assess with high con?dence that Kumar and MZ come closest to modeling E. coli and C. elegans, respectively. In the case of Sole and the S. cerevisiae protein network we observed ?uctuations in the assignment to the best model. 3 out of 22 times S. cerevisiae was classi?ed as Vazquez (duplication-mutation) , other times as Barabasi (preferential attachment), Klemm (duplicationmutation), Kim (scale-free static) or Flammini (duplication-mutation) depending on the subset of words chosen. This clearly indicates that different features support different models. Therefore the con?dence in classifying S. cerevisiae to be Sole is limited. The preference of individual words for individual models is investigated using kernel density estimation 2.3 by ?nding words which maximize pi (x0 )/pj (x0 ) for two different models (i and j) at a word value of the real data set x0 . Figure 4 shows the sampled distribution and estimated density for the word which extremely disfavors the winning model over its follower. The opposite case is shown in 3 for E. coli, where the word supports the winning model and disfavors its follower. More speci?cally we are able to verify that most of the words of E. coli 7

Table 1: Results of multi-class SVM. Ltr is the empirical training loss averaged over all pairwise classi?ers, Ltst is the averaged empirical test loss. Nsv is the average number of support vectors. The winner is the model that got the highest number of votes when classifying the given real data set. All three classi?ers show very low test loss and two of them a very high robustness. The average number of support vectors is relatively small. Indeed, some pairwise classi?ers had as few as three support vectors and more than half of them had zero test loss. All of this suggests the existence of a small subset of words which can distinguish among most of these models. The predicted models Kumar, MZ, and Sole are based on very similar mechanisms of duplication and mutation. The model by Kumar et al was originally meant to explain various properties of the WWW. It is based on a duplication mechanism, where at every time step a prototype for the newly introduced node is chosen at random, and connected to the prototype’s neighbors or other randomly chosen nodes with probability p. It is therefore built on an imperfect copying mechanism which can also be interpreted as duplication-mutation, often evoked when considering genetic and proteininteraction networks. Sole is based on the same

Figure 3: Kernel Density Estimation of nnz D(AUT AUT AUAUT A) for two models of E. coli (Kumar and KrapivskyBianconi). Log-Likelihoods: log(pkumar ) = ?4.22, log(pkrap?bianc ) = ?12.0. are most likely to be generated by Kumar. Indeed, out of 1897 words taking at least 2 integer values for all of the models (density estimation for a single value is not meaningful), the estimated density at the E. coli word value was highest for Kumar in 1297 cases, for KrapivskyBianconi in 535 cases and for Krapivsky in only 65 cases.

Figure 4: Kernel Density Estimation of nnz D(AUADAT AUA) for two top- scoring models of C. elegans (Middendorf-Ziv and Grindrod). Log-Likelihoods: log(pmz ) = ?376, log(pgrind ) = ?6.23.

graph. This subgraph also has to be absent in the E. coli network since it gives a zero word value, showing that the Kumar and the KrapivskyBianconi models have both a tendence to give rise to a topological structure that does not exist in E. coli. This analysis gives an example of how these ?ndings are useful in re?ning network models and in deepening our understanding of real networks. For further discussions refer to Figure 3 shows the distributions for the word our website [1] nnzDAUT AUT AUAUT A which had a maximum ratio of probability density of Kumar over The SVM results suggest that one may only the one of Krapivsky-Bianconi at the E. coli po- need a small subset of words to be able to sition. E. coli in fact has a zero word count separate most of the models with almost zero meaning that none of the associated subgraphs test loss. The simplest approach to ?nd such shown in Figure 5 actually occur in E. coli. Four a subset is to look at every word for a given of those subgraphs have a mutual edge which is pair of models and compute the best split, absent in the E. coli network and also impossi- then ranking words by lowest training loss. ble to generate in a Kumar graph. Krapivsky- We ?nd that among the most discriminative Bianconi graphs allow for mutual edges which words some occur very often such as nnzAA could be one of the reasons for a higher count or nnzAT A, which count the pairs of edges atin this word. Another source might be that tached to the same vertex and either pointing in the ?fth subgraph showing a higher order feed- the same direction or pointing away from each forward loop is more probable to be generated other, respectively. Other frequent words inin a Krapivsky-Bianconi graph than in a Kumar clude nnzDAA, nnzDAT A and sumUAT A. 8

Figure 5: Subgraphs associated with the word nnzDAUT AUT AUAUT A. The word has a non-zero value iff at least one of these subgraphs Figure 6: E. coli and seven directed models. The occurs in the network distributions in word space are shown for a projection onto the subspace of the three most disA striking feature of this single-word analysis criminatve words. Subgraphs associated with is that the test loss associated with simple one- every word are also shown. dimensional classi?ers are comparable to the SVM test loss con?rming that most pairs of arating various kinds of models. Furthermore, models are separable with only a few words. To they allow us to de?ne a high-dimensional input consider all of the models at once and not just in space for classi?cation algorithms which for the pairs we apply both tree algorithms described in ?rst time are able to decide which of a given set 2.4 to all three data sets. Figures 6 and 7 show of models most accurately describes three exemscatter-plots of the training data using the most plary biological networks. discriminative three words. Taking those three words the average training-loss over all pairs of models is 1.7%, 0.8% and 0.2% for the E. coli, C. elegans and S. cerevisiae training data, re- 6 Acknowledgments spectively. It is a pleasure to acknowledge useful conversations with C. Leslie, D. Watts, and P. Ginsparg. 5 Conclusions We also acknowledge the generous support of NSF VIGRE grant DMS-98-10750, NSF ECS03-32479, and the organizers of the LANL It is not surprising that models with different CNLS 2003 meeting and the COSIN midterm mechanisms are distinguishable; however, the meeting 2003. fact that these models have not been separated in a systematic manner to date points to the inadequacy of current metrics popular in the network theory community. We have shown that a sys- References tematic enumeration of countably in?nite features of graphs can be successfully used to ?nd [1] Supplementary technical information and all source code are available from new metrics which are highly ef?cient in sep9

[12] Kim, D.-H., Kahng, B., & Kim, D. The q-component static model: modeling social networks, (2003) arXiv.org:cond-mat/0307184. [13] Caldarelli, G., Capocci, A., De Los Rios, P. & Munoz, M. A. Scale-free networks from varying vertex intrinsic ?tness, (2002) Phys. Rev. Lett. 89 25, 258702. [14] Watts, D. & Strogatz, S. Collective dynamics of small-world networks, (1998) Nature 363, 202–204. [15] Erd¨ s, P., & R? nyi, A., On random graphs, (1959) Publicao e tiones Mathematicae 6 , 290–297. [16] Bianconi, G. & Barabasi, A. Competition and multiscaling in evolving networks, (2001) Europhys. Lett. 54 4, 436–442. [17] Vazquez, A., Flammini, A., Maritan, A., & Vespignani, A. Modeling of protein interaction networks, (2001) arXiv:cond-mat/0108043. [18] Sole, R. V., Pastor-Satorras, R., Smith, E., & Kepler, T. B. A model of large-scale proteome evolution, (2002) arXiv.org:cond-mat/0207311. [19] Callaway, D., Hopcroft, J. E., Kleinberg, J. M., Newman, M. E. & Strogatz, S. H., Are randomly grown graphs really random?, (2001) arXiv:cond-mat/0104546. [20] Krapivsky, P. L., Rodgers, G. J., & Redner, S. Degree distributions of growing networks, (2001) Phys. Rev. Lett. 86 23, 5401–5404. [21] Grindrod, P. Range-Dependent Random Graphs and their application to modeling large small-world proteome datasets, (2002) Phys. Rev. E 66 6, 066702. [22] Higham, D. J. Spectral Reordering of a Range-Dependent Weighted Random Graph, (2003) University of Strathclyde Mathematics Research Report 14, May 2003. [23] Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D. Stochastic models for the web graph, (2000) FOCS: IEEE Symposium on Foundations of Computer Science (FOCS) [24] Vazquez, A. Knowing a network by walking on it: emergence of scaling, (2002) arXiv:cond-mat/0006132. [25] Shen-Orr, S., Milo, R., Mangan, S., & Alon, U. Network motifs in the transcriptional regulation network of Escherichia coli, (2002) Nature Genetics, 31, 64-68. [26] Jeong, H., Mason, S., Barabasi, A., & Oltvai, Z. N. Lethality and centrality of protein networks, (2001) Nature 411, 41–42. [27] White, J. G., Southgate, E., Thompson, J. N. & Brenner, S. The structure of the nervous system of the nematode C. elegans, (1986) Phil. Trans. of the Royal Society of London 314, 1– 340. [28] Bellman, R., (1961) in Adaptive Control Processes: A Guided Tour, Princeton University Press. [29] Jebara, T. (2003) in Machine Learning: Discriminative and Generative, Kluwer Academic. [30] Klemm, K., & Eguiluz, V. M. Highly clustered scale-free networks, (2002) Phys. Rev. E 65 3, 036123.

Figure 7: S. cerevisiae and 7 undirected models. The distributions in word space are shown for a projection onto the subspace of the three most discriminative words. Subgraphs associated with every word are also shown.

http://www.columbia.edu/itc/applied/wiggins/netclass [2] Faloutsos, C., Faloutsos, M., & Faloutsos, P. On power-law relationships of the internet topology, (1999) Computer Communications Review 29, 251–262. [3] Albert, R., Jeong, H. & Barab? si, A.-L. Diameter of the worlda wide web, (1999) Nature 401, 130–131. [4] Price, D. J. de S. Networks of scienti?c papers, (1965) Science 149, 510–515. [5] Price, D. J.de S. A general theory of bibliometric and other cumulative advantage processes, (1976) J. Amer. Soc. Inform. Sci. 27, 292–306. [6] Newman, M. The Structure and Function of Complex Networks, (2003) arXiv:cond-mat/0303516. [7] Barab? si, A. Emergence of scaling in random networks, a (1999) Science 286 , 509–512. [8] Goh, K.-I., Kahng, B., & Kim, D. Universal behavior of load distribution in scale-free networks, (2001) Phys. Rev. Lett. 87 27, 278701. [9] Ziv, E., Koytcheff, R., & Wiggins, C. H. Novel systematic discovery of statistically signi?cant network features, (2003) arXiv:cond-mat/0306610. [10] Vapnik, V. (1995) in The Nature of Statistical Learning Theory, Springer-Verlag, NY, USA. [11] Joachims, T. (1999) in Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, eds. Schlkopf, B., Burges, C. & Smola, A. , MITPress.

10

[31] Klemm, K., & Eguiluz, V. M. Growing scale-free networks with small-world behavior, (2002) Phys Rev E 65 5, 057102.

11

A Supplementary tables

Name Bianconi Fundamental Mechanism Growth model with a probability of attaching to an existing node p ? ηi ki , where ηi is a ?tness parameter. Here we use a random ?tness landscape, where η is drawn from a uniform distribution in (0, 1) Growth model adding one node and several edges between randomly chosen existing nodes (not necessarily the newly introduced one) at every time step. A “static” model giving rise to a scale-free network. Edges are created between nodes chosen with a probability p ? i?α where i is the label of the node and α a constant parameter in (0, 1). Undirected random graph. Growing graph based on duplication modeling protein interactions. At every time step a prototype is chosen randomly. With probability q edges of the prototype are copied. With probability p an edge to the prototype is created. Growing graph using sets of active and inactive nodes to model citation networks. Interpolation between a regular lattice and a random graph. We replace edges in the regular lattice by random ones. Growing graph with a probability of attaching to an existing node p ? ki . (“Bianconi” with ηi = 1 for all i) Growing graph initialized with a 5-ring substrate. At every time step a new node is added and a prototype is chosen at random. The prototype’s edges are copied with a probability p. Furthermore, random nodes are connected to the newly introduced node with probability q/N , where p and q are given parameters in (0, 1) and N is the number of total nodes at the considered time step. References [16]

Callaway

[19]

Kim Erdos

[12],[8],[13] [15]

Flammini

[17]

Klemm Small World Barabasi

[30], [31] [14] [7]

Sole

[18]

Table 2: Undirected Network Models. ki is the degree of the i-th node.

12

Name

Fundamental Mechanism Directed version of “Kim”. A “static” model giving rise to a scale-free network. Edges are created between nodes chosen with probabilites Kim2 α p ? iα and q ? jout where αin and αout are in ?xed parameters chosen in (0, 1) and i(j) is the label of the i-th (j-th) node Directed random graph. Erdos Static graph. Edges are created between nodes i, j with probability p = bλ|i?j| , where b and λ Grindrod are ?xed parameters. Growing graph modeling the WWW. At every time step either a new edge, or a new node with an edge, are created. Nodes to connect Krapivksy are chosen with probability p ? ki,in + a and q ? kj,out + b based on preferential attachment with ?xed real-valued offsets a and b. Extension of “Krapivsky” using a random ?tness landscape multiplying the probabilities for Krapivskypreferential attachment. It is the directed analog Bianconi of “Bianconi” being an extension to “Barabasi”. Growing graph based on a copying mechanism to model the WWW. At every time step a prototype P is chosen at random. Then for every edge connected to P , with probability p an Kumar edge between the newly introduced node and P ’s neighbor is created, and with probability (1 ? p) an edge between the new node and a randomly chosen other node is created. Growing directed graph modeling biological network dynamics. A prototype is chosen at random and duplicated. The prototype or proMiddendorf- genitor node has edges pruned with probability Ziv (MZ) β and edges added with probability α ? β. Based loosely on the undirected protein network model of Sole et al. [18]. Growth model based on a recursive ‘copying’ mechanism, continuing to 2nd nearest neighVazquez bors, 3rd nearest neighbors etc. The authors call it a ‘random walk’ mechanism.

References

[8]

[15] [22],[21]

[20]

(original)

[23]

original

[24]

Table 3: Directed Network Models. ki,in (ki,out ) is the in-(out-)degree of the i-th node.

13

Kumar

votes 7/7

Kumar

Krapivsky-Bianconi f (x) = 1.48 Ltst = 5.3% Ltr = 4.4% Nsv =139

KrapivskyBianconi

6/7

f (x) = ?1.48 Ltst = 5.3% Ltr = 4.4% Nsv =139 f (x) = ?2.32 Ltst = 4.5% Ltr = 3.2% Nsv =122 f (x) = ?2.80 Ltst = 0.8% Ltr = 0.7% Nsv =194 f (x) = ?1.12 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = ?3.58 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = ?3.11 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = ?1.26 Ltst = 0.0% Ltr = 0.0% Nsv =9

Krapivsky f (x) = 2.32 Ltst = 4.5% Ltr = 3.2% Nsv =122 f (x) = 2.44 Ltst = 32.8% Ltr = 31.3% Nsv =1084

Kim f (x) = 2.80 Ltst = 0.8% Ltr = 0.7% Nsv =194 f (x) = 2.49 Ltst = 0.8% Ltr = 0.9% Nsv =178 f (x) = 2.56 Ltst = 0.8% Ltr = 1.6% Nsv =223

Vazquez f (x) = 1.12 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = 1.01 Ltst = 0.0% Ltr = 0.0% Nsv =14 f (x) = 0.95 Ltst = 0.0% Ltr = 0.0% Nsv =12 f (x) = 0.36 Ltst = 0.0% Ltr = 0.0% Nsv =47

Erdos f (x) = 3.58 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = 2.33 Ltst = 0.0% Ltr = 0.0% Nsv =13 f (x) = 2.67 Ltst = 0.0% Ltr = 0.0% Nsv =13 f (x) = 0.87 Ltst = 9.0% Ltr = 10.5% Nsv =498 f (x) = 0.60 Ltst = 0.0% Ltr = 0.0% Nsv =8

Grindrod f (x) = 3.11 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = 2.30 Ltst = 0.0% Ltr = 0.0% Nsv =11 f (x) = 2.69 Ltst = 0.0% Ltr = 0.0% Nsv =12 f (x) = 1.53 Ltst = 3.0% Ltr = 2.9% Nsv =180 f (x) = 1.25 Ltst = 0.0% Ltr = 0.0% Nsv =6 f (x) = 1.43 Ltst = 2.3% Ltr = 2.3% Nsv =130

MZ f (x) = 1.26 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = 1.64 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = 1.72 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = 1.06 Ltst = 0.0% Ltr = 0.1% Nsv =84 f (x) = 1.23 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = 1.36 Ltst = 0.0% Ltr = 0.0% Nsv =7 f (x) = 1.37 Ltst = 0.0% Ltr = 0.0% Nsv =12

Krapivsky

5/7

Kim

4/7

Vazquez

3/7

Erdos

2/7

Grindrod

1/7

MZ

0/7

f (x) = ?2.44 Ltst = 32.8% Ltr = 31.3% Nsv =1084 f (x) = ?2.49 Ltst = 0.8% Ltr = 0.9% Nsv =178 f (x) = ?1.01 Ltst = 0.0% Ltr = 0.0% Nsv =14 f (x) = ?2.33 Ltst = 0.0% Ltr = 0.0% Nsv =13 f (x) = ?2.30 Ltst = 0.0% Ltr = 0.0% Nsv =11 f (x) = ?1.64 Ltst = 0.0% Ltr = 0.0% Nsv =9

f (x) = ?2.56 Ltst = 0.8% Ltr = 1.6% Nsv =223 f (x) = ?0.95 Ltst = 0.0% Ltr = 0.0% Nsv =12 f (x) = ?2.67 Ltst = 0.0% Ltr = 0.0% Nsv =13 f (x) = ?2.69 Ltst = 0.0% Ltr = 0.0% Nsv =12 f (x) = ?1.72 Ltst = 0.0% Ltr = 0.0% Nsv =9

f (x) = ?0.36 Ltst = 0.0% Ltr = 0.0% Nsv =47 f (x) = ?0.87 Ltst = 9.0% Ltr = 10.5% Nsv =498 f (x) = ?1.53 Ltst = 3.0% Ltr = 2.9% Nsv =180 f (x) = ?1.06 Ltst = 0.0% Ltr = 0.1% Nsv =84

14

f (x) = ?0.60 Ltst = 0.0% Ltr = 0.0% Nsv =8 f (x) = ?1.25 Ltst = 0.0% Ltr = 0.0% Nsv =6 f (x) = ?1.23 Ltst = 0.0% Ltr = 0.0% Nsv =10

f (x) = ?1.43 Ltst = 2.3% Ltr = 2.3% Nsv =130 f (x) = ?1.36 Ltst = 0.0% Ltr = 0.0% Nsv =7

f (x) = ?1.37 Ltst = 0.0% Ltr = 0.0% Nsv =12

Table 4: SVM results for E. coli. f (x) = w · xE.coli + b, Ltst is the test loss, Ltr the training loss and Nsv the number of support vectors. Results are shown for SVMs trained between every pair of models. if f (x) > 0 E. coli is classi?ed as the row-header, if f (x) < 0 as the column-header.

Kumar Kumar

Krapivsky-Bianconi sum(ATA) Ltst = 0.3% Ltr = 0.1%

KrapivskyBianconi

Krapivksy sum(ATA) Ltst = 0.0% Ltr = 0.4% nnz(ADATA) Ltst = 27.8% Ltr = 26.9%

Kim nnz(ATA) Ltst = 0.0% Ltr = 0.0% nnz(ATA) Ltst = 0.0% Ltr = 0.0% nnz(ATA) Ltst = 0.0% Ltr = 0.0%

Vazquez nnz D(AATA) Ltst = 0.0% Ltr = 0.0% nnz D(ATA) Ltst = 0.0% Ltr = 0.0% nnz D(AATA) Ltst = 0.0% Ltr = 0.0% nnz(ATA) Ltst = 0.0% Ltr = 0.0%

Erdos nnz(ATA) Ltst = 0.0% Ltr = 0.0% nnz(ATA) Ltst = 0.0% Ltr = 0.0% nnz(ATA) Ltst = 0.0% Ltr = 0.0% sum U(AAUATA) Ltst = 7.8% Ltr = 10.1% nnz(ATA) Ltst = 0.0% Ltr = 0.0%

Grindrod nnz(ATA) Ltst = 0.0% Ltr = 0.0% nnz(ATA) Ltst = 0.0% Ltr = 0.0% nnz(ATA) Ltst = 0.0% Ltr = 0.0% sum U(AUTAA) Ltst = 5.0% Ltr = 5.6% nnz(ATA) Ltst = 0.0% Ltr = 0.0% sum D(AADATA) Ltst = 2.5% Ltr = 2.8%

MZ nnz(AA) Ltst = 0.0% Ltr = 0.0% sum D(AA) Ltst = 0.0% Ltr = 0.1% sum D(AA) Ltst = 0.0% Ltr = 0.0% sum D(AADAA) Ltst = 4.0% Ltr = 4.6% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0%

Krapivksy

Kim

Vazquez

15

Erdos

Grindrod

MZ

Table 5: Most discriminative words for the E. coli training data based on lowest test loss by 1-dimensional splitting for every pair of models. Ltst is the test loss and Ltr the training loss.

RANKING

WORD

Ltr

ASSOCIATED SUBGRAPHS

1

sum U(ATA)

5.8%

2

nnz D(AUTA)

2.4%

3

sum D(AADAA)

1.7%

4

nnzU(AUAUTAUTA)

1.4%

5

nnz D(AUTAUAUTAUA)

1.3%

6

nnz U(ADAUTA)

1.2%

7

sum D(AUTAUTAUTA)

1.2%

8

nnz U(AAUA)

1.1%

9

sum(AUTA)

1.1%

10

sumU(ADAUADAUTA)

1.0%

Table 6: Ranking of words found by binary pairwise trees for the E. coli training data. Ltr for a word ranked n is the average training loss over all pairwise trees, where every tree has depth n and 16 splits the data using words 1 to n in the given order.

MZ MZ

Grindrod sum(AA) Ltst = 0.0% Ltr = 0.0%

Grindrod

Kim nnz D(AAAAUA) Ltst = 3.8% Ltr = 4.3% sum(ATADATA) Ltst = 3.8% Ltr = 5.1%

Kim

Erdos sum(AA) Ltst = 0.0% Ltr = 0.0% sum D(AAUATA) Ltst = 1.5% Ltr = 1.3% sum D(ATAUATA) Ltst = 1.0% Ltr = 2.3%

Erdos

Kumar nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0%

Kumar

Krapivsky-Bianconi sum D(AATA) Ltst = 0.0% Ltr = 0.0% sum(AA) Ltst = 0.0% Ltr = 0.0% nnz D(ATA) Ltst = 0.0% Ltr = 0.0% sum(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AA) Ltst = 0.0% Ltr = 0.0%

Krapivsky-Bianconi

Vazquez nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0%

Vazquez

Krapivksy sum D(AATA) Ltst = 0.0% Ltr = 0.0% sum(AA) Ltst = 0.0% Ltr = 0.0% nnz D(ATA) Ltst = 0.0% Ltr = 0.0% sum(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 16.5% Ltr = 15.4% nnz(AA) Ltst = 0.0% Ltr = 0.0%

17

Krapivksy

Table 7: Most discriminative words for the C. elegans training data based on lowest test loss by 1-dimensional splitting for every pair of models. Ltst is the test loss and Ltr the training loss.

RANKING

WORD

Ltr

ASSOCIATED SUBGRAPHS

1

sumD(AAUAUAA)

3.6%

2

nnz D(AUAUA)

1.1%

3

sumU(AUTA)

0.8%

4

sum D(AUAUTAUAUTA) 0.6%

5

nnzU(AUA)

0.5%

6

sum D(AA)

0.5%

7

sumU(AUTADAUAUA)

0.4%

8

nnz U(ADAUTA)

0.4%

9

nnzD(AUADATAUA)

0.4%

Table 8: Ranking of words found by binary pairwise trees for the C. elegans training data. Ltr for a word ranked n is the average training loss over all pairwise trees, where every tree has depth n and splits the data using words 1 to n in the given order. 18

MZ

votes 7/7

MZ

Kim f (x) = 1.82 Ltst = 0.0% Ltr = 0.0% Nsv =48

Kim

6/7

Grindrod

5/7

Krapivsky-Bianconi

4/7

Krapivksy

2/7

Erdos

2/7

Kumar

2/7

Vazquez K5

0/7

f (x) = ?1.82 Ltst = 0.0% Ltr = 0.0% Nsv =48 f (x) = ?0.49 Ltst = 0.0% Ltr = 0.0% Nsv =11 f (x) = ?3.19 Ltst = 0.0% Ltr = 0.0% Nsv =48 f (x) = ?2.28 Ltst = 0.0% Ltr = 0.0% Nsv =37 f (x) = ?1.18 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = ?0.91 Ltst = 0.0% Ltr = 0.0% Nsv =8 f (x) = ?1.25 Ltst = 0.0% Ltr = 0.0% Nsv =5

Grindrod f (x) = 0.49 Ltst = 0.0% Ltr = 0.0% Nsv =11 f (x) = 0.99 Ltst = 2.5% Ltr = 2.8% Nsv =165

f (x) = ?0.99 Ltst = 2.5% Ltr = 2.8% Nsv =165 f (x) = ?0.63 Ltst = 0.0% Ltr = 0.0% Nsv =14 f (x) = ?0.06 Ltst = 0.0% Ltr = 0.0% Nsv =21 f (x) = ?16.99 Ltst = 3.5% Ltr = 4.9% Nsv =294 f (x) = ?1.25 Ltst = 0.0% Ltr = 0.0% Nsv =12 f (x) = ?1.25 Ltst = 0.0% Ltr = 0.0% Nsv =4

Krapivsky-Bianconi f (x) = 3.19 Ltst = 0.0% Ltr = 0.0% Nsv =48 f (x) = 0.63 Ltst = 0.0% Ltr = 0.0% Nsv =14 f (x) = 0.46 Ltst = 0.0% Ltr = 0.0% Nsv =8

f (x) = ?0.46 Ltst = 0.0% Ltr = 0.0% Nsv =8 f (x) = ?0.39 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = ?55.68 Ltst = 2.0% Ltr = 1.6% Nsv =110 f (x) = ?7.88 Ltst = 0.0% Ltr = 0.0% Nsv =6 f (x) = ?3.87 Ltst = 0.0% Ltr = 0.0% Nsv =3

Krapivksy f (x) = 2.28 Ltst = 0.0% Ltr = 0.0% Nsv =37 f (x) = 0.06 Ltst = 0.0% Ltr = 0.0% Nsv =21 f (x) = 0.39 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = 0.44 Ltst = 6.5% Ltr = 6.8% Nsv =572

f (x) = ?0.44 Ltst = 6.5% Ltr = 6.8% Nsv =572 f (x) = ?0.10 Ltst = 0.0% Ltr = 0.0% Nsv =4 f (x) = ?0.25 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = ?0.58 Ltst = 0.0% Ltr = 0.0% Nsv =8

Erdos f (x) = 1.18 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = 16.99 Ltst = 3.5% Ltr = 4.9% Nsv =294 f (x) = 55.68 Ltst = 2.0% Ltr = 1.6% Nsv =110 f (x) = 0.10 Ltst = 0.0% Ltr = 0.0% Nsv =4 f (x) = 0.23 Ltst = 0.0% Ltr = 0.0% Nsv =6

f (x) = ?0.23 Ltst = 0.0% Ltr = 0.0% Nsv =6 f (x) = 0.00 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = ?0.15 Ltst = 0.0% Ltr = 0.0% Nsv =8

Kumar f (x) = 0.91 Ltst = 0.0% Ltr = 0.0% Nsv =8 f (x) = 1.25 Ltst = 0.0% Ltr = 0.0% Nsv =12 f (x) = 7.88 Ltst = 0.0% Ltr = 0.0% Nsv =6 f (x) = 0.25 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = ?0.00 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = 5.99 Ltst = 0.0% Ltr = 0.0% Nsv =6

f (x) = ?5.99 Ltst = 0.0% Ltr = 0.0% Nsv =6 f (x) = ?1.17 Ltst = 0.0% Ltr = 0.0% Nsv =4

Vazquez K5 f (x) = 1.25 Ltst = 0.0% Ltr = 0.0% Nsv =5 f (x) = 1.25 Ltst = 0.0% Ltr = 0.0% Nsv =4 f (x) = 3.87 Ltst = 0.0% Ltr = 0.0% Nsv =3 f (x) = 0.58 Ltst = 0.0% Ltr = 0.0% Nsv =8 f (x) = 0.15 Ltst = 0.0% Ltr = 0.0% Nsv =8 f (x) = 1.17 Ltst = 0.0% Ltr = 0.0% Nsv =4 f (x) = 168.96 Ltst = 0.0% Ltr = 0.0% Nsv =4

19

f (x) = ?168.96 Ltst = 0.0% Ltr = 0.0% Nsv =4

Table 9: SVM results for C. elegans. f (x) = w · xC.elegans + b, Ltst is the test loss, Ltr the training loss and Nsv the number of support vectors. Results are shown for SVMs trained between every pair of models. if f (x) > 0 C. elegans is classi?ed as the row-header, if f (x) < 0 as the column-header.

RANKING

WORD

Ltr

ASSOCIATED SUBGRAPHS

1

sum U(ATADAA)

0.090%

2

nnz D(ATAUAAA) 0.030%

3

nnz D(AA)

0.019%

4

nnz(ADATAUAA)

0.016%

5

nnz(ATAUAUAA)

0.014%

6

sum D(AAUAA)

0.013%

7

nnz D(ATAUAA)

0.013%

8

sum D(AAAUAA)

0.012%

9

nnz D(AAUAA)

0.012%

10

sum(ADAAUAA)

0.012%

Table 10: Ranking of words found by binary pairwise trees for the S. cerevisiae training data. Ltr for a word ranked n is the average training loss over all pairwise trees, where every tree has depth n and splits the data using words 1 to n in the given order.

20

Sole

votes 12/10

Sole

Callaway f (x) = 8.57 Ltst = 0.0% Ltr = 0.0% Nsv =28

Callaway

11/10

Flammini

9/10

Vazquez

9/10

Kim

7/10

Grindrod sym

7/10

Barabasi

6/10

Erdos

4/10

Klemm

2/10

Small World

2/10

Bianconi

1/10

f (x) = ?8.57 Ltst = 0.0% Ltr = 0.0% Nsv =28 f (x) = ?4.67 Ltst = 0.0% Ltr = 0.0% Nsv =36 f (x) = ?3.67 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = ?19.25 Ltst = 1.2% Ltr = 1.2% Nsv =306 f (x) = ?10.41 Ltst = 0.0% Ltr = 0.0% Nsv =20 f (x) = ?1.75 Ltst = 0.0% Ltr = 0.0% Nsv =16 f (x) = ?13.12 Ltst = 0.0% Ltr = 0.0% Nsv =14 f (x) = ?4.73 Ltst = 0.0% Ltr = 0.0% Nsv =41 f (x) = ?8.56 Ltst = 0.0% Ltr = 0.0% Nsv =14 f (x) = ?1.77 Ltst = 0.0% Ltr = 0.0% Nsv =15

Flammini f (x) = 4.67 Ltst = 0.0% Ltr = 0.0% Nsv =36 f (x) = 0.27 Ltst = 0.0% Ltr = 0.0% Nsv =7

f (x) = ?0.27 Ltst = 0.0% Ltr = 0.0% Nsv =7 f (x) = ?0.44 Ltst = 0.0% Ltr = 0.0% Nsv =2 f (x) = ?0.37 Ltst = 0.0% Ltr = 0.0% Nsv =48 f (x) = ?0.76 Ltst = 0.0% Ltr = 0.0% Nsv =4 f (x) = ?0.86 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = ?0.96 Ltst = 0.0% Ltr = 0.0% Nsv =3 f (x) = ?0.57 Ltst = 0.0% Ltr = 0.0% Nsv =13 f (x) = ?0.76 Ltst = 0.0% Ltr = 0.0% Nsv =4 f (x) = ?0.95 Ltst = 0.0% Ltr = 0.0% Nsv =11

Vazquez f (x) = 3.67 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = 0.44 Ltst = 0.0% Ltr = 0.0% Nsv =2 f (x) = ?0.86 Ltst = 0.0% Ltr = 0.0% Nsv =29

f (x) = 0.86 Ltst = 0.0% Ltr = 0.0% Nsv =29 f (x) = ?0.32 Ltst = 0.0% Ltr = 0.1% Nsv =94 f (x) = ?7.52 Ltst = 6.0% Ltr = 3.8% Nsv =529 f (x) = ?0.17 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = ?0.52 Ltst = 0.0% Ltr = 0.0% Nsv =30 f (x) = ?2.38 Ltst = 0.8% Ltr = 0.8% Nsv =147 f (x) = ?4.25 Ltst = 7.2% Ltr = 7.6% Nsv =384 f (x) = ?0.33 Ltst = 0.0% Ltr = 0.0% Nsv =14

Kim f (x) = 19.25 Ltst = 1.2% Ltr = 1.2% Nsv =306 f (x) = 0.37 Ltst = 0.0% Ltr = 0.0% Nsv =48 f (x) = 0.32 Ltst = 0.0% Ltr = 0.1% Nsv =94 f (x) = 0.35 Ltst = 0.0% Ltr = 0.0% Nsv =20

f (x) = ?0.35 Ltst = 0.0% Ltr = 0.0% Nsv =20 f (x) = 0.12 Ltst = 0.0% Ltr = 0.0% Nsv =15 f (x) = ?0.17 Ltst = 0.0% Ltr = 0.0% Nsv =8 f (x) = ?0.95 Ltst = 0.0% Ltr = 0.0% Nsv =7 f (x) = ?3.39 Ltst = 0.0% Ltr = 0.0% Nsv =23 f (x) = ?0.47 Ltst = 0.0% Ltr = 0.0% Nsv =5 f (x) = ?0.23 Ltst = 0.0% Ltr = 0.0% Nsv =8

Grindrod sym f (x) = 10.41 Ltst = 0.0% Ltr = 0.0% Nsv =20 f (x) = 0.76 Ltst = 0.0% Ltr = 0.0% Nsv =4 f (x) = 7.52 Ltst = 6.0% Ltr = 3.8% Nsv =529 f (x) = ?0.12 Ltst = 0.0% Ltr = 0.0% Nsv =15 f (x) = ?1.29 Ltst = 1.5% Ltr = 1.4% Nsv =107

f (x) = 1.29 Ltst = 1.5% Ltr = 1.4% Nsv =107 f (x) = ?1.41 Ltst = 0.0% Ltr = 0.0% Nsv =26 f (x) = ?4.55 Ltst = 12.2% Ltr = 16.7% Nsv =603 f (x) = ?1.15 Ltst = 0.0% Ltr = 0.0% Nsv =55 f (x) = ?5.60 Ltst = 0.2% Ltr = 0.4% Nsv =309 f (x) = ?1.44 Ltst = 0.0% Ltr = 0.0% Nsv =24

Barabasi f (x) = 1.75 Ltst = 0.0% Ltr = 0.0% Nsv =16 f (x) = 0.86 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = 0.17 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = 0.17 Ltst = 0.0% Ltr = 0.0% Nsv =8 f (x) = 1.41 Ltst = 0.0% Ltr = 0.0% Nsv =26 f (x) = ?0.10 Ltst = 0.0% Ltr = 0.0% Nsv =7

f (x) = 0.10 Ltst = 0.0% Ltr = 0.0% Nsv =7 f (x) = ?3.10 Ltst = 1.2% Ltr = 0.9% Nsv =66 f (x) = ?2.75 Ltst = 0.0% Ltr = 0.0% Nsv =44 f (x) = 2.11 Ltst = 10.5% Ltr = 10.1% Nsv =297 f (x) = ?0.08 Ltst = 0.0% Ltr = 0.0% Nsv =11

Erdos f (x) = 13.12 Ltst = 0.0% Ltr = 0.0% Nsv =14 f (x) = 0.96 Ltst = 0.0% Ltr = 0.0% Nsv =3 f (x) = 0.52 Ltst = 0.0% Ltr = 0.0% Nsv =30 f (x) = 0.95 Ltst = 0.0% Ltr = 0.0% Nsv =7 f (x) = 4.55 Ltst = 12.2% Ltr = 16.7% Nsv =603 f (x) = 3.10 Ltst = 1.2% Ltr = 0.9% Nsv =66 f (x) = 0.17 Ltst = 0.0% Ltr = 0.0% Nsv =7

f (x) = ?0.17 Ltst = 0.0% Ltr = 0.0% Nsv =7 f (x) = 2.26 Ltst = 3.5% Ltr = 5.6% Nsv =281 f (x) = ?0.37 Ltst = 0.0% Ltr = 0.0% Nsv =7 f (x) = ?2.48 Ltst = 2.2% Ltr = 3.0% Nsv =111

Klemm f (x) = 4.73 Ltst = 0.0% Ltr = 0.0% Nsv =41 f (x) = 0.57 Ltst = 0.0% Ltr = 0.0% Nsv =13 f (x) = 2.38 Ltst = 0.8% Ltr = 0.8% Nsv =147 f (x) = 3.39 Ltst = 0.0% Ltr = 0.0% Nsv =23 f (x) = 1.15 Ltst = 0.0% Ltr = 0.0% Nsv =55 f (x) = 2.75 Ltst = 0.0% Ltr = 0.0% Nsv =44 f (x) = ?2.26 Ltst = 3.5% Ltr = 5.6% Nsv =281 f (x) = 1.35 Ltst = 0.0% Ltr = 0.0% Nsv =24

f (x) = ?1.35 Ltst = 0.0% Ltr = 0.0% Nsv =24 f (x) = ?11.16 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = ?0.07 Ltst = 0.0% Ltr = 0.0% Nsv =9

Small World f (x) = 8.56 Ltst = 0.0% Ltr = 0.0% Nsv =14 f (x) = 0.76 Ltst = 0.0% Ltr = 0.0% Nsv =4 f (x) = 4.25 Ltst = 7.2% Ltr = 7.6% Nsv =384 f (x) = 0.47 Ltst = 0.0% Ltr = 0.0% Nsv =5 f (x) = 5.60 Ltst = 0.2% Ltr = 0.4% Nsv =309 f (x) = ?2.11 Ltst = 10.5% Ltr = 10.1% Nsv =297 f (x) = 0.37 Ltst = 0.0% Ltr = 0.0% Nsv =7 f (x) = 11.16 Ltst = 0.0% Ltr = 0.0% Nsv =10 f (x) = ?4.53 Ltst = 0.0% Ltr = 0.0% Nsv =33

f (x) = 4.53 Ltst = 0.0% Ltr = 0.0% Nsv =33 f (x) = ?2.14 Ltst = 1.8% Ltr = 0.9% Nsv =106

Bianconi f (x) = 1.77 Ltst = 0.0% Ltr = 0.0% Nsv =15 f (x) = 0.95 Ltst = 0.0% Ltr = 0.0% Nsv =11 f (x) = 0.33 Ltst = 0.0% Ltr = 0.0% Nsv =14 f (x) = 0.23 Ltst = 0.0% Ltr = 0.0% Nsv =8 f (x) = 1.44 Ltst = 0.0% Ltr = 0.0% Nsv =24 f (x) = 0.08 Ltst = 0.0% Ltr = 0.0% Nsv =11 f (x) = 2.48 Ltst = 2.2% Ltr = 3.0% Nsv =111 f (x) = 0.07 Ltst = 0.0% Ltr = 0.0% Nsv =9 f (x) = 2.14 Ltst = 1.8% Ltr = 0.9% Nsv =106 f (x) = ?0.02 Ltst = 0.0% Ltr = 0.0% Nsv =9

21

f (x) = 0.02 Ltst = 0.0% Ltr = 0.0% Nsv =9

Table 11: SVM results for S. cerevisiae (only 11 models out of 13 shown). f (x) = w · xS.cerevisiae + b, Ltst is the test loss, Ltr the training loss and Nsv the number of support vectors. Results are shown for SVMs trained between every pair of models. if f (x) > 0 S. cerevisiae is classi?ed as the row-header, if f (x) < 0 as the column-header.

Sole Sole

Callaway nnz(AAAAA) Ltst = 0.3% Ltr = 0.5%

Callaway

Flammini nnz(AAAAA) Ltst = 3.8% Ltr = 2.5% nnz(AAAAA) Ltst = 0.0% Ltr = 0.0%

Flammini

Vazquez nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AAUAA) Ltst = 0.0% Ltr = 0.1%

Vazquez

Kim nnz(ADA) Ltst = 7.2% Ltr = 5.2% nnz(ADATAUAA) Ltst = 3.0% Ltr = 5.2% nnz D(AAA) Ltst = 13.8% Ltr = 11.1% nnz D(AA) Ltst = 0.0% Ltr = 0.0%

Kim

Grindrod sym nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz U(ATADAAA) Ltst = 14.0% Ltr = 13.4% nnz D(ATAUAA) Ltst = 0.0% Ltr = 0.0% nnz(AAAAA) Ltst = 0.8% Ltr = 0.6%

Grindrod sym

Barabasi nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AAUAA) Ltst = 0.0% Ltr = 0.2% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0%

Barabasi

Erdos nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% sum(ADAAA) Ltst = 0.0% Ltr = 0.1% nnz D(AA) Ltst = 0.0% Ltr = 0.0% sum U(ATADAA) Ltst = 8.0% Ltr = 9.2% nnz D(ATAUAAA) Ltst = 0.3% Ltr = 0.1% nnz(AA) Ltst = 0.0% Ltr = 0.0%

Erdos

Klemm nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AAUAA) Ltst = 0.5% Ltr = 0.2% sum D(AAA) Ltst = 0.0% Ltr = 0.0% nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz(ADATAUAA) Ltst = 0.0% Ltr = 0.0% nnz D(ATAUAA) Ltst = 2.0% Ltr = 2.7% nnz D(AA) Ltst = 0.0% Ltr = 0.0%

Klemm

Small World nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% sum D(AAUAA) Ltst = 8.5% Ltr = 8.9% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz D(ATAUAAA) Ltst = 18.5% Ltr = 19.2% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AA) Ltst = 0.0% Ltr = 0.0%

Small World

Bianconi nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz(AAA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz D(AA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz D(ATAUAA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0% nnz D(ATAUAA) Ltst = 0.0% Ltr = 0.0% nnz(AA) Ltst = 0.0% Ltr = 0.0%

22

Bianconi

Table 12: Most discriminative words for the S. cerevisiae training data based on lowest test loss by 1-dimensional splitting for every pair of models. Ltst is the test loss and Ltr the training loss.

赞助商链接

- Online Selection of Discriminative Tracking Features
- On-line selection of discriminative tracking features
- The pyramid match kernel Discriminative classification with sets of image features
- Topological Structure of US Flight Network Based on Complex Network Theory
- Topological analysis of network attack vulnerability
- Discriminative Reordering with Chinese Grammatical Relation Features
- Efficient discriminative learning of bayesian network classifier via boosted augmented naiv
- Erhan 2009 Visualizing higher layer features of a deep network
- rfc4689.Terminology for Benchmarking Network-layer Traffic Control Mechanisms
- Representing TCPIP connectivity for topological analysis of network security
- Algorithms for biological network alignment
- Formation and Translation of Network English
- wen-Crime-Network
- Support vector tracking
- On-line selection of discriminative tracking features

更多相关文章：
更多相关标签：