9512.net

甜梦文库

甜梦文库

当前位置：首页 >> >> # Abstract Empirically-Derived Analytic Models of

Empirically-Derived Analytic Models of Wide-Area TCP Connections: Extended Report

Vern Paxson Lawrence Berkeley Laboratory and EECS Division, University of California, Berkeley 1 Cyclotron Road Berkeley, CA 94720 vern@ee.lbl.gov LBL-34086 June 15, 1993

Abstract

We analyze 2.5 million TCP connections that occurred during 14 wide-area traf?c traces. The traces were gathered at ?ve “stub” networks and two internetwork gateways, providing a diverse look at wide-area traf?c. We derive analytic models describing the random variables associated with telnet, nntp, smtp, and ftp connections, and present a methodology for comparing the effectiveness of the analytic models with empirical models such as tcplib [DJ91]. Overall we ?nd that the analytic models provide good descriptions, generally modeling the various distributions as well as empirical models and in some cases better.

1 Introduction

Though wide-area networks have been in use since the early 1970’s, until recently we have known virtually nothing about the characteristics of the individual connections of different protocols. In the last few years a number of papers have appeared giving statistical summaries of traf?c on a per-protocol basis [C` ceres89, Heimlich90, CW91, EHS92, WLC92], an a important ?rst step. The next step in understanding wide-area traf?c is to form models for simulating and predicting traf?c. One such model, tcplib [DJ91, DJCME92], is now available. tcplib is an empirical model of wide-area traf?c: it models the distribution of the random variables (e.g., bytes transferred, duration) associated with different protocols by using the distributions actually measured for those protocols at an Internet site. Ideally we would like to have analytic traf?c models: simple mathematical descriptions rather than empirical distributions. Such models are easier both to convey and to analyze. The key question is whether analytic models can describe the

diverse phenomena found in wide-area traf?c as well as empirical models. Our previous work [Paxson91] offered such models but suffered in part from ?awed statistical methodology. In this paper we analyze 14 wide-area traf?c traces gathered at seven different sites, ?ve “stub” networks and two internetwork gateways. We derive analytic models describing the random variables associated with telnet, nntp, smtp, and ftp connections, and present a methodology for comparing the effectiveness of the analytic models with tcplib and with another empirical model constructed from one of the datasets. Table 1 summarizes our main results. Overall we ?nd that the analytic models provide good descriptions, generally modeling the various distributions as well as the empirical models and in some cases better. We develop each of these ?ndings in the remainder of the paper. In the next section we give an overview of the 14 traf?c traces. We describe the gross characteristics of the traces including their traf?c mix, and discuss how we ?ltered the traf?c to remove anomalous connections. The following section presents our statistical methodology. We discuss how we transformed the data and dealt with outliers; our unsuccessful attempts to ?nd “statistically valid” models; the metric we devised for comparing the ?t of two different models to a dataset; and our methodology for modeling connection interarrivals, which is more complex than modeling the other random variables associated with a connection. We then present one section each on modeling telnet, nntp, smtp, and ftp. These sections can be read independently if the reader is more interested in one protocol than another, except that the ?rst section describes how to read the plots used in all four sections to compare models. By reading the text accompanying Figures 4, 5, 13, and 30, the reader can if desired skip the remainder of the telnet section.

1

Random variables associated with wide-area network connections can be described as well by analytic models as by empirical models. When using either type of model, caution must be exercised due to frequent discrepancies in the upper 1% tails. While in general the analytic models do not match the observed distributions identically in a statistical sense, often a random subsample of hundreds of data points does result in a statistically valid ?t, indicating that the analytic models are often close though not exact. Bulk-transfer traf?c (ftpdata, smtp, nntp, and telnet response) is best modeled using lognormal distributions. Bulk-transfer traf?c is not strongly bidirectional; the responses to bulk transfers show little variation relative to the variation in the size of the transfer. Network traf?c varies signi?cantly, both over time and more so from site-to-site, not only in traf?c mix but in connection characteristics. Scaling usually helps signi?cantly in modeling the bytes transferred by nntp, smtp, rlogin, and individual ftpdata connections, but is usually not necessary for adequate ?ts to telnet connections and full ftp conversations. Except for nntp, connection interarrivals are well modeled using nonhomogeneous Poisson processes with ?xed hourly rates. Table 1: Major Findings In the last section we summarize the different analytic models and discuss ?ndings in addition to those listed in Table 1. We also include appendices summarizing how we ?ltered the data prior to analysis, and exploring the effectiveness of modeling rlogin traf?c using the telnet models. packet capture tool [JLM89] running the Berkeley Packet Filter [MJ93]. The Sun 3/50 had kernel modi?cations to gain a clock resolution of 10 msec. These are the traces discussed in [Paxson91]. When we took the last four traces the monitor workstation had been upgraded to a Sun SLC with a consequent improvement of clock resolution to 1 microsecond. We used a tcpdump ?lter to capture only those TCP packets with SYN, FIN, or RST ?ags in their headers, greatly reducing the volume and rate of data (but at the cost of no analysis of intra-connection dynamics). From SYN and FIN packets one can derive the connection’s TCP protocol, connection duration, number of bytes transferred in each direction (excluding TCP/IP overhead), participating hosts, and starting time. In principle we could derive the same information using RST packets instead of FIN packets, but we found that often the sequence numbers associated with RST packets were erroneous. Since we could not derive reliable byte counts from RST-terminated connections we excluded them from subsequent analysis. With this packet capture scheme there are two mechanisms by which packets can be lost. The ?rst is that, if a packet arrives at the Ethernet controller and the controller has run out of kernel memory to buffer the packet, it drops the packet and sets a bit indicating that this event occurred. The Ethernet driver subsequently reads the bit and increments a corresponding counter. It is possible that more than one packet will be dropped before the driver is able to read the bit, so the actual number of dropped packets is unknown but at least as large as the driver’s counter. The second packet-drop mechanism occurs when the kernel determines that the packet ?lter accepts a packet, but has no 2

2 Overview of Network Traf?c Traces

To develop and then evaluate our models we acquired a number of traces of wide-area traf?c. Our main data were from six month-long traces of all wide-area TCP connections between the Lawrence Berkeley Laboratory (LBL) and the rest of the world. With the help of colleagues we also were able to study traces from Bellcore, the University of California at Berkeley, the University of Southern California, Digital’s Western Research Laboratory, the United Kingdom–United States academic network link, and traf?c between the coNCert1 network and the rest of the world. We discuss the general characteristics of each of these datasets in turn and then provide summaries of their TCP traf?c.

2.1 The LBL Traces

All off-site communication at LBL funnels through a group of gateways that reside on a network separate from the rest of the Laboratory. The ?rst two datasets were taken using a Sun 3/50 residing on the gateway network, using the tcpdump

1 Communications for North Carolina Education, Technology.

Research and

more buffer space for saving the packet (due to the user-level program failing to consume previously accepted packets). In this case the kernel drops the packet and increments a counter. Values reported by this counter thus correspond to exactly the number of acceptable packets (in our case, SYN/FIN/RST packets) dropped.

Dataset LBL-1 LBL-2 LBL-3 LBL-4 LBL-5 LBL-6 Packets (days) 124M (36) ? 207M (47) 210M (36) 337M (35) 447M (31) Start 01Nov90 28Feb91 07Nov91 19Mar92 24Sep92 24Feb93 End 01Dec90 30Mar91 07Dec91 18Apr92 23Oct92 26Mar93 Drops 0 0 0 ? 9 24 6 233 8 1808 3 0

? ? ? ? ? ?

of the UK–US traf?c as similar to a large stub site since it comprises only a few hosts).

Site Bellcore (BC) UCB (UCB) USC (USC) DEC (DEC-1) DEC (DEC-2) DEC (DEC-3) coNCert (NC) UK-US (UK) Starting Time Tue 14:37 10Oct89 Tue 10:30 31Oct89 Tue 14:24 22Jan91 Tue 16:46 26Nov91 Wed 17:55 27Nov91 Mon 15:02 02Dec91 Wed 09:04 04Dec91 Wed 05:00 21Aug91 Duration 13 days 24 hours 26 hours 24 hours 24 hours 24 hours 24 hours 17 hours Drops 0 0 ? ? ? ? 0

?

0 6%

¤ ?

Table 3: Summary of Additional Datasets The additional datasets are summarized in Table 3. Next to the site name we give in parentheses the abbreviation we will use to identify the dataset. The drop rates for the ?rst three datasets correspond to those listed in [DJCME92]; for the last dataset, to that listed in [WLC92]; and the drop rates for the remaining datasets were unavailable. The USC dataset’s drop rate is marked because we found our copy of the trace plagued throughout by “blackouts” of missing packets, occurring almost exactly a minute apart and each blackout lasting roughly ten seconds.3 Because of these blackouts, we exclude the USC dataset from our interarrival models.

Table 2: Summary of LBL Datasets Table 2 summarizes the LBL datasets. The second column gives the total number of network packets received by the kernel for each dataset, along with the number of days spanned by the entire trace. (The statistics missing for the LBL-2 dataset are due to abnormal termination of the tracing program.) Each dataset was then trimmed to span exactly 30 days, beginning at midnight on a Thursday and ending at midnight on a Saturday (i.e., just after 11:59PM Friday night), except for LBL-6, which begins on a Wednesday and ends on a Friday midnight. The “Drops” column gives the drop count reported by the Ethernet driver followed by the drop count reported by tcpdump; this last value represents dropped SYN/FIN/RST packets.2 Finally, since the LBL datasets span 2.5 years at roughly regular intervals, they provide an opportunity to study how a site’s wide-area traf?c evolves over time. Such a study is reported in [Paxson93].

2.3

Filtering of non-WAN traf?c

2.2 The Additional Traces

As mentioned above, a number of colleagues generously provided access to traf?c traces from other sites. The authors of [DJCME92] provided their traces of traf?c from Bellcore, U.C. Berkeley, and U.S.C.; Jeffrey Mogul provided traces from DEC-WRL; Wayne Sung provided traces of traf?c to/from the coNCert network in North Carolina; and the authors of [WLC92] provided their traces of the UK–US academic network. The ?rst four traces all originate from “stub” sites, while the latter two represent inter-network traf?c (though the authors of [WLC92] characterize the UK side

2 In the LBL-4 dataset we observed the heated exchange of nearly 400,000 RST packets sent between a lone remote host and three LBL hosts, separated by a geometric mean of 1.3 msec. In LBL-5 we observed the exchange of nearly 120,000 RST packets between a single pair of hosts, virtually all occurring during a 98 second period, separated by a geometric mean of 400 sec. LBL-6 did not include any RST bursts. The LBL-5 RST bursts correspond to enough traf?c to consume 500 kbit/sec. We suspect that the RST bursts are the cause of the relatively large number of dropped SYN/FIN/RST packets in LBL-4 and LBL-5.

? ?

Before proceeding with our analysis we ?ltered out non-widearea traf?c from the datasets: internal and transit traf?c. The details are given in Appendix A. In addition, we removed from the LBL datasets all traf?c between LBL and U.C. Berkeley4 . While traf?c with the University forms a signi?cant fraction of LBL’s off-site traf?c (20-40% of all connections), it is atypical wide-area traf?c due to the close administrative ties and the short, high-speed link between the institutions.

2.4

Traf?c Overview

We now turn to characterizing the different datasets in order to gauge their large-scale similarities and differences. Of previous traf?c studies, only [FJ70], the related [JS69], and [DJCME92] compare traf?c from more than one institution. The ?rst two papers found signi?cant differences between their four traf?c sites, which they attribute to the fact that the different sites engaged in different applications and had different hardware. The authors of [DJCME92] found that their three sites (which correspond to the USC and UCB datasets in

3 These blackouts do not correspond to network outages; sequence numbers of TCP connections spanning outages show jumps. 4 Including nntp, unlike [Paxson93], which keeps the nntp traf?c.

3

Dataset LBL-1 LBL-2 LBL-3 LBL-4 LBL-4* LBL-5 LBL-6 BC UCB USC DEC-1 DEC-2 DEC-3 NC UK

# Conn 146,209 170,718 229,835 449,357 312,429 370,397 528,784 17,225 37,624 13,097 72,821 49,050 73,440 62,819 25,669

nntp 40 34 20 16 23 14 11 2 18 35 33 38 26 1 0.02

smtp 26 30 33 21 30 34 40 49 45 27 35 22 43 42 42

ftpdata 16 16 17 15 21 22 23 30 18 14 11 8 9 30 39

ftpctrl 3 3 3 3 4 5 6 4 2 2 1 1 1 4 7

telnet 4 4 4 2 3 4 3 4 2 3 0.08 0.04 0.07 5 4

rlogin 1 1 1 1 1 1 0.8 2 0.9 1 0.05 0.06 0.07 0.3 0.4

?nger 4 5 4 32 3 6 5 5 12 11 0.1 0.2 0.2 5 0.9

domain 4 4 11 5 8 8 5 0.1 0.1 2 20 29 19 0.8 1

X11 0.2 0.2 0.4 0.4 0.5 0.9 0.7 0.1 0.02 0.09 0 0 0 0.03 0.02

shell 0.5 0.2 0.3 0.2 0.3 0.2 0.4 0.5 0.2 0.3 0.001 0.02 0.003 0.3 0.02

other 0.5 0.7 5 4 5 5 4 2 0.8 3 0.8 1 1 5 4

Table 4: Percentage Connection Mixes for All Datasets

Dataset LBL-1 LBL-2 LBL-3 LBL-4 LBL-5 LBL-6 BC UCB USC DEC-1 DEC-2 DEC-3 NC UK MB 2,852 3,785 6,710 11,398 19,269 22,076 346 318 362 981 819 1,379 1,553 625 nntp 19 14 7 21 17 22 4 23 62 43 54 52 9 0.5 smtp 5 6 4 4 3 5 8 16 3 17 14 16 8 11 ftpdata 65 67 67 52 57 57 78 50 18 38 30 30 68 80 ftpctrl 0.2 0.2 0.1 0.1 0.1 0.2 0.3 0.3 0.1 0.2 0.1 0.1 0.3 0.4 telnet 6 5 4 4 3 2 4 4 2 0.1 0.0 0.1 5 3 rlogin 0.8 1 1 0.9 0.7 0.7 2 3 0.9 0.2 0.2 0.2 0.3 0.5 ?nger 0.1 0.1 0.1 0.0 0.1 0.1 0.2 0.9 0.3 0.0 0.1 0.1 0.1 0.0 domain 1 0.9 0.7 0.6 0.4 0.5 0.1 0.0 0.3 0.7 0.6 0.6 0.3 0.3 X11 3 1 3 6 11 8 0.1 0.2 5 0.0 0.0 0.0 0.1 0. 1 shell 1 3 11 10 8 3 2 0.6 7 0.0 0.0 0.0 0.3 0.5 other 0.1 2 1 1 1 0.8 2 1 2 1 2 1 8 4

Table 5: Percentage Byte Mixes for All Datasets this paper, as well as part of the BC dataset) had quite different mixes of traf?c, but that the characteristics of any particular protocol’s traf?c were very similar (though they did not quantify the degree of similarity). Table 4 shows the “connection mix” for each of the datasets. The second column gives the total number of connections recorded, and the remaining columns the percentage of the total due to particular TCP protocols. The mixes for BC, UCB, and USC differ from those given in [DJCME92] because the latter reports conversation mixes, where multiple related connections have been combined into single conversations. (The authors also used twenty-minute silences to delimit the end of connections, instead of FIN packets.) From the Table it is immediately clear that traf?c mixes for all protocols vary substantially, both from site-to-site and over time (for LBL). There are also a number of anomalies which merit comment: The huge spike in the LBL-4 ?nger connections, the large jump in other connections at LBL-3, and the increasing proportion of ftpctrl traf?c (i.e., the control side of an ftp conversation), are all due to the use of background scripts to automate periodic network access. Reference [Paxson93] explores this phenomenon further. LBL-4* shows the LBL-4 connection mix with the periodic ?nger connections removed, as they signi?cantly skew the mix pro?le.

?

The large variance of LBL’s nntp mix is due to changes in LBL’s nntp peer servers and differences in the rate at which new news arrives. Again, see [Paxson93] for a

?

4

discussion. DEC has a “?rewall” in place which prohibits traf?c other than nntp, smtp, and ftp, and domain. The little remaining traf?c due to other protocols originated on the outside of the ?rewall.

? ? ?

The DEC-2 dataset includes part of the Thanksgiving holiday, accounting for the depressed number of connections. As mentioned in [WLC92], the United Kingdom receives its network news from Holland, hence the very low proportion of nntp connections.

Table 5 shows the total number of data megabytes transferred (in either direction) for each of the datasets, along with the “byte mix”—the percentage of the total bytes due to each protocol. The LBL datasets show striking growth over time, which we explore further in [Paxson93]. The LBL datasets naturally total more bytes than the others because they span 30-day periods, as opposed to about 1 day for all the other datasets except BC (see Table 3). We see immediately that, much as with the connection mix, the byte mix also varies considerably both from site-to-site and over time. Some sites (the ?rst three LBL datasets, BC, NC, and UK) are wholly dominated by ftp traf?c, while others (the last three LBL datasets, UCB, and the DEC datasets) show more of a balance between nntp and ftp traf?c; and USC is dominated by nntp traf?c. For some sites (UCB, DEC), smtp traf?c contributes a signi?cant volume, and for others (LBL, USC), traf?c due to X11 and shell far outweighs the almost negligible proportion of connections due to those protocols (see Table 4). We now turn to the development of the statistical methodology that we will use to characterize the individual connections that make up the data shown in Tables 4 and 5.

Our initial goal was to develop “statistically valid” analytic models of the characteristics of wide-area network use. By statistically valid we mean models whose distributions for random variables could not be distinguished in a statistical sense from the actual observed distributions of the variables. In this attempt we failed. Most of the models we present do not re?ect the underlying data in a statitistically valid sense; that is, we cannot say that our analytic distributions do indeed precisely give the distributions of the random variables they purport to model. We discuss our failure in Section 3.8 below, and then in Section 3.9 develop a “metric” for determining which of two statistically invalid models better ?ts a given dataset. But ?rst we discuss the value of statistically valid analytic models and our methodology for developing them, as these issues remain fundamental to putting our results in perspective.

3.1

Analytic vs. Empirical Models

3 Statistical Methodology

As noted in [Pawlita89], one weakness of many traf?c studies to date has been in their use of statistics. Often the studies report only ?rst or perhaps second moments, and distributions are summarized by eye. Frequently they omit discussion of dealing with outliers, and rarely do they report goodness-of-?t methodologies and results. The few cases where goodness-of-?t issues have been discussed are somewhat unsatisfying (the authors of [FJ70] developed their own, apparently never-published goodness-of-?t measure; and in our own previous work [Paxson91] we used the KolmogorovSmirnov goodness-of-?t test as a goodness-of-?t metric, an inferior choice). We endeavor in this work to address these statistical shortcomings and to present a general statistical methodology that might serve future work as well.

For our purposes we de?ne an analytic model of a random variable as a mathematical description of that variable’s distribution. Ideally the model has few bound parameters (making it easy to understand) and no free parameters (making it predictive), in which case it fully predicts the distribution of similar random variables derived from datasets other than the ones used to developed the model. But typically the model might include free offset and scale parameters, in which case it predicts the general shape of future distributions but not the exact form. If those parameters are known for a future dataset, then the model becomes fully predictive for that dataset. In contrast, an empirical model such as tcplib describes a random variable’s distribution based on the observed distribution of an earlier sample of the variable. The empirical model includes a great number of bound parameters, one per bin used to characterize the variable’s distribution function; it may be predictive but not easy to understand. There are a number of advantages of an analytic model compared to an empirical model for the same random variable: analytic models are often mathematically tractable, lending themselves to greater understanding;

?

analytic models are very concise and thus easily communicated;

?

with an analytic model, different datasets can be easily compared by comparing their ?tted values for the model’s free parameters.

?

A key question, though, is whether an analytic model fully captures the essence of the quantity measured by a random variable. An empirical model perfectly models the dataset from which it was derived; the same cannot be said of an analytic model. If the analytic model strays too far from reality, then, while the above advantages remain true, the model no

5

longer applies to the underlying phenomena of primary interest, and becomes useless (or misleading, if one does not recognize that the model is inaccurate). The key question then is how to tell that an analytic model accurately re?ects reality as represented by a dataset of samples. One approach is to require that the random variable distributions predicted by the model and those actually observed be indiscernable in a statistical sense. To test for such agreement we turn to goodness-of-?t techniques.

3.4

Dealing with Outliers

3.2 Goodness-of-?t Tests

The random variables we model (amount of data transferred, connection duration, interarrival times, and ratios of these quantities) all come from distributions with essentially unbounded maxima. Furthermore, these distributions are either continuous or, in the case of data transferred, continuous in the non-negative integers. As such the values of the variables do not naturally fall into a ?nite number of categories, which makes using the well-known chi-squared test less than ideal because it requires somewhat arbitrary choices regarding binning [Knuth81, DS86]. The goodness-of-?t test commonly used with continuous data is the Kolmogorov-Smirnov test. The authors of [DS86], however, recommend the Anderson-Darling ( 2 ) test [AD54] instead. They state that 2 is often much more powerful than either Kolmogorov-Smirnov or chi-squared, and that 2 is particularly good for detecting deviations in the tails of a distribution, often the most important to detect. We followed their recommendation and, in attempting to develop statistically valid models, always used 2 in assessing goodness-of?t.

When applying a logarithmic transformation to non-negative data, one immediately runs into the problem of what to do with data equal to zero. Fortunately for us, in our data such values are rare (and con?ned to values representing number of data bytes transferred), so we decided to eliminate any connections in which the number of bytes transferred in either direction was zero. We report in Appendix B the number of connections thus eliminated for each dataset; in the worst case they comprised 0.5% of the total connections. An alternative approach would have been to bias our logarithms, by using log2 1 rather than log2 ; we rejected this approach as being error-prone when converting to and from the logarithmic models. Some of our datasets also exhibited values so anomalously large that we removed their associated connections from our study. These outliers were much rarer than those discussed above. Often the values were clearly due to protocol errors (for example, connections in which the sequence numbers indicated 232 1 bytes transferred). We discuss these outliers also in Appendix B. Finally, we restricted our analysis to datasets with at least 100 connections of interest, to prevent small, anomalous datasets from skewing our results.

3.5

Censored Data

3.3 Logarithmic Transformations

When analyzing data drawn from distributions unbounded in one direction and bounded in the other, often it helps to re-express the data by applying a logarithmic transformation [MT77]. We found that for many of our models logarithmic transformations were required to discern patterns in the large range of values in the data. For convenience we developed and tested our models using a log2 transformation. Note that, when converting from logarithmic models back to untransformed models, arithmetic means of transformed values become geometric means of the untransformed values, and standard deviations become factors instead of additive values. For example, a log-normal model with ? 4 0 and 2 5 speci?es that any observation within a factor of 5.66 (22 5 ) of 16 (24 0 ) lies within one standard deviation of the geometric mean. Thus, 2.83 ( 16 5 66) and 90.56 ( 16 5 66) are the boundaries of values lying within one standard deviation of the geometric mean, which is 16.

Some of our models describe only a portion of the distribution of a random variable (such as the upper 80% of the distribution). Reference [DS86] discuss modi?ed goodness-of-?t tests (including 2 ) to use with such censored distributions, in which a known fraction of either tail has been removed from the measurements prior to applying the test. In addition, they describe a method (due to Gupta [Gupta52]) for estimating the mean and variance of such a censored distribution, which can be used to derive estimated parameters of a model from censored data.

3.6

Deriving Model Parameters from Datasets

Often a model has free parameters that must be estimated from a given dataset before testing the model for validity in describing that dataset. For example, a log-normal model may require that the geometric mean and standard deviation be estimated from the dataset. The authors of [DS86] make the important point that estimating free parameters from datasets alters the signi?cance levels corresponding to statistics such as 2 computed from the ?tted model. They then provide both methods to estimate free parameters from datasets, and the required modi?cations for interpreting the signi?cance of the resulting 2 (and other) statistics. We followed their approach.

6

?

?

?

??

?

?

?

?

¤ ¨ ? ¤ § ? ? ¤ ? ? ? ¤ ??

?

?

?

3.7 Model Development vs. Testing

To know if a model is truly predictive, we must test it on data other than that used to develop the model. To this end, we developed all of our models using the ?rst half of the LBL-1 through LBL-4 datasets. We refer to these below as the “test datasets”. We then tested the models against the second half of these LBL datasets along with the entirety of the remaining datasets (including all of LBL-5 and LBL-6). Below we compare our analytic models with two empirical models: one derived from the UCB dataset, which is essentially the same as the tcplib model, and one derived from all of LBL-2. Thus, in keeping with our goal of testing models only on data other than that used to develop them, we do not report results for ?ts to these datasets. An exception is for our interarrival models, which in general we do not compare to the empirical models (see Section 3.11 below).

0.0

0.2

0.4

0.6

0.8

1.0

5

10

15

20

3.8 Failure to Find Statistically Valid Models

Using the methodology described above, we attempted to develop models for a number of random variables for TCP connections of various protocols. While we often could ?nd fairly simple analytic models that appeared to the eye to closely match the distributions of the random variables for a given dataset, these models rarely proved valid at a signi?cance level of 5%, or even 1%, when tested against other datasets.5 What we found tantalizing, though, is that often, when we subsampled the dataset, we did ?nd valid ?ts to the smaller sample. This pattern held whether the subsamples were constructed randomly or chronologically (for example, testing each day in the LBL datasets separately). We tested whether the pattern was due to daily variations in the model’s parameters by using autocorrelation plots. We found such patterns only in the arrival process and bytes transferred of nntp, and bytes transferred by smtp connections. We discuss these ?ndings below in Sections 5.2 and 6.2. We did not ?nd any consistent patterns in the LBL telnet or ftp test datasets, ruling out simple hourly, daily, or weekly patterns in the parameters. These ?ndings are consistent with our models being close to describing the distributions but not statistically exact. In such a case it will take a large number of sample points for a goodness-of-?t test to discern a difference between the distributions. When we subsample we present the test with fewer points and the ?t is then more likely to be found valid. Figure 1 illustrates the problem. Here we see the distribution of log2 of the bytes sent by the telnet responder (i.e., not the host that began the connection) for the ?rst half of the LBL-4 dataset. Fitted against the distribution is our responder-bytes model, which uses a normal distribution for

5 A signi?cance level of 5% indicates a 5% probability that the 2 test erroneously declares the analytic model to not ?t the dataset. A 5% test is more stringent than a 1% test; it errs more often because it demands a closer correspondence between the model and the dataset before declaring a “good ?t.” See [Ross87, pp. 205-206] for further discussion of signi?cance levels.

?

lg Responder Bytes

Figure 1: Censored Log-Normal Fit to Upper 80% of LBL-4 TELNET Responder Bytes the upper 80% of the data (and ignores the lower 20%). The horizontal line indicates the 20th percentile; the goodness-of?t test applied only to the agreement above this line. While judging visually we might be tempted to call the ?t “good”, it fails the 2 test even at the 1% level. This sample consisted of 5,448 points. We then subsampled 1,000 points randomly, tested the validity of the model’s ?t to the subsample, and repeated the process 100 times. Of these 100 tests, 79 were valid at the 1% level and 55 at the 5% level. Thus we feel con?dent that the model is close, though we know it is not exact.

3.9

Comparing Models

While we must abandon our initial goal of producing statistically valid, “exact” models, we still can produce useful analytic models by building on the work of [DJ91, DJCME92] in the following way. In those papers the authors argue that their empirical models are valuable because the variation in traf?c characteristics from site-to-site and over time is fairly small. Therefore the tcplib models, which were taken from the UCB dataset, faithfully reproduce the characteristics of wide-area TCP connections. If we can develop analytic models that ?t other datasets as well as tcplib does, then the analytic models are just as good at reproducing the characteristics of widearea TCP connections; a network researcher is just as well off using either set of models, and may prefer the analytic descriptions for the advantages discussed in Section 3.1. The question then remains how to compare an analytic 7

?

Analytic

and

Empirical

1

remains invariant with increasing . If the bins have equal width, then we have:

2

1

2

1

which allows us to compute , the “average deviation” in each bin: 2 1 We interpret as follows: the value of 2 we observed is consistent with what we would observe if in each bin the proportion of observations deviates from the predicted proportion by , i.e., . While in general the deviation will vary from bin to bin, we can use to summarize the “average” deviation. We are faced with several problems when using this metric: Similar to the problems using chi-squared tests mentioned in Section 3.2 above, we are forced to make a somewhat arbitrary choice as to how many bins to use. We chose to use ten equal-sized bins, so as to measure the deviation from the predicted distribution within each 10th percentile.6

?

6 In one

case below we use nine bins, to accommodate censored data.

We use the metric to gauge how closely the distributions of different models match that of a particular observed distribution. We deem the model distribution with the lowest value as corresponding to the best-?tting model for the observed distribution. In general we tested each dataset against three model distributions: one produced by our analytic model, one produced using the empirical distributions found in the UCB dataset, and one drawn from the LBL-2 dataset. As mentioned in Section 2.4, the distributions in tcplib come from the UCB dataset, with some minor differences in the data reduction. Thus, how well the UCB dataset ?ts the other datasets should closely match the ?t of tcplib to those datasets. If the analytic models ?t the datasets as well or better, then we argue that the analytic models provide as good or better an overall model. Finally, to guard against the possibility that the UCB dataset is atypical and that better

8

?

?

? ?

§

? §

§

?

?

¤

? ?§

?

??

¤

?

?

? §

?

2

2

2

An empirical model does not always allow us to create equal-sized bins. It may be that the model has a single-valued spike straddling a bin boundary (for an exaggerated example, suppose that the lower 20% of an empirical distribution are all equal and we want to create bins 10% wide). We deal with this case by placing the entire spike in the lower bin and adjusting the bin widths accordingly. If the spike is substantial and not aberrant, then this procedure will aid the ?t of the empirical model more than that of an analytic model.

?

Since an empirical model has bounds on the range of values it allows for, the tested dataset may have values not corresponding to any bin. We removed such values from the dataset prior to computing its ?t to the model. We did, however, include these values in the summary of deviation in the tails (see Section 3.10 below).

where is the number of bins, the fraction of all observations predicted to fall into the th bin, the total number of observations, and the number of observations actually falling into the th bin. We make one important change, though. If a chi-squared test is used to compare non-identical distributions, then the resulting 2 increases with , making it dif?cult to compare 2 values when testing a distribution against different-sized datasets to see which it more closely matches. If two distributions are different, then for large values of , will approach some ?xed factor , and the squared term in the 2 computation approaches 1 . We then see that the metric:

§

? ?

?

? §

?

?

§

? ?

?

? §? ¨

§

?

1

? §

?

?

? ?

¤

?

??

?

?

2

2

? ?

model with an empirical one. Rather than a goodness-of?t test, we need some sort of goodness-of-?t metric. While under certain conditions one can apply tests like 2 as metrics [DS86], they are not appropriate metrics for measuring the ?t of an empirical model; the tests are designed for comparing a continuous distribution (an analytic one) with an empirical distribution. We chose as our metric a measure of “bin” frequencies, similar to a chi-squared test. A chi-squared test computes:

The metric does not inform us of deviations in the distribution tails, often the most important type of deviation. We address this shortcoming in the next section. The metric does not inform of us interesting, localized spikes or clumps. Within a single bin we may miss considerable departure from a model; the danger is particularly acute when testing analytic models, since their continuous nature does not usually allow for clumping. Empirical models, on the other hand, may exactly predict the clumping. We do not believe this problem to be major because in our studying of the LBL test datasets to form our models we rarely encountered consistent clumping (we make mention below of those occasions when we did). We also note that if clumping exists and is not accompanied by nearby sparseness, then the clump will “pull” more values into the bin than a model without the clump would predict, which will raise the value. So a major clump may be detected as an overall poor ?t by the model.

?

?

? § ? ¨

?

?

? ??

!

?

? ? § " ? ? ?

?

?

?!

?§

?

? §

?

?

empirical models might exist, we also constructed and tested an empirical model consisting of the entire LBL-2 dataset. We developed and settled on this metric prior to observing the values it gave for the different models. We tested two versions of each model. In the ?rst version all parameters were ?xed; none were derived from the dataset being tested. When developing our analytic models we picked for each free parameter a round value lying somewhere in the range the parameter exhibited in the LBL test datasets. We chose round values as reminders that there is in general considerable range in the possible values of the parameters, and that our choice was therefore somewhat arbitrary (nearby choices would work just as well). In the second version of each model we derived the model’s free parameters from the dataset being tested. For empirical models we applied a linear transformation to the empirical distribution so that its mean and standard deviation matched that computed for the tested dataset. We refer to this second type of model as scaled.

disagreement in the upper tails can result in large connections that are megabytes too big or small. For other models we summarize both the upper and lower tails.

3.11 Modeling Interarrivals

The ?nal aspect of our methodology is how we model connection interarrivals. Our hope was to successfully model interarrivals as Poisson processes, as these have many attractive properties and a natural interpretation (uncorrelated, memoryless arrivals). We cannot hope for much success, though, if we simply model the interarrival distribution directly: we expect that the arrival process will vary over the course of each day, since computer users tend to work during daylight hours, take lunch breaks, and so on; we do not expect a homogeneous Poisson arrival process. Instead we ?rst look at the relative rate of connection arrivals over the course of a day in order to develop a nonhomogeneous Poisson model. Figure 2 shows the mean, normalized, hourly connection rate for the test datasets. For each hour we plot the fraction of the entire day’s connections that occurred during that hour. We see, for example, that telnet connections are particularly prominent during the 8AM-6PM working hours, with a lunchrelated dip at noontime; this pattern has been widely observed before. ftp ?le transfers have a similar hourly pro?le, but they show substantial renewal in the evening hours, when presumably users take advantage of lower networking delays. The nntp traf?c hums along at a fairly constant rate, only dipping somewhat in the early morning hours (but the mean size of each connection varies over the course of the day; see Section 5.2). The smtp traf?c is interesting because it shows more of a morning bias than either telnet or nntp. To explore this bias we have also plotted the hourly rates for the BC dataset’s smtp connections. Here we see a signi?cant afternoon bias. As LBL lies on the west coast of the United States and Bellcore to the east, three time zones away, we can interpret this difference as being due to cross-country mail: mail sent by east-coast users arrives early in the day for westcoast users, and mail sent by west-coast users late in the day for east-coast users. We can then use this data to attempt to model interarrivals as Poisson processes. First we compress datasets consisting of more than one day into a single “superday” by grouping together all connections beginning during each hour of the day. For example, all connections arriving between 9:00AM and 9:59AM are placed in one 9AM “superhour”, regardless of during which day the connection arrived. The hope is that the daily variations are considerably less than the hourly variations, which is true in general except for weekends, during which much less traf?c is generated. But because weekends have many fewer arrivals, the effect of aggregating them with weekday connections of the same hour is small. Next we predict the number of connections occurring dur-

3.10 A Metric for Deviation in the Tails

We summarize each model’s ?t to the extreme tails as follows. Suppose we test the model against datasets. For the th dataset, let be the number of observations predicted to lie in the tail, and be the number actually found to do so. De?ne: 1 log 2

1

then gives the mean of the natural logarithm of the proportion by which the model overestimates the population of the tail. Positive values of indicate that the model overestimates the tail, either consistently or in a few cases grossly. Similarly, negative values indicate the model underestimates the tail. With this de?nition, an underestimate by a factor of two ( 1 2) is just as bad as an overestimate by the same factor ( 2), though if the two occur in different datasets they will cancel out one another. Values of close to 0.0 indicate that either the model consistently does well in modeling the tail, or overestimates for some datasets and underestimates for others. In the latter case there probably is great diversity in the distribution’s tail across the different datasets, and the model’s estimate of the tail is a good compromise. One problem arises when using this de?nition of : if is 0 then becomes unde?ned. We address this problem by replacing with 100 in these cases. In comparing models we summarize how well each model does in the 10% and 1% tails. For models describing bytes transferred, we only summarize the upper tails, as in these cases disagreement in the lower tails is a matter of predicting a few bytes too many (or few) in small connections, while

?

? ? ? ?

?

?

?

§

?

?

?

¤

??

§

?

?

?

?

? § ?? ? § §

?

?

?

?

§?

?

?

?

?

?

?

?

?

?

9

Fraction of Total Connections

Telnet FTP NNTP SMTP BC SMTP

0.0

0.02

0.04

0.06

0.08

0

5

10 Hour

15

20

Figure 2: Mean Daily Variation in the Test Dataset Connection Rate ing each hour by multiplying that hour’s fraction as given in Figure 2 by the total number of connections during the superday. Call this quantity , for the number of arrivals during hour . If we have arrivals from a Poisson process during a single hour, then we expect the mean interarrival time in seconds to be 3600sec 3 and if we divide the interarrival times by , then they should be exponentially distributed with a mean of 1. Now that each hour’s interarrivals have been normalized to the same mean, we test the distribution of all of the superday’s normalized arrivals together against that predicted by an exponential model with mean 1. We can also test a “scaled” version of this model which does not rely on the rates given by Figure 2. Instead of computing as given in Equation 3, we simply compute each superhour’s interarrival mean directly and divide by that value, guaranteeing a resulting mean of 1. tcplib does not presently include empirical models for interarrivals, probably because creating such empirical models requires a fair amount of transformation to the raw interarrival times. We therefore do not compare the performance of the analytic interarrival model against that of empirical models, but instead compare the scaled version of the model against the unscaled. If we ?nd that for both versions is quite low, then the analytic model is successful and the rates given by

Figure 2 are widely applicable. If is only low for the scaled model, then the arrivals are indeed from a nonhomogeneous Poisson process, but with rates different from those given in Figure 2. If is high for both versions, then the arrivals are not from a Poisson process with a ?xed hourly rate. If were to be high for the scaled model but low for the unscaled model, then we would be left with a puzzle, but fortunately this never happened. Note that we do not model the arrival of a site’s inbound and outbound connections separately, though the two might well have different hourly rates; nor do we model the correlations between inbound and outbound arrivals. We leave these important re?nements to future work.

4

TELNET

We now turn to analyzing the characteristics of individual protocols and developing models to describe them. We begin with telnet.7

7 Appendix C presents a similar overview for rlogin traf?c, along with results of modeling it with the telnet models developed in this section.

10

?

?

?

? ?

§

?

? ??

?

?

? ¤?

?

?

?

Table 6: Summary of TELNET Connections

4.1 Overview of TELNET Connections

Table 6 summarizes some basic statistics of the datasets’ telnet connections. The Table is read as follows. The second column gives the number of “valid” connections recorded for the dataset and the third column the number of “rejected” connections; Appendix B details the rejected connections. As discussed in [Paxson93], the LBL-6 telnet traf?c included 1,988 connections due to periodic traf?c. LBL-6* summarizes the LBL-6 traf?c with these connections removed. For the remainder of this section we use LBL-6* instead of LBL-6. The 4th through 6th columns summarize the number of data bytes transmitted by the originator (the user end of the remote-terminal connection). The values given are the geometric mean, the geometric standard deviation, and the maximum. As noted in Section 3.3, except for interarrival times we applied logarithmic transformations to the data prior to analysis. This transformation is also important for summary statistics such as those presented in this Table, because arithmetic means and standard deviations are quickly dominated by upper-tail outliers; compare the ?gures given in this paper with those of our previous work [Paxson91]. The latter tend to be much larger. The 7th through 9th columns give the same summary for the number of bytes transmitted by the responder (remote computer), and the 10th through 12th columns the same for the duration of the connections, with ‘s’ used to indicate seconds and ‘h’ for hours. We note that the geometric mean duration of telnet connections ranges from 2 to 4 minutes, while Jackson and Stubbs [JS69] reported average connection lengths for local logins of 17 to 34 minutes, and [Bryan67] gives a local-login median of 20 minutes and a mean of 45-50 minutes. Jackson and Stubbs infer that connection time “may be considerably reduced by providing a high-speed channel from the computer to the user”, so we might suspect the difference between their measurements and the telnet data is due to the higher commu11

nication speeds of today’s computers. More recently (1985), Marshall and Morgan found that local-area remote logins had an average duration of 45 minutes [MM85], and non-network logins had an average duration of 150 minutes. Thus the distance between the user and the computer appears inversely correlated with the login duration. Since bandwidth usually decreases with distance, we appear to be seeing Jackson and Stubbs’ effect but rescaled to re?ect today’s range of communication speeds. The LBL telnet connections were on average substantially longer and consisted of more bytes than those at other sites. We would expect slightly longer average durations for LBL connections since the datasets span several weeks, giving an opportunity to detect long-lived connections that would be missed by the short spans of the other datasets (except for BC, which spans 13 days and has the next highest average). But this effect is small: if we eliminate from LBL-1 all connections spanning more than one day (i.e., crossing midnight), then ? orig drops to 197B, ? resp drops by 53B, and ? dur drops to 260 s. Given the difference in these parameters even after this adjustment, we are forced to conclude that, at least with regard to mean bytes transferred and duration, the LBL telnet traf?c is signi?cantly different from that at other sites. We also note a de?nite trend over the LBL datasets towards increasing values of ? orig , and a similar though less convincing trend in ? resp , too, indicating that telnet connections are growing larger with time. Connection durations, on the other hand, are not growing longer, suggesting that higher network bandwidths are enabling users to engage in more work during each session. Finally, we note that the data provide support for the observation in [DJCME92] that “interactive applications can generate 10 times more data in one direction than the other,” and actually suggest the factor is around 20:1. The observation that the computer end of a terminal session generates an orderof-magnitude more data than the user end can be found as far back as reference [JS69], though [Bryan67] found the ratio to

?

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

?

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

LBL-1 LBL-2 LBL-3 LBL-4 LBL-5 LBL-6 LBL-6* BC UCB USC NC UK

5,734 7,582 9,607 10,897 14,922 17,425 15,437 744 655 405 3,023 962

9 12 23 58 81 52 52 2 4 0 34 35

199B 199B 214B 237B 237B 147B 242B 145B 155B 184B 112B 143B

44 46 47 43 39 73 45 41 47 43 39 36

207KB 282KB 537KB 613KB 215KB 777KB 777KB 9.7KB 27KB 12KB 146KB 30KB

4.2KB 4.3KB 4.1KB 5.3KB 5.2KB 3.8KB 5.7KB 2.9KB 2.5KB 4.1KB 2.6KB 2.5KB

79 75 76 74 68 87 73 87 91 72 10 6 93

1.9MB 3.2MB 5.5MB 86.6MB 19.3MB 14.0MB 14.0MB 0.6MB 0.7MB 0.6MB 3.4MB 0.7MB

266 s 237 s 226 s 271 s 248 s 256 s 270 s 193 s 166 s 168 s 106 s 175 s

?

?

orig

?

resp

?

?

?

Dataset

# Conn

# Rej

? orig

maxorig

? resp

maxresp

? dur

dur

maxdur 90.5 h 78.2 h 167.9 h 270.0 h 386.8 h 102.9 h 102.9 h 8.1 h 7.9 h 5.5 h 6.8 h 7.2 h

68 68 69 68 71 69 77 64 69 65 74 52

0.4

4.2 TELNET Originator Bytes

With the bulk transfer protocols we examine in subsequent sections, we usually are only interested in modeling the number of bytes transferred and the connection interarrival process. With interactive applications, on the other hand, we not only are interested in the bytes transferred in both directions but also the connection duration and the relationships between these variables. We begin by modeling the number of bytes sent by the originator of a telnet connection (typically a human typing at a keyboard). The best ?t we found to the LBL telnet test datasets came using the extreme distribution: exp exp

Analytic Fit

0.2

USCL6 L5 l3 L4 L1 L3 l1 usc

BC bc NC

4

0.0

0.0

0.1

0.2

0.3

Reference [DS86] gives a procedure for estimating and for a given dataset. For our originator-bytes model, in Equation 4 is log2 of the number of bytes transmitted by the connection originator.

Empirical Fit

Figure 4: Empirical vs. Analytic Models for TELNET Originator Bytes Figure 4 shows the computed values of for this analytic model plotted against both the UCB and LBL-2 empirical models, where is de?ned as in Equation 1. The X axis gives the value of corresponding to one of the empirical models, and the Y axis the value corresponding to the analytic model. We read the plot as follows. Each point on the plot is labeled with the name of the corresponding dataset. “L1” through “L6” represent the LBL datasets and “D1” through “D3” the DEC datasets (not present in this particular plot). Labels written in lower case (e.g., “usc”) re?ect values for unscaled models; that is, using the raw UCB or LBL-2 data for an empirical model, and the ?xed version (no ?tted parameters) for the analytic model. Labels in upper case (“USC”) re?ect the scaled models. We plot the text label at the point corresponding to comparing the UCB empirical model, on the X axis, with the analytic model, on the Y axis. We then draw a line from that point to the corresponding point comparing the LBL-2 empirical model with the analytic model. This line is always horizontal because the two comparisons share the same value, for the analytic model, on the Y axis. Thus for each dataset four different points are plotted: the unscaled analytic model vs. the UCB empirical model (e.g., 12

0.0

0.2

0.4

0.6

0.8

1.0

5

10 lg Originator Bytes

15

Figure 3: TELNET Originator-Bytes Model for LBL-2: LogExtreme Distribution Figure 3 shows the distribution for the ?rst half of the

¤

?

¤

??

?

?

¤

?

?

?

?

be 2.85:1 on a line-by-line basis (the author also states, however, that the studied system was substantially different from a general-purpose, on-line, time-shared system). Marshall and Morgan found ratios as high as 35:1 for teletypewriters in technical use, with half that being a representative average, and as low as 3:1 for teletypewriters used for word processing [MM85]. In Section 4.5 below we present a model for this ratio.

LBL-2 dataset, along with the ?tted model. We see apparently good agreement except in the tails, but when tested with 2 the ?t fails to be valid; the same holds for the other LBL half-datasets. For the four test datasets, varied from 6.55 to 6.93; we chose log2 100 6 64. varied from 1.74 to 1.92. For our ?xed model we chose log2 3 5 1 81.

nc UK uk l5 l4 l6

?

?

?

§ ¨? ? ? ¤

?? ?

?

? ??

? ?

0.4

0.5

4.3

TELNET Responder Bytes

We next turn to modeling the bytes transferred by the telnet responder. Figure 6 shows a log-normal ?t to the upper 80%

13

?

?

?

“usc”); the scaled version of the same (“USC”); the unscaled analytic model vs. the LBL-2 empirical model (the line drawn from “usc”); and the scaled version of the same (line drawn from “USC”). For example, the lowest pair of points indicate that the UCB empirical model had 0 3; the LBL-2 model, 0 1; and the analytic model, 0 1. Since the line drawn from “usc” goes to the left, the LBL-2 empirical model provided a better ?t to the unscaled USC dataset than did the UCB empirical model. In general, if the lines head to the left of the labels then the LBL-2 empirical model surpasses the UCB model; and vice versa if the lines go to the right. The diagonal line indicates where analy emp , i.e., where the analytic and empirical models yield the same closenessof-?t metric. Points below and to the right of this line indicate datasets for which the analytic model ?tted better than the empirical model; points above and to the left, where the empirical model ?tted better. For example, from this plot we see that the unscaled analytic ?t to the LBL-5 dataset was much better than that of the UCB empirical model (“l5”) but about the same as that of the LBL-2 empirical model. We see in this plot that the LBL-2 empirical model almost always does better than the UCB empirical model, and that the analytic model performs comparably. The points tend to lie either just above the diagonal, indicating a slightly better empirical ?t, or a bit further away from and below the diagonal, indicating a better analytic ?t. For this model, scaling sometimes results in a big improvement (NC, LBL-4), no improvement (BC), or an improved empirical model but worsened analytic model (USC, UK). Thus in this case it makes sense to scale the empirical models when predicting traf?c, but not the analytic model. Closer observation reveals that for every dataset except UK and NC (two extreme cases), the analytic model ?ts the dataset better than the UCB model, while the LBL-2 model ?ts best in every case except for a few points very close to the line of equality. Thus we can order the models: the LBL-2 empirical model is better than the analytic model, which in turn is better than the UCB model. The overall ?t of the model to the datasets does not tell the entire story, however. As is generally the case with bytestransferred models, for telnet originator bytes the models’ ?ts to the upper tail are much more important than ?ts to the lower tail. Figure 5 summarizes the upper-tail ?ts. The plot is labeled with “a” for the unscaled analytic model, “u” for the unscaled empirical UCB model, and “l” for the unscaled empirical LBL-2 model. The upper-case versions of these letters correspond to the tails for the scaled versions of these models. The X axis gives the value for the upper 10% tail, and the Y axis the value for the upper 1% tail, where is computed as given in Equation 2. In this plot we see that scaling had little effect on ?tting the upper tails, as all of the uppercase letters are near their lowercase counterparts. A letter close to the origin, such as “u”, indicates excellent

3

Upper 1% Tail

1

a A l L u U

-3

-1

0

2

¤

?

?

?

¤¤?

?

?

-3

-2

-1

0

1

2

3

Upper 10% Tail

Figure 5: Tail Summary for TELNET Originator Bytes modeling in both the upper 10% tail and the upper 1% tail. That all of the models are clustered around the Y-axis indicates they all model the upper 10% tail well. But we see that both the analytic model and the LBL-2 model have high values of for the upper 1% tail. As explained in Section 3.10 above, this indicates that those models overestimate the distribution in the 1% tail. That is, they tend to predict more values in the 1% tail than were actually present in the datasets. As the axes are scaled logarithmically, the deviations shown are quite large. Indeed, the unscaled analytic model overestimates the upper 1% tail for every single dataset, and for all except the LBL and NC datasets not a single observation actually resided in the predicted tail. Thus a value of 2 corresponds to exceptionally poor tail ?tting. While the UCB empirical model does poorly versus the other models in ?tting the datasets over the entire distribution of originator bytes, it is the obvious champion when it comes to ?tting the upper 1% tail. Thus predicting telnet originator bytes leaves us in a quandary: we must decide which is more important to us, the overall ?t to the distribution, in which case LBL-2 or the analytic model is recommended, or the upper 1% tail, in which case UCB is recommended. If ?tting just the upper 10% tail well is adequate, then either the analytic model or LBL-2 is recommended. In the interest of conserving space, for the remaining models we relegate their outlier summaries to Appendix D.

1.0

NC

0.8

0.4

0.6

Analytic Fit

nc

0.4

0.2

0.2

uk UKBC bc

5

10

15

20

lg Responder Bytes

0.0

l6 l5 l4 USC usc L6 l1 L5 L1 L3l3 L4

0.0

0.0 0.1 0.2 0.3 0.4 0.5 Empirical Fit

Figure 7: Empirical vs. Analytic Models for TELNET Responder Bytes

Figure 6: TELNET Responder-Bytes Model for LBL-2: LogNormal Fit to Upper 80% of the responder bytes in the LBL-1 test dataset. This ?t is excellent; it passes the 2 test at the 25% signi?cance level (compare with Figure 1, which shows the same ?t for the LBL-4 dataset and fails 2 even at 1% signi?cance). We see, however, that the lower 20% (below the horizontal line, corresponding to less than 1 KB transferred) is not smoothly distributed, making it unlikely we might ?nd a simple analytic model encompassing it. We speculate that this roughness is due to the varying sizes of log-in dialogs and message-of-theday greetings. Fortunately the lower tail is the least important part of this distribution. We found in the test datasets that the log-mean ( ? ) varied from 12.0 to 12.4, generally closer to 12.0, and we chose for our ?xed model ? log2 4500 12 1. varied from 2.79 to 2.89; we chose log2 7 2 2 85. For this one model we evaluated the metric using 9 bins, from 0.2 to 1.0, instead of 10 bins (0.1 to 1.0), because the analytic model only ?ts the upper 80% of the data and it did not seem worthwhile to develop a separate model for the lower 20%. Figure 7 summarizes the ?ts. Except for NC, the analytic model uniformly performs well, with always 0 2. The LBL-2 model also fares quite well, while the UCB model is not as good except for UK and BC. Scaling these models does not always improve things (USC in particular) but in general helps. Figure 8 explains the terrible performance ?tting NC: the distribution suffers from two large clusterings, one between 240 and 265 bytes, and the other between 400 and 425 bytes. The ?rst consumes 13% of all the connections, the second 5%. A single host originated virtually all of the connections in the ?rst cluster, but to a number of different

0.0

0.2

0.4

0.6

0.8

1.0

¤

?

?

?

? ¤¤

?

?

¤

?

?

? ?? ?

?

5

10

15

20

lg Responder Bytes

Figure 8: Distribution of NC TELNET Responder Bytes

14

nc

Analytic Fit 0.4

Analytic Fit

0.6

0.4

uk nc UK NC USC bc l6 usc L6L5 l3 L3L1l5l4 L4 l1 BC

uk usc NC l1 bc L1 l4 L4 l6 L3 USCl3 UK l5 L6 L5 BC

0.2

0.0

0.0

0.2

0.4

0.0

0.2

0.0

0.2

0.4

0.6

Empirical Fit

Figure 9: Empirical vs. Analytic Models for TELNET Duration hosts, and two other hosts originated almost all of the connections in the second cluster, primarily to two remote hosts. We were unable to ?nd obvious patterns in the interarrivals (see Figure 18 below for an example of clear one-minute patterns in connection arrivals); therefore, unlike many of the spikes discussed in [Paxson93], the connections were probably not generated by background scripts. Perhaps they correspond to cracking attempts, or more benign searches. Overall they remain puzzling. Figure 28 in Appendix D shows the performance of the models with regard to the upper tails. Each model except for unscaled UCB does well in the upper 10% tail. All of the models overestimate the upper 1% tail somewhat; the unscaled UCB model surprisingly doing the best. On the basis of these plots we would prefer the empirical models if the upper 1% tail is important to us; otherwise either the analytic model or LBL-2 is preferable. LBL-2 provides the best overall model.

Empirical Fit

Figure 10: Empirical vs. Analytic Models for TELNET Resp./Orig. Ratio worse than the other two. Figure 28 in Appendix D summarizes the tail performance of the models. In general the models do well in the upper 10% tail, though the unscaled analytic and LBL-2 models overestimate somewhat. In the upper 1% tail these same models do quite poorly, while the UCB models are excellent in both tails. Because the UCB model did well in the general ?tting shown in Figure 9, its good performance here makes it the model of choice for telnet duration.

4.5

TELNET Responder/Originator Ratio

4.4 TELNET Duration

We model telnet connection durations using a simple lognormal distribution. For the test datasets we found ? ranging from 7.67 to 8.03 and chose ? log2 240 7 91. ranged from 2.83 to 3.02; we chose log2 7 8 2 96. Figure 9 shows the ?ts for the duration models. In general the models are fairly good, with the metric falling between 0.1 and 0.3. NC again proves troublesome, though not so when scaled. No model emerges a clear winner, and, while the analytic model appears to do worst, it is not considerably

?

If we wish to use these models to generate or predict telnet traf?c, then we also need models giving the relationships between the various distributions. In particular, we would like to know how many responder bytes to expect given a particular number of originator bytes, and how long a connection will last given how many bytes it transfers. We model the ratio between the number of responder bytes and originator bytes using a simple log-normal distribution. For the test datasets we found ? ranged from 4.17 to 4.46, tending toward the high end, and from 1.77 to 1.89, also tending to the larger value. For the ?xed model we chose ? log2 21 4 39 and log2 3 6 1 85. Figure 10 shows the performance of each model. Other than the unscaled UK and NC datasets, the analytic model does quite well, with 0 2 except for the scaled NC, with 0 25. In general the LBL-2 empirical model does a little better than the analytic model, and almost always better than UCB. Scaling improves some ?ts considerably and has only

15

¤

?

¤

?

??

¤ ? ?

?

?

¤

?

¤ ?

??

¤ ? ¤ ?? ¤ ?

? ? ??

?

16

?

marginal effect on others. The overall success of the unscaled analytic model gives solid evidence that the ratio between the bytes generated by the computer in a remote login session and those generated by the user is about 20:1, since the ?xed model uses a ratio of 21:1. For the responder/originator ratio we are interested in agreement in both the upper and lower tails, as disagreement in either could result in skewed predictions when the number of originator bytes is large. Figure 28 in Appendix D shows the performance for the upper and lower tails. All of the models do fairly well for the upper tails except for the unscaled UCB model, which underestimates both upper tails. The analytic model does best. With the LBL-2 model, scaling trades off better performance in the 10% tail for worse in the 1% tail. In the lower tail for both the analytic and LBL-2 models scaling helps the 10% tail but worsens the 1% tail, indicating that the 1% tail is distributed differently than the other 99%. The UCB model does well in the lower tails, though. All in all we are left with no clear best model, and none of the models is really bad. One might wonder whether the responder/originator ratio’s distribution itself varies according to the number of bytes transferred; for example, perhaps when many originator bytes are transferred, the ratio tends to be low, because relatively speaking not so many responder bytes are transferred. For the test datasets we found that the correlation coef?cient between log2 of the originator bytes and log2 of the responder/originator ratio varied from 0.07 to 0.10, indicating at most a mild positive correlation. When using the responder/originator ratio to generate telnet traf?c, a subtle point arises: one can either derive the originator bytes and the ratio, and multiply to obtain the responder bytes, or one can proceed in the opposite fashion, generating the responder bytes and the ratio, and dividing to obtain the originator bytes. While these two approaches appear equivalent, they are not, and the former (deriving the responder bytes from the originator) is preferable. The difference arises because while both the responder bytes and the ratio are log-normal distributed, the originator bytes are extreme distributed. Multiplying the originator byte’s extreme distribution by the ratio’s log-normal distribution yields a distribution close to log-normal; but dividing the responder byte’s log-normal distribution by the ratio’s log-normal distribution yields exactly a log-normal distribution (since the difference of two normal distributions is a normal distribution), and not an extreme distribution. Alternatively, we can think of the originator bytes as having a somewhat skewed log-normal distribution. Multiplying this distribution by another log-normal distribution smears out the deviations, and the result is close to log-normal; but chances are dividing two log-normal distributions will never reproduce the skewed distribution. Thus, to generate traf?c we should begin by generating the number of originator bytes and the responder/originator ratio, and then multiply to derive the responder bytes. This ap-

0.0

0

0.2

0.4

0.6

0.8

1.0

200

400

600

800

Responder (bytes) / Duration (secs)

Figure 11: Responder/Duration Distributions for LBL-1: Exponential Fits proach is not ideal, however, because it ignores the responderbytes model we outlined above, which is more successful than the originator-bytes model.

4.6

TELNET Responder/Duration Ratio

Just as we want a way to relate the originator bytes sent with the responder bytes, we also would like to relate these random variables to the connection duration. We investigated analytic models for three different ratios: originator bytes to duration, responder bytes to duration, and total bytes to duration. We found the best ?ts came using the responder/duration model. For most connections the responder/duration ratio was well modeled by an exponential distribution, but “large” connections—those whose responder bytes were in the upper 10% of all connections—had a different distribution. For these, the ratio was fairly well modeled by a log-normal distribution. Figure 11 shows the responder/duration ratio for both the lower 90% of the LBL-1 connections (in terms of responder bytes) and the upper 10%. The distribution on the left is for the lower 90%; though it is hard to tell due to scaling, an exponential with the same mean has been drawn and lies squarely on top of it. This ?t is very good; it passes 2 at the 5% level. To the right we show the distribution of the upper 10%, plotted with an exponential with the same mean. We see that the distribution is qualitatively different, and the corresponding exponential not a good ?t. We ?nd the bimodality shown in this ?gure a bit puzzling. It says that very large connections (in terms of bytes trans-

0.4

0.20

UCB

Analytic Fit

uk

0.2

l6 NC

usc nc BC L6 L3

Scaled Fit

0.10

bc UK l1 USC l5 l4 l3 L4 L5 L1

UK BC L1 L3 L2L4 L6 L5 0.0 0.10 Unscaled Fit 0.20 NC

0.0

0.0

0.1

0.2

0.3

0.4

Empirical Fit

Figure 12: Empirical vs. Analytic Models for TELNET Resp./Duration Ratio ferred) occur over relatively short durations: while the geometric mean of the responder bytes in these large connections is 45 times that of the smaller (lower 90%) connections, the geometric mean of their durations is only 16 times that of the smaller connections. This phenomenon was also observed by the authors of [SC92], who found that “users transmitting large amounts of data over a link tend to transmit that data within 15 minutes.” We do not have a good explanation for this phenomenon. For the lower-90% model, the test datasets gave ? ranging from 27 to 33 for the responder/duration ratio; we chose ? 30. For the upper-10% model, ? ranged from 5.19 to 5.41 and from 1.38 to 1.61; we chose ? 5 3 and 1 5. Figure 12 shows the ?t of the models for the lower 90% of the responders. The analytic ?t is good, with 0 3 and often 0 2; in general it ?ts better than either empirical model. For the upper 10% of the responders we compared considerably fewer datasets. Our requirement that each dataset include at least 100 measurements ruled out any dataset with fewer than 1,000 telnet connections, leaving just the LBL and NC datasets. The ?t remains good, though: the analytic 0 3 except for the unscaled NC model does well, with dataset (where 0 6 for all three models), quite a bit better than the UCB model and about equal to the LBL-2 model. Figure 28 in Appendix D summarizes the upper and lower tail distributions for the ?t to the lower 90% of the responders. In the upper tails the analytic model does best, only mildly underestimating the upper 1% tail; only the scaled LBL-2 model is roughly comparable. In the lower tails the scaled analytic

model does very well, with the unscaled version overestimating the 1% tail somewhat. Again the empirical models do considerably worse and the scaled UCB model is completely inadequate, though the unscaled model is acceptable. For the models of the upper 10% of the responders, every model underestimates the upper 1% tail somewhat, with the analytic models and the scaled LBL-2 model about the same at 0 5. The unscaled empirical models fare poorly in the 10% tail, too, considerably underestimating it, while analytic models and the scaled empirical models match the 10% tail well. In the lower tails the unscaled models do fairly well with the 10% tail, and the scaled models do quite well. Except for the scaled UCB and analytic models, though, the lower 1% tail is considerably underestimated.

4.7

TELNET Interarrivals

We now turn to modeling telnet interarrivals, using the methodology discussed in Section 3.11 above. Figure 13 compares the values for the unscaled and scaled arrival models. As explained in Section 3.11, instead of comparing the analytic model to the empirical models, we compare the analytic model’s scaled version with its unscaled version. We plot for the scaled analytic model on the Y axis vs. for the unscaled model on the X axis. Also, as mentioned in Section 2.2, we omit the USC dataset from our interarrival models because of the trace’s periodic blackouts. As expected, the scaled model in general does uniformly better, but we note that even for the unscaled model, 0 25, which, when compared to the ?ts of other models above, we see is quite good. The arrivals are thus well

17

¤

0.0

Figure 13: Interarrivals for TELNET

¤

?

?

?

¤ ¤ ? ? ?? ?

?

?

¤ ? ??

¤

?

¤

?

¤

?

?

?

Table 7: Summary of NNTP Connections modeled as a non-homogeneous Poisson process with hourly rates given by Figure 2. This ?nding is at odds with that of [MM85], who found that “user interarrival times look roughly lognormal”. Perhaps the discrepancy is due to the authors characterizing all interarrivals lumped together, rather than postulating separate hourly rates. Figure 30 in Appendix D summarizes the tail distributions for the scaled and unscaled arrival models. (See the text in Appendix D for an explanation of the symbols in the ?gure.) Note the range shown in the ?gure: even the worst ?ts have 0 25. Thus both the unscaled and scaled models do quite well, and the scaled model does exceptionally well. nating peer then replies with “QUIT” followed by a carriagereturn and a line-feed, it will send a total of 6 bytes during the connection. Indeed, we ?nd large spikes of 6 originator bytes in the nntp datasets, as did the authors of [DJCME92]. Thus we can recognize a connection in which the originating host sent 6 bytes as a “failure”. Not surprisingly, the failure rate varies greatly from site to site and from time to time, since it is often due to transient phenomena such as full disks. These failure rates are given in the “% Failures” column. Note that even over a period of 7 days, the DEC failure rate moved from 2% to 7%. To compute the remaining statistics in the Table, we ?rst removed all failure connections from the datasets. Not only can the failure rate vary signi?cantly, but so can the bytes transferred during non-failure connections. For example, as can be seen by the large increase in ? orig between LBL-3 and LBL-4, the LBL nntp server became much more effective in propagating news over a ?ve month period. LBL-5 and LBL-6 continue the impressive growth in ? orig . A similar effect can be seen between DEC-1 and DEC-3, only a week apart. Such changes can be due in part to circumstances wholly outside of the local site. Whether the articles a server attempts to propagate to its peers are accepted depends on whether those peers already have the articles; a subtle change in the nntp peer topology can swing a server’s position from one of holding mostly “stale” news to holding mostly “fresh” news. The steadily increasing ? orig value for the last four LBL datasets, though, is most likely simply a re?ection of the global growth in USENET nntp traf?c, which increases in volume about 75%/year (see [Paxson93]).

5 NNTP

5.1 Overview of NNTP Connections

Table 7 summarizes nntp connections. As nntp is noninteractive, the connection duration is not of much interest and has been omitted. Appendix B discusses the connections we rejected due to protocol errors. We expect nntp connections to show considerable variation because they can come in at least three modes: a server contacts a peer and is informed that the peer presently cannot talk to the server; the server offers the peer news articles but the peer already has the articles; the server offers articles and the peer does not have the articles. Each of these modes will result in signi?cantly different distributions of the bytes transferred during the connection. Furthermore, the second and third modes are somewhat indistinct, since the remote peer may have some but not all of the offered articles. The ?rst mode is easy to detect. If upon initially being contacted a responder peer is unable to communicate with the originating peer, it sends a message with response code 400 (“service discontinued”) as per [RFC977]. When the origi-

5.2

NNTP Originator Bytes

Figure 14 shows the distributions of bytes sent by the originator in non-failure nntp connections at LBL, DEC, and coNCert. The distributions show a large degree of variance 18

?

?

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

?

?

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

?

LBL-1 LBL-2 LBL-3 LBL-4 LBL-5 LBL-6 BC UCB USC DEC-1 DEC-2 DEC-3 NC

57,898 57,997 46,167 73,179 50,969 55,176 345 6,899 4,615 23,864 18,819 19,244 904

2 1 6 39 161 1048 116 0 15 5 88 7 206

38 % 36 % 19 % 2% 8% 8% 25 % 1% 4% 2% 3% 7% 9%

2.0KB 2.4KB 2.4KB 6.0KB 14.5KB 28.4KB 15.5KB 2.1KB 11.5KB 1.1KB 1.3KB 2.2KB 12.9KB

92 78 62 85 85 68 62 72 10 3 11 6 11 7 14 1 12 3

4.2MB 1.1MB 1.9MB 5.6MB 16.5MB 15.7MB 2.4MB 720KB 3.6MB 5.8MB 26MB 18MB 12MB

305B 328B 384B 398B 633B 888B 1005B 307B 709B 264B 292B 339B 1182B

?

orig

?

?

?

Dataset

# Conn

# Rej

% Failures

? orig

maxorig

? resp

resp

maxresp 923KB 584KB 128KB 1.4MB 9.5MB 1.3MB 81KB 1.7MB 74KB 75KB 356KB 223KB 3.2MB

20 21 19 22 29 22 30 20 23 22 24 27 45

¤

?

!

?

!

1.0

LBL-1 LBL-3 LBL-4 LBL-6 DEC-1 NC

0.0

0.2

0.4

0.6

0.8

5

10

15 lg NNTP Originator Bytes

20

Figure 14: Distribution of NNTP Originator Bytes (recall that the X axis is scaled logarithmically), suggesting that scaling is vital when modeling nntp traf?c. Given the great variation in originator bytes transferred, we decided to simply use a log-normal model to describe the connections, with the caveat that we do not expect the model to perform well in either scaled or unscaled forms (but we also do not expect empirical models to do well, either). For the LBL test datasets, the log-mean ranged from 10.8 to 12.5, and the log-standard deviation from 2.6 to 3.3. For our ?xed model we chose ? 11 5 and 3, respectively. We note that ? log2 3 KB appears a bit higher than the median nntp article size of around 2 KB reported in [Adams92]. This difference probably means that, when an nntp server has a “fresh” article, it tends to have more than one. Figure 15 shows the performance of the various models. As expected, none of the models does well due to the great variations from dataset to dataset, though scaling the models helps somewhat. In general, the analytic model performs acceptably only when scaled, and the empirical models only on the earlier LBL datasets and the scaled BC datasets. But the analytic model does as well as than either empirical model, indicating that the log-normal approximation is no worse than the inherent variation in the distributions. Figure 29 in Appendix D shows the tail performance of the models. Again we show only the upper tail because with bulk transfer the lower tail is not of much interest. The unscaled models do quite badly, not surprisingly, given their poor overall performance. The upper 10% tail is only safe when scaling the models, and the upper 1% tail only when scaling empirical models. From this ?gure we conclude that the nntp models must always be scaled, and even then the otherwise somewhat successful analytic model is problematic due to grossly overestimating the upper 1% tail. One ?nal important point regarding modeling nntp originator bytes is that the distribution is not stationary but changes over the course of a day. Figure 16 shows the hourly ? orig for LBL-1 and LBL-4 non-failure nntp connections (this plot was made by constructing “superhours” as discussed in Section 3.11). We see considerable but not consistent variation. The peak-to-peak differences for both datasets is about a factor of 3.4; but LBL-1’s connections tended to be largest in the middle of the night, with secondary peaks during “primetime” work hours. LBL-4’s connections peaked during working hours and were lowest at precisely the time when LBL-1’s were highest. The test datasets also showed a weekly pattern, with LBL-1 and LBL-4 (and to a lesser extent LBL-2) having minimal ? orig during weekends (with a peak-to-peak variation of about a factor of 3), while LBL-3 had a maximum ? orig on Saturdays (peak-to-peak about a factor of 2). The variation in the daily pattern may be due to the in?uence of key nntp gateways either propagating news as soon as it comes in (consistent with the LBL-4 case) or waiting till the

19

?

?

?

? ?

?

¤ ??

?

?

L2

1.5 Analytic Fit

0.8

0.0

USCl3d1 L6l1 D1l4 d2 BC L5 NC L1 L3 D3 d3 D2 L4 0.0 0.5

0.5

0.4

nc

bcusc l6 l5

Scaled Fit

L1 D2 D3 L4 BC D1 L5 NC

1.0

1.0

1.5

0.0

L3 UCB L6 0.0 0.2 0.4 0.6 0.8 1.0

Empirical Fit

Figure 15: Empirical vs. Analytic Models for NNTP Originator Bytes

Unscaled Fit

Figure 17: Interarrivals for NNTP

X X X

lg Mean Orig. Bytes

500

X X X X X X X X X X X X X X X X X X X XX X XX X XX XX X X X X XXXXX X X XX X X X X

13

600

12

Arrivals

400

LBL-1 LBL-4

X XX X X XX

200

300

0 5 10 Hour 15 20

X X X

11

X

10

0

10

20

30

40

50

60

Seconds

Figure 16: Daily Variation in log2 -mean of LBL NNTP Originator Bytes

Figure 18: One-Minute Variation in DEC-2 NNTP Arrivals

20

late-night hours to take advantage of minimal loads (LBL-1). The weekly variation is more dif?cult to explain. We expect that most news articles are written during the week, and it seems unlikely that nntp gateways would queue a signi?cant number of articles till the weekend. So the strong LBL-3 Saturday peak remains a puzzle.

5.3 NNTP Responder Bytes

As seen in Table 7 above, there is in general much less variation in the bytes sent by an nntp responder than by the originator. For all of the datasets except LBL-5, BC and NC, the responder sent fewer than 1500 bytes in 82% or more of the connections. For LBL-5 this value was 77%, for BC 65%, and for NC 63%. Thus we decided not to model nntp responder bytes, as in general the datasets do not show interesting variations. We did compute the correlation coef?cients between log2 of the originator and responder bytes and found a range from 0.37 (NC and UCB) to about 0.8 (USC, BC). In general we would expect to ?nd positive correlation since the more articles offered by the originator to the responder, the more replies the responder must generate.

past. All of the nntp datasets show this pattern to varying degrees except for LBL-3; LBL-4 shows two distinct spikes. With NC, UCB, and LBL-6, the spike is quite sharp. With the other datasets, it is broad, like in Figure 18. The sharp spikes mean that only a relatively small fraction of the interarrivals are skewed, evidently enough to preserve suf?cient approximation to a Poisson process. We also investigated one-hour variation. We found a threeminute pattern in LBL-1, ?ve-minute patterns in LBL-5 and UCB, ?fteen-minute patterns in LBL-6 and BC, and a lessstrong twenty-minute pattern in DEC-2. Figure 30 in Appendix D summarizes the tails corresponding to the scaled and unscaled models. Both models underestimate the lower 10% tail but overestimate the lower 1% tail; the upper 10% tail is well modeled but the upper 1% considerably underestimated. As in Figure 15 above, we see little difference between the scaled and unscaled models.

6

6.1

SMTP

Overview of SMTP Connections

5.4 NNTP Duration

Since nntp is a bulk-transfer protocol and not interactive, we do not model connection durations, because these are presumably dominated by networking latencies and not a fundamental aspect of the nntp protocols. Similarly, below we do not model smtp or ftp durations.

5.5 NNTP Interarrivals

Figure 17 shows the scaled and unscaled interarrival models.8 The results appear puzzling. The UCB, LBL-3, LBL-6, and to some degree NC arrivals all appear well-modeled as Poisson processes with hourly rates corresponding to those in Figure 2. The other datasets, including the remaining LBL datasets, are poorly ?tted both when scaled and when unscaled, indicating that they are not Poisson processes. The poor ?t to the BC data is in part due to its very low nntp arrival rate: less than two connections per hour on average. The other poorly-?tted datasets turn out to have interesting periodic behavior. In particular, nntp arrivals have a de?nite one-minute periodicity about them. Figure 18 shows the number of DEC-2 nntp connections that arrived during each second (i.e., ignoring minutes and larger units of the arrival time). Clearly, arrivals tended to show up at about 19 seconds past the minute, though some tended to arrive about 7 seconds

8 The actual points for DEC-3 and LBL-4, and for LBL-3, LBL-6 and UCB, overlapped on this plot and became hard to distinguish, so we have added some horizontal bias away from the diagonal. These points all actually lie on the diagonal.

Table 8 summarizes the smtp connections. Again, Appendix B summarizes the reasons for removing the connections marked as rejects. Based on the values for maxorig it is clear that smtp is sometimes used to transfer quite large ?les, though that is not its main purpose. There is quite a bit of variation in ? orig (and just about none in ? resp ). In [WLC92] the authors note that the UK smtp data show a substantially higher (arithmetic) ? orig than for the LBL-1 and LBL-2 datasets reported in [Paxson91]. They attribute this difference to the fact that since the U.K. academic network (JANET) was not at that time fully connected to the Internet, U.K. users were more likely to use smtp to transfer ?les. The large UK orig variance supports this hypothesis. The DEC traf?c has similar orig values, and Mogul also states in [Mogul92] that an “FTP-by-mail” facility is responsible for about 150 rather lengthy smtp messages at DEC-WRL each day. It is less clear whether this theory explains the large NC message sizes. Another explanation is that perhaps the DEC, NC, and UK traf?c tends to make more smtp “hops”, each of which adds a Received header to the mail message [RFC822], pushing up the average number of bytes9 . One would expect the greater number of hops to be correlated with “wider” widearea traf?c, presumably a property of the NC and especially the UK traf?c, as these sites are at inter-network gateways. But this explanation does not address why the DEC traf?c might tend to make more hops, unless due to the structure of DEC’s internal mail gateways.

9 A check of one of the author’s mail folders revealed an average Received header length of more than 100 bytes.

21

?

?

? ?

?

Table 8: Summary of SMTP Connections We see a de?nite trend in the LBL data indicating larger and larger mail messages. As discussed in [Paxson93], LBL’s wide-area traf?c did become “wider” during the 29 month period spanned by the LBL datasets, in agreement with the “hops overhead” explanation.

1.0

Data Lower 80% fit (f) Upper 20% fit (g)

6.2 SMTP Originator Bytes

When modeling the number of bytes sent by the smtp originator, we found that nearly all connections transferred more than 300 bytes, while the connections transferring fewer bytes showed sporadic distributions. We hypothesize that the ?rst 300 bytes of these connections constitute a more-or-less ?xed overhead, and that connections with fewer total originator bytes correspond to “failures”: either invalid email addresses or busy remote machines unable to accept mail at the moment. In constructing our models we therefore removed any connections of 300 bytes (anywhere from 0.6% to 2.3% of all connections) and subtracted 300 bytes from the remaining connections. We found the distribution of smtp originator bytes to be bimodal, not surprisingly given that smtp is also used to transfer ?les. We model the distribution using two log-normal distributions, one (called here) for the lower 80% of the data, and one for the remaining 20% ( ). Figure 19 shows this model’s ?t to the LBL-3 test data after removing failures and subtracting 300 bytes; the horizontal line indicates the dividing line between using distribution (below the line) and (above). For our ?xed model, we found the mean of distribution to range from 9.90 to 10.16, and chose ? 10; the standard deviation ranged from 1.42 to 1.52, and we chose log2 2 75 1 46. For , the upper distribution, the means ranged from 8.43 to 9.06, and the standard deviations from 2.55 to 3.52. We chose ? 8 5 (since three of the test datasets had means quite close to 8.5) and 3. Note that, while the distribution has a lower mean than distribution

?

0.0

0.2

0.4

0.6

0.8

5

Figure 19: Log-Normal Fit to LBL-3 SMTP Originator Bytes , we use only ’s upper 20% tail, which is larger due to ’s signi?cantly higher standard deviation. Figure 20 shows that this analytic model is highly successful compared to the empirical models. In virtually every case it performs as well or better than the empirical models, usually better. We also see that scaling consistently improves the performance of all three models, often substantially, indicating that there is considerable site-to-site variation in bytes transferred. Figure 29 in Appendix D shows the tail distribution for this model. The models all do well, in general slightly underestimating the upper tails, except for the unscaled UCB model,

? ? ?

22

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

?

?

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

?

?

LBL-1 LBL-2 LBL-3 LBL-4 LBL-5 LBL-6 BC UCB USC DEC-1 DEC-2 DEC-3 NC UK

38,481 51,240 75,418 92,694 123,741 207,485 8,428 16,929 3,498 25,160 10,777 31,631 26,161 10,729

286 572 333 1583 446 6,567 121 61 3 19 5 70 511 129

1.4KB 1.5KB 1.6KB 1.7KB 1.7KB 1.9KB 1.3KB 1.3KB 1.4KB 2.0KB 2.1KB 2.0KB 1.9KB 1.9KB

28 29 26 30 29 30 28 30 23 31 35 32 29 33

2.1MB 7.2MB 1.6MB 1.2MB 2.4MB 37.0MB 1.1MB 0.5MB 0.1MB 2.5MB 4.9MB 5.1MB 1.8MB 4.6MB

331B 334B 334B 335B 320B 321B 324B 334B 337B 340B 341B 338B 340B 319B

?

orig

?

?

?

Dataset

# Conn

# Rej

? orig

maxorig

? resp

resp

maxresp 1.9KB 6.5KB 2.9KB 2,980KB 8.0KB 9.5KB 10.2KB 2.0KB 1.6KB 4.7KB 4.7KB 3.5KB 10.6KB 6.0KB

12 12 12 13 13 13 13 13 12 12 12 12 14 13

? ?

? ? ¤ ??

?

?

?

?

?

¤

?

?

?

10 lg Bytes

15

20

¤

? ?

?

?

Analytic Fit

0.4

lg Mean Orig. Bytes

0.0

0.0

0.2

0.4

Empirical Fit

Figure 20: Empirical vs. Analytic Models for SMTP Originator Bytes which is more severe in its underestimation. As was the case for nntp, for smtp we found that the originator bytes distribution is not stationary. Figure 21 shows the hourly ? orig for LBL-1 and LBL-4 smtp connections after removing those less than 300 bytes and subtracting 300 bytes from the remainder. Unlike nntp, which suffered from inconsistent variations, here the pattern is more stable: connection sizes peak during off-hours, the evening and early morning, and reach minima during peak working hours. We conjecture that uses of smtp to transfer ?les and not messages typed in by users tend to happen off-hours and cause this pattern. Of the four test datasets, LBL-4 shows the greatest peak-to-peak variation, about a factor of 3.3, comparable to the nntp variation. The other datasets are closer to a factor of 2. Unlike nntp, we did not detect a noteworthy weekly pattern.

10.0

10.5

d1 d3 uk nc l6 D2 D3 usc D1 bc l4 L6 l3 UK L3 l5 USCNC BC L4 l1 L5 L1

11.5

d2

LBL-1 LBL-4

0.2

11.0

0

5

10 Hour

15

20

Figure 21: Daily Variation in log2 -mean of LBL SMTP Originator Bytes

Scaled Fit

0.2

0.3

0.4

6.3 SMTP Responder Bytes

We did not model the distribution of the responder bytes in smtp connections, as the responder’s role shows little variation. For the LBL test datasets, in about 75% (73% to 79%) of all connections the responder sent between 300 and 400 bytes, and more than 99% of the connections sent between 100 and 1000 bytes. Of all the datasets, LBL-5 had the lowest proportion of connections sending between 100 and 1000 bytes, still a very high 98%. We also found that the coef?cient of correlation between log2 of the originator bytes and log2 of the responder bytes for the LBL datasets varied from .035 to .246; thus we found little interesting behavior to model in the responses. While reference [DJCME92] ?nds that smtp

0.1

0.0

?

D2 D3 NC UCB BCD1 L1L3L6 L2 L4 L5 0.0 0.1 0.2 UK

0.3

0.4

Unscaled Fit

Figure 22: Interarrivals for SMTP

23

connections are strongly bidirectional, this ?nding must be interpreted with the rather ?xed nature of the smtp responder in mind.

7 FTP

7.1 Overview of FTP Connections

Table 9 summarizes ftpdata connections. Each connection is unidirectional, with sometimes data ?owing from the connection originator to the responder (corresponding to an ftp get command) and sometimes in the other direction. The “Get” column shows the percentage of connections that were get commands; the remainder were put commands. The next three columns show the (geometric) mean, standard deviation, and maximum for the number of bytes transferred. As before, Appendix B gives details regarding the connections we rejected. Two rows are given for each of LBL-5 and LBL-6. As discussed in [Paxson93], a considerable portion of LBL traf?c, particularly in LBL-5 and LBL-6, was generated by background scripts fetching weather maps from a remote anonymous ftp site. LBL-5* and LBL-6* show the ftpdata statistics with this traf?c removed.11 Two rows are given for the UCB dataset. The ?rst includes 2,315 connections of 74 bytes each, all but three of which

10 Here again we have added some horizontal space between the points in the lower-left corner to aid legibility. The LBL points all have unscaled 0 05; the excursion up to 0 1 is an artifact of the plot. 11 Weather-map ftp traf?c comprised 3.2% of LBL-3 connections and 8.5% of LBL-4 connections. Excluding this traf?c lowered LBL-3’s mean to 3.2KB and raised LBL-4’s mean to 4.0KB. The corresponding standard deviations rose to 18.0 and fell to 14.0, respectively.

24

?§? ?§?

?

Figure 22 shows the unscaled and scaled ?ts to the smtp interarrivals.10 Both models do extremely well for almost all of the datasets, indicating these arrivals are well described by the pattern shown in Figure 2. We are somewhat puzzled that the BC interarrivals fared so well with the unscaled model, given the roughly three-hour shift between BC’s arrival activity and LBL’s as shown in Figure 2; evidently there is enough similar overlap during the busy 11AM-4PM times to bring the overall distributions into fairly close agreement. We do not have an explanation for the poor ?t to DEC-2’s interarrivals, though the traf?c, which spanned the Thanksgiving holiday, is certainly atypical in one sense: only 6% of the DEC-2 connections originated from a DEC host, while for DEC-1 and DEC-3 the ?gure is 40-46%. Figure 30 in Appendix D summarizes the tail behavior for the interarrival models. Both the scaled and unscaled models do quite well in the lower and upper tails, only underestimating the upper tail somewhat.

?

6.4 SMTP Interarrivals

were between the same two hosts, and 95% of which came between 30 and 45 seconds apart. The UCB* row summarizes the UCB data with this anomaly removed. In testing our models below we used LBL-5*, LBL-6*, and UCB*. There clearly is quite a range in ? bytes , even day-to-day as shown in the DEC data (though the low-point there, DEC-2, includes Thanksgiving, and might therefore be uncharacteristic). We might be tempted to declare a trend towards increasing ?le sizes with time in the LBL datasets, save for the LBL-6 dataset, which shows a sharp drop. We do not know whether LBL-6 was atypical, or whether the mean ?le size simply ?uctuates a great deal. The uniformly large values of bytes shows that in general ?le sizes vary widely. ?) Finally, computing the coef?cient of variation (i.e., for the ftpdata interarrival times gives values from 2.4 to 8.0, signi?cantly higher than for the other protocols. If the arrivals came from a homogeneous Poisson process, then for ? we would get 1, and if they were perfectly periodic, 0. These high values show that the traf?c is quite bursty. This result is not surprising, as a “multiple-get” ?le transfer results in a rapid succession of ftpdata connections, sometimes quite large (see below). Table 10 summarizes the ftpctrl connections. We have not shown statistics for bytes transferred and duration of the ftpctrl connections themselves since the primary use of ftpctrl connections is to spawn ftpdata connections, either for ?le transfer or to list remote directories. Instead, we grouped with each ftpctrl connection its associated ftpdata connections. We considered an ftpdata connection to belong to a ftpctrl connection if it occurred during the span of the ftpctrl connection and was between the same two hosts. The starred LBL rows summarize the LBL datasets with the weather-map traf?c removed. Unlike with ftpdata, here we include LBL-3 and LBL-4 in the ?ltering, since weather-map traf?c had a substantial in?uence on their ftpctrl connections. Again, we use the starred datasets in our analysis below. The “Orphans” column lists the percentage of ftpdata connections for which we could not associate an ftpctrl connection. High percentages of orphans were often due to ftpctrl connections that were terminated by RST packets instead of FIN packets, which, as explained in Section 2.1 above, were not included in our analysis. The authors of [EHS92] reported about 3% of ftpctrl connections were terminated by RST packets. As seen in the Table there appears to be considerable variation in this value. The “# Overlap” column lists the number of overlapping ftpctrl connections between two hosts. For our analysis we merged such overlaps into a single conversation. The large number of LBL-4 overlapping connections is almost all due to overlapping connections to one of the weather-map sites, as can be seen by the appreciably lower value for LBL-4*. In LBL-4 we observed up to ?ve overlapping connections (typically four), all virtually identical in bytes transferred and

? ? ?

?

? ?

?

Table 9: Summary of FTP Data Connections

Dataset LBL-1 LBL-2 LBL-3 LBL-3* LBL-4 LBL-4* LBL-5 LBL-5* LBL-6 LBL-6* BC UCB USC DEC-1 DEC-2 DEC-3 NC UK # Conn 3,757 5,312 7,920 6,916 11,587 7,941 18,501 9,968 31,734 12,470 669 756 272 727 491 811 2,500 1,733 # Rej 51 72 93 90 191 189 1,227 1,227 535 535 19 19 6 8 8 17 59 35 Orphans 5% 5% 5% 5% 6% 7% 5% 6% 7% 10 % 40 % 15 % 26 % 6% 6% 14 % 7% 5% # Overlap 57 49 135 135 1,012 112 160 108 212 196 2 7 2 18 14 18 49 133 0 xfer 19 % 25 % 19 % 21 % 15 % 17 % 15 % 26 % 21 % 24 % 32 % 26 % 22 % 26 % 13 % 25 % 31 % 24 %

?

Table 10: Summary of FTP Control Connections duration, repeating every half hour for days on end. Evidently a number of weather-map scripts were run in the background on the same host and managed to synchronize. The large number of overlapping UK connections, on the other hand, is due to the high frequency of connections between pairs of popular hosts, such as one vendor’s main Internet site in the U.K. connecting to the anonymous ftp archives of Washington University in Missouri. The authors of [WLC92] noted the Missouri site as the single most popular U.S. ftp site (and, indeed, Missouri was the most popular state in general for 25 U.K.-U.S. traf?c). The next four columns in Table 10 show statistics regarding the number of ftpdata connections that occurred during each ftpctrl connection. The “0 xfer” column lists the percentage of all ftpctrl connections that did not have any associated ftpdata connection. These numbers are lower than the 44% reported in [EHS92] because the authors of that paper were able to distinguish between ?le transfers and remote directory listings; we consider any ftpdata connection to be a “?le transfer”. Presumably a large proportion of the “0 xfer” connections are

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

3.3 3.2 2.7 3.1 2.8 3.3 2.2 3.0 2.2 3.1 3.3 3.9 3.8 5.4 5.0 4.8 5.0 3.4

29 28 28 29 27 30 25 30 25 29 27 26 28 32 30 29 29 30

1,006 388 612 612 1,951 1,951 975 975 2,996 2,996 426 350 133 961 106 232 392 368

28KB 27KB 24KB 30KB 24KB 33KB 28KB 31KB 30KB 31KB 13KB 12KB 20KB 36KB 36KB 36KB 26KB 22KB

?

xfers

?

?

? xfers

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

LBL-1 LBL-2 LBL-3 LBL-4 LBL-5 LBL-5* LBL-6 LBL-6* BC UCB UCB* USC DEC-1 DEC-2 DEC-3 NC UK

23,555 27,917 39,552 65,860 82,025 66,411 123,773 86,464 5,199 6,844 4,529 1,870 7,970 4,013 6,775 19,076 10,018

287 335 349 335 344 344 464 464 58 77 77 29 6 13 25 183 58

80 % 92 % 91 % 86 % 83 % 80 % 89 % 91 % 97 % 98 % 96 % 93 % 100 % 100 % 99 % 98 % 97 %

2.3KB 2.4KB 3.3KB 3.8KB 5.1KB 4.5KB 3.8KB 2.1KB 2.5KB 0.4KB 1.0KB 1.3KB 2.2KB 1.3KB 1.9KB 1.8KB 3.4KB

?

?

Dataset

# Conn

# Rej

Get

? bytes

bytes

maxbytes 54.0MB 124.3MB 61.6MB 67.2MB 176.5MB 176.5MB 291.6MB 291.6MB 16.1MB 22.1MB 22.1MB 4.7MB 4.9MB 7.1MB 12.8MB 43.8MB 6.8MB

15 3 17 4 17 7 14 7 16 0 16 0 16 2 14 9 12 6 11 5 13 5 14 5 16 5 17 1 16 7 19 0 14 2

maxxfers

? bytes

bytes

15 2 17 0 17 8 18 4 17 5 17 6 12 3 16 7 12 5 16 8 14 2 14 9 14 5 15 6 17 8 15 3 18 6 16 0

due to failed attempts to provide log-in information to the remote host. Still, the rates are surprisingly high. The ? xfers and xfers columns give the geometric mean and standard deviation for the number of ?les transferred, given that at least one ?le was transferred. That the mean is substantially higher than one is not surprising since we classify remote directory listings as ?le transfers, and probably the most common use of ftp is to connect to a remote archive site, do several listings to ?nd the ?le or ?les of interest, and then transfer those ?les. We did ?nd the values for maxxfers surprising, as we would expect that large sets of related ?les would be grouped together into a single archive ?le, unless the archive would be too large. This latter hypothesis turns out to be the case: the 1,951 ?les transferred at one time during LBL-4 together totaled 93MB, certainly too much to conveniently pack into an archive. Similarly, the 961 transferred during DEC-1 totaled 50MB. The ? bytes and bytes columns show the geometric mean and standard deviation for the total number of bytes transferred via ftpdata connections during each ftpctrl connection (for those connections with at least one ftpdata transfer). We note that these means are 5-10 times greater than those for ftpdata, an increase larger than that due simply to the multiplying effect of ? xfers . We suspect that this disparity is due to a typical ftpctrl connection including at least one true ?le transfer. As ?les will tend to be signi?cantly larger than directory listings, the mean number of transferred bytes will approach the mean ?le size, and not be held down, as are the ftpdata connection summaries, by a large number of smaller directory listings. The bytes values are quite large, again showing a wide range in transfer sizes.

0.0

0

0.2

0.4

0.6

0.8

1.0

We model the bytes transferred during an ftpdata connection using a log-normal distribution. Figure 23 shows this model ?tted to the ?rst half of the LBL-4 dataset. While the model appears to match the overall shape, a number of clumps and spikes make the actual distribution rather irregular. For example, LBL-4 has a spike of 1,269 connections, each transferring 1,856 bytes. For the most part, unfortunately, these spikes do not occur in predictable locations, making it dif?cult to incorporate them into our analytic model. Such unpredictability also impairs the ability of empirical models to ?t other datasets. One spike stands out, however, being present in all the DEC datasets, the NC dataset, LBL-4, and LBL-5 (but not LBL-6). This spike occurs at 524,288 bytes (i.e., exactly 219 bytes), a size often used when splitting a large distribution archive into manageable pieces. Using the LBL test datasets, we found a range for the loglog2 3000 11 55 mean of 11.27 to 11.88, and chose ? for our ?xed model. The log-standard deviation varied from 4. 3.83 to 4.21; we chose Figure 24 shows the comparison between analytic and em-

Analytic Fit

0.6

7.2 FTP Connection Bytes

0.8

0.0

0.2

0.4

¤

?

??

? ?

?

?

?

?

?

5

10

15

20

25

lg Bytes

?

?

Figure 23: Log-Normal Fit to LBL-4 FTP Data Bytes

d2 D3usc d3nc D2 NC l5 L6 d1 l3 l6 L3D1 l1 bc l4 USC L UK L45 uk BC L1

0.0

0.2

0.4 Empirical Fit

0.6

0.8

Figure 24: Empirical vs. Analytic Models for FTP Data Bytes

26

pirical models for the bytes transferred during an ftpdata connection. For the most part, scaling considerably improves the ?t, as we might expect given the wide range in ? bytes shown in Table 9. The LBL-2 model almost always ?ts better than the UCB model, even though the UCB model has had its anomalous spike at 74 bytes removed. In general the analytic model does well, though it suffers somewhat when ?tted with the DEC datasets, which are very noisy. For modeling ftpdata data bytes we are not particularly concerned with the degree of ?t in the lower tail. The upper tail, on the other hand, is of particular importance because large ?le transfers can consume a tremendous amount of network resources. Figure 29 in Appendix D summarizes the tail performance of the various models. While most of the models ?t the upper 10% tail fairly well (with the exception of the unscaled UCB model), all except the scaled LBL-2 model fare poorly in the upper 1% tail, each overestimating the tail except, again, for unscaled UCB. Thus all of these models must be used with particular care concerning their predictions for large ftpdata connections. The overestimation of the analytic model might be understood at least in part by the tendency to split huge ?les into several pieces (see discussion of the 219 byte spike, above). In this case we would expect such large ?les to be fetched together as a group, and hope that models of the total bytes transferred during an entire ftp conversation might more accurately predict the upper tail. Unfortunately those models actually prove worse; see below.

0.0

0

0.2

0.4

0.6

0.8

7.3 FTP Conversation Bytes

Perhaps more important than an ftpdata bytes-transferred model is a model for the total number of ftpdata bytes transferred due to an ftpctrl connection. Such a model gives an indication of the total impact of each ?le transfer conversation. Figure 25 shows this distribution for the LBL-1 test dataset, again ?tted to a log-normal model. In this case the ?t is visually quite satisfying, and indeed an 2 test indicates this ?t is valid at the 1% level. Unfortunately this ?t is also the best of those to the LBL test datasets; the others fail validity. For the LBL test datasets, the log-mean ranged from 14.85 to 15.20, and the log-standard deviation from 3.82 to 4.18. 4. For the ?xed analytic model we took ? 15 and Figure 26 summarizes the performance of the models. The overall variance of the analytic model is fairly low; ignoring the BC datasets, the ?ts all fall in or quite close to the range 02 0 3. Scaling has only a minor effect on the caliber of the ?ts (except for BC), and in some cases worsens them. Since scaling is bene?cial for modeling ftpdata connection bytes but not ftp conversation bytes, we conjecture that the “mix” between short directory listings and larger ?le transfers varies considerably from site-to-site. Such variation would mean that scaling would aid ftpdata bytes considerably more than ftp conversation bytes, since the latter are dominated by the actual ?les transferred and are relatively unaffected by

0.4

Analytic Fit

0.0

0.2

? ?

?

?

1.0

5

10

15

20

25

lg Bytes

Figure 25: Log-Normal Fit to Bytes in LBL-1 FTP Conversations

bc L4 uk l4 l3 L3 l1L1 D1D3 d1 d3 nc USC UK d2 D2 NCusc l5 L5 BC l6 L6

?

??

¤

?

?

¤

0.0

0.1

0.2

0.3

0.4

Empirical Fit

Figure 26: Empirical vs. Analytic Models for FTP Conversation Bytes

27

0 1 2 3 4 5 6 7-10 11-20 21-99 100+

13-32 % 10-24 % 8-15 % 5-11 % 6-8 % 4-6 % 3-5 % 7-13 % 5-14 % 3-9 % 0-1 %

17-26 % 20-24 % 12-15 % 8-10 % 6-7 % 4-5 % 3-4 % 7-8 % 5-7 % 3-5 % 0-1 %

Scaled Fit

0.20

0.30

# Items

Range

RangeLBL

UCB

0.10

Table 11: FTP Data Items Per FTP Conversation

0.0

L2 L3 L1 L4 L5 L6 0.0 0.10

7.4 Data Items Per FTP Conversation

While ftpdata connection bytes and total ftp conversation bytes are closely related, we were unable to produce a good model of the number of ftpdata connections in each ftp conversation, both because the distribution varies considerably from dataset to dataset and because of the heavy upper tail in the distributions. Table 11 lists the range over all of the datasets for the distribution. For example, for one dataset 8% of all conversations transferred 2 items, while for another 15% did. Site-to-site variation is considerable. Furthermore, there is too much mass in the upper tail to accommodate a geometric distribution, our most likely candidate for modeling. The third column lists the same ranges for just the six LBL datasets, with the weather-related ftp conversations removed. Here the variation is substantially less, indicating that the mix at a particular site is fairly stable over time. Looking at the Table one might wonder whether the LBL data is “holding down” the lower-end of the ranges in the second column for 4 or more items. This turns out not to be the case; removing the LBL data from the tabulation does not change any of the 28

Unscaled Fit

Figure 27: Interarrivals for FTP

8

Summary

We have presented a number of analytic models for describing the characteristics of telnet, nntp, smtp, and ftp connections, drawn from wide-area traces collected from seven different sites, comprising more than 2.5 million connections. While these models are rarely exact in a statistical sense, we developed a methodology for comparing their effectiveness to that of other models, and found that in general they capture the essence of the connections as well or better than the empirical tcplib library. We also compared the models to an empirical model derived from a one-month trace of traf?c at the

directory listings. The analytic model performs noticeably better than the UCB empirical model, but not as well as LBL-2, which overall does quite well. We note that the authors of [DJCME92] reported that 80% of ftp conversations transfer less than 10 KB. But once we remove the 20-30% of conversations that did not transfer any data, half of the remainder transfer more than 32 KB, and a sixth transfer more than 500 KB. Thus if a ?le transfer conversation is not a “failure”, it should not be assumed small. As with ftpdata connections, we again are most concerned with the behavior of the models in the upper tails, summarized in Figure 29 of Appendix D. Each model except unscaled UCB does well in the upper 10% tail, but the analytic model greatly overestimates the upper 1% tail, even with scaling, and only scaled UCB performs well in both regards.

ranges for 4 or more items except to narrow the 11-20 range from 5-14% to 6-14%. Thus the LBL data is not particularly atypical.

7.5

FTP Interarrivals

Figure 27 shows the ?ts of the scaled and unscaled arrival models for ftp conversations. Overall both models perform quite well, with the maximum for the unscaled model about 0.3 and for the scaled model 0.2. We found periodicity in the DEC datasets, with arrivals peaking on the hour and the half hour, too great an interval to much affect individual interarrivals. The deviation in the tails is similarly quite low, as shown in Figure 30 in Appendix D, with almost no distortion in the 10% tails except for the lower tail of the unscaled model, and only moderate distortion in the 1% tails.

D1 D3 BC NC UK D2

0.20

0.30

Table 12: Summary of Analytic Models of Connection Characteristics

Protocol telnet nntp smtp ftp Abs. Fit + Scaling Helpful? Sometimes No Sometimes Sometimes Lower Tail good 1% over, 10% under over okay/over Upper Tail good 1% under under okay/good

+ +

Table 13: Summary of Analytic Interarrival Models Lawrence Berkeley Laboratory, which we found in general to be slightly better at modeling the traf?c than the analytic models. Table 12 summarizes the models characterizing the different protocols’ individual connections. The “Variable” column lists the random variable being modeled, where “orig.” stands for the bytes sent by the connection originator, “resp.” for those sent by the responder, “conn. bytes” the total number of bytes transferred during the connection (for ftpdata), and “conv. bytes” the total bytes transferred during a conversation (an entire ftp session). For telnet, we also modeled ratios between some of these variables, to capture their interdependence. The “model” column lists the models used. Almost all ?rst apply a log2 transformation to the data. One model is logextreme, where the extreme distribution is de?ned by Equation 4; one is exponential; and the remainder are log-normal. Four of the models have restrictions. The telnet responder bytes model describes only the upper 80% of the responses. The telnet “resp. / dur.” models describe the ratio of the responder bytes to the connection’s duration. The ?rst such model does so for those connections whose number of responder bytes fell into the lower 90% of all connections. The second model describes this ratio for those connections in the upper 10% of all responses. Finally, the smtp originator model uses parts of two different log-normal distributions in its description. The lower 80% of the originator distribution is modeled using the lower 80% of the ?rst log-normal distribution; similarly, the upper 20% is modeled using the upper 20% of the second log-normal distribution. The “Parameters” column gives the parameters we used for the ?xed (i.e., unscaled) version of the model. The “Abs.” column summarizes the quality of the model’s ?t in absolute terms: how well we assess the model as describing the random variable’s distribution. A “0” indicates the model describes it adequately, a “+” that the model describes it well, and a “ ” that it does poorly. When two values are given, the ?rst is for the unscaled version of the model and the second for the scaled version. When only one value is given, scaling did not signi?cantly improve the model’s ?t. The Table shows that we assess the models as being at least adequate for every random variable except when modeling nntp originator bytes; in that case the scaled version of the model is required for an adequate description. The “Rel.” column compares the model’s performance to that of the two empirical models, one constructed from the UCB dataset and corresponding to tcplib, and one constructed from the LBL-2 dataset. A “0” indicates that the analytic model performs about equally and a “+” that it performs better. In all cases we found the analytic model does overall at least about as well as the empirical models, though for some of the “0” entries it did somewhat better than the UCB model and

29

?

?

?

ftp

?

conn. bytes conv. bytes

¨

¤

?

?

?

¤

¤

?

?

nntp smtp

orig. bytes orig. bytes

?

lg-norm lg-norm + 300B, lower 80%; lg-norm + 300B, upper 20% lg-norm lg-norm

? ? ? ? ?

11 5; 3; 10; log2 2 75; 8 5; log2 3; log2 3000; 4; 15; 4;

? ? ? ? ? ? ?? ? ? ? ? ?

¤

?

¤

?

?

resp. / dur. resp. / dur.

exp., 0-90% resp. lg-norm, 90-100% resp.

? ?

30 5 3;

1 5;

¤

¤

¤

?

?

?

? ? ?

¤

telnet

?

?

?

orig. bytes resp. bytes duration secs. resp. / orig.

lg-extreme lg-norm, upper 80% lg-norm lg-norm

log2 100; log2 4500; log2 240; log2 21;

?? ? ?? § ? ? ? ?

Protocol

Variable

Model

Parameters

Abs. log2 3 5 log2 7 2 log2 7 8 log2 3 6 0 + 0 + 0 0 /0 0/+ 0/+ 0

Rel. 0 0 0 0 + 0 0 + 0 0

Tails u: over u: over u: Over u: good/okay; l: good/okay both good u: okay; l: OVER/good u: Over u: good u: over u: over

? ¤ ? ¤ ? ¤ ? ¤ ? ¤ ? ¤ ? ¤ ? ¤ ? ¤ ? ¤ ? ??

¨

somewhat worse than the LBL-2 model. The ?nal column summarizes the analytic model’s performance in modeling the tails. A “u:” entry gives the ?t to the upper tail and an “l:” entry to the lower tail. A value of “over” indicates the model substantially overestimates the 1% tail; “Over” that it also somewhat overestimates the 10% tail; and “OVER” that it grievously overestimates both the 1% and 10% tail. Similarly for “under”. For models that do well describing their tails, we chose subjective evaluations of “okay” and “good”. Some models have two evaluations reported for one of their tails; in this case the ?rst is for the unscaled model and the second for the scaled model. Table 13 summarizes the interarrival models for each protocol. Here “Abs. Fit” summarizes the absolute ?t of the model using the same notation as before. Since we did not compare the analytic interarrival models to empirical ones (due to dif?culties in constructing such empirical models), we omit the “relative” ?t. The “Scaling Helpful?” columns indicates whether scaling substantially improved the model. The lone “No” entry, for nntp, re?ects our ?nding that the nntp connection arrival process is not Poisson. The other “Sometimes” entries indicate that for many of the datasets scaling was not needed to produce good ?ts, and the arrivals can be modeled as a non-homogeneous Poisson process with hourly rates given by Figure 2. For those datasets requiring scaling, the arrivals can also be better modeled as a non-homogeneous Poisson process with different hourly rates than those in Figure 2. The last two columns summarize the arrival models ?t in the lower and upper tails. When two values are given, such as for ftp, then the ?rst is for the unscaled model and the second for the scaled model. Table 1, at the beginning of the paper, states our major conclusions. Here we summarize our additional ?ndings: The ratio between bytes sent by a user in a remote-login session and those sent back by the remote computer is about 1:20.

?

differences are large enough that analytic models tend to be just as good a compromise among the varying datasets as empirical models. The essence of the argument presented in this paper is that while wide-area traf?c cannot be easily modeled exactly, if we can abide some inexactness then we can reap the bene?ts of using analytic models instead of empirical ones, without any relative loss of accuracy. We believe the approach discussed in this paper will prove bene?cial for developing future analytic models and for gauging their effectiveness.

9

Acknowledgments

This work would not have been possible without the support and patience of Van Jacobson and Domenico Ferrari. I am also much indebted to Peter Danzig and his coauthors for the UCB, USC, and BC datasets; Jeff Mogul for the DEC datasets; Ian Wakeman and Jon Crowcroft for the UK dataset; Wayne Sung for the NC dataset; and especially to Ram? n C? ceres o a and Sugih Jamin, who between them made all of the nonLBL datasets available. The LBL traces were gathered with the help of Craig Leres and Steve McCanne. Craig was also helpful in understanding nntp-related phenomena. The Bellcore traces were gathered by D. V. Wilson. I also want to thank Terry Speed and particularly Sally Floyd for valuable discussions on both modeling the data and presenting the results; and Domenico Ferrari, Sally Floyd, Van Jacobson, and Jon Crowcroft for their many helpful comments on earlier drafts of this paper. This work was supported by the Director, Of?ce of Energy Research, Scienti?c Computing Staff, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098.

References

[Adams92] R. Adams, USENET Readership Summary Report for Oct 91, Internet Society News, 1(1), Winter, 1992. [AD54] T. W. Anderson and D. A. Darling, Asymptotic theory of certain goodness-of-?t criteria based on stochastic processes, Ann. Math. Statist. 23, pp. 193-212, 1954. G. E. Bryan, “JOSS: 20,000 Hours At A Console”, Proc. of the Fall 1967 AFIPS Conference, Vol. 31.

Of ftp conversations that are not “failures” (no data transferred), half transfer more than 32 KB, and a sixth transfer more than 500 KB.

? ? ? ?

smtp and nntp connections show variations in size over the course of the day, with the largest smtp connections coming during evening and early morning hours, while the peaks of nntp varied considerably. rlogin traf?c can be described by models for telnet traf?c (see Appendix C), but requires scaling for acceptable ?ts, and even then does not in general ?t as well as telnet traf?c. We believe the site-to-site and month-to-month variations in network traf?c characteristics are in part responsible for the success of the analytic models: the inter-site 30

[Bryan67]

[C` ceres89] R. C? ceres, “Measurements of Wide Area Ina a ternet Traf?c”, Report UCB/CSD 89/550, Computer Science Division, University of California, Berkeley, California, 1989.

[CW91]

J. Crowcroft and I. Wakeman, “Traf?c Analysis of some UK-US Academic Network Data”, Proceedings of INET’91, Copenhagen, June, 1991. R. B. D’Agostino and M. A. Stephens, editors, “Goodness-of-Fit Techniques”, Marcel Dekker, Inc., 1986. P. Danzig and S. Jamin, “tcplib: A Library of TCP Internetwork Traf?c Characteristics”, Report CS-SYS-91-01, Computer Science Department, University of Southern California, 1991.

[Mogul92]

J. C. Mogul, “Observing TCP Dynamics in Real Networks”, Proceedings of SIGCOMM ’92, Baltimore, Maryland, August 1992. F. Mosteller and J. W. Tukey, “Data Analysis and Regression”, Addison Wesley, 1977. S. McCanne and V. Jacobson, “The BSD Packet Filter: A New Architecture for User-level Packet Capture”, Proceedings of the 1993 Winter USENIX Conference, San Diego, CA.

[DS86]

[MT77] [MJ93]

[DJ91]

[DJCME92] P. Danzig, S. Jamin, R. C? ceres, D. Mitzel, a and D. Estrin, An Empirical Workload Model for Driving Wide-area TCP/IP Network Simulations, Internetworking: Research and Experience, 3 (1), pp. 1-26, 1992. [EHS92] D. Ewing, R. Hall, and M. Schwartz, “A Measurement Study of Internet File Transfer Traf?c”, Report CU-CS-571-92, Department of Computer Science, University of Colorado, Boulder, Colorado, 1992. E. Fuchs and P. E. Jackson, Estimates of Distributions of Random Variables for Certain Computer Communications Traf?c Models, Communications of the ACM, 13(12), pp. 752-757, December, 1970. A. K. Gupta, Estimation of the mean and standard deviation of a normal population from a censored sample, Biometrika 39, pp. 266-273, 1952.

[Pawlita89] P. Pawlita, “Two Decades of Data Traf?c Measurements: A Survey of Published Results, Experiences and Applicability”, Teletraf?c Science for New Cost-Effective Systems, Networks and Services, ITC-12, M. Bonatti (Editor), Elsevier Science Publishers B.V. (NorthHolland), 1989. [Paxson91] V. Paxson, “Measurements and Models of Wide Area TCP Conversations”, Report LBL-30840, Lawrence Berkeley Laboratory, Berkeley, California, 1991. [Paxson93] V. Paxson, “Growth Trends in Wide-Area TCP Connections”, in submission to IEEE Network. Available as WAN-TCP-growth-trends.ps.Z via anonymous ftp to ftp.ee.lbl.gov. [RFC822] D. Crocker, “Standard for the Format of ARPA Internet Text Messages”, RFC 822, Network Information Center, SRI International, Menlo Park, CA, 1982. B. Kantor and P. Lapsley, “Network News Transfer Protocol”, RFC 977, Network Information Center, SRI International, Menlo Park, CA, 1986. S. Ross, “Introduction to Probability and Statistics for Engineers and Scientists”, John Wiley & Sons, 1987. A. Schmidt and R. Campbell, “Internet Protocol Traf?c Analysis with Applications for ATM Switch Design”, Report No. UIUCDCS-R-921735, Department of Computer Science, University of Illinois at Urbana-Champaign, May, 1992. I. Wakeman, D. Lewis, and J. Crowcroft, “Traf?c Analysis of Trans-Atlantic Traf?c”, Proceedings of INET’92, Kyoto, Japan, 1992.

[FJ70]

[Gupta52]

[RFC977]

[Heimlich90] S. Heimlich, “Traf?c Characterization of the NSFNET National Backbone”, Proceedings of the 1990 Winter USENIX Conference, Washington, D.C. [JS69] P. E. Jackson and C. D. Stubbs, “A study of multiaccess computer communications”, Proc. of the Spring 1969 AFIPS Conference, Vol. 34. V. Jacobson, C. Leres, and S. McCanne, tcpdump, available via anonymous ftp to ftp.ee.lbl.gov, June, 1989. D. Knuth, “Seminumerical Algorithms”, Second Edition, Addison-Wesley, 1981. W. T. Marshall and S. P. Morgan, Statistics of Mixed Data Traf?c on a Local Area Network, Computer Networks and ISDN Systems 10(3,4), pp. 185-194, 1985.

[Ross87]

[SC92]

[JLM89]

[Knuth81] [MM85]

[WLC92]

31

A Details of ?ltering non-WAN traf?c

We ?ltered the traf?c datasets as followed:

?

Each DEC dataset had fewer than 100 telnet connections due to the DEC ?rewall, so we did not include the DEC traf?c in our telnet study. For nntp, we removed four LBL-5 connections of 3.8GB or more at rates of 5.9MB/s or higher. Two connections were removed from DEC-3 since they purported to have transferred in excess of a 1GB at rates of 1MB/s or higher, along with two USC connections of 3GB or more at rates of 40KB/s or higher. We omitted the UK dataset because it contained only four connections. For smtp, we removed one DEC-3 connection as it purported to have transferred 2GB in 20 seconds.

?

As mentioned in Section 2.3, we removed all connections between LBL and U.C. Berkeley.

? ? ? ? ? ? ? ?

The LBL datasets did not include any internal or transit traf?c. The BC dataset consisted of about 65% internal and transit traf?c, which we removed, keeping only traf?c between a Bellcore host and an external host. Similarly, 10% of the UCB dataset was transit or internal traf?c, and another 2% was traf?c with LBL. We removed these connections. The USC dataset had no transit or internal traf?c. The DEC traf?c consisted of about 3% transit traf?c, virtually all (e.g., 99.8% of the DEC-1 transit traf?c) of which was smtp. We left this traf?c in the dataset. We did not identify any internal traf?c in the DEC datasets. The NC dataset had no transit traf?c. We removed internal traf?c, comprising 2% of the connections. About 2.5% of the UK traf?c was internal, which we removed. Another 3.2% of the traf?c was transit (source and destination both outside of the United Kingdom); as the UK traf?c represents a truly wide-area link, we felt it appropriate to include this traf?c in our analysis.

? ?

For ftpdata, we discarded three LBL-4 connections due to each involving 500MB or more at rates exceeding 400KB/s. For rlogin, we rejected one USC connection, purporting 4GB transferred in 40 seconds, as a protocol error. We removed the DEC and UK connections because in each case there were fewer than 100 connections.

C

Modeling rlogin traf?c using telnet models

B Outliers removed from datasets

As noted in Section 3.4, we removed from our analysis connections that did not transmit at least 1 byte in each direction (or, for ftp data transfers, 1 byte total). In addition, we removed connections that were clearly the result of protocol errors. These latter connections were identi?ed by their high purported data volume, transmitted over time scales that required unlikely data rates: For telnet, the additional removals were an LBL-4 connection of 4GB at 600KB/s, four LBL-5 connections of 4GB at rates from 480KB/s to 6.2MB/s, and three NC connections, two of more than 100MB at rates greater than 1.2MB/s, and one of 33MB at 42KB/s. It is possible (though improbable) that this last was a legitimate connection. The very large value given in Table 6 for LBL-4’s maxresp appears to be due to a legitimate connection; it lasted just over 18 hours, for an average data rate of 1.3KB/s. We removed it as an extreme outlier.

Table 14 summarizes the rlogin traf?c in the same manner as Table 6 does for telnet. We expect to ?nd the rlogin characteristics similar to those of telnet connections, since the two protocols are very similar in purpose, though rlogin will usually involve a pair of Unix hosts, which would eliminate one source of variance in the characteristics. Indeed, the LBL rlogin traf?c shown in Table 14 is quite similar to that of the other rlogin datasets, suggesting that the difference in telnet traf?c shown in Table 6 is due to LBL telnet connections tending to be with different types of computers than the other telnet datasets.12 We tested all of the analytic and empirical telnet models discussed in Section 4 on the rlogin datasets as well. In general we found that scaling improved ?tting considerably and was required to achieve adequate ?ts. From Tables 6 and 14 we see that rlogin connections have smaller orig ’s, larger ? resp ’s, and shorter ? dur ’s, so it is not surprising that the telnet models must be scaled. Performance in the tails also often improved when the models were scaled. For the ratio models, however, as with telnet, scaling sometimes improved the rlogin ?ts and sometimes did not. Even after scaling, though, the rlogin ?ts tend not to be as good as those for telnet. The arrival ?ts, however, were

12 Perhaps because the scienti?c community still favors mainframe and VMS machines

32

?

?

?

Responder Bytes

1 2 3 1 2 3

Duration

Upper 1% Tail

a A L l uU

Upper 1% Tail

A L u U

a l

-1

-3

-3

-2

-1

0

1

2

3

-3

-3

-1

-2

-1

0

1

2

3

Upper 10% Tail

Upper 10% Tail

Resp./Orig. Ratio, Upper Tail

1 2 3

Resp./Orig. Ratio, Lower Tail

1 2 3

Upper 1% Tail

Lower 1% Tail

-1

-3

-3

-2

-1

0

1

2

3

-3

-1

u

L U A la

L A l au U

-3

-2

-1

0

1

2

3

Upper 10% Tail

Lower 10% Tail

Resp./Duration Ratio, Upper Tail

1 2 3

Resp./Duration Ratio, Lower Tail

1 2 3

U l u La A

Upper 1% Tail

-1

-3

-3

-2

-1

0

1

2

3

-3

-3

-1

l L A aU u

Lower 1% Tail

-2

-1

0

1

2

3

Upper 10% Tail

Lower 10% Tail

Figure 28: Tail Summaries for TELNET Models

33

Table 14: Summary of RLOGIN Connections quite good for the LBL datasets, not requiring scaling, and acceptable for BC, UCB, and NC, with scaling required for good ?ts. model’s ?t in the upper tail; and “s” and “S” the same for the scaled model.

D Tail Summaries

Figure 28 shows tail summaries for the various telnet models. The text associated with Figure 5 in the main body of the text explains how to read the summaries. Figure 29 shows similar tail summaries for the nntp, smtp, and ftp models. Figure 30 shows tail summaries for the various connection arrival models. A lower-case “u” indicates the ?t for the unscaled model in the lower tail; upper-case “U” the same

NNTP Originator Bytes SMTP Originator Bytes TELNET NNTP

1 2 3

1 2 3

Upper 1% Tail

A L U l u

Upper 1% Tail

1% Tail

0.2

a

1% Tail

l A L U a u

-1

-1

0.0

-0.2

s S u -0.2 U 0.0 10% Tail 0.2

-1.0

0.0

1.0

-3

-3

-1

1

3

-3

-3

-1

1

3

Upper 10% Tail

Upper 10% Tail

FTP Data Bytes

FTP Conversation Bytes

SMTP

1 2 3

1 2 3

Upper 1% Tail

1% Tail

1% Tail

0.0

0.0

U a A l L u

Upper 1% Tail

a A L l U u

0.4

0.4

-1

-1

s

-0.4

u

S U

-0.4

-3

-3

-3

-1

1

3

-3

-1

1

3

-0.4

0.2 10% Tail

Upper 10% Tail

Upper 10% Tail

Figure 29: Tail Summaries for NNTP, SMTP, and FTP Models 34

Figure 30: Tail Summaries for Arrival Models

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

¤

¤

¤

¤

¤

¤

¤

¤

¤

¤

?

?

?

?

?

?

?

?

?

?

LBL-1 LBL-2 LBL-3 LBL-4 LBL-5 LBL-6 BC UCB USC NC

1,436 1,617 3,009 3,305 4,303 4,424 307 340 144 201

2 5 5 13 29 9 1 2 0 0

170B 199B 211B 240B 202B 201B 177B 185B 208B 134B

32 38 36 38 34 37 33 43 38 32

49KB 70KB 44KB 51KB 37KB 246KB 7.9KB 23KB 9.0KB 5.8KB

3.3KB 4.9KB 5.8KB 6.1KB 6.0KB 6.3KB 5.8KB 3.0KB 4.6KB 3.7KB

66 75 60 61 69 71 73 10 3 96 81

0.4MB 2.1MB 1.4MB 17.3MB 2.4MB 5.7MB 0.4MB 0.7MB 0.4MB 0.5MB

251 s 287 s 308 s 306 s 302 s 303 s 462 s 225 s 308 s 283 s

?

?

orig

?

resp

?

?

?

Dataset

# Conn

# Rej

? orig

maxorig

? resp

maxresp

? dur

dur

maxdur 89.3 h 143.9 h 185.4 h 109.3 h 170.4 h 257.3 h 36.8 h 6.1 h 21.2 h 4.3 h

65 70 65 72 73 86 68 68 69 61

s u -1.0

S U 0.0 10% Tail 1.0

FTP

s u S U -0.4 0.0 10% Tail 0.4

赞助商链接

更多相关文章：
**
某轿车动力装置参数的匹配设计英文翻译毕业论文
**

*Abstract* The availability *of* pressure information *of*...The former is obtained *empirically* via system ...actuator, while the latter is *derived* physically....
更多相关标签：