9512.net
甜梦文库
当前位置:首页 >> >>

Cache Consistency Techniques for Peer-to-Peer File Sharing Networks



MASTERS PROJECT REPORT Cache Consistency Techniques for Peer-to-Peer File Sharing Networks
Jiang Lan Department of Computer Science University of Massachusetts Amherst, MA 01003 jiang@cs

.umass.edu

Advised by Prashant Shenoy and Brian Levine June 26, 2002

1 Introduction
1.1 Cache Consistency Problems in Peer-to-Peer Networks
Peer-to-peer (P2P) ?le sharing systems provide infrastructure for communities to share storage space (e.g., Napster, Gnutella [1], Kazza [2]). Unlike traditional distributed systems, P2P networks aim to aggregate large numbers of computers that join and leave the network frequently and might not have permanent network (IP) addresses. In a pure distributed P2P system such as Gnutella, peers communicate directly with each other and share resources without dedicated servers. In essense, these P2P systems build, at the application level, a virtual overlay network with it own routing mechanisms. Popular usage of P2P applications has focused simple sharing of music or video ?les. However, P2P ?le sharing systems are not limited to this task. Because P2P networks are application-level networks, new functionalities can be easily added and deployed. For example, the Gnutella protocol was designed to share any type of ?le, and new features have been continuously added or proposed to the existing model. These observations have motivated us to consider cache consistency issues in a P2P system, like Gnutella. Using the web model as a comparison, we assume that each ?le in a Gnutella network has an origin server. And correspondingly, the ?le at its origin server is called its master copy. Any user in the network can download any ?le she ?nds through querying, and thus, ?les are replicated by user initiatives. We consider all the non-master-copies of a ?le as replica of the master copy, and assume that only the master copy of a ?le can be modi?ed. Given this terminology, maintaining cache consistency in the Gnutella 1

network means ensuring that when the master copy of a ?le changes, the peers that cache the ?le should be made aware of it. This has two consequences: ?rstly, local access of a cached copy should only be allowed if the copy is consistent with the master copy (not stale); and secondly, peers send back query result messages only if the matching replica ?le is consistent with the master copy.

1.2 Research Contributions of this Report
This report presents the results of our investigation of several cache consistency techniques, broadly divided into push-based or pull-based methods. In addition, we propose using a combination of push and pull (PAP), which our evaluations show improved consistency of replicas in the network. More speci?cally, we focused on the following techniques: 1. Pull with Adaptive TTR, our ?rst algorithm, is presented in Section 4.1.2. The main idea is to poll the origin server of the replicas whenever the TTRs expire. TTR refers to Time-to-Expire, it is an estimation of the duration during which the ?le is less likely to change. We present an adaptive TTR algorithm where the clients calculate TTR values based on the polling history. We refer to this algorithm as Adaptive Pull. 2. Push with Invalidation Messages, our second algorithm, is presented in Section 4.2. Whenever a ?le gets modi?ed at its origin server, we construct an invalidation message and broadcast it into the P2P network. Peers that replicated the same ?le check their local replicas upon receiving the invalidation message, and change their local copies’ status to stale if they are older than the master copy. We refer to this algorithm as Push. 3. Push with Adaptive Pull, our third algorithm, is presented in Section 4.3. It simulaneously employs both Push and Adaptive Pull algorithms to provide replica consistency. The push part of the algorithm is exactly the same as the Push algorithm. However, the pull part of this algorithm is slightly modi?ed from the Adaptive Pull algorithm such that it takes into account a peer’s received invalidation messages and number of active connections to reduce the polling overhead. We refer to this algorithm as Push with Adaptive Pull. We also described brie?y another variant of the poll algorithm, Pull with Static TTR, in Section 4.1.2. However, we don’t emphasize much on it because it’s less ?exible in a dynamically changing network environment, like Gnutella. In order to evaluate the effectiveness and overhead of each technique, we performed a detailed simulation study. We performed simulation experiments for two types of network environments: one is a stable network 2

where all the peers remain in the network throughout the simulation; the other is a dynamic network where peers join and leave frequently. The reason we are interested in both is because there are different P2P applications suitable for different environments, and the cache consistency techniques we studied can be applied to different situations. We derive the following conclusions from the simulation results. For a stable network environment: 1. Push alone achieves almost perfect ?delity when the update rate is lower than query rate. Push with Adaptive Pull achieves only

are comparable. With the assumption that there are normally much less updates than queries in a Gnutella-style network, we expect Push alone to be suf?cient in guaranteeing strong consistency.

when the update rate changes and is in general at least one magnitude lower than the invalidation message overhead when using Push or Push with Adaptive Pull. In applications where the consistency requirements are less stringent, the Adaptive Pull algorithm is a good choice. 3. TTL value determines the reach of invalidation messages. The ?delity of replicas decreases as TTL value increases, when the TTL value gets large enough, both Push and push with adaptive pull achieve almost perfect ?delity. The invalidation message overhead increases almost exponentially as TTL value increases. However, it reaches a plateau after the TTL gets large enough ( in our simulation).

attribute these to the leverage effect of the pulling part of the Push with Adaptive Pull algorithm. 5. The average number of active connections each peer maintains is another important parameter in a Gnutella-style network. The larger this value, the more peers each invalidation message gets broadcasted to. The simulation results show that in both Push and Push with Adaptive Pull algorithms, we get better ?delity as this value increases. We achieve almost perfect ?delity when this value gets large enough ( in our simulation). However, the Push with Adaptive Pull algorithm is more immune

3

 

§ §

to this parameter, for example, it achieves

to

?

?

§



the ?delity achieved with pure Push is

to

times worse than that with Push with Adaptive Pull. We

better ?delity than Push when the average

?

?





only get

to

times worse ?delity as the network size increases by

?

?

network size increases by

times. However, when using the Push with Adaptive Pull algorithm, we times. And more importantly,

?

4. Network size affects ?delity when using the Push algorithm. We get to

?



?

 ¨

to cover approximately

[8] of all the peers in the Gnutella network. times worse ?delity as the

§

In common Gnutella implementations [12] [13], TTL is set to

and it allows the broadcast messages



? ¨

?

§ ??

Adaptive Pull gives

to

better consistency. The poll message overhead is relatively consistent

?

2. Adaptive Pull alone does not guarantee strong consistency. As updates rate decreases by

?

? ? ¤?? ?

better ?delity than pure Push when the update rate and query rate

times,

?

number of connections per peer is . On the other hand, the invalidation message overhead involved

Therefore in practive, we suggest peers to maintain a moderate number of active connections that helps to achieve good ?delity and at the same time, avoids large overhead. 6. Query rate does not have much effects on ?delity when using the Push with Adaptive Pull algorithm.

For a dynamic network environment: 1. A Gnutella application ?xes its number of active connections when it falls below some threshhold. This helps maintaining the connectivity of the Gnutella network and thus helps Push to achieve better

Push. The topology ?x process isn’t necessary when using the Push with Adaptive Pull algorithm. 2. Both the Adaptive Pull and Push algorithms achieve worse ?delity when more disconnections occur

disconnections, and possibly worse ?delity when the network gets highly disconnected. In general, neither of the two algorithms can guarantee strong consistency in a dynamically changing network.

even when the network gets highly disconnected. So if strong consistency is desired, this is the best algorithm among the three. 4. The invalidation message overhead depends on the update rate. If the update rate is low relative to the query rate, then the invalidation message overhead is also lower than the query message overhead. Both the query and invalidation message overheads decrease linearly as more disconnections occur in the network. 5. The poll message overhead is generally at least one magnitude lower than the invalidation message overhead. Our adaptive TTR algorithm adapts to the changes in the number of active connections of each peer, therefore a peer polls more often when it has fewer connections. This adaptiveness helps the polling to achieve good ?delity even in a highly disconnected network environment. The remainder of this report is structured as follows: the next section brie?y describes Gnutella protocol and application. Section introduces the traditional cache consistency techniques for distributed (especially client-server) systems. In Section

we focus our attention on applying the traditional techniques to the

4

?

3. The Push with Adaptive Pull algorithm gives good ?delity (with QFVR and DFVR both below

? ? ? ??



 ?

in the network. Push gives

to

better ?delity than Adaptive Pull when there are moderate

¨

?

?delity. However, using Push with Adaptive Pull achieves

better consistency than using pure

?

We achieved almost perfect ?delity even when the update rate was

times of the query rate.

?

?

? ¤?



increases exponentially, by approximately

magnitudes when this parameter increases from

to .

?

? ¤?

?

?

)

?



Gnutella P2P model, and propose our extensions to those algorithms. Then in Section , we descibe our simulation study and the results of the simulation experiments. We offer conclusions in Section .

2 Overview of Gnutella
Gnutella [1] is a fully decentralized Peer-to-Peer ?le sharing protocol. In essence, Gnutella is a routing protocol where messages are distributed by means of application-level ?ooding. Each node in a Gnutella
?

network only knows its neighbor nodes (also called peers). When a node

sends out a query message,

it ?rst routes the query message to its neighbor peers, the neighbor peers then pass on this query message to their neighbors, and so on. This process stops when the hop count of the message reaches the specifed maximum, de?ned as the Time-to-Live (TTL) value. Nodes that can satisfy the query will pass results back
?

to

following the reverse path of the query message. A peer forward new query messages to all its neighbor

peers no matter it can satisfy the queries or not. Each Gnutella node performs tasks normally assiociated with both servers and clients. As a client, they provide interfaces through which users can issue queries and view search results; as a server, they accept queries form other servents, check for matches against their local store, and respond with corresponding results. A node joins the Gnutella network by initially connecting to one of several known hosts that are intended to be always available (e.g., gnutellahosts.com). Once connected to the network, nodes send group membership messages (PING and PONG) to interact with each other. Users can then send queries and view search results through the client interface. Once a servent receives the search results, it may initiate direct download of one of the ?les in the result set. File download is through the HTTP protocol, and is referred to as “out-of-network” in the Gnutella protocol [1]. In the next section, we introduce some traditional cache consistency techniques for distributed systems. Most of them have been extensively studied and successfully deployed in client-server type systems, such as the Web.

3 Traditional Cache Consistency Techniques for Distributed Systems
Generally there are two broad approaches to achieve cache consistency in a distributed system. One is the client-initiated approach (client-poll, for short); the other is the server-initiated approach (server-push, for short). Most of the studies of these techniques have been con?ned to the client-server model. We use the general terms client and server to indicate the consumer and provider of resources, and use ?le to represent resourses that are hosted by servers. In a web environment, a server could be any web server, a client could

5

?



be a user operating at a local machine with a web browser.

3.1 Client-initiated Algorithms
3.1.1 Poll-Every-Time

The simplest technique is poll-every-time. Each time a cached ?le is accessed by a client, an “if-modi?edsince” message is sent to the server that hosts the ?le. Upon receiving this message, the server will check if its local copy of the speci?ed ?le is newer than the cached copy at the client, then send a reply indicating the result. If the two copies are the same, the client will proceed accessing the ?le, else it will fetch the newer copy from the server before further accessing. The advantage of this approach is its simplicity and strong consistency (ignoring the inconsistency caused by network delay) guarantee. However polling every time will incur large amount of network traf?c and vastly increase server load. In addition, the client response time includes an additional round-trip delay even when the local copy is the same as the server’s copy. 3.1.2 Poll with “Time-To-Refresh”

In the time-to-refresh (TTR) approach [3] [4], a client assigns a TTR value to every ?le it downloads from the server. This TTR value is an estimation of the duration during which the ?le is less likely change. The client will then assume the ?le remains valid (i.e., consistent with the master copy) in its cache for the period of TTR. Any request for the ?le before the expiration of its TTR will be served by the client’s local cache. Upon the expiration of the TTR, the client needs to poll the server to check the validity of the ?le, and updates its local cache copy and its TTR value basing on the server’s feedback. The purpose of using TTR is to reduce network traf?c and server load. Larger TTR results in less network traf?c and lighter server load. However, larger TTR also increases the degree of inconsistency. If measured in time, the worst case inconsistency period is now TTR as opposed to strong consistency in the poll-every-time approach. Ideally, perfect TTR values could be set if updates of objects are predictable. However, this is hardly true in real life. In order to achieve both the goals of reducing overhead and maintaining a high degree of consistency, TTR values must be chosen intelligently. Several techniques have been proposed by Sirnivasan et al. for choosing good TTR values [3], as described below: 1. Static TTR, based on priori assumptions. The simplest approach is to choose a low TTR value based on some heuristics and use it throughout. The dif?culty of this approach is that TTR values are hard to predict and usually ?uctuate over time. Thus this technique works well only for applications 6

where updates to ?les are predicable and more or less follow a uniform distribution, such as online newspapers. 2. Adaptive TTR. In adaptive TTR, a client can compute the TTR values based on the rate of observed changes at the server. Rapidly changing ?les result in smaller TTR, whereas infrequent changes require less frequent polls to the server, and hence, a larger TTR. Sirnivasan et al. summarize a set of techniques for computing the TTR values [3]. We present a variation of it for usage in the Gnutella style P2P system in Section . The main problem of adaptive TTR is that using past to predict future is not always accurate, thus it can’t guarantee strong consistency. However, if some degree of inconsistency can be tolerated, this technique will suf?ce.

3.2 Server-initiated Algorithms
There are two basic approaches in this category: the ?rst one is referred to as server-based invalidation, which requires the server to notify clients with invalidation messages when a ?le gets modi?ed; the second approach is to let the server send the actual changes to the clients instead of invalidation messages at the time of updates. Both approaches are optimal in the number of control messages exchanged between the server and client (since messages are sent only when ?les get modi?ed). However, the invalidation based approach saves more bandwidth if changes are large. It is also more reasonable for clients that aren’t interested in the changes, in this case it costs less overhead than sending the actual updates, and still achieves strong consistency. An intermediate approach is to send invalidation messages when changes are large, and send the actual changes when they are small, but this adds more complexity into the system. For simplicity, we will only discuss the invalidation based approach in this report. In the traditional push-based approach (which, for example, can be implemented in a distributed ?le system), the server must maintain per-?le state consisting of a list of all clients that cache the ?le. In the case of popular ?les in a large distributed system, the amount of state maintained can be signi?cant. In addition, if a client is unreachable due to network partition or other reasons, the server must either block on a write request until a timeout occurs, or risk violating consistency guarantees. These limitations are the main reasons that server-based invalidation is not adopted by the current Internet (HTTP servers are stateless). To overcome this limitation, Gary and Cheriton introduced an alternative approach call lease [5], which reduces server space overhead at the expense of more control message overhead. In lease-based approach, the server assigns a lease to each download request from a client. The lease duration indicates the interval of time during which the server will notify the client about any update of the ?le. Once a ?le’s lease

?

7

expires, the client needs to contact the server to renew the lease. Since the server only maintains for each ?le the list of clients with valid leases, shorter leases imply smaller server space overhead. Lease was ?rst proposed in the context of cache consistency in distributed ?le systems [5]. Yin et al. [6] and Duvvuri, et. al. [7] have proposed techniques of using lease in the World Wide Web (WWW) environment. The advantage of using leases is that it guarantees strong consistency with less server space overhead. The dif?cult part is how to choose a good lease value. Static lease values may inhibit systems from scaling. Adaptive lease algorithms have been proposed by Duvvuri, et. al. that address these problems [7]. In the next section, we investigate how to apply the traditional cache consistency techniques to the Gnutella network, and how well would they perform. We also propose variants of those algorithms that are promising to give better degree of consistency. The follow-up simulation study evaluates their performance in terms of the consistency level and control message overhead.

4 Cache Consistency Algorithms for the Gnutella Network
The Gnutella protocol is an open, decentralized group membership and search protocol. The term “Gnutella” also designates the virtual network of Internet accessible hosts running Gnutella-speaking applications, often referred to as the “Gnutella network”. The two main features of the Gnutella network are of a self-organizing topology and the use of ?ooding/back-propagation as its routing protocol. In theory, all the traditional cache consistency techniques described in the previous section could be implemented with some success in such an environment. However, the highly dynamic nature of the Gnutella network requires the algorithms to be fault tolerant and scalable. In addition, ?le downloads are user-initiated: a user may choose to download any copy among the search results, not necessarily the master copy. If we use the parent-child relationship to describe the two peers that uploads and downloads a ?le respectively, the distribution of a ?le then forms a tree-like hierarchy. And this tree structure is very unstable. All the above natures make it very hard to implement a Lease based technique in the Gnutella network, since any peer failure is hard to recover. In the following sub-sections, we describe the advantages and disadvantages of applying the pull-based and push-based techniques in the Gnutella network, and we propose a new push and pull combined technique that is promising to give better degree of consistency.

4.1 Client-initiated Techniques
Implementing client-initiated techniques in a P2P network is no different than implementing it in a clientserver style network. We require that the following information be stored with each ?le for this scheme to work correctly: 8

1. Version Number — the ?le version number is incremented upon each update. 2. Origin Server IP address — where to poll for the status of the master copy. 3. Consistency Status — we de?ned three values here: (a) valid — when a ?le is consistent with its master copy; (b) stale — when a ?le is older than its master copy; (c) possible-stale — when the origin server of a ?le is unavailable (e.g., left the network), we can’t decide the de?nite consistency status of the ?le. We maintain these information for each replica ?le. Each peer can maintain two local stores, one for the ?les it hosts (the master copies); the other for all the replica ?les. 4.1.1 Poll-Every-Time

A poll request is generated whenever there’s a local access for a ?le, or there is a query for the ?le. The poll request includes the ?le name, the ?le’s version number, and it is sent to the origin server of the ?le. The above ?le details are suf?cient for a correct implementation of this algorithm. The advantage of this technique is its simplicity and strong consistency guarantee. Its problem is every local access involves an extra round-trip delay (unless it is the master copy), and every query result is also delayed because of the polling. Because of the control message overhead and degradtion of performance, we consider this algorithm less appealing. 4.1.2 Poll with TTR

In order to implement polling with TTR, we need to maintain another piece of information for each downloaded ?le: the TTR value. The success of the pull-based technique hinges on the accurate estimation of the TTR value. There are various techniques to compute the TTR value, they can be broadly divided into static and adaptive TTR algorithm. 1. Static TTR. This is the simplest TTR algorithm. Basically it chooses a low TTR value (based on some heuristics) and use it throughout. It is easy to implement. The per-?le state overhead is low since computing the TTR does not require any previous history. However, it is hard to choose a good TTR value that both gives a good consistency guarantee and has low control message overhead. 2. Adaptive (also called Dynamic) TTR. To overcome the problems of static TTR, Sirnivasan et al. [3] and Urgaonkar et al. [4] have developed adaptive TTR algorithms that compute TTR values based on 9

the history of ?le updates and other heuristics. Sirnivasan et al. [3] summarizes a set of techniques for computing the TTR values. Urgaonkar et al. [4] proposed a linear increase multiplicative decrease (LIMD) algorithm to adapt the TTR value to the rate of change of the object. The LIMD algorithm gradually increases the TTR value as long as there are no updates at the master copy. On detection of updates, the TTR value is reduced by a multiplicative factor. In this manner it probes the server for the rate at which the object is changing and sets the TTR values accordingly. We developed an algorithm similar to the LIMD algorithm. The precise details are described as follows.

the lower and upper bounds on the TTR values. These bounds ensure that the TTR computed by the algo-

minimal duration between successive updates. Although this value changes over time and systems, we can

After each poll, the server will send back its version number of the ?le, the client then computes the next value based on the following algorithm (we use VN to denote the ?le version number): .

2. The estimated TTR value for the next poll is calculated as follows:

where C is a constant. Our assumption here is that if the ?le undergoes no change at the server during the previous TTR period, we should probably use a larger TTR for the next poll.

still want to reduce the next TTR value.

10

!$ ! ? ? &%#" 9???

!$ ! ? ? ? X b A@#"4 9?`cV F ? WB #"4?¤#?A@??`YW ¨ R??? a ! !$" ? ? ? X V UT ? ?

#"§?¤#?A@)?? ! !$" ? ?

3. We compute a dynamic TTR value basing on both

and

, (3)

?

!¨ 6 (  S2?"? &?)' F 2R%5&Q)' 1"31"$ (

?

I

where

is a con?gurable parameter. It is necessary since when

IPBH!75"?? &))' G5422&$ )' ?#"4§?¤#?&%?? ¨ 6 ( F 1"31" (  ! !$" ? ? !$ ! ? ? &@#"4 9?

! ¨ 6 ( ' 1"31"$ ( 75?"? &))? 5422&0)'

If

, (2) , we

D B !$ ! ? ? ?  ! !$" ? ? ECA@#"4 9?8#"4§?¤#?&%??

! ¨ 6 ( '  1"31"$ ( 75?"? &))?5422&0)'

If

, then (1)

!$ ! ? ? &%#" ??

1. Each ?le maintains its most recent TTR value,

?¨?? ? ¤ ? ? ?  ? ?

still use a statistically collected value as an estimate. The algorithm begins by initializing

?¨§¤ ?? ? ? ?

rithm is neigher too large nor too small.

is typically set to the most-frequently-changed-objects

 ¤ ? ? §??

?¨§???? ? ¤ ? ?

For each object, the algorithm takes as input two parameters:

and

, which represents

.

?

? ? ??

the current poll result. 4. The ?nal TTR value must be with the lower and upper bound,

The adaptive TTR algorithm has the following features.

2. Each cached ?le only need to store the last used TTR value and the ?le’s version number in order to calculate the next TTR. This requires very small per-?le state-space overhead. 3. Failure recover is easy. When a peer recovers, it can simply reset the TTRs of all cached objects to .

4.2 Server-initiated Techniques — Broadcasting with Invalidation Messages
We have described previously that the main message routing protocol in a Gnutella network is ?ooding. Our push-style cache consistency technique utilizes this feature, and has the advantage of a stateless server.

information: the origin server ID (e.g., IP address), the ?le name, and the new version number of the ?le. A
? ?

in the message, and invalidates its local copy if the message contains a newer version number. It passes on

ping messages. The advantage of this push-style algorithm is its simplicity and stateless server. It guarantees strong consistency (ignoring inconsistencies caused by network delays) given that all the peers that caches the ?le are reachable from the origin server. However, it does add a lot of unnecessary control message overhead into the network, especially when an object is cached only at a few peers. In order to implement this algorithm in the Gnutella network, the following changes need to be made to the protocol and its implementation:

A new descriptor 1 , invalidation descriptor, need to be added to the Gnutella protocol. This descriptor

1

A descriptor is, in Gnutella terminology, a message with some ?xed format.

¤

message

to its neighbors no matter if it contains the specifed ?le, just like the way it deals with query or

¤

peer receives

will check if it caches ?le

, if yes, it compares the local version number of

11

¤

¤

dation message

is constructed and ?ooded into the network. Message

contains at least the following



?

The idea is simple: whenever a master copy (for example, of ?le

V

I D

1. It provides several tunable parameters, namely

, , and

b UT ? ? ?  ¤ ? ? @b ¨ R??%§??? a

that can be used to control its behavior.

) gets updated at peer



?

V

?¨??? a ?§? ??7???  ¤ ? ¤ ? ? ¨ ? ¤ "¨ ? ?

?

V

where

is a con?gurable parameter. We suggest using

? ??

?

? ??

to put more emphasis on

(4)

?¨§¤ ? ? ? ?

, an invali-

with that



contains the Origin Server ID, the File ID, and the new Version Number of the File. The origin server ID could simply be that node’s IP address; the File ID must uniquely identify a ?le in the whole system, an example could be (Origin Server IP + Author Name + File Name); the new Version Number is necessary to identify if a cached copy is out of date.

messages are large enough to reach every peer, the push-based algorithm described above should give us a strong consistency guarantee. However, things are not that simple in real life. And in particular, we are facing the following problems in the real Gnutella network. 1. Not all the peers in the network will receive the broadcast messages. There are two situations this can happen: one is when the network is partitioned; the other is when a peer is beyond the reach of the speci?ed TTL. 2. Peers in the Gnutella network leave and join the network dynamically. After a peer leaves Gnutella, it won’t be able to receive any further invalidation messages. Even after it rejoins the Gnutella network, it won’t know if it has missed any invalidation message. Given the above scenarios, we conclude that push alone is not suf?cient for maintaining consistency in a large scale P2P environment such as the Gnutella network. To overcome this limitation, we propose a push and pull combined technique that is promising to give good consistency guarantees to all the peers in a large scale network.

4.3 Push and Pull Combined Technique — Push with Adaptive Pull
Push with Adaptive Pull (PAP) technique combines the Push (PUSH) and adaptive poll (PULL) algorithms described earlier in this section. Using Push alone gives strong consistency guarantees to all the peers that 12

 

Whenever a ?le gets updated at its original server, the server broadcasts an invalidation message in the same manner as it will send out a query message. Upon receiving this message, every node in the network will compare the File ID with those in its cache, if a match is found, the cached copy’s version number is compared with that in the message. The cached copy is marked stale if its local copy is older. Unlike a query messages, an invalidation message does not require reply messages when matches are found, therefore results less overhead. To further improve on the control message overhead, a peer can piggyback invalidation messages with outgoing ping or query messages. Ideally, if all the peers are constantly connected in the Gnutella network and the TTLs of the ?ooding

remain online and reachable by the broadcast messages. However, it does not handle well for those peers that are not reachable or when the network is highly dynamic (i.e., nodes leave and join frequently). Therefore we add Adaptive Pull, and we see that it succeeds where Push leaves off. The push part of this algorithm is exactly the same as the invalidation-based push algorithm described in section 4.2.1. The pull part of the algorithm is similar to the adaptive poll algorithm. However, since we are now combining push and pull, ideally only those peers that could not receives the invalidation messages are supposed to poll the servers. It is hard to achieve this ideal case, but we can modify the adaptive poll algorithm to make the pulling less agressive. Particularly, we need to make the following changes: 1. Now we want to take into account the number of active connections a peer has. The heuristics is that the more connections a peer has, the better chance it will receive a future invalidation message. Let

of connections per peer (this can be obtained from measurement studies of Gnutella network). We

average number of connections, and less actively if that peer has more than the average number of connections. 2. All the other parts of the adapive TTR algorithm remain the same as the pure pull based approach. 3. Additionally, whenever a peer receives an invalidation message for a ?le copy of as STALE if the message contains a newer version number for
? ? ? ?

peer updates ?le

the copy is updated (for example, by user’s initiative), the peer will restart polling for the object using the TTR stored with the STALE copy. By combining server-push and client-poll, we provide strong consistency for most of the peers and bounded consistency for a small portion of them. In addition, the traf?c overhead caused by the adaptive polling part is generally much less than the traf?c overhead caused by the pushing part, given carefully chosen parameters for the adaptive TTR algorithm. In the next section, we describe a simulation study of the three cache consistency techniques described previously, i.e., adaptive pull, Push, and Push with Adaptive Pull. 13

D

constant

as in the adaptive TTR algorithm. Although a peer will not poll for a STALE copy, when



Based on Eq.

, a ?le will be polled more actively if the peer that caches the ?le has less than the

’s next TTR value by adding to it a constant value, which we can use the same

 a B !$ ! ? ?  ! !$" ? ? D §b 7¨ ¤6 ?S¨  ?6 ( ?40S¨ ?6 ( B ? WC&%#" 9?? 8#"§?¤#?A@??? X ¨ ? ? ¨ 4 ? ? 3 F ?¨ ( ? 3

?

then need to change Eq.

to the following form: (5)

¨ ? ?3 7¨ ¤6 ?4 (

denotes the number of active connections a peer has, and

denotes the average number

¨ ? 7¨ ?6 (

, it will mark the local . At the same time, the

5 Simulation Study
We conducted a simulation study of the Gnutella-style P2P network. The simulator is written in C, and it uses the CSIM18 simulation library.

5.1 Overview of the Simulator
The basic scheme of the simulation is to initialize a P2P network based on observed statistical data and educated assumptions. Each peer is simulated as a seperate process, and each peer maintains a FIFO queue for incoming requests. A workload generator continues generating query and download requests, the intervals between successive requests follow an exponential distribution. A peer process is an in?nite loop which continuously services incoming requests and performs appropriate actions based on the request type. To simulate cache consistency, we created other processes, for example, an update process, or a poll process. To account for the dynamic nature of the P2P network, we created a process that emulates the behavior of peers leaving and rejoining the network. Our simulation used a lot of statistical distributions to simulate the behavior or resource distributions within the network. These data are derived from measurement studies by Ripeanu et al. [8] and Saroiu et al. [9], and therefore our simulation is close to a real Gnutella network. In addition, we argue that the performance of our proposed cache consistency techniques are not strongly dependent on some of the distributions we choose, we will describe more about this in the result section.

5.2 Simulator Components
The simulator is composed of the following components: a network topology generator, a workload generator, a peer module, an update module, a poll module, a failure module, a download module, and a statistics module. Below we describe each module in details: The network topology generator Although study by Ripeanu et al. and Saroiu et al. showed that the link distribution of the current Gnutella network is quasi-heavy-tail. We did not use this distribution since it hardly affects the cache consistency techniques we propose. To initialize a peer to peer network, the

implementations [12] [13], we de?ned two other parameters: the average number of links per node,

Currently we used three methods to initialize the P2P network:

14

 ¤ 4(

, and the maximum number of links per node,
?

¨ ? 7¨ ¤6 (

of active connections each peer maintains,

. In addition, based on common Gnutella client

.

$1" ? R25" ?(

simulator requires two parameters, the number of peer nodes in the network,

, and the number

3 4?(

1. The ?rst one initializes the network randomly, and the links are all uni-directional links. The
? ?

only guarantee in this case is that each peer has at least one outgoing link, and the total number

2. The second method initializes the network using bi-directional links, and all the peers have the
?

will be fully connected.

3. The third method is similar to the second method except that the result network is guaranteed to be fully connected, i.e., there’s a connected path between any pair of nodes in the network. In addition to generating the link topology, we also initialize the object distribution within the network. We assume all the objects in the network are there from the beginning, objects can be replicated and the replicated (or cached) copies can be deleted, but no new objects are injected into the network

percent of the objects. The peer bandwidth distribution is also simulated based on the study by Ripeanu et al. and Saroiu et al. We made a simpli?cation by dividing peer bandwidth into two broad catagories, one as modem bandwidth (including 56K modems and ISDN), the other as broadband bandwidth (including cable modems, DSL, T1, and T3). For the purpose of studying cache consistency techniques, this simpli?cation is suf?cient. The workload generator The workload generator is a process that continuously generates requests following an exponential distribution of the interarrival time. A request contains the following information:

Object ID. This is the target object for the request. It is chosen according to a Zipf’s distribution, which is based on the observation of the popularity of objects in the Gnutella network in [7].

Source peer ID. This is the peer that sends the request. It is selected randomly based on the following criteria: – The peer must be connected in the network. – The peer does not have the speci?ed object. Or the peer may have a stale or possibly-stale copy of the object.

Request type. This can be QUERY, POLL, or REFRESH depends on if the target object is already in the source peer’s local cache and its consistency state. There are other request types 15



?



?

$ ??¤? ( ?

and

, the objects are distributed following a

rule, i.e.,

percent of the peers owns

$1" @22" ? (

?

?

?

? ??

$ ???? ( ?

during the simulation. Let

denotes the total number of objects in the network. Given

3 4(

same number of links de?ned by the

 ¤ 4(

of links of each peer does not exceed the

.

. However, there’s no guarantee that the result network

3  (

end result is that peers have different number of links. The

parameter is not used here. The

  

used in the simulator, however, the workload generator only generates QUERY, POLL, or REFRESH requests, and a DOWNLOAD request may be generated following a QUERY request with certain possibility.

The workload generator ?rst selects a target object and the source peer of the request. If the source peer does not have the object, a query request is generated and inserted into the source peer’s request

LOAD request is created some time after the query, the download server (i.e., the destination peer) is chosen among those peers that have valid copies of the object. If the source peer has a stale copy of the object, a REFRESH request will be generated and the new version of the object will be downloaded from the object’s origin server, given that it’s online. In the case the source peer has a possibly-stale copy of the object, a POLL request to the origin server of the object will be generated if polling is enabled, otherwise the request is discarded. A request as described above carries all the information needed for processing itself and collecting statistics. There are totally six types of requests: QUERY, DOWNLOAD, INVALIDATION, FAILURE, REFRESH, POLL. Among them the FAILURE request is a bogus one, simply created for the simulation, it does not exist in real implementations. Its function is to notify a peer to stop functioning for a period of time, as a way to simulate a peer’s leaving and rejoining the network. It will be described in more detail in the failure module section. The peer module Peers are simulated as processes. Each peer maintains an incoming request queue. A peer process is essentially an in?nite loop which continuously services new requests in FIFO order. The update module An update process is created for simulating updates to master objects in the system. We adopt the object update distribution of the current web environment. Speci?cally, we divide the objects into four catagories: very fast mutable, very mutable, mutable, and immutable. Each catagory has its own percentage and time between successive updates respectively. Given the above sequence, the percentage and time between successive updates for each category are: (0.5%, 15 minute), (2.5%, 450 minutes), (7%, 1800 minutes), and (90%, 86400 minutes). Because the Gnutella network is

16

¨ T  ? C? ? T

?

queue, and broadcast into the network from there. With certain probability,

  

TTL. This is necessary when the request is a broadcast message, such as a QUERY request. Destination peer ID. This is where the request is sent to. Initially this is the same as the source peer ID. Arrival time. This is the time when the request arrives at the destination peer.

, a DOWN-

primarily used for sharing immutable ?les to date, realistic distribution of object updates is unavailable and the web model is used as an approximation. In the simulator, the average time between successive updates is a con?gurable parameter and the objects are chosen according to the distribution described above. Each update increments the master copy’s version number by one and also updates the last-modi?ed-time of the object accordingly. If push mechanism is used, an INVALIDATION message will be ?ooded into the network. The message contains the following ?elds:

Peers that receive this message will check their local caches. If a match is found, they will compare the version number of the local copy with that in the message, and mark the local copy stale if its version number is smaller. The poll module A poll process is created to take care of all the pollings for the entire system. This module includes an ordered list where POLL requests are inserted according to the scheduled polling time. The poll process continously takes requests off the head of the list and services it. In our implementation, a POLL request is not inserted into the corresponding origin server’s incoming request queue and wait for the response being inserted back. Instead, we simpli?ed the polling by directly comparing the object’s version number with that of its master copy using global information. There are three types of poll results, as described below: 1. Server Unavailable. This happens when the origin server of the object leaves the Gnutella network. The polling peer will mark its cache copy possibly-stale and stop polling further. A possibly-stale copy will be polled again if there’s a query request or local access for it. 2. Object Unmodi?ed. This happens when the master copy has the same version number as the polling peer’s cache copy. In this case, the polling peer will mark its cache copy valid if it is not, and calculates a new TTR value based on the TTR algorithm described earilier. A new POLL request will be constructed and inserted into the polling list based on the new TTR expiration time.

   

The object ID. The object’s origin server ID. The new version number. The new last-modi?ed-time.

17

3. Object Modi?ed. This happens when the master copy has a different version number than the polling peer’s cache copy. The polling peer will simply mark its local copy stale and stop polling. A stale copy will be refreshed when there’s a query request or local access for it. The failure module A failure process is created to simulate the dynamic behavior of the Gnutella network. It simulates peers leaving and rejoining the network. There are three parameters involved in this pro-

The failure process is a loop that continuously generates failure requests. The interval between re-

the following information:

Destination peer ID. This is the peer that is about to leave the network. The peer is chosen

Of?ine duration. This is the duration during which the peer will remain disconnected. It is

Arrival time. Here the arrival time is when the peer gets disconnected from the P2P network.

After being generated, the failure request is inserted into the destination peer’s incoming request queue. The destination peer (say all the un-serviced requests in to
? ? ?

) processes this request by ?rst marking its status as of?ine. Then

’s incoming request queue will be discarded. All the links connected ’s process will sleep for the speci?ed duration. After that peer
? ?

will be tear down. Then peer

will rejoin the network. The rejoining process consists of re-initializing its neighbors and marking its status as online. When a peer becomes of?ine, all its scheduled downloads will be cancelled, its cached objects will become possibly-stale when their TTR expires. The topology ?x module This module logically belongs to the failure module. When a peer leaves the network, it will tear down all its active connections, this will cause each of its neighbor nodes lose one of their active connections. If a lot of nodes leave the network, the network may become more or less disconnected. Real Gnutella client implementations [12] [13] solve this problem by letting a peer client pick new neighbors from its host cache when its number of active 18

? §¤

chosen from an exponential distribution centered around

?

remaining

percent of the nodes are relatively stable in the network.

¨

?

¨

following a

rule, i.e.,

percent of the nodes leave and rejoin the network frequently, the

.

? ??

quests are simulated with an exponential distribution with an average

? ?¤

the average duration peers remain of?ine after disconnections,

.

. Each failure request includes

? ??

from the network at any given time; the time between successive disconnection occurrences,

?

? ??

cess: the maximum of?ine ratio,

, which is the maximum percentage of peers that are disconnected ; and

?

? ??

?

  

connections falls below some threshhold. Following the same idea, we created a topology ?x process to simulate this functionality. Basically what this process does is to check periodically if there are peers with less than average active connections, and add links between these peers. The download module A download process is created to take care of all the scheduled downloads for the entire system. Basically, we created an ordered list where DOWNLOAD requests are inserted in the order of download completion time. The download process continously take off DOWNLOAD requests from the head of the list and services that request. Each download checks if the downloader already has the object or not. If not, it’s a fresh download, the object is simply inserted into the downloader’s local cache; otherwise the cache copy is updated. If the state of a downloaded object is valid, the simulator will start polling that object. However, if the download updates an old cache copy, the old TTR value is used to calculate a new TTR value. In this manner the download process takes care of both fresh downloads and refreshes of stale objects. The statistics module The statistics module is responsible for collecting statistical data in the simulation.

5.3 Simulation Parameters
Because our simulator synthesizes arti?cial traces instead of using a real one, it has a large set of con?gurable parameters for tuning the performance or testing against different hypothesis or realistic measurements. This also makes the simulator more extensible and versatile for different purposes. In this section we list the important parameters used in the simulation, along with their descriptions and default values.

Table 1: General parameters for the simulation

19

Parameter

Description Random seed Number of peers in the network Number of unique objects in the network Average number of active connections each peer maintains Maximum number of active connections each peer may have

Default Value N/A

Percentage of modem-type peers

Average gap between a query and a download request Average time between successive update requests TTL of broadcast messages TTR algorithm for PULL Static TTR value when TTR algorithm is STATIC

seconds seconds

[STATIC, DYNAMIC] minutes

.

.

.

* PAP denotes PUSH with Adaptive Pull combined algorithm.

Table 2: Parameters for the dynamic network environment
Parameter Description This ?ag turns on/off the failure mode Percentage of maximum of?ine nodes Default Value [FALSE, TRUE]

20



Average time between successive topology checks

minutes

i

Average of?ine duration



Average time between successive disconnections

seconds hours

x Q

Parameter for computing dynamic TTRs, see equation

yQ x

Parameter for computing dynamic TTRs, see equation

8

 7C

Parameter for computing dynamic TTRs, see equation

minutes

C

TTR high bound when using dynamic TTRs.



TTR low bound when using dynamic TTRs.

minutes hour

 f

Probablity that a download request follows a query

V

C

Average time between successive query requests

second

R

Cache consistency algorithm used in the simulation

[NONE, PULL, PUSH, PAP ]

V W8

 DC

Length of simulation

hours

8

8 i  3 V  

$    3

C

i

?

v 5 r p ?% 4 Gqp 1 A GGqp 4 r p )u#?#tsGqp Ah %h r p h P HF #Ih QIGE 9 p Gqp ¨h ` #0% U ? ?X U % egc0?X d1 bU U % ec20U S d1 b a `Y 2¨ 2?X 4 ¨ U2T4 S   12 ) PQIGE HF 4 B@9 A

11 )5  76% 4 ? ??)  ?  0X ? h ? ?? ?X ? ?r d ¨ ?! % 1 ¨ ? w ?

112 0(?% ? )'& "!  # ? ?§? ¨¨? ¤?? ????

5.4 Metrics for Performance Analysis
To measure the effectiveness of our proposed cache consistency techniques, there are two main criteria we consider.

valid ratio (QFVR) as

In the same fashion, we de?ne the download false valid ratio (DFVR) as

.

Control message overhead. Flooding invalidation messages into the network and/or polling with TTR both add control message overhead into the network. Comparatively, ?ooding with invalidation messages seems to incur more overhead than polling. However, we will show in the simulation results that the ?ooding overhead will depend on several factors, including the time between successive updates, the dynamic behavior of the network, and the topology of the network. Based on our study, we will propose situations where ?ooding is more appropriate. And we also argue that ?ooding is the basic message routing protocol in the Gnutella network, although it has poor network ef?ciency, currently there are no best solutions to solve this problem. With the rapid research and industry efforts in the P2P networking area, we believe that a better solution for ef?cient message routing in Gnutella network is right around the corner. One advantage of pushing invalidation messages is that the network overhead incured by it will be alleviated as soon as a better message routing scheme is available. Another advantage of it is that the servers are stateless, thus the technique is more faulttolerant. In addition, implementation techniques can be applied to reduce the invalidation message overhead. For example, we can piggyback invalidation messages with PING or QUERY messages.

5.5 Network Environments Used in the Simulation
The simulations are conducted in two different network environments as described below:

The ?rst set of experiments are performed in a perfectly stable network environment. All the peers remain online throughout the simulation, there are no failures of any type. Although this situation is 21

3? (

number of query hits where the object states are shown as valid,

, we can de?ne the query false

3 ? ?( ?

3 ? ( ? 3 ? ?( ?

3 ?( ? 3 ? ?( T T

  

Fidelity. Here we create the term False Valid Ratio (FVR) to measure the degree of ?delity. When a query reaches a peer and there’s a local copy matching the query, if the local object’s state is valid, we want to know the likelyhood that this object is actually NOT valid, i.e., the master copy has a newer version number. By collecting the total number of query hits that are false valid, , and the total

unrealistic for a global P2P network, it gives us some sense of how the techniques will perform in the ideal case.

, as de?ned previously. This situation is close to the Gnutella network in use today. However, the

size of the network used in our simulation is much smaller than the real one, this is due to the limitation of the simulator, where memory and time required to run large scale simulations are prohibitive. For

consistency study, we believe such an environment is suf?cient.

5.6 Simulation Results and Analysis
In the following two sections, we describe the simulation results for the two different network environments as described above. 5.6.1 Simulation Results of a Stable P2P Network

ters are set to the defaults unless explicitly spci?ed.

Pull) alone as the cache consistency technique, the second one uses Push alone, and the third one

22

? ?

? ?

combines Push and Adaptive Pull. Figures

and

?

?

?

ranges from to

. We did three experiments in this batch, the ?rst one uses Adaptive

show our observations:

#" T ? ¤? ! ?

?

?

1 ? ! ? U 2" ?? ? #"4 T ? ??

?

?

to

second, and change the time between successive update requests,

, such that the ratio of

1 ? U 2" ?? ?

query/download FVR. More speci?cally, we ?x the time between successive query requests,

$ ! ? @#"4 T ? ??

1. The ?rst set of experiments measure the effects of time between successive updates,

" ? S2" ¨

All the experiments described in this section were conducted with

set to FALSE. The other parame-



?



most of the simulations, we use a network consisting of

peers and

objects. In terms of cache

?? ? ?

? ? ?

? ?

? ¤



The second set of experiments are performed in a dynamically changing network environment. Here we assume peers leave and rejoin the network randomly, based on the three parameters , , and

, on the ,

Effects of update rate on QFVR, Npeers=500, TTL=8, Iquery=1sec, stable network 0.025 Adaptive Pull Push Push with Adaptive Pull 0.02

Query False Valid Ratio

0.015

0.01

0.005

0 0 2 4 6 8 (time between successive updates) / (time between successive queries) 10

Figure 1.1: Effects of update rate on query false valid ratio

Effects of update rate on DFVR, Npeers=500, TTL=8, Iquery=1sec, stable network 0.025 Adaptive Pull Push Push with Adaptive Pull 0.02 Download False Valid Ratio

0.015

0.01

0.005

0 0 2 4 6 8 (time between successive updates) / (time between successive queries) 10

Figure 1.2: Effects of update rate on download false valid ratio The above two ?gures imply the following:

when the ratio is greater than . Therefore if there are less updates than queries in the system, we expect Push alone to be suf?cient. 23

?

using pure Push when the ratio

is . Push alone achieves almost perfect ?delity

?

(a) Observe that in both graphs, the FVRs of using Push with Adaptive Pull are about

? ¤?

of that of

1 ? ! U 2" ?? ? ? #" T ? ? ?

?

(b) Adaptive Pull alone does not guarantee strong consistency. The FVRs of using Adaptive Pull are

using Adaptive Pull only when the consistency requirements are less stringent, and it is better to use it when the update rate is low relative to the query rate. The control message overhead (measured in terms of number of control messages) of the above experiements are also collected and showed in the following graphs:
Effects of update rate on control message overhead - Push, Npeers=500, TTL=8, Iquery=1sec, stable network 1e+08 Push overhead (number of control messages) in log scale

1e+07

1e+06

100000

10000

Number of query msgs (Push) Number of invalidation msgs (Push) 1000 0 2 4 6 8 (time between successive updates) / (time between successive queries) 10

Figure 1.3: Effect of update rate on control message overhead – PUSH

Push and Adaptive Pull overhead (number of control messages) in log scale

Effects of update rate on control message overhead - Push with Adaptive Pull, Npeers=500, TTL=8, Iquery=1sec, stable network 1e+08

1e+07

1e+06

100000

10000 Number of query msgs (Push with Adaptive Pull) Number of invalidation msgs (Push with Adaptive Pull) Number of poll msgs (Push with Adaptive Pull) 1000 0 2 4 6 8 (time between successive updates) / (time between successive queries) 10

24

? ¨

§ ??

the FVRs of using Adaptive Pull decrease by

to

. From these facts we recommend

?

?

?



to

times higher than using Push with Adaptive Pull. As update rate decrease by

?

?

? ?

folds,

Figure 1.4: Effect of update rate on control message overhead – PAP

Effects of update rate on control message overhead, Npeers=500, TTL=8, Iquery=1sec, stable network 290000 285000 Pull overhead (number of control messages) 280000 275000 270000 265000 260000 255000 250000 245000 Number of poll msgs (Adaptive Pull) Number of poll msgs (Push with Adaptive Pull) 240000 0 2 4 6 8 (time between successive updates) / (time between successive queries) 10

Figure 1.5: Comparison of PULL overhead Figures and

when using push and adaptive pull. We can derive the following conclusions from the obervations: (a) Invalidation message overhead is dominant over poll message overhead for example, two magnitudes higher in our experiments. The invalidation message overhead in pure Push is roughly the same as that in push and adaptive pull. In addition, the invalidation message overhead decreases linearly as the update rate decreases. This suggests that the Push technique is more appropriate for systems where update rates are lower than query rates. (b) Although poll message overhead is relatively trivial comparing to invalidation message overhead, we are still interested in a smart pull algorithm where poll message overhead is reduced

algorithm tends to produce smaller TTR values which results in slightly more polls. In comparison, when push and pull are used together, each invalidation hit will cancel or delay a currently

25

 ?

scheduled pull, and also increase TTR for future pulls. Therefore in Figure

?

?

decreases by

folds. This is expected because when updates are more often, the adaptive TTR

?

when pull is used alone, the poll message overhead decreases by about

?

 ?

when push and pull are used together; Figure

?

 ?

Adaptive Pull respecitively. Figure

?

?

?? ?

 ?

?

show the control message overhead (in log scale) of using Push and Push with compares the poll message overhead when using pull alone and

above shows such a comparison. Observe that as the update rate

, the poll message

Effects of static TTR values on pull - Static Pull, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, stable network 0.016 Download False Valid Ratio (Static Pull) Query False Valid Ratio (Static Pull) 0.014

0.012

False Valid Ratio

0.01

0.008

0.006

0.004

0.002

0 0 2 4 6 8 10 TTR (mins) 12 14 16 18 20

Figure 2.1: Effects of static TTR values on ?delity As the static TTR values increase, there are less polls for the objects in the system, thus the QFVR

the DFVRs is that downloads are picked from those query hits of valid replica, since the majority of these replica are indeed valid, the DFVRs tend to be smaller than QFVRs.

26

? ?

The control message overhead of this set of experiments are shown in Figure

?

?

minutes, the DFVRs increase only by about

folds. Our explanation to the slower increase of

below:

?

?

and DFVR both increase. The QFVRs increase by approximately

?

?

? ?

experiments. The results are shown in Figure

below:

folds from TTR minute to TTR

¨ ?

?

?

?

#"4 T ? ? ? !

and the

to

seconds. We varied the static TTR values from

minute to

minutes across the

?

1 ? U 2" ?? ?

2. The second set of experiments study the pull with static TTR algorithm. We set

?

overhead in Push with Adaptive Pull is

to

less than that in Adaptive Pull. to second

?

? ¤?

 ? ? ?

?

overhead increases by

when the update rate decreases by

?

?

folds. Also the poll message

¨ ?

Effects of static TTR values on control message overhead - Static Pull, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, stable network 8e+06 Pull with Static TTR 7e+06

6e+06 Number of Poll Messages

5e+06

4e+06

3e+06

2e+06

1e+06

0 0 2 4 6 8 10 TTR (mins) 12 14 16 18 20

Figure 2.2: Effects of static TTR values on control message overhead The control message overhead decreases by folds as the static TTR values increase from minute to minutes. Thus it is a tradeoff when selecting static TTR values: large TTR values give worse con-

sistency but less overhead; small TTR values give better consistency but more overhead. In practice, people tend to emphasize more on consistency and use small TTR values. 3. The third set of experiments investigate the effects of different TTL values on ?delity when using Push only. Since TTL values set the scope each broadcast message can reach, an invalidation message can reach more replicas with larger TTL values. Thus we expect the query/download FVRs to decrease

27

? 

with the increase of TTL values, this is demonstrated in Figure

below:

?

?

§

¨ ?

Effects of TTL values on fidelity - Push, Npeers=500, TTL=8, Iquery=1sec, Iupdate=1sec, stable network 0.2 Query FVR (Push) Download FVR (Push) 0.18 0.16 0.14 False Valid Ratio 0.12 0.1 0.08 0.06 0.04 0.02 0 0 2 4 6 TTL 8 10 12

Figure 3.1: Effects of TTL values on ?delity when using push In Figure

Effects of TTL values on control message overhead - Push, Npeers=500, TTL=8, Iquery=1sec, Iupdate=1sec, stable network 6e+07 Push

5e+07 Number of Invalidation Messages

4e+07

3e+07

2e+07

1e+07

0 0 2 4 6 TTL 8 10 12

Figure 3.2: Effects of TTL values on control message overhead when using push From Figure , we observe that the invalidation message overhead increases by almost



?

as TTL values increase from

to . This is close to an exponential increase. However, for a certain 28

? ?

?

? 

The control message overhead of this set of experiments are shown in Figure

?



? 

?



?

from

to . Setting the TTL larger than

? 

?

, we observe that both the QFVR and DFVR fall to zero when the TTL values increase is thus unnecessary. below:

folds

due to the Gnutella routing algorithm where a peer will not route a message it has routed before. 4. We also conducted experiments to investigate the effects of network size on ?delity. The ?rst experiment uses Push, and the second one uses Push with Adaptive Pull. In both the experiments, the

six. The results are shown in the following graph:
Effects of network size on fidelity, TTL=6, Iquery=1sec, Iupdate=1sec, stable network 0.045 0.04 0.035 0.03 False Valid Ratio 0.025 0.02 0.015 0.01 0.005 0 0 500 1000 Network Size (number of peers) 1500 2000 Query FVR (Push) Download FVR (Push) Query FVR (Push with Adaptive Pull) Download FVR (Push with Adaptive Pull)

Figure 4.1: Effects of network size on ?delity Figure

implies that as network size increases, the QFVR and DFVR in pure Push both increase

Adaptive Pull. These suggest that pull leverages the effects of network size in terms of ?delity.

below:

29

?

The control message overhead associated with the above experiments are shown in Figures

? ?

§



More importantly, the ?delity achieved with pure Push is

to

times worse than that with Push with





?



by

and

folds respectively. However, in push with adaptive pull, the increase is only

?

?

network size ranges from

to

. The TTL value for all the experiemnts in this group is set to

to



 7¨ ?6 ?)R ¨ ? ?3 (



? ? ?

 $1" R25" ? (

network con?guration (here

?

? ?

), it reaches a plateau after TTL . This is

? ?

?

?

? ?

folds.

and



? ?

Effects of network size on control message overhead - Push, TTL=6, Iquery=1sec, Iupdate=1sec, stable network 1e+09 Overhead (number of control messages) in log scale 1e+08 1e+07 1e+06 100000 10000 1000 100 10 Number of query msgs (Push) Number of invalidation msgs (Push) 1 0 500 1000 Network Size (number of peers) 1500 2000

Figure 4.2: Effects of network size on control message overhead – PUSH

Effects of network size on control message overhead - Push with Adaptive Pull, TTL=6, Iquery=1sec, Iupdate=1sec, stable network 1e+09 Overhead (number of control messages) in log scale 1e+08 1e+07 1e+06 100000 10000 1000 100 10 1 0 500 1000 Network Size (number of peers) 1500 2000

Number of query msgs (Push with Adaptive Pull) Number of invalidation msgs (Push with Adaptive Pull) Number of pull msgs (Push with Adaptive Pull)

Figure 4.3: Effects of network size on communication overhead – PAP In both the graphs, the invalidation message overhead increases by only

is an important factor on how well peers are connected. To evaluate this metric, we performed two 30

¨ ? ?3 7¨ ?6 ?`(

5. In a Gnutella-style network, the average number of active connections a peer maintains (

! 1 ? U 2" ?? ?  #"4 T ? ? ?

is roughly the same as the invalidation message overhead since

?

increases by

folds. The slow increase is due to the ?xed TTL value. The query message overhead . )

?? ?

folds as the network size

?

groups of experiments using Push and push with adaptive pull respectively. In both groups, the

Effects of average connections per peer on fidelity, Npeers=500, TTL=8, Iquery=1sec, Iupdate=1sec, stable network 0.3 Query FVR (Push) Download FVR (Push) Query FVR (Push with Adaptive Pull) Download FVR (Push with Adaptive Pull)

0.25

0.2 False Valid Ratio

0.15

0.1

0.05

0 0 1 2 3 Average Number of Connections per Peer 4 5

Figure 5.1: Effects of the average number of connections per peer on ?delity From the above graph, we observe that both the query and download FVR decrease to zero as

is small. For these reasons we recommend P2P clients to maintain at least active connections to help achieve better consistency. If the number of active connections differs greatly among peers, Push with Adaptive Pull is the better algorithm than pure Push . The control message overhead of these experiments are shown in the following graphs:

31

?

?

increases to . Push with Adaptive Pull achieves to

times better ?delity than Push when

¨7¨ ¤6 ?4`( ? ? 3  ¨ ? ?3 7¨ ¤6 ?4`(

? 

?





?

values range from

to . The results are shown in Figure

?

below:

?

¨ ? 3 S¨ ?6 ? 4 (

Effects of average connections per peer on control message overhead - Push, Npeers=500, TTL=8, Iquery=1sec, Iupdate=1sec, stable network 1e+08 Overhead (number of control messages) in log scale

1e+07

1e+06

100000

10000

1000 Number of query msgs (Push) Number of invalidation msgs (Push) 100 0 1 2 3 Average Number of Connections per Peer 4 5

Figure 5.2: Effects of the average number of connections per peer on control message overhead – PUSH

Effects of average connections per peer on control message overhead - Push with Adaptive Pull, Npeers=500, TTL=8, Iquery=1sec, Iupdate=1sec, stable network 1e+08 Overhead (number of control messages) in log scale

1e+07

1e+06

100000

10000

1000 Number of query msgs (Push with Adaptive Pull) Number of invalidation msgs (Push with Adaptive Pull) Number of poll msgs (Push with Adaptive Pull) 100 0 1 2 3 Average Number of Connections per Peer 4 5

Figure 5.3: Effects of the average number of connections per peer on control message overhead – PAP

respectively. Observe that both graphs have roughly the same invalidation message overhead, and it

32



¨ ? ?3 S¨ ?6 ?4 (

tocol. The increase becomes slower after

¨ ? ?3 7¨ ?6 ? (

increases exponentially as

 

? 

Figures

and

?

?

show the control message overhead of using Push and Push with Adaptive Pull

increases, which is as expected due to the broadcast routing prois above , this is because nodes stop forwarding

messages they have seen before. The poll message overhead is not affected by this parameter since all polls are made directly towards the origin servers through HTTP. 6. Before we went on to study the dynamic P2P network environment, we conducted a set of experiments to study the effects of query rate on ?delity and control message overhead. We only performed these

Effects of query rate on fidelity - Push with Adaptive Pull, Npeers=500, TTL=8, Iupdate=1sec, stable network 0.002 Query FVR (Push with Adaptive Pull) Download FVR (Push with Adaptive Pull)

0.0015

False Valid Ratio

0.001

0.0005

0 0 2 4 6 Time between successive queries (secs) 8 10

Figure 6.1: Effects of query rate on ?delity The plot shown above suggests that the query rate does not affect ?delity when using Push with Adaptive Pull. This implies that even when the query rate is ten times larger than the update rate, Push with Adaptive Pull algorithm still achieves almost perfect ?delity. Below is the control message overhead associated with this set of experiments.

33

?

?

1 ? U 2" ?? ?

second, and the

values range from

second to

seconds. The results are shown below.

#"4 T ? ? ? !

experiments with the Push with Adaptive Pull algorithm. The

in these experiments are set to

?

?

Effects of query rate on control message overhead - Push with Adaptive Pull, Npeers=500, TTL=8, Iupdate=1sec, stable network 1e+08 Overhead (number of control messages) in log scale

1e+07

1e+06

100000

10000

1000

100

10

Number of query msgs (Push with Adaptive Pull) Number of invalidation msgs (Push with Adaptive Pull) Number of poll msgs (Push with Adaptive Pull) 0 2 4 6 Time between successive queries (secs) 8 10

1

Figure 6.2: Effects of query rate on communication overhead The query message overhead decreases almost linearly as query rate decreases, it deceases by

trend. This is due to the fact that when query rate decreases, there are also less downloads such that the number of replications decreases, which in turn causes less poll messages. 5.6.2 Simulation Results of a Dynamic P2P Network

set to the defaults unless explicitly speci?ed.

1. In our simulation, we created a process that ?xes a peer’s active connections when this number drops

peers that have less than average connections. Our ?rst set of experiments study the effects of this process. Presumably, this process only affects the Push algorithm or the push part of the Push with Adaptive Pull algorithm. Therefore the experiments are performed for both the Push and Push with Adaptive Pull algorithms. In both experiments, the

below.

34

? §

? §

§ ?



values range from

seconds to

seconds. The results are shown in Figures

and

?

?

?

¨ ? ?3 7¨ ¤6 ?40(

below

. This process checks the network periodically and adds more connections between

" ? 75" ¨

All the experiments in this section were conducted with

?

?

when the query rate decrease by

folds. The poll message overhead also decreases with the same

set to TRUE. The rest of the parameters are

? ?

folds

?

? ??

6 %? ? ? ! ?

Effects of the topology fix process on QFVR, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, If=5secs, Rf=50%, Df=2hrs, dynamic network 0.03 Push Push with Adaptive Pull 0.025

Query False Valid Ratio

0.02

0.015

0.01

0.005

0 0 20 40 60 80 100 120 140 Time between successive topology checks (secs) 160 180

Figure 7.1: Effects of time between successive topology checks on query false valid ratio

Effects of the topology fix process on DFVR, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, If=5secs, Rf=50%, Df=2hrs, dynamic network 0.02 Push Push with Adaptive Pull

0.015 Download False Valid Ratio

0.01

0.005

0 0 20 40 60 80 100 120 140 Time between successive topology checks (secs) 160 180

Figure 7.2: Effects of time between successive topology checks on download false valid ratio
? ?

process ran more regularly. However, when using Push with Adaptive Pull, the QFVR and DFVR

35

§ ?



? ?

6 2? ? ? ! ?

§ ??



only increase by

folds and

when the

increases from

seconds to

?

?

§ ?

to

seconds. Both cases tell us that Push achieved much better consisency when the topology ?x

seconds. In



? ?

6 %? ? ? ! ?

?

? §

seconds. In Figure

, the DFVR increases by

folds when the

increases from

seconds

§ ?



6 2? ? ? ! ?

?

?

? §

In Figure

, the QFVR increases by

folds when the

increases from

seconds to

?

?

?

with Adaptive Pull. All these imply that the pull part of the Push with Adaptive Pull algorithm is effective enough such that the topology ?x process is not necessary.

below.
Effects of the topology fix process on control message overhead - Push, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, If=5secs, Rf=50%, Df=2hrs, dynamic network 3e+07

2.5e+07 Overhead (number of control messages)

2e+07

1.5e+07

1e+07

5e+06 Number of query msgs (Push) Number of invalidation msgs (Push) 0 0 20 40 60 80 100 120 140 Time between successive topology checks (secs) 160 180

Figure 7.3: Effects of time between successive topology checks on control message overhead – PUSH

Effects of the topology fix process on control message overhead - Push with Adaptive Pull, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, If=5secs, Rf=50%, Df=2hrs, dynamic network 1e+08 Overhead (number of control messages) in log scale

1e+07

1e+06

100000

Number of query msgs (Push with Adaptive Pull) Number of invalidation msgs (Push with Adaptive Pull) Number of poll msgs (Push with Adaptive Pull) 10000 0 20 40 60 80 100 120 140 Time between successive topology checks (secs) 160 180

Figure 7.4: Effects of time between successive topology checks on control message overhead – PAP 36

§

 §

The invalidation message overhead of running these experiments is shown in Figures

and

??

?



?

addition, the QFVRs and DFVRs of using Push are

to

?

?

times higher than those of using Push

2. Among the three parameters that de?ne the dynamics of the network, the ?rst one we studied is the

time. Intuitively, the larger this value, the more disconnections occur in the network. We performed

below show the results.

Effects of the percentage of maximum offline nodes on QFVR, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, If=5secs, Df=2hrs, dynamic network 0.04 Adaptive Pull Push Push with Adaptive Pull

0.035

0.03 Query False Valid Ratio

0.025

0.02

0.015

0.01

0.005

0 0 5 10 15 20 25 30 35 Percentage of Maximum Offline Nodes 40 45 50

Figure 8.1: Effects of percentage of maximum of?ine nodes on query false valid ratio

37

? 

 ??

? 



F



? ?

?

? ¤

seconds, the

is ?xed at

hours, and the

values range from

. Figures



?

?

?

??

experiments for all the three cache consistency algorithms. In these experiments, the

? ¤?

?

? ?

maximum failure ratio

, which de?nes the maximum percentage of of?ine nodes at any given

? 

? ?

6 %? ? ? ! ?



?

invalidation message overhead decreases slowly (

) as

increases by

 §

?

§

Notice that the plots in Figure

are in log scale but those in Figure

?

??

are not. As we expected, the folds.

is ?xed at

Effects of the percentage of maximum offline nodes on DFVR, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, If=5secs, Df=2hrs, dynamic network 0.03 Adaptive Pull Push Push with Adaptive Pull

0.025 Download False Valid Ratio

0.02

0.015

0.01

0.005

0 0 5 10 15 20 25 30 35 Percentage of Maximum Offline Nodes 40 45 50

Figure 8.2: Effects of percentage of maximum of?ine nodes on download false valid ratio From the above two graphs, we observe that when using Adaptive Pull, both the QFVR and DFVR

than using Push, which suggests that even in a highly dynamic network environment, Push alone still achieves better consistency than that of Adaptive Pull. Using Push with Adaptive Pull gives good

38



 

? ?



The control message overhead of these experiments is shown in

and

??

? ?

?

?

?



 )? ?

algorithm at

is

, while Adaptive Pull and Push give

and

 ? ? ?? ?

? ?? ?

? ?

and consistent ?delity across different

values, for example, the maximum QFVR when using this respectively.



? ? ? ??





? ¤?

? ?

as

increases from

to

. The Q/DFVRs are at least

higher when using Adaptive Pull

below.

?

Push, and the increase in Push is roughly linear, and both the QFVR and DFVR increase by

?

? ¤?





? ?

? ¤?

?

?



increase by

and

folds as

increases from

to

? ¤?

?

; the same trend occurred when using folds

Effects of the percentage of maximum offline nodes on control message overhead - Push, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, If=5secs, Df=2hrs, dynamic network 1e+08 Overhead (number of control messages) in log scale

1e+07

1e+06

100000

Number of query msgs (Push) Number of invalidation msgs (Push) 10000 0 5 10 15 20 25 30 35 Percentage of Maximum Offline Nodes 40 45 50

Figure 8.3: Effects of percentage of maximum of?ine nodes on control message overhead – PUSH

Effects of the percentage of maximum offline nodes on control message overhead Push with Adaptive Pull, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, If=5secs, Df=2hrs, dynamic network 1e+08 Overhead (number of control messages) in log scale

1e+07

1e+06

100000

Number of query msgs (Push with Adaptive Pull) Number of invalidation msgs (Push with Adaptive Pull) Number of poll msgs (Push with Adaptive Pull) 10000 0 5 10 15 20 25 30 35 Percentage of Maximum Offline Nodes 40 45 50

Figure 8.4: Effects of percentage of maximum of?ine nodes on control message overhead – PAP Observe that as increases from to

is due to the fact that when a peer leaves the network, it discards all the scheduled and future polls. Alghough we tries to adapt the TTR algorithm such that it polls more actively when there are more 39

? ?

that there are more disconnections in the network as

?

and

respectively. The decrease of the invalidation messages is consistent with the assumption increases. The decrease of the poll messages

? ¤? 

? 

, the invalidation and poll message overhead decrease by

? ?

?
?

? ?

?

disconnections, since a peer can only judge this by local information, this adaptiveness to network changes is limited. We also observe that the poll message overhead is insigni?cant comparing to the invalidation message overhead, and as shown previously, we achieved much better consistency with this little overhead on top of the invalidation message overhead. Our TTR algorithm’s adaptiveness to network conditions is shown in the following graph.
Effects of the percentage of maximum offline nodes on control message overhead - Adaptive Pull, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, If=5secs, Df=2hrs, dynamic network 250000 Overhead (number of control messages) in log scale

200000

150000

100000

50000

Number of poll msgs (Adaptive Pull) Number of poll msgs (Push with Adaptive Pull) 0 0 5 10 15 20 25 30 35 Percentage of Maximum Offline Nodes 40 45 50

Figure 8.5: Comparison of PULL overhead In Figure

one below it is the poll message overhead using the Push with Adaptive Pull algorithm. Observe that in the Push with Adaptive Pull algorithm, we do poll less than that in the pure adaptive pull. Also notice that the gap between the two plots become smaller as the degree of disconnection increases,

algorithm used in the Push with Adaptive Pull takes into account network dynamics and adjusts to it. However, this adaptiveness is limited due to the fact that it only comes from the change in the number of active connections a peer has, see equation 5.

40

? ?

of experiments to give us similar results as the previous one with the

??

Intuitively, as

increases, there are less disconnections in the network. Thus we expect this set parameter. The results

§ ??



??

?

? ¤



is ?xed at

, the

is ?xed at

hours, and

values range from

seconds to

seconds.

??

3. The second parameter we studied is the time between successive disconnection occurrences,

?



 ? ?

? ¤?



 ? ?

Pull at

, it is only

more at

. This is due to the fact that the adaptive TTR

?

for example, the poll message overhead in Adaptive Pull is

? ¤?

?

? ?

? ¤?

 

?

, the plot on top is the poll message overhead using pure Adaptive Pull algorithm, the

more than that in Push with Adaptive

. The

? ?

turned out to be consistent with our prediction, therefore we only list the graphs below without further explanations.
Effects of the time between successive node disconnections on QFVR, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, Rf=50%, Df=2hrs, dynamic network 0.03 Adaptive Pull Push Push with Adaptive Pull 0.025

Query False Valid Ratio

0.02

0.015

0.01

0.005

0 0 50 100 150 200 250 Average time between successive node disconnections (secs) 300

Figure 9.1: Effects of time between successive node disconnections on query false valid ratio

Effects of the time between successive node disconnections on DFVR, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, Rf=50%, Df=2hrs, dynamic network 0.025 Adaptive Pull Push Push with Adaptive Pull 0.02 Download False Valid Ratio

0.015

0.01

0.005

0 0 50 100 150 200 250 Average time between successive node disconnections (secs) 300

Figure 9.2: Effects of time between successive node disconnections on download false valid ratio The above two graphs both show that Push with Adaptive Pull is the ideal cache consistency algorithm if strong consistency is desired.

41

¨

 ¨

? ?



The control message overhead of the experiments is shown in

and

??

?

?

below.

Effects of the time between successive node disconnections on control message overhead - Push, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, Rf=50%, Df=2hrs, dynamic network 1e+08 Overhead (number of control messages) in log scale

1e+07

1e+06

100000

Number of query msgs (Push) Number of invalidation msgs (Push) 10000 0 50 100 150 200 250 Average time between successive node disconnections (secs) 300

Figure 9.3: Effects of time between successive node disconnections on control message overhead – PUSH

Effects of the time between successive node disconnections on control message overhead - Push with Adaptive Pull, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, Rf=50%, Df=2hrs, dynamic network 1e+08 Overhead (number of control messages) in log scale

1e+07

1e+06

100000

Number of query msgs (Push with Adaptive Pull) Number of invalidation msgs (Push with Adaptive Pull) Number of poll msgs (Push with Adaptive Pull) 10000 0 50 100 150 200 250 Average time between successive node disconnections (secs) 300

Figure 9.4: Effects of time between successive node disconnections on control message overhead – PAP The adaptiveness of our TTR algorithm to network conditions is shown in Figure

42

 ¨

?

.

Effects of the time between successive node disconnections on control message overhead Adaptive Pull, Npeers=500, TTL=8, Iquery=1sec, Iupdate=2secs, Rf=50%, Df=2hrs, dynamic network 260000 240000 Overhead (number of control messages) 220000 200000 180000 160000 140000 120000 100000 80000 60000 0 50 Number of poll msgs (Push with Adaptive Pull) Number of poll msgs (Adaptive Pull) 100 150 200 250 Average time between successive node disconnections (secs) 300

Figure 9.5: Comparison of PULL overhead

6 Summary and Conclusions
In this report we investigated different cache consistency techniques for peer-to-peer ?le sharing systems. Particularly, we used Gnutella as our target framework and proposed and studied three different cache consistency algorithms: Push, Adaptive Pull, and Push with Adaptive Pull. We conducted a detailed simulation study on the performance of the proposed algorithms. Our evaluation criteria are ?delity and control message overhead. Below is a list of the observations and our conclusions. For a stable network environment: 1. Push alone achieves almost perfect ?delity when the update rate is lower than query rate. Push with Adaptive Pull achieves only

are comparable. With the assumption that there are normally much less updates than queries in a Gnutella-style network, we expect Push alone to be suf?cient in guaranteeing strong consistency. In addition, we didn’t count the query reply messages in our simulation. Given low update rates, the invalidation message overhead could be magnitudes lower than the query (and reply) message

of the query rate. It is also worth mentioning that the large invalidation message overhead is due to the intrinsic message passing scheme of Gnutella, it would be improved if a better message routing scheme replaces the existing one. 43

? ?

?

overhead, for example, the invalidation overhead was

times lower when the update rate is

?

?

?

? ? ¤??

better ?delity than pure Push when the update rate and query rate

when the update rate changes and is in general at least one magnitude lower than the invalidation message overhead when using Push or Push with Adaptive Pull. In applications where the consistency requirements are less stringent, the Adaptive Pull algorithm is a good choice. 3. TTL value determines the reach of invalidation messages. The ?delity of replicas decreases as TTL value increases, when the TTL value gets large enough, both Push and push with adaptive pull achieve almost perfect ?delity. The invalidation message overhead increases almost exponentially as TTL value increases. However, it reaches a plateau after the TTL gets large enough ( in our simulation).

attribute these to the leverage effect of the pulling part of the Push with Adaptive Pull algorithm. 5. The average number of active connections each peer maintains is another important parameter in a Gnutella-style network. The larger this value, the more peers each invalidation message gets broadcasted to. The simulation results show that in both Push and Push with Adaptive Pull algorithms, we get better ?delity as this value increases. We achieve almost perfect ?delity when this value gets large enough ( in our simulation). However, the Push with Adaptive Pull algorithm is more immune

number of connections per peer is . On the other hand, the invalidation message overhead involved

Therefore in practive, we suggest peers to maintain a moderate number of active connections that helps to achieve good ?delity and at the same time, avoids large overhead. 6. Query rate does not have much effects on ?delity when using the Push with Adaptive Pull algorithm.

For a dynamic network environment: 1. A Gnutella application ?xes its number of active connections when it falls below some threshhold. This helps maintaining the connectivity of the Gnutella network and thus helps Push to achieve better 44

?

We achieved almost perfect ?delity even when the update rate was

times of the query rate.

?

?



increases exponentially, by approximately

magnitudes when this parameter increases from

to .

?

 

§ §

to this parameter, for example, it achieves

to

?

?

§



the ?delity achieved with pure Push is

to

times worse than that with Push with Adaptive Pull. We

better ?delity than Push when the average

?

?

?





only get

to

times worse ?delity as the network size increases by

?

?

network size increases by

times. However, when using the Push with Adaptive Pull algorithm, we times. And more importantly,

?

4. Network size affects ?delity when using the Push algorithm. We get to

?



?

 ¨

to cover approximately

[8] of all the peers in the Gnutella network. times worse ?delity as the

§

In common Gnutella implementations [12] [13], TTL is set to

and it allows the broadcast messages



? ¨

?

§ ??

Adaptive Pull gives

to

better consistency. The poll message overhead is relatively consistent

?

2. Adaptive Pull alone does not guarantee strong consistency. As updates rate decreases by

?

times,

?

?

Push. The topology ?x process isn’t necessary when using the Push with Adaptive Pull algorithm. 2. Both the Adaptive Pull and Push algorithms achieve worse ?delity when more disconnections occur

disconnections, and possibly worse ?delity when the network gets highly disconnected. In general, neither of the two algorithms can guarantee strong consistency in a dynamically changing network.

even when the network gets highly disconnected. So if strong consistency is desired, this is the best algorithm among the three. 4. The invalidation message overhead depends on the update rate. If the update rate is low relative to the query rate, then the invalidation message overhead is also lower than the query message overhead. Both the query and invalidation message overheads decrease linearly as more disconnections occur in the network. 5. The poll message overhead is generally at least one magnitude lower than the invalidation message overhead. Our adaptive TTR algorithm adapts to the changes in the number of active connections of each peer, therefore a peer polls more often when it has fewer connections. This adaptiveness helps the polling to achieve good ?delity even in a highly disconnected network environment. To sum up, Push with Adaptive pull algorithm achieves the stongest consistency. With this algorithm, we achieve very good consistency (for example, both the QFVR and DFVR measured in the simulations are

message overhead, for example, its control message overhead could be two magnitudes higher than poll message overhead. But we argue that in most Gnutella-style P2P applications, the update rate is generally much lower than the query rate. In this scenario, the invalidation message overhead could be magnitudes lower than the query (and reply) message overhead. On the other hand, Adaptive Pull alone provides less strong consistency, but has the advantage of the least control message overhead. Based on these, we conclude that both the Push with Adaptive Pull and Adaptive Pull algorithms could be good choices for a P2P system, which one to choose depends on the requirements of the application and user demands.

References
[1] The Gnutella Protocol Speci?cation v0.4, Clips2 Distributed Search Solutions, http://dss.clip2.com. 45

?

below

) even in highly dynamic network environments. However, this algorithm has the most control

?

3. The Push with Adaptive Pull algorithm gives good ?delity (with QFVR and DFVR both below

? ? ? ??



 ?

in the network. Push gives

to

better ?delity than Adaptive Pull when there are moderate

¨

?

?delity. However, using Push with Adaptive Pull achieves

? ¤?

better consistency than using pure

? ¤?

?

)

? ? ? ??

[2] The KaZaA website, http://www.kazaa.com/ [3] R. Sirnivasan, C. Liang, and Krithi Ramamritham, Maintaining Temporal Coherency of Virtual Data Warehouses, The 19th IEEE Real-Time Systems Symposium, Madrid, Spain, December 2-4, 1998 [4] B. Urgaonkar, A. Ninan, M. Raunak, P. Shenoy and K. Ramamritham, Maintaining Mutual Consistency for Cached Web Objects, In Proceedings of the 21st International Conference on Distributed Computing Systems (ICDCS-21), Phoenix, AZ, April 2001 [5] C. Gary and D. Cheriton. Leases: An Ef?cient Fault-Tolerant Mechanism for Distributed File Cache Consistency. In Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, pages 202-210, 1989. [6] J. Yin, L. Alvisi, M. Dahlin, and C. Lin. Hierarchical Cache Consistency in a WAN. In Proceedings of the USENIX Symposium on Internet Technologies (USEITS’99), Boulder, CO, October 1999. [7] V. Duvvuri, P. Shenoy, and R. Tewari. Adaptive Leases: A Strong Consistency Mechanism for the World Wide Web. In Proceedings of the IEEE Infocom’00, Tel Aviv, Israel, March, 2000. [8] M. Ripeanu, I. Foster, and A. Iamnitchi. Mapping the gnutella network: Properties of large-scale peerto-peer systems and implications for system design. IEEE Internet Computing Journal, 6(1), 2002. [9] S. Saroiu, P. Gummadi, and S. Gribble, A measurement study of peerto -peer ?le sharing systems, In Proceedings of Multimedia Computing and Networking 2002, January, 2002. [10] A. Chankhunthod, P. B. Danzig, C. Neerdaels, M. F. Schwartz, and K. J. Worrel, A hierarchical Internet Object Cache, USENIX’96, January 1996. [11] K. Sripanidkulchai, The popularity of Gnutella queries and its implications on scalability, February 2001. [12] Limewire web site, http://www.limewire.com/ [13] GTK-gnutella web site, http://gtk-gnutella.sourceforge.net/

46



更多相关文章:
第八章 Squid服务全攻略
它可以提供文件缓存、复制和地址过滤等服务,充 分利用有限的出口带宽, 加快内部...cache_peer_domain web1 www.squidtest.com 注:上面二行的主要意图是 从客户...
EhCache缓存系统的使用
consistency="eventual" /> </cache> <terracottaConfig> <tc-config> <...<cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution.RMICache...
常见计算机词汇
Cache Calculator Call back Catalog Category ...Consistency 剪贴板 内聚 协作图 协作 组合数学 商务...Peer-to-peer computing Performance testing ...
Inside a WANO peer review
Using international peers promotes consistency and sharing of review techniques among regions, and allows communication of independent points of view. A team ...
更多相关标签:

All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。
甜梦文库内容来自网络,如有侵犯请联系客服。zhit325@126.com|网站地图