Date: Thu, 25 Apr 2002 23:01:12 -0500 (CDT) From: Yu Hu Chord: A Scalable Peer-to-Peer Lookup Service for internet Application [Ion Stoica,Robert Morris,David Karger,M.Frans Kaashoek,Hari Balakrishnan] Main contributeion of the paper: This paper showed us a powerful protocol(chord) that solved the challenging problem: in distributed peer-to-peer applications, system needs to determine whichnode stores the specific data item. Chord has its goodness such as implementation in distributed manner, simplicity,correctness and liability to concurrent arrivals and departures and failures. This new protocol is of use to the applications on peer-to-peer distributed system. Critique the main contribution: Significance : 4( significant contribution) Methodology:This paper introduced us an implemented service chord. Firstly, authors gave the theorical fundament of this protocol, which will support its correctness and effetiveness. After mentioning its implementaion, authors simulatedand got results, using which authors showed us a clear view of the protocol's performance. So I think the method that authors used is precise and convincing.The idea of the protocol is of some intuition. The most important limitation of the approach: The protocol is vulnerable to the mailicous attack.It can recover from the failure of some nodes, but it is lack of mechanism to deal with the wrong finger tables that are produced maliciously or mistakely. Most interesting ideas: On every node,it maintained a finger table whose length will not beyond LogN.Itis very interesting and useful. Using this make the procotol correct , scalable and runing in polynomial time. Weakness&Questions: 1.In spite that the protocol will work under failures of some nodes, how about under the condition of wrong finger tables ? 2.Deos the protocol have some mechanism to test whether the node produced rightor mistaking information , which maybe be harmful to whole peer-to-peer system. Interesting Extension: This paper is pretty interesting, I think we should done some more work about how to deal with its partition problem, liability problem and real implementation in real peer-to-peer distributed system. Comments: This is a good paper.Consistent hashing, I think, is a good idea for some functions of applications in distributed system. I deem that the chord will be useful for peer-to-peer in near future. A Measurement Study of Peer-to-Peer File Sharing System [Stefan Saroiu, P.Krishna, Gummadi, Steven D.Gribble] Main contributeion of the paper: Comparing with former papers about peer-to-peer file sharing systems, this paper perfromed a detailed measurement study of the two popular peer-to-peer file sharing systems: Napster and Gnutella.The main goodness of the paper precisely characterize the population of end_user hosts that participate in these two systems. And the paper showed us a heterogeneous view and lack of cooperation across peers. Critique the main contribution: Significance : 4( significant contribution) Methodology: Authors used two crawlers to collect dataset from Napster and Gnutella. And based on those, authors analyzed peers' some characters such as latency,lifetime, bottleneck bandwidth and etc. The results of heterogeneity are reasonable in spite that these datasets can't be said absolute representive. Most interesting ideas: This paper paid focus on peer sites' charateristics instead of that of whole peer-to-peer system . Weakness&Questions: 1.The crawler of Napster collected some dataset, which perhaps can't be representive of the real performance of its peer sites. 2.Can authors give some hypothesis about the heterogeneity of these peer sites?or give some possible explanations ? Interesting Extension: If we trust the results of this paper, can we think of some mechanism to overcome the heterogeneity which is adverse to the design principal of peer-to-peer system ? Comments: This paper is a normal paper to me. Anyway it show us some fresh and useful view about the peer site of peer-to-peet system.However, it is lack of some deep analysis which maybe need more work on it. ############################################################## Date: Thu, 25 Apr 2002 23:05:07 -0500 (CDT) From: Rahul Santhanam Stoica-Morris-Karger-Kaashoek-Balakrishnan: 1.The authors describe a distributed lookup protocol called "Chord" for peer-to-peer applications based on consistent hashing that is scalable, reliable and has small communication and storage costs. 2 a. 4 - significant contribution. The authors develop a compelling theoretical model and show that it functions fairly well in practice. b. The theory is flawless but the experiments could have been better designed. For example, the authors could have compared their system quantitatively with current systems. Also, the experiments are not completely specified in the paper. On the other hand, the results of the experiments do support the authors' claims. The simulation result for simultaneous node failures is very impressive. c. The average number of messages per lookup is probably too high. 3. The central idea of extending hashing to a dynamic situation is an important one. The "Chord" system is simple while being provably efficient. 4. (1) The authors do not clearly specify the mechanism by which nodes/keys enter or leave in their simulation. (2) The authors do not consider the rate at which failures occur - their analysis is static. 6. Perhaps the system would be even more reliable if some redundancy were incorporated into it. Saroiu-Gummadi-Gribble: 1. The authors perform a measurement study of Napster and Gnutella to determine the degree of homogeneity in these systems with respect to various parameters such as bottleneck bandwidths, IP-level latencies, frequency of connection/disconnection, co-operation between hosts etc. The measurements show that there is significant heterogeneity and lack of co-operation across peers, calling into question the validity of current theoretical studies of peer-to-peer architectures. 2. a. 2 - modest contribution. b. Unfortunately, the authors' methodology is not very convinving. As they themselves admit, information provided by peers can be unreliable. Moreover, it is not clear that their crawls are representative of the peer population. This is particularly true in the study of Napster. c. The main weakness of the approach is in the way experimental data is collected. 3. The authors gather a large amount of data and their analysis does support their claim of heterogeneity in Gnutella and Napster. 4. (1) There is the question of whether the experiments are ethical. The authors do not display any awareness of the ethical issues raised by their experimental methodology. (2) It is not clear to what extent heterogeneity is a characteristic of the particular systems they are studying rather than of peer-to-peer systems in general. (3) The authors are wrong when they say that the distribution of latencies over the entire population of peers might be similar from different hosts - the distribution most likely depends critically on the position of the host in the network topology. (4) There are several redundancies in the paper. 6. An open problem is to actually design and test a P2P protocol that actually takes into account characteristics of hosts. Rahul. ######################################################## Date: Thu, 25 Apr 2002 23:29:20 -0500 (CDT) From: Xinghua Shi Reading7: Paper1: Topologically-Aware Overlay Construction and Server Selection Contribution: The paper not only shed light on the importance of understanding network topology, but also uses applications on the proposed scheme based on this idea. The authors presented a simple distributed binning scheme and applied it to applications of overlay network construction and server selection.They also used simulation and Internet measurement traces to show significant performance improvements with only approximate topological information. They focused on the use of simple topological hints to solve certain application-level problems which differed from conventional researches. So their research has more practical value in the research and design of networks. It's also the first paper I've read mentioning the drawbacks of using traceroute since we have read several papers using it to collect dataset. Rate: a. 4 in significance: This paper has significant contribution because its results have practical value in the networks world. b. 4 in convincing of methodology: The argument is well organized and the ideas and applications are clearly explained. They repeated experiments on different topologies to draw unbiased conclusion.The claims and conclusions follow from the experiments and the authors are careful in their claims by comparing their method to other mechanisms. The strongest and/or most interesting ideas in the paper: a.The simple idea of distributed binning scheme which is similar to cluster classifier. b.The simple binning strategy is really useful in the applications of actual Internet system. c.Their simulation shows that for a given topology the density of nodes being binned does not affect the gain ratio. d. Even the coarse-grained topological information provided by their binning strategy can significantly improve the performance of systems such as overlay networks and CDNs. The most striking weaknesses in the paper: a. They are some uncertain elements that maybe have effect on their results of experiments, e.g. The random assignment of link delays. Questions: a. What are the specific advantages and drawbacks of tracerouter? b. Is it possible to find more applications suitable for their binning strategy? c. How the random assignment of link delays affects their results in the experiments? Paper2: A Scalable Content-Addressable Network Contribution: The paper has a detailed research in Content_Addressable network and demonstrates its scalability, robustness and low-latency properties through simulation. They addressed two key problems in the design of CANs: scalable routing and indexing. The authors argued that large-scale distributed systems could benefit from hash table functionality like software systems. They also explain one possible application of CAN, i.e. "grass-roots" content distribution system. They first constructed their design and then did certain improvements on that. This evolusional method really works well and is practical and useful. Rate: a. 4 in significance: This paper has significant contribution because its research on CAN is detailed ans inspiring. b. 3 in convincing of methodology: The argument is well organized and the issues are clearly explained. They made some quantity analysis to make their argument sound. But they can be more convinced if they built their simulation on different topologies and did some comparative work. The strongest and/or most interesting ideas in the paper: a. Large-scale distributed systems could benefit from hash table functionality like software systems. b. Use multiple hash functions to improve data availability The most striking weaknesses in the paper: a. Some evasions because of the limitations in their research, e.g. they denied to decide on tradeoffs between improved routing performance and system robustness on the one hand and increased per-node state and system complexity on the other because they had no enough understanding of the requirement. Questions: a. So far, are there some research results about the tradeoffs between improved routing performance and system robustness on the one hand and increased per-node state and system complexity on the other hand? b. Because there're very similar ideas in the two papers, so can we compare them to get a wider viewpoint? #################################################### Date: April 25, 11:50 pm. From: Ivona Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica, Morris, Karger, Kaashoek, Balakrishnan] 1. State the main contribution of the paper: The authors propose and implement a new peer-to-peer protocol (for sharing a common large database). It is based on theoretical method of consistent hashing and has many excellent provable properties: Every node stores information of size O(log N), search is O(log N), add/remove/fail node is O(log^2 N). The authors even consider the case when several nodes join at the same time - they can guarantee a finite (but maybe long) search time. 2. Critique the main contribution. a. Rate the significance: 5. The method is provably correct and practical experiments suggest very good constant hidden in O-notation. It scales and performs well even in dynamic environment. b. Rate how convincing: 5. Clear, well-written, mathematically correct, experiments provided. 3. What are the three strongest and/or most interesting ideas in the paper? - Distribution of the routing table: the finger tables and keeping pointers to roughly 2^i-th successor. - Solution to the dynamic setting. - Many ideas for future work. 6. Detail an interesting extension to the work not mentioned in the future work section. Could be the idea of consistent hashing used for peer-to-peer networks of Gnutella/Napster type (redistributing information when a node disconnects might be hard). 7. Optional comments on the paper that you'd like to see discussed in class. See 6. What are the implications/limitations of this study with respect to the statistics given in the other paper (although the statistics is specific for Gnutella/Napster)? A Measurement Study of Peer-to-Peer File Sharing Systems [Saroiu, Gummadi, Gribble] 1. State the main contribution of the paper: A detailed study of user behavior and systems characteristics of Gnutella/Napster. The authors use a crawling to obtain information about bandwidth, latency, number of files to share/download, etc. Based on the statistics, the user body is very far from homogeneous and the authors conclude that this should be taken into account when designing a peer-to-peer network protocols. Other interesting observations include: High percentage of users (>25% for Gnutella) are "free-riders", i.e. they just download files. In Napster, many users (30%) tend to lie about their bandwidth. 2. Critique the main contribution. a. Rate the significance: 4. Many interesting and important (for future research) statistics are provided. b. Rate how convincing: 4. The authors clearly specify all routines used and present data in a clear form. Obtaining any kind of information is hard and therefore there are many limitations here (read below) but the authors seem to keep this in mind and discuss data accordingly. c. What is the most important limitation of the approach? The crawlers - possibilities for investigating the graphs are limited. The same is true about trying to estimate the bandwidth. Another issue is users connecting from many different IP-addresses 3. What are the three strongest and/or most interesting ideas in the paper? The study itself and the optimistic idea of using crawlers to capture the state of the network. 4. What are the three most striking weaknesses in the paper? Read limitations. 5. Name three questions that you would like to ask the authors? 6. Detail an interesting extension to the work not mentioned in the future work section. In a power-law network if one floods neighbors up to a specified depth, how likely is this going to capture statistical distributions accurately? (I.e. if only sqrt(N) vertices are covered?) What is the right d given some accuracy? 7. Optional comments on the paper that you'd like to see discussed in class. - How could we use the statistics to design new search/replication algorithms? - One would say that percentage of free-riders is even higher. Could the free-riders be hiding somehow so that the crawler does not find them (e.g not responding to ping)? - Is there another parameter that we implicitly use in search algorithms? - What other methods could be used to obtain information about Gnutella/Napster graphs and their parameters?