Date: Tue, 30 Apr 2002 22:26:02 -0500 (CDT) From: Rahul Santhanam Ratnasamy-Francis-Handley-Karp-Shenker: 1. The paper introduces the notion of a content-addressable network, which is a scalable, fault-tolerant and low-latency distributed scheme for indexing keys to values. 2. a. 4 - significant contribution. The paper introduces and analyzes an important new idea. b. The simulations are well-designed and the conclusions are valid. The authors could perhaps have run the simulation for a wider range of design choices, to determine which the optimal combination is likely to be in practice. c. The most important limitation of the approach is the latency grows as a power of the number of nodes rather than the logarithm. 3. (1) The paper is well-written. (2) The authors consider a variety of design choices. (3) The experimental results are impressive 4. (1) The authors do not include any analytical justification of their claim that latency is low and that partitioning of the coordinate space is typically more or less uniform. (2) The discussion of landmark orderings is naive. (3) The authors do not analyze how caching and replication perform in combination with their design. (4) There is no discussion of the process by which keys enter and leave the system. 6. An interesting extension would be to design a CAN such that the overlay graph is expander-like in that each node has a constant-size neighborhood and the latency grows logarithmically with the number of nodes. Of course, it is important that the CAN routing and CAN maintenance mechanisms not be too complex. Ratnasamy-Handley-Karp-Shenker: 1. The paper describes a simple, scalable scheme to infer network proximity information and apply this scheme to the problems of topologically-aware overlay construction and server selection. 2. a. 3 - modest contribution. The idea is interesting but seems to have limited applicability. b. The authors perform measurements on many different systems, all of which support their claims for the efficacy of their binning scheme. 3. (1) The simplicity of the binning scheme is compelling. Also, the measurements used for determining a node's bin are non-intrusive.(2) The results showing that binning performs almost as well as nearest neighbor clustering are impressive. 4. (1) As the authors themselves remark, the system does not scale when one considers load on the landmark nodes. Perhaps there is a feasible scheme that is even more distributed? (2) The authors do not sufficiently address the question of where the landmarks are positioned. This might have a significant impact on the scheme. (3) The authors do not discuss the question of how the number of bins affects performance. 6. Theoretically analyze the binning scheme used. Rahul. Date: Tue, 30 Apr 2002 23:11:53 -0500 (CDT) From: Xinghua Shi Paper1: Chord: A Scalable Peer-to-Peer Lookup Service for internet Application Contribution: The paper provides distributed lookup protocol called Chord that addresses the problem to efficiently locate the node that stores a particular data item. It's striking contribution is that Chord is a scalable protocol for lookup in a dynamic peer-to-peer system with frequent node arrivals and departures. Rate: a. 4 in significance: This paper has significant contribution because Chord it presents is really a nice protocol. It's valuable in peer-to-peer, large scale distributed applications. b. 5 in convincing of methodology: The argument is well organized and the ideas and applications are clearly explained. The psuedocode and proof of thermos along with the simulation and results analysis offers a sound base for the arguments. The strongest and/or most interesting ideas in the paper: a. Chord improves the scalability of consistent hashing by avoiding the requirement that every node know about every other node. b.The finger scheme. c.With high probability, the number of nodes that must be contacted to find a successor in an N-node network is O(logN). d. The simulator for Chord protocol is implemented in an iterative or recursive way that is plausible and feasible in engineering. The most striking weaknesses in the paper: a.The protocol is robust in most of the time, but it's vulnerable to malicious attack by hacks or others. Questions: a. Comparisons of the consistenct hashing in this paper with that in previous paper. b. How Chord might work under malicious attack? How to deal with errors in finger tables? Paper2: A Measurement Study of Peer-to-peer File Sharing Systems Contribution: The paper offers a detailed measurement study of two popular peer-to-peer file sharing systems, Gnutella and Napster. It seeks to precisely characterize the population of end-user hosts that participate in these two systems. All the study and results present a close look at the two systems and thus give a practical understanding of the peer-to-peer system in general. Rate: a. 4 in significance: This paper has significant contribution because it gives a detailed study of two typical peer-to-peer systems in real world. There come out some interesting observations through this thorough study. b. 4 in convincing of methodology: The authors explain the issues clearly and have a good organization of the whole paper. The results are from reasonable datasets and analyses, though they can do further from more representative datasets. The strongest and/or most interesting ideas in the paper: a. The fact that in spite of claims that every peer is both a server and a client, Gnutella has an inherently large percentage of free-riders. The most striking weaknesses in the paper: a. They claimed they successfully captured a significant number of data points for them to believe that their results and conclusions are representative for the entire Napster population. But they didn't mention why we can believe them in detail. Questions: a.How about the results if we get different datasets, that is to say, how much effect the dataset has on the results? b. Is there improvement in the drawbacks of Gnutella and Napster in the similar P2P systems so far? From: Ivona Bezakova To: Anda Iamnitchi A Scalable Content-Addressable Network [RFHKS01] 1. State the main contribution of the paper: A new concept of distributed (decentralized) file-sharing system is proposed: Nodes correspond to subsections of d-dimensional space (mapping is random) and they keep track of their neighbors (neighboring subsections). Keys are hashed to appropriate nodes and Manhattan-like metric is used for routing. Subsections can be merged or divided to accommodate a new or failed node. Many improvements are suggested, e.g. to try to minimize latency by grouping nodes geographically close to nearby subsections; replication strategy (use several hash functions); minimize failure errors by using realities (multiple coordinate spaces), etc. 2. Critique the main contribution. a. Rate the significance: 3-4. Nice concept, although I don't think that it is a new idea. If yes, then 4. (Please don't compare with my previous ratings, I've become harsher since the start of the quarter.) b. Rate how convincing: 3-4. See (4) 3. What are the three strongest and/or most interesting ideas in the paper? - Simple idea, easy to implement. - Design improvements (and their total number). 4. What are the three most striking weaknesses in the paper? - Better CAN routing metrics: Is delivery of the message always guaranteed? - The comparison of design improvements: The authors always fix other parameters (or don't assume other improvements) and play with just one, finding the best value. For example, increasing realities reduces path lengths but what happens if we use e.g. overloading coordinate zones at the same time? 5. Name three questions that you would like to ask the authors? - Overloading coordinate zones: What happens if a neighbor I am pointing to fails? (How do I find its peer?) 6. Detail an interesting extension to the work not mentioned in the future work section. - What are the theoretical guarantees (e.g. deviation)? - How to compare design improvements? 7. Optional comments on the paper that you'd like to see discussed in class. - Are the numbers used for simulations reasonable / corresponding to real life? - Comparison to Chord: Which would you prefer? Topologically-Aware Overlay Construction and Server Selection [RHKS02] 1. State the main contribution of the paper: This is a follow-up paper to [RFHKS01]. The authors detail possible uses of binning method of finding geographically close neighbors. They argue that this is a local, distributed, and well-scalable approach and it could be used both for overlay network construction and server selection. 2. Critique the main contribution. a. Rate the significance: 2-3. The binning idea was presented in [RFHKS01]. We are provided with more experiments supporting the idea and the server selection application is a new contribution. b. Rate how convincing: 3-4. I did not find the paper very interesting, after reading the first one. 3. What are the three strongest and/or most interesting ideas in the paper? - Application of binning idea to server selection problem. 4. What are the three most striking weaknesses in the paper? - See (5). 5. Name three questions that you would like to ask the authors? - I am able to compute my bin number by myself but how do I know who else is in my bin? Date: Wed, 01 May 2002 00:51:35 -0500 From: Oleg Pashko Topologically-Aware Overlay Construction and Server Selection. 1.A Binning scheme is proposed such that close nodes are partitioned into local beens (nodes within one been are assumed to be close to one another in terms of topological latency. The scheme is applied to improve strategy for overlay network construction and server selection. 2a.3. Interesting approach to an important problem - mapping the existing network onto the underlying IP-level topology. 2b.Experiments seem to be fair, but not as numerous and extensive as I would expect 3.To me the most interesting thing in the paper was the posed problem itself - an attempt to reflect the topology of nodes in distributed systems.. 4-7. It is not clear if the gathered topological information will be very reliable in a real dynamic peer-to-peer applications. It would be interesting to quantitatively compare this approach with other attempts to incorporate the knowledge of the underlying topology into the network construction/server selection scheme. The described binning is not the only possible approach to the node clustering. Perhaps the problem can be solved in a more general setting - defining some metric/ distance between the nodes (witch does not have to be only a physical distance between the nodes) and applying some standard classification algorithms (nearest neighbor classifier, for example). A Scalable Content-Addressable Network. 1.The traditional hash tables scheme (now called Content-Addressable Network) is extended to large-scale distributed systems in hope to improve scalability (in particular indexing) of P2P systems. 2a-7. 2.5. They consider an important problem of P2P scalability and apply an efficient tool - hash tables. Yet, the idea does not seem to be very new. Also allocating nodes to zones (bins) at random does not make much sense, for it does not improve the global overlay network structure. Topologically-sensitive construction of the CAN overlay network maybe a more interesting direction. The security of the CAN system to the denial of service attacks is an open problem, too. Testing this approach on a real file sharing applications would provide more insight into the effectiveness of the CAN systems. Can binning and hash table approaches be combined into a more efficient network which topology is more congruent with the IP structure?