Date: Thu, 25 Apr 2002 23:01:12 -0500 (CDT)
From: Yu Hu <yuhu@cs.uchicago.edu>

Chord: A Scalable Peer-to-Peer Lookup Service for internet Application
[Ion Stoica,Robert Morris,David Karger,M.Frans Kaashoek,Hari Balakrishnan]

Main contributeion of the paper:
This paper showed us a powerful protocol(chord) that solved the
challenging problem: in distributed peer-to-peer applications, system
needs to determine whichnode stores the specific data item. Chord has its
goodness such as implementation in distributed manner,
simplicity,correctness and liability to concurrent arrivals and departures
and failures. This new protocol is of use to the applications on
peer-to-peer distributed system.

Critique the main contribution:
Significance :  4( significant contribution) 
Methodology:This paper introduced us an implemented service
chord. Firstly, authors gave the theorical fundament of this protocol,
which will support its correctness and effetiveness. After mentioning its
implementaion, authors simulatedand got results, using which authors
showed us a clear view of the protocol's performance. So I think the
method that authors used is precise and convincing.The idea of the
protocol is of some intuition. 

The most important limitation of the approach:
The protocol is vulnerable to the mailicous attack.It can recover from the
failure of some nodes, but it is lack of mechanism to deal with the wrong
finger tables that are produced maliciously or mistakely.

Most interesting ideas:
On every node,it maintained a finger table whose length will not beyond
LogN.Itis very interesting and useful. Using this  make the procotol
correct , scalable and runing in polynomial time. 

Weakness&Questions:
1.In spite that the protocol will work under failures of some nodes, how
about under the condition of wrong finger tables ?
2.Deos the protocol have some mechanism to test whether the node produced
rightor mistaking information , which maybe be harmful to whole
peer-to-peer system.

Interesting Extension:
This paper is pretty interesting, I think we should done some more work
about how to deal with its partition problem, liability problem and real
implementation in real peer-to-peer distributed system.

Comments:
This is a good paper.Consistent hashing, I think, is a good idea for some
functions of applications in distributed system. I deem that the chord
will be useful for peer-to-peer in near future.


A Measurement Study of Peer-to-Peer File Sharing System
[Stefan Saroiu, P.Krishna, Gummadi, Steven D.Gribble]

Main contributeion of the paper:
Comparing with former papers about peer-to-peer file sharing systems, this
paper perfromed a detailed measurement study of the two popular
peer-to-peer file sharing systems: Napster and Gnutella.The main goodness
of the paper precisely characterize the population of end_user hosts that
participate in these two systems. And the paper showed us a heterogeneous
view and lack of cooperation across peers.

Critique the main contribution:
Significance :  4( significant contribution) 
Methodology: Authors used two crawlers to collect dataset from Napster and
Gnutella. And based on those, authors analyzed peers' some characters such
as latency,lifetime, bottleneck bandwidth and etc. The results of
heterogeneity are reasonable  in spite that these datasets can't be said
absolute representive. 

Most interesting ideas:
This paper paid focus on peer sites' charateristics instead of that of
whole peer-to-peer system . 

Weakness&Questions:
1.The crawler of Napster collected some dataset, which perhaps can't be
representive of the real performance of its peer sites.
2.Can authors give some hypothesis about the heterogeneity of these peer
sites?or give some possible explanations ?

Interesting Extension:
If we trust the results of this paper, can we think of some mechanism to
overcome the heterogeneity which is adverse to the design principal of
peer-to-peer system ?

Comments:
This paper is a normal paper to me. Anyway it show us some fresh and
useful view about the peer site of peer-to-peet system.However, it is lack
of some deep analysis which maybe need more work on it. 


##############################################################
Date: Thu, 25 Apr 2002 23:05:07 -0500 (CDT)
From: Rahul Santhanam <rahul@cs.uchicago.edu>

Stoica-Morris-Karger-Kaashoek-Balakrishnan:

1.The authors describe a distributed lookup protocol called "Chord"  for
peer-to-peer applications based on consistent hashing that is scalable,
reliable and has small communication and storage costs.

2 a. 4 - significant contribution. The authors develop a compelling
theoretical model and show that it functions fairly well in practice.
  
  b. The theory is flawless but the experiments could have been better
designed. For example, the authors could have compared their system
quantitatively with current systems. Also, the experiments are not
completely specified in the paper. On the other hand, the results of the
experiments do support the authors' claims. The simulation result for
simultaneous node failures is very impressive.

  c. The average number of messages per lookup is probably too high.

3. The central idea of extending hashing to a dynamic situation is an
important one. The "Chord" system is simple while being provably
efficient.

4. (1) The authors do not clearly specify the mechanism by which
nodes/keys enter or leave in their simulation. (2) The authors do not
consider the rate at which failures occur - their analysis is static.

6. Perhaps the system would be even more reliable if some redundancy were
incorporated into it. 


Saroiu-Gummadi-Gribble:

1. The authors perform a measurement study of Napster and Gnutella to
determine the degree of homogeneity in these systems with respect to
various parameters such as bottleneck bandwidths, IP-level latencies,
frequency of connection/disconnection, co-operation between hosts etc. The
measurements show that there is significant heterogeneity and lack of
co-operation across peers, calling into question the validity of current
theoretical studies of peer-to-peer architectures.

2. a. 2 - modest contribution.

   b. Unfortunately, the authors' methodology is not very convinving. As
they themselves admit, information provided by peers can be unreliable.
Moreover, it is not clear that their crawls are representative of the peer
population. This is particularly true in the study of Napster.

   c. The main weakness of the approach is in the way experimental data is
collected.

3. The authors gather a large amount of data and their analysis does
support their claim of heterogeneity in Gnutella and Napster.

4. (1) There is the question of whether the experiments are ethical. The
authors do not display any awareness of the ethical issues raised by their
experimental methodology. (2) It is not clear to what extent heterogeneity
is a characteristic of the particular systems they are studying rather
than of peer-to-peer systems in general. (3) The authors are wrong when
they say that the distribution of latencies over the entire population of
peers might be similar from different hosts - the distribution most likely
depends critically on the position of the host in the network topology.
(4) There are several redundancies in the paper.

6. An open problem is to actually design and test a P2P protocol that
actually takes into account characteristics of hosts.

Rahul.

########################################################

Date: Thu, 25 Apr 2002 23:29:20 -0500 (CDT)
From: Xinghua Shi <shi@cs.uchicago.edu>

Reading7:

Paper1: Topologically-Aware Overlay Construction and Server Selection

Contribution:  The paper not only shed light on the importance of
understanding network topology, but also uses applications on the proposed
scheme based on this idea. The authors presented a simple distributed
binning scheme and applied it to applications of overlay network
construction and server selection.They also used simulation and Internet
measurement traces to show significant performance improvements with only
approximate topological information. They focused on the use of simple
topological hints to solve certain application-level problems which
differed from conventional researches. So their research has more
practical value in the research and design of networks. It's also the
first paper I've read mentioning the drawbacks of using traceroute since
we have read several papers using it to collect dataset. 


Rate:
a. 4 in significance: This paper has significant contribution because its
results have practical value in the networks world.

b. 4 in convincing of methodology: The argument is well organized and the
ideas and applications are clearly explained. They repeated experiments on
different topologies to draw unbiased conclusion.The claims and
conclusions follow from the experiments and the authors are careful in
their claims by comparing their method to other mechanisms. 

The strongest and/or most interesting ideas in the paper:
a.The simple idea of distributed binning scheme which is similar to
cluster classifier. 
b.The simple binning strategy is really useful in the applications of
actual Internet system.
c.Their simulation shows that for a given topology the density of nodes
being binned does not affect the gain ratio.
d. Even the coarse-grained topological information provided by their
binning strategy can significantly improve the performance of systems such
as overlay networks and CDNs.


The most striking weaknesses in the paper:
a. They are some uncertain elements that maybe have effect on their
results of experiments, e.g. The random assignment of link delays.

Questions:
a. What are the specific advantages and drawbacks of tracerouter?
b. Is it possible to find more applications suitable for their binning
strategy?
c. How the random assignment of link delays affects their results in the
experiments?


Paper2: A Scalable Content-Addressable Network

Contribution:  The paper has a detailed research in Content_Addressable
network and demonstrates its scalability, robustness and low-latency
properties through simulation. They addressed two key problems in the
design of CANs: scalable routing and indexing. The authors argued that
large-scale distributed systems could benefit from hash table
functionality like software systems. They also explain one possible
application of CAN, i.e. "grass-roots" content distribution system. They
first constructed their design and then did certain improvements on
that. This evolusional method really works well and is practical and
useful.


Rate:
a. 4 in significance: This paper has significant contribution because its
research on CAN is detailed ans inspiring.

b. 3 in convincing of methodology: The argument is well organized and the
issues are  clearly explained. They made some quantity analysis to make
their argument sound.  But they can be more convinced if they built their
simulation on different topologies and did some comparative work. 

The strongest and/or most interesting ideas in the paper:
a. Large-scale distributed systems could benefit from hash table
functionality like software systems.
b. Use multiple hash functions to improve data availability


The most striking weaknesses in the paper:
a. Some evasions because of the limitations in their research, e.g. they
denied to decide on tradeoffs between improved routing performance and
system robustness on the one hand and increased per-node state and system
complexity on the other because they had no enough understanding of the
requirement.

Questions:
a. So far, are there some research results about the tradeoffs between
improved routing performance and system robustness on the one hand and
increased per-node state and system complexity on the other hand?

b. Because there're very similar ideas in the two papers, so can we
compare them to get a wider viewpoint?


####################################################
Date: April 25, 11:50 pm.
From: Ivona

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
[Stoica, Morris, Karger, Kaashoek, Balakrishnan]

1. State the main contribution of the paper: The authors propose and
implement a new peer-to-peer protocol (for sharing a common large
database). It is based on theoretical method of consistent hashing and has
many excellent provable properties: Every node stores information of size
O(log N), search is O(log N), add/remove/fail node is O(log^2 N). The
authors even consider the case when several nodes join at the same time -
they can guarantee a finite (but maybe long) search time.

2. Critique the main contribution.  
a. Rate the significance: 5. The method is provably correct and practical
   experiments suggest very good constant hidden in O-notation. It scales
   and performs well even in dynamic environment.
b. Rate how convincing: 5. Clear, well-written, mathematically correct,
   experiments provided.

3. What are the three strongest and/or most interesting ideas in the paper?
- Distribution of the routing table: the finger tables and keeping 
  pointers to roughly 2^i-th successor.
- Solution to the dynamic setting.
- Many ideas for future work.

6. Detail an interesting extension to the work not mentioned in the future
work section. Could be the idea of consistent hashing used for
peer-to-peer networks of Gnutella/Napster type (redistributing information
when a node disconnects might be hard).

7. Optional comments on the paper that you'd like to see discussed in
class. See 6. What are the implications/limitations of this study with
respect to the statistics given in the other paper (although the
statistics is specific for Gnutella/Napster)?


A Measurement Study of Peer-to-Peer File Sharing Systems
[Saroiu, Gummadi, Gribble]

1. State the main contribution of the paper: A detailed study of user
behavior and systems characteristics of Gnutella/Napster. The authors use
a crawling to obtain information about bandwidth, latency, number of files
to share/download, etc. Based on the statistics, the user body is very far
from homogeneous and the authors conclude that this should be taken into
account when designing a peer-to-peer network protocols. Other interesting
observations include: High percentage of users (>25% for Gnutella) are
"free-riders", i.e. they just download files. In Napster, many users (30%)
tend to lie about their bandwidth.

2. Critique the main contribution.  
a. Rate the significance: 4. Many interesting and important (for future
   research) statistics are provided. 
b. Rate how convincing: 4. The authors clearly specify all routines used
   and present data in a clear form. Obtaining any kind of information is
   hard and therefore there are many limitations here (read below) but the
   authors seem to keep this in mind and discuss data accordingly.
c. What is the most important limitation of the approach? The crawlers -
   possibilities for investigating the graphs are limited. The same is
   true about trying to estimate the bandwidth. Another issue is users
   connecting from many different IP-addresses

3. What are the three strongest and/or most interesting ideas in the
paper? The study itself and the optimistic idea of using crawlers to
capture the state of the network.

4. What are the three most striking weaknesses in the paper? Read
limitations.

5. Name three questions that you would like to ask the authors? 

6. Detail an interesting extension to the work not mentioned in the future
work section. In a power-law network if one floods neighbors up to a
specified depth, how likely is this going to capture statistical
distributions accurately? (I.e. if only sqrt(N) vertices are covered?)
What is the right d given some accuracy?

7. Optional comments on the paper that you'd like to see discussed in 
class.
- How could we use the statistics to design new search/replication 
algorithms?
- One would say that percentage of free-riders is even higher. Could the
free-riders be hiding somehow so that the crawler does not find them (e.g
not responding to ping)?
- Is there another parameter that we implicitly use in search algorithms?
- What other methods could be used to obtain information about
Gnutella/Napster graphs and their parameters?