Date: Fri, 12 Apr 2002 00:41:36 -0500 (CDT) From: Ivona Bezakova To: Anda Iamnitchi Subject: summaries 4 Experimental Study of Internet Stability and Wide-Area Backbone Failures ------------------------------------------------------------------------ Main Contribution: The authors present various statistics regarding Internet backbone failures (both inter- and intra-domain). The data were obtained from recorded routing tables (?) over the period of three years (one year for intra-domain). Interesting points: - Internet is highly unstable compared to the telephony network. The authors speculate that global regulations would help maintain stability. - Very few paths are responsible for most of the long failures, e.g. inter-domain backbones get congested. Rate significance: 4, assuming no such study had been done before Rate methodology: 4, authors give consistent statistics of data obtained over long-enough period of time, and they are aware of bounds of their method, e.g. that every intra-domain network behaves very differently Weaknesses: 1. The facts are very hard to verify (data are probably not public). 2. Trouble tickets are subject to human error too. Ask authors: I am a little suspicious about picture 7(a). I understand that a backbone failure might affect other backbones and therefore the backbone curve is jumpy. But why is the customer curve jumpy if customers are nodes of degree 1? Future work: What are other possible means of obtaining statistics on the same subject without direct need of providers' cooperation? End-to-end WAN Service Availability ----------------------------------- Main contribution: Authors compare datasets obtained by traceroute in 1995 and 1999 and by HTTP in 2000 and analyze possible solutions for masking the failures. Interesting points: - For each dataset probability, location, and length of failure are presented. A user can expect to experience a failure on server's side about 1% of time. - Different strategies for masking the failure such as a cache, relaxed consistency, pre-fetching, replication, and rerouting are compared on experimental level (simulator). Replication seems to be the best strategy (under quite unrealistic assumptions). Rate significance: 3 Rate methodology: 3, see weaknesses Weaknesses: 1. Distributions picked for modeling are not reasoned. (Poisson vs. exponential) 2. Bug in simulator! 3. Infinite cache assumption. 4. Pre-fetching and replication algorithms not described. Possible future work: Study strategies from theoretic point of view. Compare results to other areas using similar strategies, e.g. OS. ##################################################################### Date: Fri, 12 Apr 2002 11:06:40 -0500 (CDT) From: Rahul Santhanam To: Anda Iamnitchi Subject: Paper evaluation (better late than not..) Chandra-Dahlin-Gao-Nayate: 1. Contribution of the paper: The paper develops model for failures in wide area networks, considering three parameters - failure rate, failure duration, failure location. Using prior experimental data, the authors determine typical distributions of values for these parameters. Also, using the experimental data, the authors run simulations to evaluate the viability of two classes of techniques for coping with network failures - client-independence techniques such as caching, prefeteching and replication of active objects, and network routing techniques. 2. a. Significance - 3. The authors obtain a couple of interesting results but it is unclear how applicable these are in general. b. The claims and conclusions do follow from the experiments but the experiments are not well-designed - the authors themselves acknowledge several sources of bias the impact of which they have no way of evaluating. c. The main limitation is that the simulation model for evaluating techniques for coping with network failures is too simplistic. In practice, the viability of thse techniques, especially the client-independent ones, depends heavily on the application. 3 & 4. The most interesting idea in the paper was the idea behind the simulation - one would think that this is not a problem that can be studied at such a high level. The author's straightforwardness in detailing sources of bias was a strength of the paper; the poverty of statistical information obtained from the data was a weakness. 6. An interesting extension to the paper would be to incorporate time into the model, and allow for nonstationarity of network flow distributions, to get a more accurate picture. Rahul. ########################################################################### From Matei: End-to-End WAN Service Availability Bharat Chandra et all. 1. The paper models the impact of network failures on end-to-end service availability and quantifies the effectiveness of various techniques (caching, hoarding, push-based content distribution, relaxed consistency, mobile service and overlay routing) used to improve service availability. 2a. Rate 3.5. The results presented are somehow intuitive (e.g. one would expect hoarding to improve service availability) but it is useful to have a quantitative evaluation. I wonder how general are these quantitative results though. 2.b,c. It is difficult to rate: authors use a simple model for failure rates and values for some parameters in this model are chosen somewhat arbitrarily. These values do look reasonable but, I think, at least a sensitivity study would be good. 3. Estimate network failure rate, duration and location based on traceroute data. Abstracts this into a model and uses the model to explore end-to-end service availability. Nice to see the whole chain. 4. Authors present, in terms of availability rates, advantages of using various availability enhancing techniques. They do not present however comparative costs of using these techniques. A number of typos still went unnoticed. For example the exponent in the failure time distribution is sometimes 1.85 sometimes 0.85.