Date: Tue, 23 Apr 2002 22:36:50 -0500 (CDT) From: Yu Hu YU HU DNS Performance and the Effectiveness of Caching [Jaeyeon Jung, Emil Sit,Hari Balakrishnan, and Robert Morris] Main contributeion of the paper: This paper presents a detailed analysis of traces of DNS and associated TCP traffic collected on the internet links. DNS is a very important part which will impact user application much and needs more analysis than before. The paper showed us a clear DNS and detail analysis on client sides. Main contribution of thepaper, I think, is leading us into a deeper and detailed insight of the DNS andits caching. Critique the main contribution: Significance : 4( significant contribution) Methodology: The method which is used in this paper is novel comparing with former papers. Authors not only collected the DNS traffics, but also collected thecorresponding TCP traffics, this made the experients' results more convincing and useful. And the datasets were collected from two places: MIT LCS and KAIST that were different connection structures. So I support the method used by authors and trust the results result from these experiments. The most important limitation of the approach: I just want to know whether the datasets that authors collected can represent the real DNS traffics? Most interesting ideas: DNS performance is not mainly depended on caching. And lowing A record TTL willnot impact DNS performance much. DNS is more depend on partition the name spaceand avoid overloading any single name sever in the internet. Weakness&Questions: 1.Can authors give more detailed and proved relationship between DNS packets and TCP packets? 2.Are the datasets collected by authors representives of the real DNS traffics?3. I can find a lot power-law in these distributions, is there some fundenmental rule underlying them ? Interesting Extension: Can we improve DNS performance by using optimal mapping algorithms or using optimal group rules on share cache ? Comments: This is a good paper. After reading it, I am clearer about the DNS and its caching. Authors' argument is reasonable and their consideration is comprehensive. The paper will produce people a detailed and deep insight into the important DNS. Web Caching with Consistent Hashing [David karger, Alex Sherman and etc] Main contributeion of the paper: The paper introduced a new idea to implement the web caching, namely consistenthashing. And by showing us their implemented system, authors argued that the idea was practical for the real web. Critique the main contribution: Significance : 3 (modest contribution) Methodology: The paper didn't show us much methods, it just set up a system and a test and got some test results, from which got some conclusions. I prefer to learn more details about the experiments in spite the experiment's method waspretty trivial. Most interesting ideas: In this paper, authors used DNS in another way,which proved reasonable in the experiments. Weakness&Questions: I trust that it is helpful to use consistent hashing in web caching. However,author seemed not to show us a very clear view of the new idea and a detailed view of their implemented system. Another question I want to ask , have author consider the issue of cost when they design their system ? Interesting Extension: Can we set up a mechanism to decide which papes are hot or not ? Does this mechanish exit ? Comments: This paper showed us a good idea--consistent hashing. In spite that authors didn't analyze in much details, we can benefit from it a lot. It maybe lead us to design another system to improve web caching performance using the intuition and ideas that were given in this paper. Anyway, I support this paper. Date: Tue, 23 Apr 2002 22:56:46 -0500 (CDT) From: Ivona Bezakova DNS Performance and the Effectiveness of Caching [Jung, Sit, Balakrishnan, Morris] 1. State the main contribution of the paper: As the title suggests, the paper discusses DNS performance (i.e. number of requests sent, percentage of errors and no response obtained, number of "hops" - recursive or iterative calls to other DNS needed, etc) and the usefulness of caching in this process. The study is based on three sets of data obtained at two academic locations in 2000 and 2001. The percentage of errors is surprisingly high - above 20%, authors give some reasons for this behavior, including mistakes in mappings and occurrence of DNS-loops. The unanswered messages contribute overwhelmingly to the total traffic by sending the requests several times. According to the datasets the number of retrials could be significantly reduced and the success rate of finding the address would be the same. The cache-study suggests that using cache is helpful (only) for the most popular sites and for these using a small TTL (order of minutes) suffices. 2. Critique the main contribution. a. Rate the significance: 4. I am surprised that very few experiments have been performed in the area. Every new experiment shedding some light on the problem is significant. b. Rate how convincing: 4. I like the self-critique that authors use, presenting not only results but pointing out possible problems or misinterpretations. c. What is the most important limitation of the approach? Study based on only 2 sites, both of them being academic. The datasets for these sites cannot be really compared since the servers/routers use different strategies. 3. What are the three strongest and/or most interesting ideas in the paper? - A detailed study of the subject, authors are aware of strengths as well as weaknesses of their approach. - Practical suggestions for DNS or cache behavior - e.g. not caching "off-stream" sites, correcting DNS routers that don't implement negative caching, etc. - Results/conclusions for varying TTL times. 4. What are the three most striking weaknesses in the paper? - Authors realize that their first MIT dataset was not as precise as the other one because they limited the packet size. Why didn't they recollect the data when they found out that memory is not a constraint? - For the simulation algorithm why did they decide to use groups of size s? Wouldn't it better correspond to reality if the sizes were drawn according to some (power-law?) distribution? - How do they deal with websites that reload themselves automatically? It is not the user's query, and these pages are likely to be in the cache. Don't they affect the statistics too much (as being likely quite popular sites, e.g. cnn.com)? 5. Name three questions that you would like to ask the authors? - See 4. 6. Detail an interesting extension to the work not mentioned in the future work section. - Wouldn't the results for commercial domains be very different? In academia it is expected that the users browse approximately the same sites (within .edu plus a few others) whereas for clients of AOL I would expect the interests to be much more diverse. 7. Optional comments on the paper that you'd like to see discussed in class. See 4. and 6. Web Caching with Consistent Hashing [Karger, Sherman, Berkheimer, Bogstad, Dhanidina, Iwamoto, Kim, Matkins, Yerushalmi] 1. State the main contribution of the paper: Experimental study of theoretical model of consistent hashing (Karger et al, STOC '97) used as a web-caching strategy. Some comparisons are being made to other caching methods, e.g Common Mode and the consistent hashing seems to be better wrt both latency and miss rate. 2. Critique the main contribution. a. Rate the significance: 2-3 - successful (? - read below) practical verification of theoretical result b. Rate how convincing: 2-3 - it is not clear how the numbers for the test (e.g 1000 names, three proxy servers) were picked and whether these numbers are statistically significant. c. What is the most important limitation of the approach? 3. What are the three strongest and/or most interesting ideas in the paper? - Verifying theoretical result in practice. - Use of DNS to bypass the browser's setting problem. - Stating the original FOCS'97 results in rigorous form as well as intuitive explanation. 4. What are the three most striking weaknesses in the paper? - The paper mentions CARP but no comparison to this protocol is made. - What is c in their experiment? (See Thm.) As I understand it, there are three proxy caches (geographical) and then for each there are a few (c) sub-caches. - I think that the amount of data analyzed is not sufficient for any reliable conclusions. (Although I don't have any statistical background and therefore a good support for this opinion.) 5. Name three questions that you would like to ask the authors? 6. Detail an interesting extension to the work not mentioned in the future work section. - Compare to CARP or, even more interestingly, try to theoretically characterize the features that make them different and see which one is better. 7. Optional comments on the paper that you'd like to see discussed in class. - The authors of this paper assume finite caches, papers discussed earlier assume infinite caches. How do these views fit together? Which one is better in which context? Date: Tue, 23 Apr 2002 23:11:52 -0500 (CDT) From: Rahul Santhanam Jung-Sit-Balakrishnan-Morris: 1. The paper analyzes DNS traffic on the Internet and assess the impact of caching on this traffic. They measure various parameters of actual DNS traffic, such as latency, failures and number of retransmissions. They then use trace-driven simulations to assess the impact of cache sharing and choice of TTL on caching efficiency. Some of the conclusions reached in the paper: (1) Caching NS-records substantially reduces the DNS lookup latency. (2) Because of the Zipf-like distribution of domain name popularity, cache sharing is highly beneficial. (3) Lower TTLs than customary will work for A-records but not for NS-records. 2. a. 4 - significant contribution b. The methodology is convincing. The authors go to great detail in their description of the experiments and of the sources of bias. The experiments are well-designed. 3. The separate analysis of A-records and NS-records is interesting, and is justified by the fact that different conclusions are drawn. 4. The section on negative caching is weak - the experiment is unenlightening; the authors provide no alternative explanation for the large number of NXDOMAIN responses. Karger-Sherman-Berkheimer et al.: 1. The paper describes the implementation of a new web caching strategy based on consistent hashing, as well as experiments to support the claim that it improves performace. The consistent hashing approach is taken from a previous theoretical paper - the authors' main contribution is an implemetation method that uses the Domain Name System in a significant way. 2. a. 3 - modest contribution. There are no new ideas in the paper. b. The experiments would be more convinving if more caches and clients are used, approximating a real-life situation. Also, only cache miss rate is considered; perhaps, latency should also be a factor(especially since DNS is used.) 3. For once, an interesting theoretical idea is also shown to be useful in practice! 4. (1) The paper is not well-written. (2) The theoretical model does not take into account the fact that web pages are requested more or less according to Zipf's law (3)In the early part of the paper, the authors do not mention the fact that they are using DNS in the implementation; moreover, they are defensive about this strategy throughout the paper. The authors state that DNS resolution did not affect the performance of their system, but give no evidence. 6. An interesting extension would be to actually modify a browser to support consistent hashing, so that we can get a better idea of the advantages of this technique. Rahul. -------------------------------------------------------------------------------- Rahul Santhanam, Phone:(773)324-2583 1369, E.Hyde Park Blvd., E-mail:rahul@cs.uchicago.edu Apt. 905, Chicago,Illinois - 60615. -------------------------------------------------------------------------------