Date: Tue, 16 Apr 2002 16:12:16 -0500 (CDT) From: Catalin Lucian Dumitrescu On Power-Law Relationships of the Internet Topology --------------------------------------------------- 1. State the main contribution (or focus?) The paper presents several observations about the Internet topology, and introduces the analysis of its properties by means of power law distributions of outdegree of nodes, frequency of an outdegree, and on the number of hopes between pairs of nodes. From these measurements several formulas for number of edges (N, rank exponent), or effective diameter (N, E) were derived. 2. Critique the main contribution a. Rate signifiance - 3 b. Rate the methodology - 4 3. Three strongest points - the tree data sets collected over a year - derived formulas 5. Three questions for the author - I am not sure if there cannot be better approximation 6. Detail an interesting future work - ------------------------------------------------------------------------------ Graph Structure in the Web -------------------------- 1. State the main contribution (or focus?) Analysis of the structure of the web by constructing its graph and classifying pages in different classes (core, in, out, tendrils). Three algorithms (BFS, WCC, SCC) are used to detect connected components, weak cc, strong cc and to measure different characteristics. 2. Critique the main contribution a. Rate signifiance - 2 b. Rate the methodology - 2 3. Three strongest points - 5. Three questions for the author - ################################################# From: "Adriana Iamnitchi" To: "Adriana Iamnitchi" Subject: evaluations for Wed. Date: Mon, 15 Apr 2002 18:02:31 -0500 [FFF99] Michalis Faloutsos, Petros Faloutsos, Christos Faloutsos, On Power-Law Relationships of the Internet Topology. (PDF) ACM SIGCOMM'99 1. State the main contribution of the paper The paper evaluates the Internet graph as of 1998 and shows its power-law characteristics at different moments in time (Nov. 97, April 98 and Dec 98) with respect to various graph parameters (node outdegree, outdegree rank and frequency) 2. Critique the main contribution. a. 4 (significant contribution). This paper's contribution is significant in many respects:20 i. It highlights the erroneous and (until then common) practice of modeling Internet topology based on hierarchies and uniform node degree ii. It proposes a more accurate model for Internet topology iii. It proposes some graph metrics that better incorporate graph density b. The methodology is convincing: 3 snapshots of the network in a period of significance growth of the Internet are sufficient to show the characteristics of the Internet during that period of time. The authors are careful in stressing that while they have no reasons to believe these characteristics will be maintained in the future, they also stress the fact that currently (as of '99) there are no reasons to believe these characteristics will change. c. What is the most important limitation of the approach? Perhaps not very significant, but the '95 snapshot at the routers' level may not be very relevant: the Internet grew significantly after '95, with various structural changes (e.g., NSFNet). I have no reasons to believe the results from '95 are obsolete now (i.e., there is no power-law at the routers' level), but neither that they are still true. A more recent snapshot would have been perhaps more convincing. 3. I've heard of the power-law result from other papers that cited this one, so for me the three most interesting ideas when I actually read the paper were: (1). The alternative graph metrics that approximate better the Internet than the traditional metrics of average node degree. (2). The effective diameter formula that encapsulates information on, for example, the optimum TTL for flooding the network efficiently. (3). The authors' explanation for the power-law as capturing the equilibrium of the tradeoffs between the gain and cost of adding a new edge from financial and functional perspective. 4. What are the three most striking weaknesses in the paper? (1). I did not understand the motivation for excluding the small percentage (apparently < 2%) of nodes with higher outdegree and frequency 1. This trick contributed to the "perfection" of the power-law plots, but for what accuracy cost? (2). I suspect some of the results are interrelated and I would have liked to understand in what measure: e.g., outdegree vs. rank and frequency vs. outdegree seem to be strongly related (or aren't they?) (3). I did not understand what was the analysis of the Internet that "suits" the graph generator models, and from here the significance (or even the meaning) of the results that show that 40-50% of nodes are in trees -- what trees? This explanation would have added more information than Table 1 and/or 2... 7. Optional comments on the paper that you'd like to see discussed in class. * Do we have any reasons to believe their result as of 98 is outdated? (i.e., the Internet is not a power-law anymore, or the exponents changed significantly) Note that a result by Barabasi & all shows that dynamic networks show the power-law characteristics in the presence of 2 simultaneous conditions: a) growth; b) preferential attachments * The authors claim an estimation of the number of edges within 9-20% of the real number: isn't 20% too far? * Are the predictions in Table 4 or Table 5 correct? [BKMR00] Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, Janet Wiener. 2000 Graph structure in the web Comput. Netw. 33 30920 1.. State the main contribution of the paper: the paper looks at the web graph and studies a set of characteristics, like the node degree distribution (confirming it's a power-law) and reachability and distance between nodes. The paper sketches a map of the web that reveals interesting characteristics (e.g., a significant portion of the graph is disconnected) 2.. Critique the main contribution. 1.. 4 (significant contribution): the paper contradicts an earlier result by Barabasi & others that showed the web is more connected than this paper reveals to be. Knowing the approximate map of the web is relevant to web crawlers, designing new algorithms, predicting the evolution of the web, sociology.20 2.. Rate how convincing the methodology is: the methodology is convincing. However, although the graph studied is larger than in the previous studies, it would have been more interesting to see an even larger snapshot of the Internet, by merging, for example, Altavista data with some other web crawlers' (Google already existed in 2000...) I think my worry is: is it anything in the Altavista crawler algorithm that favors some parts of the network or subjects, etc., that would lead to a biased snapshot of the web?20 3.. Most important limitation of the approach: See above. 3.. What are the three strongest and/or most interesting ideas in the paper?20 1.. An interesting result is due to studying the directed graph (the previous results treated the web as an undirected graph, I think?) 2.. A significant fraction of the web is disconnected. 3.. The map of the web looks more like an organism than like a convex polygon (as I was expecting....) 4.. What are the three most striking weaknesses in the paper? The results are too straightforward (although somewhat surprising) so I can't seem to find striking weaknesses. I'm rather surprised that the original paper that showed the web is a small world based its results only on the local graph, within nd.edu, which is hardly a representative snapshot of the Internet -- due to small size and particularity of the domain (educational). ########################################################## Date: Tue, 16 Apr 2002 20:49:34 -0500 (CDT) From: Rahul Santhanam To: Anda Iamnitchi Subject: Feedback for both papers! Faloutsos-Faloutsos-Faloutsos: 1. The authors study the topology of the internet. They show that the inter-domain graph is characterised by power laws expressing (1) the relationship between the degree of a node and its rank (2) the relationship between degree and the frequency with which this degree occurs in the graph (3) The relationship between the number of pairs of nodes within distance $d$ and $d$ (4) The relationship between eigenvalues of the graph and the ranks of eigenvalues. The power laws are shown to hold for measurements at three different times over the course of a year, and the claim is made that power laws are therefore fundamental properties of Internet topology. 2.a. 4 - significant contribution. The paper contains new and important insights about the structure of the Internet. b. The methodology is convincing. The power-law plots fit the data very closely. The fact that the power-law exponents change gradually over time is further evidence that something fundamental is being measured in the paper. c. One limitation of the approach is that it does not help in predicting the structure of the Internet in the future. Perhaps more measurements at different times will reveal structure in the evolution of power exponents.. 3. The most interesting idea in the paper is that the distribution of so many different parameters of the Internet topology can be described very accurately by power laws - thus the Internet displays an unexpected degree of structure. Another interesting idea is that graph metrics are highly skewed and hence should be described using the power law exponents rather than averages. 4. Weaknesses - 1. It's not clear how the parameter $P(h)$ is an adequate measure of the density of the graph, especially since it is estimated only for very small values of $h$. 2. The authors do not provide a convincing explanation of how the power-law behavior arises 3. Rather than supplying data toward fine-tuning parameters for graph generators that presumably do not generate power-law graphs, the authors could have used their insights to design a new one. 6. One possible extension of the work is to analyze power-law graphs in a theoretical setting and show correspondences between the various power laws. Broder-Kumar-Maghoul-Raghavan et al. : 1. The authors study the structure of the World Wide Web as a graph. They confirm previous findings of power law relationships between various parameters of the graph but suggest that "small world" phenomena do not actually hold, in the sense that the diameter of the graph is large even if the average diameter is small. They identify and analyse components of various types and sizes in the graph (depending on whether the graph is taken as directed or undirected). 2. a. 3 - modest contribution. The authors provide much data but little insight about the structure of the Web as a graph. b. The experiment is fairly well-designed but the authors do not properly identify sources of bias. c. The most important limitation of the approach is that the experiments are done over a limited time period. The results are unlikely to be instructive because of the dynamic nature of the web. (Also, all data is provided by Altavista - it would be nice to have other data to compare this to). 3. Strength - The experimental data is painstakingly analysed. The persistence of the power law exponent (also, its scale-invariance) for indegrees of vertices in the graph is rightly identified as a compelling phenomenon. 4. Weaknesses - 1. The authors provide vast amounts of data, especially w.r.t the components of the web graph, but do not interpret it sufficiently 2. The numbers are given to a degree of accuracy that is meaningless considering the scope of the experiment 3. Sometimes the authors are not precise about their methodology - for eg., which crawl is used for running the WCC and SCC algorithms? 6. Possible extension - To consider alternative sources of data; also to analyse other parameters of the graph such as density. Rahul. ################################################ Date: Tue, 16 Apr 2002 22:14:14 -0500 (CDT) From: Xinghua Shi To: Adriana Iamnitchi Subject: Summary5 Paper1: On Power-law Relationships of the Internet Topology. Contribution: The authors discovered some very simple power-laws of the Internet topology and used the exponents of the power -laws to characterize the Internet. This is really a novel way to study the Internet topology which finds out some order and principle from chaos. They study the Internet topology both at the router level and inter-domain level which covers the whole network in general. This paper is somewhat well-founded because ,it is based on sound mathematical analysis. Besides the study in the theoretical analysis, the paper also presents a number of practical applications of the power-laws which is useful for further research. Rate: a. 4 in significance: This paper has significant contribution because it provides a new way in the research of the Internet topology. b. 4 in convincing of methodology: The argument is based on sound mathematical definition and proof. The claims and conclusions follow from the experiments mentioned. The experiments are well designed but the assumptions are based on the data in the experiments, which may be not correct as the Internet evolves. c. The most important limitation of the approach is that there're too many definitions and terminology in the proof. The strongest and/or most interesting ideas in the paper: a. The three power-laws that characterize the inter-domain Internet topology b. The power-laws hold for three Internet instances with high correlation coefficients c. The power-laws can be used to characterize graph topologies The most striking weaknesses in the paper: a. Too many concepts and terminology b. Many proofs begin with the observation from intuition which may be misleading. Questions: Are the power-laws really correct when the Internet becomes more instable or expands too quickly? Paper2: Graph structure in the web. Contribution: The authors generated results and analysis based on their three sets of experiments on web crawls. All these show a macroscopic structure of the Internet. They studied both directed and undirected connected components of the web in the second set of experiments. This is somewhat covers the whole web. Disadvantages: Since the conclusions may be misleading if they're drawn from the study of the local structure of the Internet. We can imagine that as the Internet becomes much huge than at present, the current results the author present are probably incorrect or at least biased. Rate: a. 3 in significance: This paper has moderate contribution because it helps the understanding of the Web topology but is too straightforward and trivial. b. 3 in convincing of methodology: The argument is based on the results of experiments. But the argument is merely based on experiental data and results, which lack formal proof. c. The most important limitation of the approach is that there're too many definitions and terminology in the proof. The strongest and/or most interesting ideas in the paper: a. The connectivity of the web graph as an undirected graph is extremely resilient and does not depend on the existence of nodes of high in-degree. b. The nodes mentioned above with high PageRank or nodes that are considered good hubs and authorities are embedded in a graph that is well connected without them. c. The structure of SCC,IN, OUT and the analysis on this The most striking weaknesses in the paper: a. There're some trivial parts such as the brief primer on graphs and terminology. b. The argument lacks sound proof or analysis based on mathematical model. Questions: How the current results will be as the development of the Internet? Are their conclusions still hold or will become incorrect?How much the scope of the Internet have on the conclusions? Are there some thresholds or limitations of the scope which confine the conclusions? Are the IN, OUT sets reasonable or actually correct in the experiment they preformed? What's the relationship between these two sets and the real network world? Does the basic structure of SCC, IN, OUT remain stable over time? Is there further and detailed research following its way by far? #################################################### Date: Tue, 16 Apr 2002 23:01:01 -0500 (CDT) From: Yu Hu On Powe_Law Relationships of the Internet Topology [Michalis Faloutsos,Petros Faloutsos,Christos Faloutsos] Main contributeion of the paper: It showed to us a novel way to study such random internet topology, which is power_laws. These laws will help us to learn more about the skewed distributioncharaters of the internet "graph" and produce us some simple parameters to analyze,compare and predict the internet topology. It opened a new interesting field or method of the internet topology analyzing. Critique the main contribution: Significance : 4( significant contribution) Methodology: Most conclusion, I think, can be deduced from these experiments.However, some are not, such as: Based on different rank exponents of the inter_domain graphs and intra_domain graph, author said that rank exponent was a powerful metric for charaterizing famililies of graphs. I suspect. Most interesting ideas: In such chaos of internet,authors found such simple power_laws and using few parameters can depict main properties of internet topology. Weakness&Questions: 1. I find in the paper authors sometimes drawed conclusion based on some proofs or experiments' results, which I can't trust absolutely . 2. In discussion about figure 7,8, authors got his conclusions. Does it make sense to get conclusion from just 3 or 4 points ? 3. Can author give some application of the eigenvalue? Interesting Extension: Try to find some basic charaters of internet( or other similar system) that lead to the power_laws. Comments: After reading this paper,the first feeling of me is surprising,it is really novel that so random internet has such simple power_laws. The second feeling is encouraging, if all these are true, it will produce powerful tools to put much deeper insight into the real topology of internet. Graph Structure in the web [Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar] Main contributeion of the paper: The paper gave us a mathematical graph view of internet topology. Authors used graph theory methods to show us the graph structure in the web. Critique the main contribution: Significance : 2(incremental contribution) Methodology: I can't gotten much information about his methods. Maybe authors should provide us much more details the datasets and analyzing methods which will definitely make these results trustful. Most interesting ideas: Using BFS on internet topology and showing us a pure graph struture. Weakness&Questions: The paper didn't give me a very clear graphic structure of internet after reading. Maybe authors should give more details on the analyzing whereas not on someresults. Should the paper give some conclusions ? Interesting Extension: Can we just draw a graph of internet, after that, can we test on the graph to get some prediction of the internet or its future? Comments: This is a mathematical computer science paper.The main idea of the paper is pretty straighforward and right. Isupport the graphical method which was used in this paper, however,the paper did't give me a very clear conclusion of the "graph", maybe we can use the same method and experiments to shine more light on topology of internet. So I just agree this paper. ########################################################### From: "Matei Ripeanu" To: Subject: Paper evaluations - Wednesday April 17 Date: Tue, 16 Apr 2002 23:56:54 -0500 On Power Law Relationships on the Internet Topologies Faloustos, Faloustos & Flaoustos 1. The Internet is not a completely random assembly: at both AS level and router level the Internet is a power-law network. Benefits from a better understanding of Internet topology: we can design (1) protocols that take advantage of discovered topology and its derivate properties, and (2) better simulators and topology generators. 2.a. Rate=3.5. Provides credible evidence that the Internet is a power-law network. This evidence had a significant impact on recent network topology generators. 2.b. Rate=3. Apart from gathering BGP data and plotting node degree distributions (which is quite straightforward) I have some doubts. See point 4 below. Not sure the router level power-law still holds in 2002. 2.c. Like most "measurement" papers it would be nice if the paper tried to give (and validate) and hypothesis that explains the data found. 3. Strongest ideas: There is some macroscopic order in the apparently random Internet organization. The Internet is (at both AS level and router level) is a power law network. Also, the authors find that the eigenvalues of the adjacency graph representing the topology are distributed according to a power law too (they do not explore the significance of this fact further though!). 4. Weaknesses: - it seems to me that distributions in Sect 4.1 and 4.2 imply one another! (here is a paper that hints to this: "Zipf, Power-laws, and Pareto - a ranking tutorial" by L. Adamic, http://www.hpl.hp.com/shl/papers/ranking/ ) - a statistician would argue that, before doing regression analysis, they should bin data in exponentially larger bins. (see same reference) - figures 7 and 8 have to few points to do regression analysis. Thus conclusions in Sect 4.3 and on might look a bit far fetched. - they seem to call any distribution with and exponent a power-law: Zipf, Pareto, proper power-laws, all go in here. 5. Questions to authors: - are the other network characteristics (neighborhood size distribution, eigenvalues distribution) consequences of the power-law structure? - does these results hold today? if yes, what is the evolution in the power constant for the power laws 6. Extensions: Follow the evolution of the Internet at the AS level for a period of time and try to come up with an evolutionary explanation for the observed patterns. (Note: it is quite straightforward to collect data). 7. Comments to discuss in class: - one paper by Tangmunarunkit & all [1] that explains the emergent power-law network structure as a result of the underlying AS size distribution (as opposed to preferential attachment suggested by authors). - clean way to do data analysis? - I wish I have time to read [2] by the start of the class (I'm adding the link here hopping that one of you might find time to do read it if I don't). References: [1] H. Tangmunarunkit, J. Doyle, R. Govindan, S. Jamin, S. Shenker, W. Willinger, "Does AS Size Determine Degree in AS Topology?", To appear on Computer Communication Review, October 2001. [2] Q. Chen, H. Chang, R. Govindan, S. Jamin, S. Shenker, W. Willinger, The the Origin of Power-Laws in Internet Topology Revisited, Proceedings of IEEE Infocom, June 2002. (http://www.icsi.berkeley.edu/~ramesh/papers.html)