11.5 Scale-Free Networks

POWER LAWS

• The distribution of connections per node of a random graph follows a bell curve.
• Scale-free networks exhibit a power-law, or "fat-tail," distribution.

The Internet is one of the most important and influential man-made networks to arise in modern times. Like the phone networks that preceded it, it has connected people across vast distances and has done much to make our world seem smaller. By connecting libraries, universities, and schools with more and more people, the World Wide Web has greatly facilitated the flow of information around the globe.

Because the Web is open to anybody, it consists of hundreds of billions of pages all connected via differing numbers of hyperlinks. In 1999, physicist Albert-László Barabási and his colleagues at the University of Notre Dame in Indiana set out to map the connectedness of the Web. They constructed a program, called a crawler, to traverse the Web, collecting linkage data from the sites that it came across, operating much like modern search engines. They expected to find that most pages had about the same number of links, as would be the case in a randomly constructed network. What they found was somewhat surprising.

Item 3227 / Tamara Munzner, Eric Hoffman, K. Claffy, and Bill Fenner, VISUALIZING THE GLOBAL TOPOLOGY OF THE MBONE (1996). Courtesy of Munzner, Hoffman, Claffy, and Fenner.

Random networks have a certain, predictable, distribution of connections among their nodes. Because the process that creates them is indiscriminate, the majority of nodes tend to end up with about the same number of connections. There are, of course, always a few nodes that end up with significantly more connections than the majority, as well as a few nodes that end up with significantly fewer connections than the majority. Consequently, the distribution creates a bell curve when graphed with the number of connections represented on the horizontal axis and the number of nodes with that number of connections represented on the vertical axis.

The peak of this curve is the mean number of connections per node in the random network. The exact value of the mean is the total number of nodes divided by the total number of connections. Barabási expected the results of his web-crawler search to demonstrate a similar distribution, with a mean value determined by the overall number of pages and links.

What Barabási found was that the vast majority of web pages in his sample had very few links, while a few pages had the majority of the links. When graphed, the degree distribution looked like this:

This distribution pattern is quite different from the bell curve that arises in random networks. It roughly follows what is known as a "power law." In a power-law distribution, the number of nodes with a given number of connections is proportional to the number of connections, raised to a negative exponent.

EQ:

P(k)∼k−γ

where P(k) is the fraction of nodes of degree k and gamma is an exponent that determines the "fatness" of the tail of the distribution curve. Barabási found an exponent of about -2.2 in his 1999 Internet survey.

What are the qualitative features of networks that follow power-law distributions? Recall that random networks have very little structure and small-world networks have a fair amount of clustering. Power-law-type networks are characterized by a few highly connected nodes that serve as hubs and many nodes with only a few connections. This explains the shape of the graph.

To consider a specific example, a power-law network might have one node with 1,000 connections, two nodes with 250 connections each, three nodes with 111 connections, . . ., and k nodes with 1000 connections.

A convenient feature of graphs related to power-law distributions is that, for a given distribution, they look the same no matter what scale one chooses to examine. So, if we looked at only of the nodes in this network, thus shifting the scale of our observations, we would find that one node has 100 connections; two nodes have 25 connections each; three nodes have 11 connections each; and k nodes have 100 connections each. The distribution graph of this view would take the same shape as that of the larger network. The same exact structure appears, regardless of our chosen scale. This phenomenon is similar to what we observed with fractals in the unit on dimension.

back to top

AIRLINE MAPS

• Scale-free networks are identifiable by the existence of a small number of well-connected hubs.
• "Rich get richer"-type processes often lead to scale-free networks.

To get a sense of what a scale-free network looks like, imagine a map of airline routes.

Item 2812 / Muh-Tian Lee. IMAGE (2001). Courtesy of NASA/Virtual Skies at http://virtualskies.arc.nasa.gov. This is a route map for a major airline. Notice that it has major hubs in the cities of Newark, NJ and Houston, TX and also a minor hub in Cleveland, OH.

Most major airlines have a few busy hubs through which most of their routes pass. There are a greater number of medium-sized airports, each with fewer flights to and from them. Then there are the small airports, of which there are substantially more, but which have substantially less air traffic. Finally, there are a great number of tiny, municipal airports, which provide almost no major carrier service. This is a classic example of a scale-free network.

Item 3128 / United States Department of Transportation – Federal Highway Administration. NATIONAL HIGHWAY SYSTEM MAP (2002). Courtesy of United States Department of Transportation – Federal Highway Administration.

The airline route map can be contrasted with a standard road map. The distribution of connections on the roadmap follows a bell-shaped curve. That is, most cities have one major highway that connects them to the network, whereas a few cities have more than one major connection, and a few cities lie well off the beaten path, at some distance from a major highway.

Scale-free networks exhibit interesting distributions of clustering coefficients. The well-connected hubs tend to have lower clustering coefficients than those of the less-well-connected nodes. This situation arises because each node that connects to a hub creates as many potential neighborly connections as there are nodes that are already connected. The more neighbors, the more potential connections, which tends to lower the clustering coefficient. In simple mathematical terms, as the denominator of the fraction increases, the value of the fraction decreases.

By contrast, the nodes with fewer connections have fewer potential neighborly connections, so the ones that do exist contribute strongly to the clustering coefficient. By examining both the exponent of the power-law distribution and the shape of the clustering coefficient distribution, one can separate and classify scale-free networks in new ways.

How scale-free networks arise in the real world is somewhat interesting as well. Recall that Barabási assumed that most web pages had about the same number of links. In the absence of any contradicting evidence, this hypothesis was as good as any. When he found, however, that some pages served as extremely well-connected hubs, he searched for a reason that this might be the case. He hypothesized that hubs with more connections were more desirable links because they provided access to a greater number of other nodes. This became known as the "rich get richer" phenomenon, which applies not only to the Internet but also to human social networks. People with more acquaintances tend to meet more people than do those with fewer acquaintances. Hence, those with bigger clusters of friends tend to grow bigger clusters of friends. Barabási called this "preferential attachment" and showed that it tends to generate scale-free networks.

Discussing the mechanisms by which scale-free networks arise suggests an interesting question: What is to be done about networks that change with time? Up until this point, we have given lip-service to some of the processes by which networks can be created, but our analyses have tended to measure aspects of networks only after they have settled into a static state. This, of course, is a limited view of how real networks evolve. We are always making new acquaintances and losing touch with old ones. Web pages pop in and out of existence all the time. In assuming that networks are static, we are missing a significant portion of the picture. The study of networks in nature, of ecosystems, sheds some light on how and why we should think about networks that change with time.

back to top

Next: 11.6 Ecosystems

© Annenberg Foundation 2017. All rights reserved. Legal Policy