Zipf's Law

By Michael Goff

Zipf’s law, developed by the linguist George Kingsley Zipf in 1932 (Zipf (1932) and also presented in Zipf (1949)), is an observation of word frequency, and how the most common words in a language appear more often than others. It has been applied to many other contexts, especially to the distribution of city sizes.

Zipf’s Law in Linguistics

Zipf’s law, as presented in Zipf (1932), is a rank/frequency claim on the appearance of words. It holds that in a large text, the n-th most common appears at 1/n times the frequency of the most common word. In other words, the second most common word appears with half the frequency as the most common word, the third most common words appears with a third the frequency of the most common word, and so on. In Zipf (1949), Zipf presents the rank/frequency distribution in the context of the principle of least effort: language is structured so that individuals are able to communicate with a maximum of efficiency. In that work, Zipf argues that a Zipf’s law distribution minimizes the effort required to communicate.

Relatedly and also for the reason of least effort, Zipf proposed the law of abbreviation, which holds that the most frequent words in a language tend to be short. Kanwal et al. (2017) find evidence for Zipf’s law of abbreviation with a simulated artificial language. In the simulation, participants learn a language and develop form-meaning mappings that conform to the law of abbreviation when they seek to optimize communication both for accuracy and for efficiency.

Piantadosi (2014) shows that the word distribution in human language exhibits patterns that go beyond Zipf’s law, and thus Zipf’s law is inadequate to express the richness of word distributions. Furthermore, the frequency with which Zipfian distributions arise, even in randomly generated texts as shown by Li (1992), suggests that Zipf’s law may be more of a statistical artifact, much like the central limit theorem, rather than a deep insight into the nature of human language.

Zipf’s Law for Cities

Gabaix (1999) is among the studies that have analyzed Zipf’s law in the particular context of city sizes. He uses a variant of Zipf’s law for the analysis: for a given size S, the probability that a city has size at least S, P(Size > S), is proportional to 1/S. This formulation of Zipf’s law follows from the the rank/size criterion, which is that the n-th largest city has a population equal to 1/n times the population of the largest city, though it does not necessarily imply the rank/size criterion, as, due to randomness, the largest city might be significantly larger or smaller than the value expected from Zipf’s law.

Gabaix (1999) shows that under the condition that all cities in an urban system grow homogenously, the distribution of city sizes in the system converges to a Zipfian distribution. Here homogeneity is given by Gibrat’s law, which posits that all cities in the system should have the same mean and variance in their growth or decline rates, and no cities should appear or disappear. To insure convergence, Gabaix adds the additional condition that there is a lower bound on the size of cities, and cities at the lower bound can only increase in size. Gibrat’s law of proportional growth is, in turn, a general model of growth dynamics that was introduced in Gibrat (1931) and expounded upon in the survey Sutton (1997).

Gabaix (1999) then shows how the conditions can be relaxed. Not all cities need to have the same mean and variance in the growth rate; it is only necessary that the growth rate does not depend on city size. The distribution of growth rates can also vary with time. New cities can be added to the urban system, so long as the rate of new city formation does not exceed the overall population growth rate.

In Gabaix (1999), an urban system is typically a country, wherein people are free to move between cities. In the American context, a city is taken to be a metropolitan region, which is roughly defined as a commutershed satisfying Marchetti’s constant. Gabaix finds that metropolitan regions better fit the Zipfian pattern than cities as defined by political boundaries. Gabaix applies a regression between the natural log of a city’s rank with the natural log of size to 135 metropolitian regions in the United States as of 1991, and he finds that the slope of the regression line is -1.005 and the R² value is 0.986. Under an ideal Zipfian distribution, the slope of the regression line should be -1.0, showing that Zipf’s law fits the data quite well.

Under Gabaix’ (1999) model, a Zipfian distribution of city sizes is the logical consequence of reasonable assumptions about the dynamics of urban growth, and it does not imply that a Zipfian distribution is what city sizes “should” be or that policy should aim to attain such a distribution.

The idea of Zipf’s law for cities predates Zipf’s work and appeared in Auerbach (1913). That work finds that the Zipfian rank/size relationship holds among German cities in 1910.

A consequence of Gabaix’ (1999) model is that, for a Zipfian distribution to hold for city sizes, they must be free to grow in an unconstrained manner within their urban system. If large cities face constraints to growth, then the resulting distribution may lack the expected large cities relative to a Zipfian distribution. Dittmar (2011) finds that this was the case in Western European countries prior to the year 1500, whereas Zipf’s law generally held after 1500. The explanation is that city growth in the Middle Ages was contrained by land availability and high transportation costs, whereas those constraints relaxed in the modern period. Dittmar’s conclusion is based on a dataset of European cities that ever achieved a size of at least 5000 people between the years 1000 and 1800, though due to limited data, the analysis is restricted to the years 1300 to 1800.

Soo (2005) investigates the validity of Zipf’s law for 73 countries using two estimators. With ordinarly least squares, he finds that the pareto exponent is significantly greater than 1–the value predicted by Zipf’s law–for 39 countries, and it is significantly less than 1 for 14 countries. This means that for a larger number of countries, the distribution of city sizes is flatter than one would expect from Zipf’s law. Using the Hill estimator, he finds that 24 countries have a pareto exponent significantly greater than 1, and 12 countries have a pareto exponent significantly less than one.

We argue in our discussion of Marchetti’s constant that urban agglomerations, or metropolitan regions, are more relevant definition of a city than the administrative boundaries. For the 26 countries for which data on urban agglomerations are available, Soo (2005) finds the opposite result. The overall pareto exponent is found to be 0.870 for ordinary least squares and 0.8782 for the Hill estimator, both significantly less than 1. For individual countries, the exponent is found to be signficantly greater than 1 in 16 countries and significantly greater than 1 in two countries. This implies that the size distribution of agglomerations tend to be more uneven than would be expected from a Zipfian distribution, which suggests a factor–we would posit agglomeration economies–that favor the growth of large agglomerations.

Zipf’s Law in Other Contexts

There are many other examples of Zipf’s law, and the more general power law, appearing in natural distributions.

Manaris et al. (2005) examines Zipf’s law in the context of music. They examine several 40 metrics, such as pitch, duration, melodic intervals, and harmonic consonance, that can be analyzed on a MIDI (Musical Instrument Digital Interface) encoding of music. Across a corpus of 192 songs and the 40 metrics, the authors find a slope of the regression line of –1.2023 and an R² value of 0.8233. Across 24 control pieces, generated from DNA sequences, white noise, and pink noise, they find a slope of –0.6757 and an R² of 0.7240, demonstrating a Zipfian pattern in the music.

Louridas, Spinellis, and Vlachos (2008) find widespread examples of power laws in software engineering. In particular, they find that power law distributions, though not necessarily the Zipfian distribution with an exponent of -1, describe the rank/frequency relationship of programming language tokens across many languages and across the degrees of module dependency graphs.

Bender and Gill (1986) offer an example of Zipf’s law in a biological system. They consider the 5375-nucleotide DNA sequence of the bacterial virus ΦX174 and find that the “words” comprising the genetic code satisfy Zipf’s law.

References

Zipf, G. K. “Selected Studies of the Principle of Relative Frequency in Language”. Harvard, MA: Harvard University Press. 1932.

Zipf, G. K. “Human Behavior and the Principle of Least Effort”. Cambridge, MA: Addison-Wesley. 1949.

Kanwal, J., Smith, K., Culbertson, J., Kirby, S. “Zipf’s law of abbreviation and the principle of least effort: Language users optimise a miniature lexicon for efficient communication”. Cognition 165, pp. 45-52. August 2017.

Gabaix, X. “Zipf’s Law for Cities: An Explanation”. The Quarterly Journal of Economics 114(3), pp. 739-767. August 1999.

Gibrat, R. Les Inégalites Économiques. Paris, France. 1931.

Sutton, J. “Gibrat’s legacy”. Journal of Economic Literature 35(1), pp. 40-59. March 1997.

Piantadosi, S.T. “Zipf’s word frequency law in natural language: A critical review and future directions”. Psychonomic Bulletin & Review 21(5) pp. 1112-1130. October 2014.

Li, W. “Random texts exhibit Zipf’s-law-like word frequency distribution”. IEEE Transactions on information theory 36(6), pp. 1842-1845. November 1992.

Dittmar, J. “Cities, Institutions, and Growth: The Emergence of Zipf’s Law”. University of California, Berkeley. August 2011.

Soo, S. K. “Zipf’s Law for cities: a cross-country investigation”. Regional Science and Urban Economics 35(3), pp. 239-263. May 2005.

Manaris, B., Romero, J., Machado, P., Krehbiel, D., Hirzel, T., Pharr, W., Davis, R.B. “Zipf’s Law, Music Xlassification, and Aesthetics”. Computer Music Journal 29(1), pp. 55-69. April 2005.

Louridas, P., Spinellis, D., Vlachos, V. “Power laws in software”. ACM Transactions on Software Engineering and Methodology (TOSEM) 18(1), pp. 1-26. October 2008.

Bender, M.L., Gill, P. “The Genetic Code and Zipf’s law”. Current Anthropology 27(3), pp. 280-283. June 1986.

Auerbach, F. “Das Gesetz der Bevölkerungskonzentration (The Law of Population Concentration)”. Petermanns Geographische Mitteilungen. Translated by Antonio Ciccone. 1913.

Share: X (Twitter) LinkedIn