How Zipf's Law Can Help You Understand the World Around You

Christine Ying
Bootcamp
Published in
5 min readSep 6, 2023

--

I’ve always been fascinated with languages, perhaps it’s due to early exposure to multiple languages and linguistics courses in college. I grew up speaking Mandarin Chinese and lived in Northern Africa for three years, where I picked up a few Arabic words. I began learning English at 12 and then studied French for three years in high school. These experiences with languages have enabled me to have deeper and more meaningful conversations with people from diverse races and backgrounds.

Image source: https://www.youtube.com/watch?v=4dofBw9r0P4

Imagine my delight when I stumbled upon Zipf’s Law, a mathematically simple and elegant distribution that describes the relationship between the frequency of words in a language and their rank. Simply put, the most common word will occur about twice as often as the second most common word, three times as often as the third most common word, and so on.

Let’s take the English language as an example. The most common word is “the”, appearing in about 7% of typical written pieces. The second most common word is “of”, around 3.5% of the text. The third most common word is “and”, appearing in about 2.8% of the text.

Word-Based vs. Character-Based vs. Mixed Languages

Written languages can be broadly categorized into three types:

  • Word-based languages use words as the basic unit of meaning. Examples include English, French, and Spanish.
  • Character-based languages: Each character represents a word or a morpheme, which is the smallest unit of meaning in a character-based language. When two or more characters are combined, they form a phrase that has a different meaning than the individual characters. For example, combining the Chinese characters 好 (hǎo) and 吃 (chī) mean “good” and “eat” respectively, but when they are combined, they form the phrase 好吃, which means “delicious”.
  • Mixed languages use a combination of both characters and words to represent meaning. For example, Japanese uses a combination of logographic kanji (漢字) characters and syllabic kana characters. Kanji are used for nouns, verbs, and other important words, while kana are used for grammatical elements and less common words.

Zipf’s law is a more accurate fit to describe word-based languages than character-based ones. This is because character frequency isn’t as dependent on rank as word frequency. However, recent studies confirmed that Zipf’s law also holds the character distribution of both classical and modern Chinese texts. Overall, Zipf’s law is a useful approximation of the distribution of word frequencies in natural languages.

On Learning a New Language

The implications of Zipf’s law for language learning are significant. It means that by learning the most common words in a language, you can quickly acquire a basic understanding of the language. For example, if you learn the 750 most common words in English, you will be able to understand about 80% of the text you encounter. The math behind this is based on the Zipf-Mandelbrot distribution.

Naturally, there are always exceptions to this rule. Some languages have a larger number of common words than others. Additionally, those words considered “common” are dependent on the context. For example, the most common words in a medical journal are different from those in a fantasy novel.

Large Language Models

Zipf’s law plays a role in large language models (LLMs) like ChatGPT. LLMs are trained in massive datasets of text, which tend to follow Zipf’s law. When an LLM generates a sentence, it often starts with the most common words in the dataset to create a sentence structure and then fills in the details using less common words.

Populations of US Cities

The same principle can be applied to cities. Based on the 2020 data from the US Census Bureau, the three largest cities are

  • New York (8.8M, 2.7% of the total US population)
  • Los Angeles (3.9M, 1.1%)
  • Chicago (2.7M, 0.8%)

Applying Zipf’s Law to city sizes, we would expect the second-largest city to have a population of around 4.4 million and the third-largest city to have approximately 2.2 million residents. However, it’s important to note that Zipf’s Law doesn’t perfectly apply to the population distribution of US cities. Factors such as a well-established transportation infrastructure and the growth of smaller, more efficient cities have led to deviations from this distribution.

Possible Explanations

Zipf’s law has been observed in a wide range of domains, but there is not yet one general explanation for its wide applicability.

One possible explanation for the phenomenon is that it’s a consequence of how our brains store and retrieve information. When we learn something new, we store the information in our brains in a way to makes it easy to retrieve. The most common items are the ones we are likely to retrieve the most often, so they are stored in the most efficient way.

Below are additional considerations as to why Zipf’s law applies to so many domains with complex systems.

  • Preferential attachment is a mechanism that describes how new entities are more likely to be connected to existing entities that already have a lot of connections. This model can be used to explain why some websites become more popular than others. Websites that are already popular are more likely to receive new links from other websites, which can further increase their popularity.
  • The Pareto principle, also known as the 80/20 rule, explains that 80% of the results come from 20% of the causes. It’s an observation that most things in life are not distributed evenly. In a self-organizing system, such as a bee hive, entities spontaneously arrange themselves into patterns without any external guidance. The queen bee is responsible for laying eggs and is critical for the survival of the hive. She’s seen as the 20% of the causes responsible for the 80% of the results. The worker bees, in contrast to the queen, are seen to be 80% of the causes responsible for the remaining 20% of the results.
  • The principle of least effort: This principle states that people will tend to choose the option that requires the least amount of effort. This principle can be applied to the distribution of city sizes. Cities are often located near resources, such as water, food, and transportation. These resources are essential for people to live and work. However, resources are not evenly distributed across the world. Some areas have more resources than others.

In conclusion, Zipf’s law is a fascinating and important phenomenon that has been observed in a wide range of systems. While the exact reasons for its ubiquity are still not fully understood, it is clear that Zipf’s law can be used to gain insights into the behavior of complex systems.

References:

--

--

Silicon Valley product manager by day, mother of 2 by night. Writing about product leadership, art + technology, AI/ML, and everything in between.