November 25, 2021
Originally published at Economics from the Top Down
It’s been 20 years, but I still remember the feeling. It was a mix of curiosity and unease. I was curious because I was learning something new. But I was uneasy because something didn’t sit right. The place was Edmonton, Alberta, circa the year 2000. The situation? My first encounter with economics: Econ 101.
Interestingly, I can’t remember much of the course content. Instead, what I remember is the feeling. As I grappled with the language spoken by the economics textbook (what I’m calling econospeak), I felt that something was missing. But I couldn’t put my finger on what … and I was too busy to think much about it. So I ignored the feeling, memorized the course content, and moved on.
Today I’m a PhD-trained political economist and I know why I had a bad feeling in Economics 101. It’s because the course wasn’t teaching me about the real world. It was indoctrinating me in an ideology.
I’ve spent much of the last decade trying to understand this ideology. A key part of its appeal, I believe is the language that it uses. Of course, many people recognize that the language of econospeak is part of its ideological potency. And many people have analyzed this language. But what nobody has done (as far as I can tell) is to quantitatively deconstruct econospeak. That’s what I’m going to do here.
I’ve created a word-counting bot that compares the language found in economics textbooks to the English language at large. I’m going to use this bot to analyze econospeak. The results (which I’m only beginning to unwrap) are fascinating. Let’s dive in.
A word-counting bot
Before we get to the specifics of my word-counting bot, I’ll give it some context.
When someone asks you to ‘think critically’ about a text, they want you to compare what the author has said (or not said) to what other people have said. Here’s an example. When I read an economics textbook and see the words ‘rational utility maximizer’, I think of Thorstein Veblen’s phrase ‘homogenous globules of desire’ — a satire of the utility maximizing human. I also think of the novels I’ve read, and how the characters are filled with emotions — emotions that are absent from economics textbooks. In short, I compare the words in the economics textbook to everything else I’ve read.
This example illustrates why critical reading is difficult. To read critically, you have to read widely. The problem, though, is that there’s too much to read — far more than any human could digest. And that’s got me thinking. Is there a way to automate the act of critical reading?
Enter my word-counting bot. No, my bot is not an AI literary critic. It’s far simpler. My bot counts words. The idea is that the words you use (or don’t use) tell us about what you’re thinking. To ‘read critically’, we compare your vocabulary to the vocabulary of other people. We see how frequently you use certain words relative to everyone else. If, for instance, you have sex on your mind, you’ll likely use the word ‘sex’ more than other people. And conversely, you might use the word ‘love’ less than everyone else.
When we (as humans) read a text, we get an intuitive sense for what’s there and what’s not. But my word-counting bot can take this a step further. It can quantify word frequency … and it can do it on a massive scale.
Now let’s get to the specifics. My word-counting bot takes a sample of writing and quantifies the frequency of each word found in the text. It then compares this frequency to what’s found in the English language at large. The bot returns the ratio of the two frequencies — what I’m calling relative word frequency:
The bot can take any sample of text as an input. Here, I feed it undergraduate economics textbooks. From Library Genesis, I’ve downloaded 43 economics textbooks that are standard fare in economics pedagogy. (Details here.) I’ve fed these books to my bot, and it spits out the frequency of each word in the text.
To measure ‘word frequency in English’ I’ve used data from the Google books database. According to Google, this database is the world’s largest repository of full-text books. Conveniently, Google has created the Ngrams database, which reports word frequency in its books corpus. I use word frequency in the Ngram database (which I’m calling the ‘Google English corpus’) to represent ‘word frequency in English’. I will revert, at times, to calling this Google sample ‘average English’. That doesn’t mean it’s what the ‘average person’ speaks. It’s the average of a huge sample of English text.
To summarize, my word-counting bot eats economics textbooks and spits out relative word frequency, defined as:
In my sample of economics textbooks, there are about 34,000 unique words. My bot calculates relative frequency for all of them. But before we look at the whole output, let’s get a feel for the data. Table 1 shows the frequencies of four words found in the textbook sample.
Table 1: Examples of relative word frequency
|Word||Frequency in economics textbooks*||Frequency in Google English*||Relative frequency|
|* Frequency = occurrence per million words|
The first thing to notice is that word frequency varies wildly. Economics textbooks use the word ‘price’ about 100,000 times more than they use the word ‘ditchdigger’. Now, for these specific words that frequency difference isn’t surprising. But the wild variation in word frequency is actually a feature of language in general. Some words (like ‘and’) get used a lot. Other words (like ‘mesonemertini’) are so rare that you’ve never heard of them.
This huge variation in word use is why looking at the absolute frequency of words in economics textbooks isn’t very useful. What matters is not absolute frequency, but relative frequency — word use relative to the average. And this relative frequency, it turns out, is not necessarily related to absolute frequency.
Table 1 illustrates this fact. Economists use the word ‘price’ a lot. And as you’d expect, they use it more than average — about 40 times more. Conversely, economists almost never use the word ‘ditchdigger’. You’d have to read about 10 million words of econospeak to see ‘ditchdigger’ once. And yet economists use the word ‘ditchdigger’ 9 times more than average. So even though ‘price’ and ‘ditchdigger’ have wildly different absolute frequencies, both are overused by economists.
Let’s continue. Economics textbooks use the word ‘science’ far more than the word ‘ditchdigger’. Yet it turns out that they’re using ‘science’ less than average. (Given economics’ pseudoscience state, I can’t help but laugh at this result.) And what about ‘murder’? Economics textbooks use it about 30 times more than ‘ditchdigger’. But this constitutes underuse. Economists use the word ‘murder’ 20 times less than average.
What’s important here is that a word’s absolute frequency (in economics textbooks) doesn’t predict its relative frequency. We’ll return to this fact later.
Now that you understand what my word-counting bot does, let’s dive into the data.
The ‘shape’ of econospeak
When we analyze language, most people are interested in the specific words that are used. (I will get to specific words, don’t worry.) The problem, though, is that my bot returns data for about 34,000 words. That’s far too many words to discuss individually. But what we can do is look at the ‘shape’ of these words. I’m calling this the ‘shape’ of econospeak.
I’ve plotted this shape in Figure 1. Here I show the distribution of relative word frequency in economics textbooks. The horizontal axis shows the ratio of textbook frequency to Google frequency. (Note that I’ve used a log scale, so each tick mark indicates a factor of 10). The vertical axis shows ‘word density’ — the portion of words with the given relative frequency.
Let’s dissect this ‘shape’ of econospeak. We’ll start with the vertical red line that I call ‘frequency parity’. This line indicates that a word occurs at the same frequency in economics textbooks as it does in the Google corpus. (Its relative frequency is 1.) If the vocabulary in economics textbooks was identical to the vocabulary in the Google corpus, the distribution in Figure 1 would clump around the red line. But it doesn’t. That tells us that econospeak is different than average English. Hardly surprising.
Let’s talk about how econospeak is different. Notice that the peak of the econospeak distribution is to the left of frequency parity. That’s interesting. It suggests that economics textbooks use a large portion of English words less than average. I honestly didn’t expect this result, and am still trying to interpret it.
Here are two possibilities. First, the underuse of common English words could be a defining feature of econospeak. Alternatively, this underuse could be a feature of any branch of specialized writing. Either way, though, the result is important. It suggests that a key feature of econospeak is its underuse of a large chunk of the English language.
Let’s move on to the extremes of econospeak — the words that are most overused and most underused (relative to average English). These words live in the tails of the relative frequency distribution (shown in color in Figure 1). Notice that these tails are true extremes. In econospeak, some words are used 1000 times more than average. And other words are used 1000 times less than average.
I know you want to see these words. But before we get there, we need to do some statistics. Whenever we do empirical work, we need to make sure that our results aren’t caused by chance. We don’t want to get fooled by randomness. (Hat tip to Nassim Nicholas Taleb for this phrase). With randomness in mind, consider a thought experiment. Suppose that my econospeak data is actually a random sample of words taken from the Google English corpus. If this were true, what would the distribution of relative word frequency look like?
In statistics, this thought experiment is called the ‘null hypothesis’. To test the null hypothesis, we randomly draw words from the Google corpus and compare the result to our econospeak data. Figure 2 shows this comparison. Here, I’ve taken a random sample of 7.7 million words (the size of my econospeak sample) from the Google English corpus. For each randomly drawn word, I’ve calculated its frequency relative to the entire Google corpus. The red curve shows the resulting distribution of relative word frequency.
Let’s dissect this result. The null hypothesis has a huge peak around frequency parity. That means that most words in our random sample occur at the same frequency as in the Google corpus. That’s unsurprising. (We are, after all, sampling from the Google corpus.) What is surprising, though, is that our random sample produces about the same number of overused words as found in econospeak. To see this fact, look at the right tails of the distributions in Figure 2. The right tail of the null hypothesis is similar to the right tail of the econospeak distribution. Does this similarity mean that economists are randomly overusing words? Yes and no. As you’ll see shortly, there’s more to the story.
What’s most important, in Figure 2, is not word overuse, but word underuse. Looking at the two distributions, we see that the left tail of the econospeak distribution far outreaches the left tail of the random sample. This tells us that econospeak’s underuse of many English words cannot be due to chance.
This result is fascinating, and I’ll return to it throughout the post. It suggests that econospeak is defined not by what it says, but by what it doesn’t say.
The most overused and underused words in econospeak
Now that we’ve looked at the ‘shape’ of econospeak, let’s get more concrete. Let’s look at the words that economics textbooks most overuse and most underuse. The results will surprise you.
We’ll start with the words that economics textbooks most overuse relative to average English. Figure 3 shows these words in a cloud. The larger the font, the more the word is overused.
If you’ve ever read an economics textbook, the words in Figure 3 are not what you’d expect. You’d think that the most overused words would be economics jargon — terms like ‘supply’, ‘demand’ and ‘market’. And yet this jargon is nowhere to be found. Instead, Figure 3 shows a collection of bizarre words. (‘Grasshopperish’ … seriously?) What’s going on here?
I’ll be honest. I didn’t anticipate that the most overused words in econospeak would be oddballs. But I understand (now) how it happens. Our intuition is that we overuse words by writing them many times. But this is only one path to overuse. The other path is to pick an extremely rare word and use it a few times. It’s this other path to overuse that explains why Figure 3 is filled with oddballs.
Take the word ‘outtell’. In the Google corpus, it appears once every 10 billion words. That’s so rare that you’d likely not see it in a lifetime of reading. The word ‘outtell’ is also rare in econospeak. It occurs just 4 times in my sample of textbooks. But that’s enough to constitute massive overuse. The same is true for many of the words in Figure 3. They’re rare words that got used a few times by economists.
Not all the words in Figure 3, however, are oddballs. Some of them are recognizable jargon (for instance, ‘loanable’, ‘monopolist’ and ‘oligopoly’). How do we distinguish this jargon from the quirks? We’ll get there shortly.
First, though, let’s look at the most underused words in econospeak. Figure 4 shows these words. Here, a larger font indicates that the word is more underused.
I could write an essay about the words in Figure 4. But I have other results to show you, so I’ll reflect on just a few of the words.
First, it seems that econospeak underuses many religious words (for instance, ‘jewish’, ‘jesus’, ‘god’, ‘gospel’, ‘islam’, ‘ritual’, etc.). This underuse is in some ways banal. We can think of English writing as having two sides — a secular side and a religious side. Secular writing will tend to underuse religious words. And religious writing will tend to underuse secular words. So what we’re seeing, in Figure 4, is that econospeak is secular. That’s no surprise.
Economics textbooks, however, are a very particular type of secular writing. They’re promoting a secular ideology. And that makes economists’ underuse of religious words more interesting. Framed this way, we can think of Figure 4 as showing two contrasting ideologies. The secular ideology of economics largely excludes the language used by religious ideologies. Fascinating.
Let’s move on to another important result. The most underused word in econospeak is … drum roll please … ‘anti’!
I didn’t expect this result. (Did you?) But I’ve had a few weeks to think about it, and I’ve realized that it’s quite revealing. Here’s why. The word ‘anti’ offers a succinct way of saying you’re opposed to something. As in:
Bob is anti slavery.
The near total absence of this word in economics textbooks speaks volumes about economics ideology. If you talk in econospeak, it’s difficult to voice opposition. That’s because economists frame opinions in terms of ‘preferences’. As in:
Bob has a preference for Cheerios.
Such banal opinions litter economic textbooks. What about more serious opinions? If you talk like an economist, it’s easy to voice support for something. For instance:
Bob has a preference for slavery.
But how do you use the language of ‘preferences’ to voice opposition? You must resort to a torturous double negative:
Bob has a preference for not having slavery.
Such indirect language, you’ll notice, defangs Bob’s opposition. Compare the turgid sentence above to the simple alternative:
Bob is anti slavery.
Now Bob’s opposition is clear. And that’s why it’s so revealing that econospeak almost never uses the word ‘anti’. Economic textbooks are selling an ideology that legitimizes the status quo. And the best way to do that is to mute any talk of opposition. Purge ‘anti’ from your vocabulary.
Quadrants of econospeak
Now that we’ve looked at the most overused and underused words in econospeak, let’s look again at the big picture. The most overused words tended to be oddballs (Figure 3). How do we separate these quirks from more common economic jargon?
Figure 5 shows one way to do so. Here I’ve divided econospeak into four quadrants. Before we talk about each quadrant, let’s discuss the whole chart. In Figure 5, each point is a word. The horizontal axis shows the word’s frequency in economics textbooks. The vertical axis shows the word’s frequency relative to the Google corpus.
Now to the quadrants. What’s important about the quadrants is that they identify different types of overuse and underuse. ‘Quirks’ and ‘jargon’ are both overused relative to the Google corpus. But they take different paths. ‘Jargon’ is used frequently in economic textbooks. But ‘quirks’ appear rarely. (I’ve used 50 occurrences per million words as the dividing line between quirks and jargon.)
Let’s start with ‘jargon’. You can see, in Figure 5, that the ‘jargon’ quadrant contains familiar words like ‘price’, ‘market’ and ‘demand’. These are among the most common words in econospeak. And they’re overused relative to the Google corpus. That’s not surprising.
Now to the ‘quirks’ quadrant. This is where the oddballs live. Sure, there are some jargony words here (like nonmonopolistic). But the further left we go, the odder the words become (i.e. ‘cumquat’). These quirks are rare in economics textbooks, and yet still overused relative to the Google corpus.
Now to the different types of underuse. The ‘under-represented’ quadrant contains words that are used frequently in economics textbooks, but are still under-represented relative to the Google corpus. Here you’ll find many words related to social groups and human institutions. (More on this later.)
Last, we have the ‘neglected’ quadrant. These are words that economists use rarely. But unlike ‘quirks’ (which are rare outside of economics), ‘neglected’ words are common in average English. So in the ‘neglected’ quadrant, we find words that are massively underused. This quadrant is a goldmine for economics critics. If something is missing from economic theory, its vocabulary is probably in the ‘neglected’ quadrant.
I know you want to see more of the words in each quadrant. (Skip ahead if you’d like.) But first we need to do more statistics. Let’s again compare econospeak to the ‘null hypothesis’. The null hypothesis, to remind you, is what happens when we randomly draw words from the Google corpus. Figure 6 shows how econospeak stacks up against this random sample. Again, each point is a word. Blue points are econospeak. Red points are the null hypothesis.
Here’s what the null hypothesis tells us. Many econospeak ‘quirks’, it seems, can be chalked up to chance. We know this because in the ‘quirks’ quadrant, many of the red dots (the null hypothesis) overlap blue dots (econospeak). This means we shouldn’t make too much of economists’ overuse of words like ‘cumquat’ and ‘decafs’. It’s probably just a matter of chance.
What’s important, though, is that all the other forms of overuse/underuse cannot be caused by chance. The null hypothesis does not create jargon. Nor does it create under-represented words, or extremely neglected ones. So in statistical terms, econospeak is significantly different than average English. Of course, if you’ve ever read an economics textbook, you already knew that. But here’s a quantification of your intuition.
Let’s get concrete again and talk about actual econospeak words. Figure 7 shows the top econospeak jargon. These are the words in the ‘jargon’ quadrant that are the most overused relative to average English. There aren’t many surprises here — just typical econospeak jargon.
Let’s use some of this jargon to make a paragraph of econospeak:
The loanable funds staved off deadweight losses, brought on by firms acting monopolistically. Demanders, however, were not aware of the diseconomies of scale that caused recessionary trends away from equilibrium. But microeconomists knew that, ceteris paribus, prices were not respecting mpl or mpc. So they ate bushels of inelastic pizza.(jargon in bold)
OK, you’re unlikely to find such turgid writing in an undergraduate economics textbook. But this sentence is a fitting parody of the neoclassical economics literature. To the outsider, it’s incomprehensible gibberish. Actually, neoclassical economics is gibberish. The point of the jargon is to stop you from figuring that out.
Now to the top econospeak ‘quirks’, shown in Figure 8. These are words that economists use rarely, but still overuse relative to average English. Here, a larger font means that the word is more overused.
Unlike ‘jargon’, ‘quirks’ don’t jump out when you read economics literature. In fact, many of them are unique to a single textbook — they’re a quirk of a particular author. Lots of quirks result from non-hyphenation of usually hyphenated words. Some quirks may be typos. And a few of them, I’ll admit, could be an artifact of my word-counting bot. To analyze the textbooks, the bot converts PDF files to text files. The conversion isn’t perfect, and can introduce random errors. These show up in the ‘quirks’ quadrant.
Of the four quadrants of econospeak, ‘quirks’ are the least important, so I won’t analyze them much. Still, let’s try out a quirky paragraph:
Despite their grasshopperish legs, the superathletes tended to be homebodies. Their frontierlike, nondepreciating overdiscounting led to an underofficial refrainer.(quirks in bold)
This paragraph has the flavor of econospeak. But the quirks are mostly just oddballs. I won’t pay much attention to them here.
Under-represented in econospeak
Now we’re getting to the meat of the analysis. What’s most interesting about econospeak is not what it includes, but what it excludes. Economics textbooks underuse a large portion of the English language. Let’s have a look at this underuse.
We’ll start with the ‘under-represented’ quadrant. These are words that are used frequently in economics textbooks, but still less than in average English. Figure 9 shows the most under-represented words. Here, a larger font indicates more underuse.
Let’s write a sentence with some of these words. Unlike before, though, this sentence won’t be a parody of what economists say. It will capture what they don’t say. Here’s try number one:
Before his death, the man went to court. His child asked about the fire … but he looked away. Hope had no purpose.(under-represented words in bold)
This is a sentence you’d expect in a novel. It’s personal. It deals with a life or death situation. And it has emotion. These are things that econospeak tends to exclude.
Here’s try number two:
The woman’s status in the committee was a matter of history. The commission on professional organizations had decided that evidence-based administration was essential.(under-represented words in bold)
This sentence picks out bureaucratic language. The fact that such talk is under-represented in economics is telling. Economists pays attention to competition between groups, but not to the bureaucratic dynamics within groups.
Neglected by econospeak
Let’s now look at words that are neglected by econospeak. These are words that economists almost never use — and this rarity constitutes massive underuse relative to average English. Figure 10 shows the most neglected words. The larger the font, the more the word is neglected.
I’ll try my hand at a paragraph with these words:
The Jewish man was anti Islam. He believed he was God’s servant. His submission to the scriptures was based on his counselor’s teachings. God was his commander and savior. This was his eternal ritual.(neglected words in bold)
What we get, when we use these neglected words, is religous speak. If we treat mainstream economics as a science, then this result isn’t very surprising. It would be astonishing to find a science textbook that read like the Bible. But mainstream economics is not a science. It is an ideology. And so the fact that this ideology neglects religion is important. It highlights that there are two competing ideologies here.
There are many other neglected words (in Figure 10) worth discussing. But I have more results to show you, so onward.
Not speaking about power
The purpose of an ideology is, in large part, to legitimize the powers that be. In this regard, the ideology of economics is a bit odd.
Most ideologies legitimize power explicitly. They effectively say ‘this person is powerful, and you should obey their command’. Take, as an example, feudal ideology (i.e. religion). Feudal rulers boasted openly about their power, proclaiming that it stemmed from God. It’s no surprise, then, that religion is laced with terms like ‘commandments’ and ‘submission’. The devotion to justifying power is overt.
With economics, though, things are different. Economists don’t overtly praise the powerful. Instead they hardly talk about power at all. That leads some people to conclude that economics isn’t an ideology. But that’s a mistake. Economics is an ideology, but it wraps its justification for power under a pretense — namely ‘freedom’. In capitalism, corporate rulers don’t have the ‘power’ to command. They have the ‘freedom’ to command.
I’ve written about this subterfuge in The Free Market as a Double Lie. I showed how free-market speak became more popular at the same time that corporate power became more concentrated. If you take free-market speak literally, this trend makes no sense. But if free-market speak is subterfuge for justifying power, then the pieces fit together.
Here I want to look at the flip side of the equation — not speaking about power. What defines econospeak is that power is conspicuously absent. It’s a linguistic turn that George Orwell noticed almost a century ago. Politicians of the time, Orwell observed, had started to speak in torturous euphemisms. When militaries committed massacres, politicians call it pacification. Today, we’re so used to this euphemistic language that we hardly notice it. What we would notice is if a politician spoke plainly. Imagine a politician proclaiming:
Let the slaughter begin! The sons of this king will die because of their ancestors’ sins. None of them will ever rule the earth or cover it with cities.
This morbid passage, if you’re wondering, is from the Old Testament. It’s the ‘Good News’ translation of Isaiah 14:20. Surrounded by the euphemisms of modernity, we forget that people ever spoke so plainly. They did so, presumably, because the justification for power was overt. God was on their side.
Today, God is (mostly) off table. And that means power is justified through subterfuge. Instead of praising power, you leave it unsaid. As the dominant secular ideology, economics reflects this subterfuge. In economics, talk about power is conspicuously absent.
If you’re a good critical reader, you can notice this absence. But here I’ll go a step further and quantify it. I’ve gone through the thesaurus and picked words that relate to wielding and submitting to power. Figure 11 shows their frequency in econospeak.
The results, in Figure 11, are fascinating. The majority of words about power fall in the neglected quadrant. This speaks volumes about economics ideology. Economists don’t talk openly about power. That would ruin the subterfuge.
In simpler times, rulers boasted of their power. British imperialists, for instance, celebrated openly as they conquered the world. (For them, ‘imperialism’ was a good word.) But today, rulers talk in econospeak euphemisms. It’s not imperialism … it’s ‘free trade’!
Missing from econospeak
So far we’ve discussed words that are overused in econospeak, and words that are underused. Now let’s talk about words that are absent.
My sample of econospeak contains about 7.7 million words. In such a large sample, it’s no small feat for a word to be missing entirely. Unless the word is utterly obscure, its absence is important.
So what words are missing from econospeak? Many, obviously. But to frame the question, ask yourself — what is the most popular English word that economists don’t utter?
I’ve asked people on Twitter to take a guess. (See the responses here, here and here.) I’ve plotted these guesses in Figure 12. Notice that this is plot of words that are present in econospeak. That’s because almost no one managed to guess an actual missing word. Instead, the Twitterati were good at guessing neglected and under-represented words.
As with many of the plots in this post, I could write an essay about the words in Figure 12. But I have more results to show you. So let’s move on.
Let’s talk about the words that are actually missing from economics textbooks. I know you want to see the words themselves. (Skip ahead if you want.) But I first want to look at the structure of these missing words.
To understand this structure, it helps to have an analogy. Let’s think of the English language as a fully stocked buffet. The different foods represent words. Eating a food represents speaking a word. What we’re interested in here are the leftovers. These are the words that remain unspoken after economists finish talking.
Figure 13 shows one way of visualizing these ‘uneaten’ words. We start with the ‘English-language buffet’ (the red box). These are all the words in the English language (or in this case, a list of about 430,000 words from the Google English corpus). Before you’ve spoken anything, the language buffet is a square. The horizontal axis shows a word’s popularity, as indicated by its percentile in the Google corpus. The vertical axis is the portion of these words that you haven’t used.
Before you talk, the unused portion is 100% everywhere (you haven’t said anything). As you ‘speak’, you eat away at the buffet. If you speak in obscure prose, you eat away at the left side of the buffet. If you use only common words, you eat away at the right side.
What we’re interested in here are economists’ leftovers. When they speak (by writing textbooks), what words do they leave behind? Figure 13 shows the structure of these econospeak leftovers.
According to Figure 13, economics textbooks leave behind most of the English language — almost everything in the bottom 80% of words. The question is — what does this mean?
To interpret the econospeak leftovers, we’ll turn again to the ‘null hypothesis’. Recall that this is what happens when we randomly draw 7.7 million words from the Google corpus. Here, we’ll look at what the null hypothesis leaves behind. Figure 14 shows how the null hypothesis leftovers compare to what economists leave unspoken.
Like econospeak, the null hypothesis leaves behind most of the English language. (The bottom 70% of words remain largely unused.) So the fact that economists don’t use obscure words is unremarkable. It’s a basic feature of language. What makes words obscure, after all, is that few people use them.
There is, however, an important difference between econospeak and the null hypothesis. Econospeak leaves behind many popular words found in the top 10%. In contrast, the null-hypothesis leaves behind virtually none of these top words. So the fact that economists leave many popular words unspoken is statistically significant. (That said, it’s not clear if this is a distinguishing feature of econospeak, or if it’s found in all types of specialized writing. Figuring that out will take more digging.)
Let’s have a look at econospeak’s top 10% leftovers. Figure 15 zooms in on this part of the distribution. The histogram shows the shape of some 16,000 words that economics textbooks omit. That’s far too many words to discuss. But to give you a sense for what words are there, I’ve labelled some examples (of my choosing). These words appear at their corresponding percentile in the Google corpus. (Their vertical position doesn’t mean anything — it’s purely aesthetic.)
From the examples in Figure 15, I think you’ll agree that there are many important words that economists don’t utter. And what the statistics tell us is that this non-utterance is a choice. It cannot be chalked up to chance.
What’s interesting — and worth looking at more rigorously — is the absence of words about conflict. I’ve shown some of these words in Figure 15. (I’m sure there are many others.) It seems that economists don’t speak about ‘racism’, ‘defiance’, ‘patriarchy’, ‘treachery’, ‘sexism’, or ‘dispossession’. The absence of these conflict words is fascinating. Far from being random, I believe it’s a core part of economics ideology. Economics legitimizes power relations by pretending they don’t exist.
The top missing words
Now to the results that many of you have been waiting for. Let’s look at the most popular English words that are missing from economics. Figure 16 shows these words. The larger the font, the more popular the word.
The most popular word that’s absent from economics is … ‘Christ’. That’s an interesting result. But it says more about culture outside of economics than it does about econospeak. Economics is a secular ideology, so it’s no surprise that the last name of a Christian prophet goes unmentioned. What’s interesting is that outside of economics, the word ‘Christ’ is hugely popular. Again, this is a sign of religion’s lasting influence. In capitalist societies, religion may not be the dominant ideology, but its influence remains significant. (It’s informative that no one in my Twitter circle guessed the word ‘Christ’. It suggests that I live in a bubble of atheists.)
There’s much to be said about the other words in Figure 16. But I’ll conclude with just one observation — a fitting irony. We can use words that are absent from econospeak to describe economics ideology:
Economists are the high priests of capitalist society who worship at the altar of the free market. But their doctrines are not based on evidence. Instead, economics is a type of secular theology based on scripture.(absent words in bold)
Ideology as the unsaid
I started this word-counting project after skimming Gregory Mankiw’s textbook Principles of Economics. I noticed that he used the word ‘distort’ a lot. Peppered throughout his book were whoppers like this:
Almost all taxes distort incentives, cause people to alter their behavior, and lead to a less efficient allocation of the economy’s resources.(Mankiw in Principles of Economics)
Mankiw’s love for the word ‘distort’ got me thinking — how much does he use this word compared to the average? And so my word-counting bot was born. My initial focus was on the words that were overused. This overuse, I thought, would quantify economics ideology. (FYI: Mankiw does say ‘distort’ a lot. He uses it about 50 times more frequently than average.)
As I started to crunch the numbers, though, I realized that what is most interesting about econospeak isn’t what is overused. What’s interesting are the words that are underused or left unsaid. It’s this (relative) absence, I now believe, that’s key to understanding the ideology of economics.
It reminds me of George Lakoff’s book Don’t Think of an Elephant! If you want somebody to not think about something, the last thing you should do is tell them so. (You’re thinking of an elephant, aren’t you?) Herein lies the genius of economics ideology. Its purpose is to legitimize the status quo. It does so by getting you to think about a free-market fairy tale. While that’s got your attention, you don’t notice that power (and its many injustices) aren’t discussed.
To deconstruct economics ideology, in turn, entails talking about these absences. That’s difficult. What’s in a text is obvious. What’s not there is harder to see. Hence my unease when I took Economics 101. My gut was telling me that something was missing. But exactly what eluded me. Now I know. … because I ran the numbers. The data shouts loud and clear that a large part of the English language is absent from economics textbooks. It’s ideology through the unsaid.
Download the econospeak data and code
I know that many of you want to explore my econospeak data. To quench your thirst, I’ve included lists of the top 500 overused, underused, and missing words. See them here. I’ve also provided links (below) to my whole econospeak dataset. Lastly, I’m going to make an interactive chart that let’s you explore the structure of econospeak. Stay tuned for that.
Sources and methods
Table 2 shows my sample of economics textbooks. Although not exhaustive, this sample contains most of the standard textbooks used in undergraduate economics courses. When creating the sample, my restriction was that the textbooks should be published within roughly the same decade (here, 2004–2014) and that the books are available on Library Genesis. When possible, I tried to get the ‘micro’, ‘macro’ and ‘general’ versions of each book.
The resulting sample of econospeak contains about 7.7 million words, with a vocabulary of roughly 34,000 words.
Table 2: The sample of economics textbooks
|Blanchard & Johnson||Macroeconomics||2012|
|Case, Fair & Oster||Principles of Microeconomics||2008|
|Case, Fair & Oster||Principles of Macroeconomics||2011|
|Case, Fair & Oster||Principles of Economics||2012|
|Cowen & Tabarrok||Modern Principles of Economics||2011|
|Frank & Bernanke||Principles of Economics||2008|
|Frank & Bernanke||Principles of Macroeconomics||2008|
|Frank & Bernanke||Principles of Microeconomics||2008|
|Hubbard & O’Brien||Economics||2009|
|Hubbard & O’Brien||Macroeconomics||2011|
|Hubbard & O’Brien||Microeconomics||2013|
|Krugman & Wells||Macroeconomics||2005|
|Krugman & Wells||Economics||2009|
|Krugman & Wells||Microeconomics||2012|
|LeRoy Miller||Economics Today: The Macro View||2011|
|LeRoy Miller||Economics Today: The Micro View||2011|
|LeRoy Miller||Economics Today||2011|
|Mankiw||Principles of Economics||2008|
|Mankiw||Principles of Macroeconomics||2011|
|Mankiw||Principles of Microeconomics||2011|
|McConnell, Brue & Flynn||Macroeconomics||2006|
|McConnell, Brue & Flynn||Economics||2008|
|McConnell, Brue & Flynn||Microeconomics||2011|
|Nicholson & Snyder||Microeconomic Theory||2004|
|Nicholson & Snyder||Microeconomic Theory||2007|
|Nicholson & Snyder||Microeconomic Theory||2011|
|Parkin, Powell & Matthews||Economics||2005|
|Pindyck & Rubinfeld||Microeconomics||2012|
|Pindyck & Rubinfeld||Microeconomics||2014|
|Rittenberg & Tregarthen||Principles of Economics||2009|
|Rittenberg & Tregarthen||Principles of Microeconomics||2009|
|Rittenberg & Tregarthen||Principles of Macroeconomics||2009|
|Samuelson & Nordhaus||Economics||2009|
Notes: I downloaded the textbooks as PDFs and extracted the text using the Linux function
pdftotext. This conversion can sometimes induce errors (often due to non-standard fonts). It’s possible that some of the quirks in econospeak are caused by faults in the PDF-to-text conversion.
Google English corpus
I’ve used Google’s 2020 1-gram corpus, which measures the text frequency of one-word phrases in the Google Books database. You can download the data here. (Warning: the 1-gram dataset is about 46 GB).
I use Ngram data over the years covered by the textbooks (2004-2014). For each word, I calculated its mean frequency over these years, weighted by the portion of the textbook sample published in each year.
I restrict both the econospeak sample and Google English sample to words that are in a predetermined ‘dictionary’. My dictionary consists of the following:
- The Grady Augmented word list from the R lexicon package
- The Project Gutenberg word list from Moby Word II
- An English word list from Leah Alpert
From this word list I remove/change the following:
- remove words with fewer than 3 letters
- remove common first names (using R
- remove common last names (using R
- remove prepositions (using R
- remove English numerals from 1 to 100 (i.e. one, two, three …)
- remove ‘stop words’ (using R
- change British spellings to American (as in labour → labor)
- convert all words to lower case
- remove acronyms (words containing ‘.’)
- remove contractions (words containing apostrophes)
- remove hyphenated words
- remove words containing numbers 0-9
The resulting ‘dictionary’ contains about 500,000 words.
Econospeak word lists
Browse the top 500 overused, underused and missing words:
Top 500 overused words
|Word|| Frequency relative to
|Word|| Frequency relative to
Top 500 missing words