4 Examples Of The Profound Ways In Which Goodhart’s Law Subtly Affects Your Life
“When a measure becomes a target, it ceases to be a good measure.”
This adge is called Goodhart’s law, named after the British economist Charles Goodhart. It is something whose impact, if you look carefully, can be seen in pretty much every industry around you. Before we look at some real life instances of Goodhart’s law in present day society, let’s look at some easy to get examples.
The sketch below succinctly illustrates how Goodhart’s law works. The manager of a nail factory decides to try and increase the number of nails produced by rating workers based on the number of nails they make. This measure succeeds in increasing the production of nails. But the size of nails has significantly decreased, making them unusable.
What has happened is that in order to get good ratings, the workers have starting focusing exclusively on producing as many nails as possible, and the best way to do this is to produce small nails that consume less time and resources in their production. In this process, they have been neglecting the quality of the nails, because that wasn’t used to rate them. To adress this issue, the management changes the rating method, and starts rating workers based on the weight of the nails they make instead of number.
This succeeds in increasing the size of nails. But because there was no limit placed on weight, the workers are now producing super sized, super heavy nails, which consume a lot of time and resources in production. Consequently the number of nails has fallen even below the levels before ratings were based on the number of nails. And guess what? The giant nails are just as useless as the tiny nails.
This is Goodhart’s law in action. The moment the management started using the number of nails produced to measure the worker’s performance, it ceased to be a good measure, because the workers have now made it a target. Similarly, the moment the management starts using the weight of the nails to measure worker performance, it too ceases to be a good measure as it has now become a target for workers.
Another illustration of Goodhart’s law can be seen in the following scene from the classic 1950s sitcom I Love Lucy. When the boss tells the workers to maintain the specified rate of wrapping of chocolates no matter what, the consequences are disastrous, and hilarious.
Goodhart’s law can be observed in healthcare as well, influencing the decisions of doctors concerning treatment of patients, based on how they are rated. And the 2000s crime drama The Wire illustrates its impact on law enforcement, when the police tries to game the statistics to make crime rate look better than it actually is. Theoretically, if doctors are rated based on the number of lives they save, they will be incentivized to refuse to treat patients who have a high chance of dying. And if cops are rated based on the number of arrests they make, then they will be incentivized to make frivolous arrests.
You might think these doctors and cops would be too morally upright to do this. But the fact is that many of these actions could actually be done by them unintentionally, due to subconscious bias stemming from the rating system. Moreover, the consequences of not getting good ratings, like loosing their job, could be simply too devastating for them to not try and game the rating system, no matter how morally upright they are.
An obvious solution to Goodhart’s law that might arise in your mind is to make the system of measurement more sophisticated. In the example of nail factory for example, you could measure worker performance by both number and weight of nails at the same time. Or you could measure their performance by randomly picking a nail from the worker’s batch and testing its quality by actually hammering it in.
Theoretically, this should counter Goodhart’s law. In real situations, however, this is easier said than done. One reason for this is that The targets set are often unrealistic. So even if you do manage to develop a good system of measuring worker performance, the number of nails you want made in unit time might simply not be possible, either due to the number of workers being too low, or the machinery being of low capability, or even the morale or encouragement of workers being low, or a number of other reasons.
The reason that you face Goodhart’s law is that people try to apply methods of measurement designed for measuring passive systems, on systems that fight back. The speedometer of a car shows the car’s speed based on the amount of deflection the needle experiences due to eddy currents produced. Its possible to use the eddy currents to measure the car’s speed not just because we know that eddy currents increase with car’s speed, but also because eddy currents are not sentient beings who will try to fake high speed in anticipation of some reward.
When you use a glucometer to measure your blood glucose level, you do it knowing that the electronic circuitry in the glucometer will faithfully show the actual value of glucose levels, as it is a passive circuit, not a sentient being capable of showing fake glucose levels for some reward.
Goodhart’s law springs from the assumption that behavior can also be measured using the same methods, blissfully forgetting the fact that unlike a speedometer or glucometer, humans are sentient beings who know their behavior is being measured, who know that the outcome of this measurement will affect the prospect of them being rewarded or punished, and who will fight back by trying to game the method of measurement.
And this is a symptom of the mindset that humans can be treated like non living objects when it comes to studying them or trying to influence their behavior. Let’s look at some more examples of Goodhart’s law putting cold water on someone’s intentions of measuring performance for proper rewarding.
Education
As a kid who grew up in India, I have seen the country’s education system inside out. And I am not impressed by it. Not that it has no achievements to it’s credit. Far from it. But it could have done so much better. When kids, teachers or parents in India talk about education, especially the standards that involve board exams, namely 10th and 12th standards, their conversation usually revolves around the questions that are frequently asked in exams, and the marks those questions carry. Bring up any concept from the syllabus, and eventually the conversation drifts to whether or not it is relevant from “exam point of view”.
Any educationist worth his weight will tell you that this is a worrying sign. Because it shows that the focus of kids, teachers as well as parents has shifted from actually imparting education to getting good scores in exams- a perfect example of Goodhart’s law at work. Exam scores, that were originally meant to assess whether or not kids have attained the goal of getting educated, have themselves become the goal.
The result is that most kids don’t care to actually understand and remember the concepts they are supposed to be taught. Even the coaching centers that many kids visit advertise the high scores their previous students have obtained to increase recruitment.
And if you think this isn’t a big problem, wait until you learn about the percentage of Indian engineers that are employable in the jobs they were trained for. This problem goes back many decades and started with the decisions that were taken by the highest reaches of the government. Pankaj Chandra, former director of Indian Institute of Management, Bangalore, summarizes the issue in an interview to Business Today as follows,
“We expanded education very rapidly – India has larger number of institutions than China, both in terms of colleges and universities. The only way the government could manage is by standardisation. In that process, education got standardised and we forgot that education was about real people and real people are very different from each other. We created one big frame where examinations became the only way to judge merit. If examinations are the only way of getting merit, all the ills followed like coaching classes; anybody who could get 95 per cent is celebrated in society; those who got 50-60 per cent faced a loss of esteem in the society. People thought teaching in a standard way is the best thing to do because it leads to exams and outcomes. Along with standardisation, we said we don’t need to look at the world. We need to look at India,”
Source
There is an urgent need to reverse this trend and reorient the focus of teachers, students, parents and policymakers on properly imparting education. Government is starting to take steps. In 2019, a Draft New Education Policy was released by the Ministry of Human Resource Development. Among other things, it discusses reducing curriculum content so that essential learning can be enhanced, and also focuses on critical thinking and analysis and discussion based learning. Let’s hope it steers the education sector in the right direction.
Research
The careers of most researchers around the world today can be aptly summarized in the following phrase that can be heard all too frequently in research circles- publish or perish. The tradition of publishing your research in a journal goes back many decades. The modern system of publishing in commercial journals began in 1950s with Robert Maxwell, a British soldier and businessman of Czech origin. Maxwell had grown up in a Czech village and fought for Britain in World War 2 under a contingent of European exiles. His service in the war earned him a Military Cross and British citizenship, letting him settle in Britain after the war.
At that time, it was being felt that while British scientific community was as skilled as it could be, the publishing infrastructure was not strong enough to support seamless dissemination of scientific literature among researchers. Businessmen like Maxwell took over the process of publishing from researchers and built it into a massive, profit making industry. This was a good step in the sense that it transformed scientific publishing for good, making it more efficient.
Maxwell, for example, began with shipping scientific articles published by Springer, a German publisher, to Britain. Then when the British government paired the British publisher Butterworths (which is now owned by Elsvier) with Springer, Maxwell joined the company as a manager. Today, Reed Elsvier is one of the biggest publishers in the world. Do read this previous post of mine if you want to know more about the basics of research methodology.
The reason why scientific publishing is such a profitable business is that the publishers get their raw materials (scientific literature) from their customers (researchers) themselves, then they get the researchers themselves to do quality control of that raw materials (peer review), and then sell the finished products to those same customers. The cost of procuring the raw materials and turning them into finished products is therefore very less, while the finished products have a high value in the market.
Things started getting bad, however, after the number of publications started being used as a measure of a researcher’s skill, and became a basis of deciding whether or not a research should get a grant, be promoted, or even be hired in the first place. Today, matrices like impact factor are being used to measure the impact a research article has had on the scientific community, and that’s further used to determine the hiring, promotion and grants of researchers. This is as green a pasture for Goodhart’s law as it can be. And sure enough, the results are starting to be felt. For most researchers today, what was supposed to be only a measure has become a target.
Researchers today decide the project they want to undertake based on the number and quality of publications they can get from it, which in turn depends on how sensational journals find their studies. This inherently biases their decision making process when it comes to deciding the course of the progress of science. What’s more, the number of frivolous scientific publications, as well as shady journals that willingly publish them for a price, is steadily increasing.
Because, as Goodhart’s law would have predicted, maximizing the number of publications has now become the goal of researchers. The result is that while the number of scientific publications around the world has been steadily rising, the actual relevance and impact of most of this published research is doubtful. This is a situation bound to impact the progress of science.
As the video below explains, a recent study actually found that the reliability of most of the published research is doubtful. Due to the pressure to publish, researchers are often tempted to use a statistical trick called p hacking, which gives an impression of statistical correlations where none exist. Scientific studies are supposed to be successfully replicated by other, independent teams of researchers before they are accepted as reliable. But the same publication pressure that leads to frivolous publications, also disincentivizes researchers from carrying replication studies, as they are rarely accepted by journals for publication.
YouTube Algorithm
YouTube is changing the way people view content around the world in profound ways. What makes YouTube different from traditional TV is that it has democratized video content creation, so that anyone can become a video creator instantly, without some gatekeeper deciding if they are skilled enough to do that (unlike traditional TV and movie industry), and their success is decided based on the number of views they get. This democratization has led to the amount of content on YouTube being orders of magnitude larger than that on traditional media.
For example, TV and movie industries create just a few hundred movies and TV shows every year. YouTube on the other hand, creates hundreds of hours of videos every single minute. This creates a unique problem for YouTube- how to help a viewer find the videos he likes in the staggeringly vast amount of videos on YouTube? To address this, YouTube came up with an algorithm.
Originally, this algorithm would treat the channels generating high number of subscribers as good quality and would include their videos in suggestions to viewers more often. This made sense, as videos getting more subscribers would expectedly be of high quality, and therefore worth suggesting to viewers. However, this created a positive feedback loop. Channels that got more subscribers in the beginning got suggested more by the algorithm, which increased their subscribers even more, which in turn made the algorithm suggest them even more, leading to even more subscribers and so on.
This vastly amplified even tiny differences between the channels. Channels that initially had slightly more subscribers eventually saw their subscriber bases grow to massive levels, while other channels got hardly any views or subscribers. To address this, YouTube started suggesting other channels to the viewers, whose content was similar to that of channels that the viewer was already subscribed to. But while this broke the positive feedback loop and allowed all channels the opportunity to be seen, it created an environment where the content creators had to vie for attention of viewers by creating click bait titles and thumbnails, and other methods.
Over time, as new necessities were noticed, the algorithm was modified accordingly, and is still being modified on a regular basis. Among the parameters that the algorithm measures to decide how much to suggest a YouTube channel or video are view count, number of subscribers, video length, frequency with which the channel uploads new videos, type of content in the video and how popular that type of content currently is, among others.
In this journey of trying to device a perfect algorithm, a trend has been noticed that, while frustrating for YouTube, is not surprising when viewed in the context of Goodhart’s law. YouTubers have been chasing the algorithm. In other words, their focus, instead of being solely on catering the viewers, is now also fixed on catering to the algorithm so that their videos get noticed. What was supposed to be an effort to create good quality videos for the benefit of viewers is now also an effort to create videos of the kind that would be more highlighted by the algorithm.
This means that decisions like how long a video should be, how frequently videos should be uploaded, what titles the videos should have, and even what the videos should be about, are often taken based not on what the viewers might like or what the content creator loves to create (which is vital for creativity), but on what the algorithm gives priority to. If for example, you tweak the algorithm to highlight dog videos and not cat videos, a content creator noticing it, will be incentivized to make more dog videos even if she likes making cat videos.
In the case of YouTube, therefore, once the algorithm starts using a parameter as a measure, that parameter also becomes a target for content creators and ceases being a good measure. This forces YouTube to make more tweaks to the algorithm, until the content creators adapt to those new tweaks as well, and the cycle repeats.
Search Engine Optimization
The case of an algorithm being used to rank and reward content creators isn’t unique to YouTube. In fact, it is a common feature of the internet, and is also seen in how search engines like Google decide which web pages should turn up in the search results of a reader. This use of algorithms by search engines to include web pages in search results has led content creators to pursue a process called search engine optimization.
Search engine optimization, or SEO, is the process of fashioning a web page in such a way that it fills all the criteria of a good web page for a search engine, so that it is most noticeable in the search results. This is something I have first hand experience in, as I myself do SEO for this blog of mine. Some of the things that need to be done for SEO actually make sense and are obvious. For example, notice that the paragraphs in my blogposts are usually short. This is because small paragraphs make a post more readable, improving its SEO ranking. Similarly, including images and videos in the post also improves it SEO ranking by making it more lively and readable.
Some other requirements, on the other hand, are downright weird. For example, the reason why I have included the number 4 in the title of this post is that for some reason, posts with at least one number in their title attract more readers. Also notice that I have included the word profound in the title. This is because there is a group of words called “power words”, which tempt a reader to read the post by arousing emotions. Then there are requirements like the focus keyword (which in this post is Goodhart’s law) should be included in the title and meta description, and also appear in the rest of the post with reasonable frequency.
Google’s search algorithm uses all these parameters to decide how high in the search results your webpage should appear. Also factoring into this decision of Google algorithm, is the number of external webpages that cite your webpage, the ranking of those other webpages, and the number of external and internal links in your own webpage.
While some bloggers use these clues judiciously to improve their blog’s SEO ranking, others get overzealous and make fulfilling the criteria of the algorithm a higher priority than writing a well rounded post. Not all my posts, for example, contain numbers in their titles, as I don’t always find the opportunity to. And I won’t settle for a subpar title just to include a number. Not all would settle for a number less title for preserving its appeal though. The same is true with inclusion of power words in the title.
There are also ways in which some content creators try to game the algorithm. For example, they might build multiple different websites and interlink their webpages with each other, to give the algorithm the impression that their posts are frequently cited by other sites. They might also agree with each other to cite each other’s posts without legit reason for the same purpose.
As soon as Google discovers such attempts at gaming the algorithm, it attempts to remedy it by tweaking the algorithm as well as by taking punitive actions against the content creators like taking their webpages off the search results. This is a continuous process triggered by Goodhart’s law, where algorithm uses a parameter as a measure for ranking webpages, and bloggers make that parameter a target and chaise it, prompting Google to tweak the algorithm by using other parameters as measures, in turn prompting bloggers to go after those other parameters, and so on.
A phenomenon somewhat related to attempts to game the YouTube and Google algorithms is click farming. Click farms are composed of large number of hired people, whose only task is to like and click social media posts of clients, so that the algorithm of the social media platform, be it Facebook or Instagram or YouTube or something else, increases their rating and they go viral. This is yet another instance of Goodhart’s law, where number of clicks and likes, being used as a measure by the algorithms by social media platforms, have become the target.
So What Can Be Done?
What to do to counter the effects of Goodhart’s law is a tricky question. The immediate answer that comes to mind is don’t rate anything. But obviously, that’s not always practical. You do need to rate stuff. Be it for deciding which student should be admitted to a course, which candidate should be hired, which country or business is worth investing in, which researchers deserve to be rewarded for their work by being promoted or given grants, or which video or webpage should be included in the search results of an individual.
One solution is to constantly modify the method of measurement to address the attempts at targeting the measures. It takes time and effort for the individuals to train themselves to target a measure. So if the system of measurement changes frequently enough, it could make it too exhaustive for individuals to train themselves to target new measures after every brief interval. Google and YouTube indeed seem to be adopting this strategy by continuously modifying their algorithms. But whether this will one day lead to algorithms so perfect that they can’t be gamed, or will be an endless cat and mouse game of content creators chasing ever changing algorithms, remains to be seen.
Recall that in the nail factory example, if the manager rated workers based not only on the number or size of nails, but both number and size, it would, at least in theory, incentivize the workers to produce nails that are of right size and also reasonably numerous. This is analogous to Google and YouTube trying to perfect their algorithm. In practice, however, its possible that the constraints of material resources, energy and enthusiasm of the workers, will still limit the quality of the nails produced. In fact, its likely that people are temped to make a measure their target when they loose faith in the ability of the system to reward their actual talents and skills.
We can also see cases in which a measure being used to rate individuals has been later discarded by the rating authority. In India, for example, the University Grants Commission (UGC) requires PhD students to publish at least one research article in a peer reviewed journal for the award of PhD. This requirement, which was expected to vet PhD students based on the quality of their research, backfired. It not only led to the proliferation of predatory journals that publish dubious data for a fee, but also caused delays in completion of PhDs by students.
India is now taking steps to combat predatory journals, like making a list of all the reliable journals and recognizing literature published only in them. China too is taking similar steps against predatory journals. But India also seems to have realized that merely narrowing the publication options for students isn’t going to solve the problems, unless the pressure to publish is itself reduced. A committee of researchers has now recommended that this requirement of publication be scrapped so that the business of predatory journals looses steam.
Indian regulators are now trying to prioritize quality over quantity of publications for rating researchers. For this, Consortium for Academic Research and Ethics (CARE) has been founded, which is expected to improve the quality of research in Indian institutions. However, this change of emphasis from quantity to quality could prove just as useless as the change of emphasis from number of nails to size of nails in the nail factory example, if serious effort isn’t made to actually incentivize researchers to undertake good quality research.
Research, by its very nature, involves venturing into uncharted territory. You never know what you might end up finding, if you find anything at all. So expecting a research project to turn up a publication in a high impact factor journal is like expecting to find a diamond in any random ditch you dig. Quality of research must not be measured by parameters like the impact factor of the journal in which it is published. Instead, a culture of recognizing and rewarding good research practices needs to be inculcated in the academia.
There is one other way to prevent Goodhart’s law from coming into play. This is to hide the system of measurement from the people whose performance is being measured. A classic example of this approach is a blinded experiment. This is an experiment in which information that might influence the performance of experimental subjects is withheld from them. For example, there are double blind clinical trials, in which neither the doctor nor the patient knows which drug is being administered and in what dose.
This is because there are numerous psychological biases that can come into play, like observer bias and confirmation bias, if this knowledge is known to them. If the doctor, for example, has a favorable attitude towards the drug, he might unintentionally overlook the drug’s poor performance in the trial. The same applies to the patient when he is asked to report the drug’s effects. A similar approach could be taken in other instances. For example, at least in theory, it could be possible to conceal the algorithm being used by Google and YouTube. In practice, however, this isn’t easy or practical in most cases.
The requirement to publish for a researcher, for example, has little scope for concealment. And the questions asked in exams are also public knowledge, enabling students to focus on them. Moreover, any attempt to conceal the method of rating people is bound to arouse suspicion in people and lead to questioning of the integrity and ethics of the authority carrying out the measurement. In fact, this is a scenario that increases the scope for corruption inside the rating authority, which is why there are rightful demands for greater transparency in institutions.
The ultimate solution to Goodhart’s law, in my opinion, lies not in trying to design a perfect algorithm or changing parameters to be measured. It lies in changing the culture in which people work, and changing the mindset of the people. People who have made truly world changing discoveries, inventions or developments, have one thing in common- they have a passion for what they do, and are not interested in chasing algorithms or measures.
Steve Jobs could make Apple what it is today because his objective was not maximizing profits, but designing devices that the world would love, and more importantly, he himself would love. And Gordon Ramsey is famous the world over not for his love for profits flowing from his food business, but for his love for cooking good food. Systems of measurement and rating are necessary, but conforming to them must not be overemphasized. There has to be a healthy balance.
The passion for what you do should supersede any perceived necessity to pursue measures. The passion for understanding the world around them should supersede the pressure for obtaining high marks for kids. The passion for knowing the unknown should supersede the pressure to publish for researchers. The passion for creating informative and entertaining videos and blog posts should supersede the pressure to be recognized by the algorithm for content creators.
And for this to happen, many things need to come into play in unison. The quality and reach of education needs to be improved because that’s what makes you a responsible and mature individual. The economic growth needs to be high so that people feel economically secure enough to pursue their passions instead of chasing measures. And finally, the system of the government needs to be more efficient so that the faith of people is restored in it. It helps when you trust the system to appreciate your talent enough that you don’t feel the need to game the rating method.
Goodhart’s law comes into play when you are measuring not passive physical attributes like speed or glucose levels, but the performance of sentient beings who know they are being measured and can game the method of measurement to their benefit. And ultimately, it is these sentient beings alone, who can counter Goodhart’s law by resolving to change their ways.
I agree entirely that the staruse of metrics to evaluate people or journals id invalidated by Goodhart’s law (among other things).
But you say that “The tradition of publishing your research in a journal goes back many decades. The modern system of publishing in commercial journals began in 1950s with Robert Maxwell”. I think that publishing started in 1665, with Philosophical Transactions (of the Royal Society). Maxwell certainly exploited the post-war expansion of universities pretty ruthlessly, but I’d maintain that the pressure to publish or perish started in universities. Publishers exploited that to make unreasonable amounts of money.
That’s a valid point.
Great article, excellent examples!
Thank you