Discernment matters even more

12 Mar 2024 | A statement is not fact, Data is not evidence, Diversity

In 2015, 2018, and 2020, McKinsey released a trio of papers claiming that diversity has a positive causal impact on firm performance, titled “Diversity Matters”, “Delivering Through Diversity”, and “Diversity Wins”. These studies make basic errors, as highlighted by Green and Hand (2021) and others, yet were widely cited – perhaps due to confirmation bias.

McKinsey have now doubled down in their latest (November 2023) report, entitled “Diversity Matters Even More“. They claim that “the business case is the strongest it has been since we’ve been tracking”, that their new report presents “The most compelling business case yet”, and that “The business case for gender diversity on executive teams has more than doubled over the past decade.” Not only is the claimed performance boost from diversity greater than ever before, but the dataset is larger than ever before, suggesting the reader should have even greater confidence in these results: “we drew on our largest dataset yet – spanning 1,265 companies, 23 countries, and six global regions”.

But as is well known, “garbage in, garbage out”. It doesn’t matter how many numbers you crunch – if the methodology is flawed, the results are meaningless. McKinsey make the same basic errors that they did in all their earlier papers, despite the numerous problems having been pointed out (and covered in non-academic outlets such as the Wall Street Journal), and also by other researchers. In addition, McKinsey almost certainly know of the Green-Hand critique as Green and Hand called one of the authors of the new paper asking questions about their earlier studies to try to replicate it. In their new paper, McKinsey repeatedly justify their methodology by appealing to consistency with their other reports. But entrenching yourself in the status quo is the opposite of diversity, which is about learning from different viewpoints.

Another aspect of diversity is to form diverse teams that can bring dissenting perspectives. But the six-person research team was composed solely of women, the majority of whom are also ethnic minorities. Due to confirmation bias, they might want to find that diversity matters (just as I would, being an ethnic minority myself) and thus turn a blind eye to the glaring errors. In addition, none has a PhD in economics, finance, or any business discipline which is a basic qualification to do scientific research. McKinsey is a premier consulting firm, but that is very different from having expertise in scientific research. The team composition makes it more likely that the study is advocacy, rather than scientific research.

The purpose of this post is to highlight the flaws in the new study to help readers know what they can take away from it. Note that absence of evidence is not evidence of absence. As I have written many times elsewhere, that one particular study on diversity is problematic does not mean that there is definitely no business case for diversity (more careful studies might uncover one); in addition, even if there is no business case, there may still be a moral and ethical case. But it is neither moral nor ethical to parade flimsy studies as “the most compelling business case yet”, in particular when the flaws have already been highlighted.

1. Correlating Diversity in 2022 With Financial Performance in 2017-21

Many diversity studies use dubious measures of financial performance, throw away data, and have inadequate controls. This study has all of these problems, as I will shortly describe. But the McKinsey study makes an even more basic error absent from the other studies: they measure diversity after they measure financial performance! In their own words, “The analysis of this report is based on 2022 data on diversity in leadership teams and 2017-2021 data on financial performance”.

This makes it very likely that any relationship is due to reverse causality: it is financial performance that allows companies to invest in diversity, rather than diversity causing financial performance. (Indeed, my own work finds that financial strength is associated with superior future diversity, equity, and inclusion).

In a box, McKinsey claim they have also conducted an “analysis of synchronous data”, but they never display the results of this analysis, and it is still subject to reverse causality concerns: financial performance in 2022 could drive diversity in 2022. Worryingly, they write that, for this additional analysis, “for ethnicity, we limited them to analyses with years that had shown a statistical significance in our baseline scenario”. In other words, they cherry picked the years that were particularly to lead to a significant result.

2. Inappropriate Measure of Financial Performance

The paper is remarkably non-transparent about its methodology. The body of the paper never describes the sample of firms included in the study, what their dependent and independent variables are, and so on. This may be to stop people replicating their study as their prior research was found to be irreplicable. It is as if they hope that people will accept the results because they want them to be true, and not ask any questions (again, the opposite of diversity of thought).

The paper repeatedly refers to “financial outperformance”, but never explains the financial performance measure used until the Appendix, on p47 of the 52-page study. Aside from footnotes, it refers to “profitability” only once in passing on p13, which is inadequate as there are many measures of profitability. Only on p47 do we learn that financial performance is measured by profitability, which in turn is measured by EBIT alone. This is problematic, because there are very many ways to measure profitability (gross profit margin, EBITDA, return on equity, return on assets). McKinsey’s earlier results were earlier shown to be untrue for all of these alternative profitability measures, leading to concerns about cherry picking the one measure that worked. Given these concerns, it is important to show robustness to alternative profitability measures in this new study – but they don’t.

Moreover, it is not clear whether you should be measuring profitability at all. The most relevant performance is (long-term) Total Shareholder Return. TSR is what investors actually receive. TSR is far more comprehensive than EBIT (or any profitability measure). If a company announces a new patent or wins a big customer contract, it will boost the stock price but not immediately lift EBIT. More importantly, TSR is forward looking. Many tech companies have enjoyed soaring TSR despite modest profits due to their long-term potential.

3. Throwing Away Data

The standard way to investigate a relationship is to run a regression: to relate a company’s actual level of diversity to its actual level of performance. But instead, McKinsey only considers the link between diversity and whether profitability is above or below average. Their headline result is that increasing diversity (either gender or ethnic) from the bottom to the top quartile raises the likelihood of above-average profitability by 39%. If profitability is evenly distributed on a scale of 0 to 10, diversity will “count” if it increases profitability from 4.5 to 5.5, but not if it increases it from 6 to 10 or 0 to 4.

This is odd, since diversity’s supposed benefits are to avoid the disasters associated with groupthink (which might lead to profitability of 0) or harness innovation (which might lead to profitability of 10). Their methodology throws away the actual improvement in performance, and only considers whether performance rises from below to above average. But do we only care about being above average? McKinsey’s mission statement is “to help our clients make distinctive, lasting, and substantial improvements in their performance” (emphasis added), not just help them be above average. The average marathon finish time is 4:21 for a man and 4:49 for a woman, but a running shoe or a training programme woud not advertise itself on increasing the likelihood that you finish below 4:21 or 4:49. In the UK, the average GCSE (exam taken at 16) grade is 4.78. A school rarely advertises itself by saying it will help you get a grade of 5 or more. Many kids want grades of 8 or 9.

This methodology is particularly bizarre since, in the second half of the paper, McKinsey switches towards studying the “impact” on environmental and social (ES) performance. They state that “a 10% increase in ethnic representation is associated with a rise of nearly 4 points in climate-strategy scores”. This analysis correctly considers the magnitude of the increase, not just the likelihood of being above average. But, for their main analysis on profitability, they ignore it, perhaps because the results don’t work.

A separate concern about the ES analysis is that they use scores from a single data provider. This is despite another McKinsey article stating that “[Investors] understand that ESG scores today, unlike financial ratings, don’t correlate fully among ESG score providers. While financial ratings correlate at around 99 percent among providers, ESG ratings can correlate at less than 60 percent because of the different elements and weighting each agency assigns to various ESG metrics.” This highlights the importance of showing robustness to different measures, but just like the profitability result, they don’t do it.

4. Inadequate Controls

As is well known, correlation doesn’t imply causation. On p49, in the Appendix, the authors write “Correlation is not causation, and we are not asserting causal links.” This is not true. They assert causal links throughout the paper, claiming that “Diversity Matters Even More” and repeatedly referring to the “impact” of diversity.

One reason why correlation doesn’t imply causation is reverse causality: financial performance allows companies to invest in diversity, rather than diversity improving financial performance. This is particularly likely given the incorrect timing of their measures, and has been covered by point 1 above. The second reason is omitted variables: factors that drive both diversity and financial performance. The authors only control for industry and region. However, firm size, firm age, growth opportunities, corporate governance, and a whole host of other variables may jointly determine both: for example, good governance might improve both diversity and firm performance. Given the plethora of other McKinsey studies claiming to identify several variables that determine financial performance, it is surprising that they do not control for them.

The Big Picture

While flimsy studies on the supposed benefits of diversity are not new, this study is particularly concerning because the flaws in the McKinsey methodology have already been clearly pointed out. Rather than responding to the concerns by improving the methodology (or explaining why the concerns are invalid and justifying the methodology), McKinsey have gone in the opposite direction, which is to obfuscate the methodology to try to hide the flaws.

This paper, like all the prior ones, has had a substantial impact despite its flaws, potentially because people want the findings to be true. Most readers should be able to spot the basic mistakes I have highlighted here; none of them require an advanced knowledge of statistics, only common sense. If they see a result that they don’t want, they will argue that “correlation is not causation”, so they should apply the same discernment to results that they do want. But even if readers are unable to do so, they can perform the simple checks in Part III of May Contain Lies:

1. What are the credentials of the authors? They unfortunately have very few credentials in economics or business research. They are likely superb management consultants and business leaders (one was the Managing Partner of McKinsey UK & Ireland) but that is different from being experts in conducting scientific research.

2. What is the potential bias of the authors? They may have a significant interest in concluding that diversity matters, not only for personal reasons but because this result is great for McKinsey’s reputation. In a similar vein, I suggest asking the question: “Would the authors have published the paper if it had found the opposite result?” Almost certainly, the answer is No. I very rarely see consultancies release reports finding that diversity has no or a negative correlation with performance. Not because this is never the case, since academic research has repeatedly found that it is, but because such a conclusion would not help their reputation.

Inside the Ivory Tower

Inside the Ivory Tower

In May Contain Lies, I highlight the value of academic research. While it's far from perfect, it can be more reliable than practitioner studies for a number of reasons: Its goal is scientific inquiry, rather than advocacy of a pre-existing position or releasing findings to improve a company's image. It's conducted by those with expertise in conducting scientific research. Papers published in top scientific journals are peer-reviewed, which helpsimprove their accuracy. However, authors, journalists, and practitioners will sometimes cite research as if it bears the hallmark ...
Does only 2% of VC funding go to female founders?

Does only 2% of VC funding go to female founders?

A widely quoted statistic is that only 2% of VC funding goes to female founders. For example, this Forbes article highlights that "only 2% of all VC funding goes to women-led startups" and asks "Why is only 2% of VC funding going to female founders"? If true, this statistic is substantial underrepresentation and needs to be urgently addressed. However, it's problematic for several reasons. 1. The Statistic Ignores Diverse Teams The 2% statistic actually refers to companies founded solely by women. It ignores diverse companies founded by both men and women. This is strange, because ...
An unhealthy obsession with organisational health

An unhealthy obsession with organisational health

Two leading asset management firms drew my attention to the McKinsey Organizational Health Index as a potential tool to evaluate a company. A book, "Beyond Performance 2.0: A Proven Approach to Leading Large-Scale Change", written by two McKinsey partners, claimed that companies with high scores on this Index trounced their unhealthy peers along a range of performance measures. For example, their shareholder returns were three times as high. But as I wrote in an earlier post, rather than being more impressed by big numbers, we should be more sceptical. If it were really possible to ...