Does gender diversity really boost financial performance?

2 Nov 2023 | A fact is not data, A statement is not fact, Confirmation bias, Data is not evidence

Today, BlackRock released a study, ‘Lifting Financial Performance By Investing in Women’ which has already been covered by the Financial Times (‘BlackRock study finds gender-balanced companies outperform peers’), Investment Week (‘BlackRock study finds companies with more women in workforce outperform rivals’”), and City AM (‘Blackrock report finds ‘sweet spot’ for gender diversity’). It has also been widely shared across social media, including by some universities.

The question BlackRock is asking is an important one. If indeed investing in women has a positive causal effect on financial performance, companies should urgently increase their gender diversity, investors should use gender diversity as a key investment criterion, and regulators might consider mandating gender diversity (if they think that companies don’t understand its benefits). Particularly laudable is how BlackRock goes beyond just boardroom diversity to study diversity throughout the company, and at many different levels – the wider workforce, middle management, and senior management. Their conclusions are what most people – including me, given my own research on employee satisfaction in general and DEI in particular — would like to be true, which may explain why it’s already been widely shared.

But does the analysis actually support the claims? An academic study undergoes peer review, where world-leading experts scrutinise the manuscript. But studies by companies never go through peer review, because publishing in academic journals isn’t their objective. This doesn’t mean that they’re definitely unreliable, but just that we have no way of knowing without a rigorous review.

The purpose of this post is to provide a review to help readers understand what they can take away from the study. Unfortunately, the answer is: almost nothing. The study makes fundamental errors, such as using dubious measures of financial performance (and switching between them, perhaps cherry-picking the ones that work), using dubious measures of gender diversity (and switching between them), and omitting basic controls. I will go through its main claims in turn.

1. An intermediate ‘sweet spot’ of diversity maximises financial performance

Measuring financial performance

Chart 1 divides companies into quintiles by gender diversity, and finds that Quintile 3 (the middle quintile) performs the best. Thus, neither very low nor very high diversity is optimal; instead, the optimal level is in the middle. This is arguably the most novel result of the paper, since other studies claimed to find that gender diversity always boosts financial performance without limit. BlackRock lists it as #1 of its key findings, and the Financial Times and City AM headlines refer to this result. I too would love to believe this result, given my work on how ESG is beneficial but only up to a point, like any other long-term investment.

But the measure of performance is Return on Assets. This is odd, since BlackRock is an investor — the most relevant measure of performance to them is (long-term) Total Shareholder Return. TSR is what investors actually receive; BlackRock would never market the RoA of its funds, but their TSR. TSR is far more comprehensive than RoA. If a company announces a new patent or wins a big customer contract, it will boost the stock price but not immediately lift RoA.

Still, accounting performance may be of interest in its own right. But there are so many different ways to measure accounting performance that relying on a single measure is highly dubious. In my own research on diversity, equity, and inclusion, we used eight different measures: not just RoA but also return on equity, return on sales, return on employees (which is particularly relevant if diversity increases employee productivity), earnings per share growth, profit growth, sales growth, and sales per employee.

Inadequate controls

This headline result simply relates gender diversity to financial performance with no controls. Chart 4 looks within an industry and country, i.e. Apple would be put into a quintile according to its gender diversity compared to other U.S. tech firms. The authors write ‘The linkages between workforce diversity and corporate performance remain robust, even when controlling for country- and industry-specific factors’ (emphasis in original).

This claim is false. The results do not remain robust. The ‘sweet spot’ result — the most novel result — is totally gone. Indeed, Quintile 3 now performs the second worst out of the 5 quintiles. The top performer is Quintile 5.

Could BlackRock instead remove all their ‘sweet spot’ results and instead sell their paper as showing that very high diversity performs the best, including only their analysis controlling for industry and country? No, because even this analysis is flawed. First, in Table 1 (the only table in the whole paper — regressions are needed to show statistical significance, to ensure that differences are large enough not to be caused by luck), the statistical significance is weak — 10% compared to the normal 5% threshold. Second, it only controls for industry and country; it omits the very many firm-level factors that might drive RoA. Examples are firm age (older firms may be able to invest more in diversity, and older firms tend to be more profitable), value vs growth (similar reasons), corporate governance (better governed firms may have more diversity and better financial performance), and so on.

Measuring gender diversity

Table 1 runs a second analysis which finds that the link is stronger if you focus only on companies with below 50% gender diversity — i.e. where the ‘sweet spot’ is 50% representation. This does not address the problem of inadequate controls. Moreover, the authors have suddenly switched the ‘sweet spot’ from Quintile 3 (i.e. 40%–60% within the range of firms’ actual diversity) to 50%. If firms’ gender diversity ranged uniformly from 0% to 50%, Quintile 3 would have diversity of 20–30%, so the ‘sweet spot’ would be 20–30%; now it is suddenly 50%. There is no reason for this sudden shift, and the reader wonders if 50% has been cherry-picked.

2. Aligning diversity in middle management wIth firm-wide diversity improves fInancial performance

This result is claimed by Chart 7, which studies diversity in middle management. It finds that diversity in middle management should not be too high or too low, but aligned with the firm as a whole.

Measuring financial performance

The authors suddenly switch from measuring financial performance by RoA to alpha: TSR compared to benchmarks (the world market, a size factor, and a value factor). They offer no explanation for such a switch.

Measuring gender diversity

The “sweet spot” now switches a third time. In Chart 1 it was “relative to peers”: being in Quintile 3 compared to peers. In Table 1 it was “absolute”: 50%. Now it is “relative to yourself” – compared to the rest of the workforce. Again, there is no explanation for such a switch.

In addition to being inconsistent, the measure is illogical from basic common sense. A company with zero diversity in both its wider workforce and its middle management would be top of the league for alignment.

Inadequate Controls

There are no controls at all – not at the country, industry, or firm level. Coal mining is male dominated at the ground level (and thus likely unaligned at middle management), and has performed poorly. The previously-mentioned firm-level factors could also drive both diversity and performance.

3. Companies that Promote Women to Higher Ranks Outperform

Chart 8 moves from studying middle management to senior management, and finds that companies that promote women perform better.

Measuring Financial Performance

The financial performance measure switches for a third time. It is now alpha compared to the world market alone, and not the size factor or value factor.

Measuring Gender Diversity

While the headline of the section (“Companies that promote women tend to show higher returns”) suggests a simple diversity measure, the small-print below Chart 8 says that “The hypothetical long-short portfolio is created by optimizing the MSCI World Index on the women promotion score subject to 100 bps ex-ante tracking error, industry and full investment constraints”. The methodology is very different from Chart 7, which simply buys high-aligned companies and sells low-aligned companies, period – there is no muddying of the strategy with tracking error, industry and full investment constraints (none of which are described in detail). The Chart 8 strategy does not simply buy high-promotion firms and sell low-promotion firms, perhaps because this does not give the results the authors want, so they overlay the promotion criterion with other criteria.

Inadequate Controls

There are none.

4. Closing Women’s Underrepresentation At Higher Ranks is Associated With Lower Employee Turnover Rates

Chart 9 shows that, if women’s underrepresentation at senior levels is closed, then employee turnover falls.

Measuring Performance

The performance measure switches for a fourth time. Now it is no longer financial performance, but employee turnover. It is not clear why the authors are no longer studying financial performance. All the prior arguments for why diversity improves financial performance also apply to closing gaps.

Measuring Gender Diversity

The “sweet spot” changes for a fourth time. It seems that they are back to the “relative to yourself” measure, but they are not. In Chart 7, they looked at the unsigned difference between diversity in middle management and in the wider workforce. If the former were 20% and the latter were 25%, the difference is 5%; if the former were 25% and the latter were 5%, the difference is also 5% – both cases are treated the same. Now it is the signed difference, in which case the former would be -5%, not 5% – both cases are treated differently.

Inadequate Controls

They now suddenly add industry and country controls, as well as firm size. But they miss many other firm-level controls. In addition to the ones mentioned above, profitability is now a relevant control, since profitability is no longer the performance measure. More profitable firms may promote more women and also have lower employee turnover.

5. Women-Friendly Workplaces Help Boost Performance

I was particularly excited by this result, since it moves beyond demographic diversity to culture, which I have studied myself. Chart 12 finds that you beat the market buying companies where employees take long maternity leave and selling companies where employees take low maternity leave. (The mysterious tracking error and other constraints have disappeared again).

Measuring Financial Performance

The performance measure shifts for a fifth time, because now it is alpha over the Russell 1000 index, not the MSCI World. The size and value factors are still missing.

Measuring Women-Friendly Workplaces

The headline refers to women-friendly workplaces, which depend on very many factors – not just policies such as maternity leave, but behaviours. I appreciate that behaviours are hard to measure, but then the headline should not refer to workplaces when the analysis studies a single narrow measure. Moreover, if parental leave is relevant, it is not clear why the study focuses only on maternity leave. Indeed, one barrier to the advancement of women is that women often take their full maternity leave but men don’t, both advancing their own career and also hindering their partner’s career, as she takes a disproportionate share of parental duties. Including paternity leaves taken would lead to a more comprehensive measure.

Inadequate Controls

There are none. Well-performing firms can both offer more parental leave and also deliver higher shareholder returns.

6. Further Cherry-Picking

The authors buttress their own results by claiming that it is consistent with other studies. However, these papers are also cherry-picked. The vast majority of them are not published in any peer-reviewed journals. Indeed, the academic consensus (including that written by strong diversity advocates) is that the link is mixed or negative. This includes several papers published in the most elite peer-reviewed journals, but all are ignored.

The Big Picture

Unfortunately, overexaggerated claims of diversity are not new. I previously wrote about how the widely circulated McKinsey studies on diversity are very weak, as was a study by a regulator. However, this does not mean that diversity initiatives have no value; just that they should be pursued for reasons other than “do it to make money”:

The misrepresentation of the business case for diversity is particularly disappointing since it may be that no business case is needed at all. Even without a business case for diversity, there are strong moral and ethical cases. Some people argue that you should choose the best person for the job, regardless of characteristics. However, others believe that, due to systemic and chronic discrimination against minorities, companies have a role to play in levelling up by actively recruiting under-represented groups. Perhaps doing so might not maximise profits, but many shareholders and stakeholders are willing to accept that trade-off — just as consumers buy organic food, despite its greater cost, due to non-financial considerations.

Moreover, study-based arguments for diversity are problematic because they relegate dimensions of diversity for which no study exists. I know of no rigorous evidence on the business case for hiring people with disabilities, but again there is a strong moral and ethical case. Making more money is not the only reason to pursue an initiative.

It also highlights the danger in taking research at face value, particularly if it claims a conclusion we want to be true and our confirmation bias is at play. Newspapers should not write about a study without scrutinising it first (or asking the opinion of experts in research), otherwise they spread misinformation. I previously wrote a simple guide that readers can use to discern whether a study is reliable before writing about it or sharing it.

The danger of first impressions

The danger of first impressions

‘Go with your gut’, ‘Follow your first impression’, ‘Obey your hunches’. We frequently hear this advice, and Malcolm Gladwell wrote a successful book, Blink: The Power of Thinking Without Thinking, on the value of heeding your instincts.
Why better brains beget bigger biases

Why better brains beget bigger biases

A wealth of evidence demonstrates how people suffer from confirmation bias, but most of it is on ordinary people. Surely intelligence is a cure? Smarter cookies might better appreciate the logic in a counterargument, and notice defects in data even if supports their viewpoint.
Do women improve decision-making on boards?

Do women improve decision-making on boards?

Last week, Harvard Business Review published an article entitled "Research: How Women Improve Decision-Making on Boards". It was widely shared on LinkedIn and someone tagged me in it, given my research on diversity, equity, and inclusion. When I became Managing Editor of the Review of Finance, I appointed the first women to its board of editors in our 20-year history, so I'd like to believe the findings. However, it's important not to take claims at face value, particularly when we'd like them to be true, because confirmation bias may be at play. I read both the article and the research ...