Are we underestimating COVID-19 racial disparities?
Calculating COVID-19 racial disparities is complicated. Here’s how to do it right.
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
The figures have been stark and shocking: from fairly early in the COVID-19 pandemic, it’s been reported that Black people in the United States are dying of COVID-19 at twice the rate of White people. Commentators have been keen to point out the longstanding racial inequities that may lie behind these statistics. Others have tried to disentangle the various factors behind the disparity. But how accurately do the headlines portray what’s really going on?
We need to take great care when analyzing mortality data; otherwise, we may draw the wrong conclusions from it. Might statements about COVID-19 death rates across the U.S. population as a whole mask the reality in the hardest-hit populations? And what adjustments should we be making to get a more accurate picture of the risks to which racial minorities are exposed?
To understand the relative risk of death from a disease, you have to consider the racial make-up of people living in the areas most affected, as well as whether different racial groups skew older or younger. Doing this results in a different picture of the disparities in death rates for COVID-19 patients of different races. This has implications for health experts, policymakers and program implementers — because the figures are even worse than those headlines have reported. Let’s dig a little deeper into the numbers.
To trust, you must adjust (for population)
For our analysis, we’ve used the CDC’s data on COVID-19 deaths across the U.S., disaggregated by race and age group, as of August 26, 2020. If we look at the distribution of COVID-19 deaths purely by race (Figure 1), we see that White, Black and Hispanic people account for around 52%, 22% and 20% of these deaths, respectively.
Can we therefore conclude that White people are more at risk of dying from COVID-19? Certainly not. As a first step, we must adjust these figures to account for the racial make-up of the entire U.S. population. Figure 2 shows how to do that.
Figure 2 shows the number of deaths per 100,000 people for each racial group, and allows us to calculate relative risk — in other words, how much more likely members of any racial group are to die from COVID-19, compared with a reference group (in this case, White people). Across the entire U.S. population, the risk for a Black person of dying from COVID-19 is 2.03 times the risk for a White person, and for Hispanic people it’s 1.30 times the risk. These are the statistics that have provoked many of the headlines about racial inequity in COVID-19 mortality.
So now we have an accurate picture of the situation faced by racial minorities around the country, right? Well, not quite. Those relative risks are correct for the U.S. as a whole — but COVID-19 hasn’t affected all parts of the U.S. equally, and it hasn’t affected all age groups equally, either. The nationwide picture is not the whole story; we’re going to have to dig deeper and take a more contextualized view of the data.
Don’t forget to correct for geographic distribution
The CDC’s data come with an explicit warning not to compare the racial distribution of COVID-19 deaths with the racial distribution of the U.S. population “because COVID-19 deaths are concentrated in certain geographic locations where the racial and ethnic population distribution differs from that of the United States overall.” So, if we account for the fact the virus has hit certain regions harder than others, how does that affect the relative risk figures by race?
When we recalculate the racial make-up of the population, we assign greater weight to counties that have experienced higher numbers of COVID-19 deaths. And, just like the CDC, we find that the weighted population share of White people is lower than their overall share in the U.S. population, whereas that of Black people, Hispanics and Asians is higher (Figure 3). This reflects the fact that COVID-19 deaths are occurring in areas with disproportionately high populations of Black, Hispanic and Asian people.
This adjustment leads to a conclusion that may seem surprising. Using this weighted population estimate, we don’t see a large racial disparity in COVID-19 deaths between Black people and White people. How do we figure that? Figure 3 shows that Black people have a weighted population distribution of 15.7% but account for 21.6% of COVID-19 deaths, a multiplier of 1.37 — so they are indeed at disproportionate risk of dying from COVID-19. On the other hand, the weighted population distribution of White people is 40.9%, and they account for 51.5% of deaths, a multiplier of 1.26, meaning that they’re also at higher risk than their population numbers would suggest. The multiplier for Black people is 1.09 times that for White people, which tells us that on this measure, Black people are “only” 9% more likely to die of COVID-19 than White people — a far cry from 2.03 times more likely before adjusting for geography. The weighted risk of Hispanic people is 0.52 relative to White people (compared with the unweighted risk of 1.29), meaning that Hispanic people are half as likely to die of COVID-19 as White people.
It’s important to remember here the distinction between presenting data and interpreting it. When we take into account the population make-up by race in the places where the virus has struck hardest, the relative risk of dying of COVID-19 is similar for Black and White people. But that does not mean that they are at risk of dying for the same reasons. It’s entirely possible that racial inequity contributes to some portion of the deaths among Black people, and that other factors (such as age) are at play for deaths among White people. More on that at the end of this post, but for now, let’s turn our attention to the question of how age distribution affects relative risk.
Age matters, too
So far in our calculations, we’ve made no adjustments for age. But it’s clear that COVID-19 affects people of different age groups differently. While 58% of the U.S. population is under 44 years old, only 3% of all COVID-19 deaths occur in this age group (Figure 4a). By contrast, those aged 85 and older, who make up a mere 2% of the population, account for 31% of the deaths. Figure 4b shows how drastically the death rate increases with age: nearly 1% of the entire U.S. population over 85 has perished from COVID-19.
Differences in death rates between Black, Hispanic, and White people may occur because their age distributions are different. When we break down the data by age (unweighted for where the virus has hit), as in Figure 5a, we see that across all age groups, both Black people and Hispanic people have higher deaths per capita from COVID-19 than White people, and correspondingly higher relative risk of dying from COVID-19 (Figure 5b). Not only that, but relative risk increases sharply as age decreases. Across the U.S. population as a whole, in the highly at-risk 85+ age group, Black people are twice as likely to die as White people, and Hispanic people are 1.6 times as likely. This is striking enough, but among those aged 44 and under (the lowest-risk population segment), both Black and Hispanic people are 7.3 times more likely to die as White people. Racial inequality in health outcomes is most pronounced in the younger age groups.
But wait a minute — didn’t we just say that Black people were only 1.09 times more likely to die of COVID-19 than White people, and Hispanics only half as likely? Considering only geography, that was true, but when age is considered, we see that across the country as a whole, in every age group, Black people and Hispanic people are more likely to die than White people — anywhere from 1.5 to 7.3 times more likely, depending on the age group.
Completing the picture
The next logical step is to account for both factors — geographical distribution of COVID-19 and age — at the same time. This can give us a still more accurate picture of the relative risks. Epidemiologists apply standardization techniques to account for such demographic differences, but this is mostly absent in the reporting on COVID-19 inequities. The CDC reports these age-standardized, population-weighted numbers on its website. They show that White people, who account for 39.7% of the population, account for just 22.6% of deaths, while Black people, with 15.6% of the population, account for 28.1% of deaths (Figure 6). This means that, accounting for both geography and age, Black people are 3.2 times more likely to die of COVID-19 than White people, and Hispanic people are 2.2 times as likely. Shockingly, for American Indians/Alaskan Natives, the risk relative to Whites is more than 16 times as high.
Factoring in geography + age is key
Each of the analytical steps we’ve taken above has given us a different angle on the racial disparities in COVID-19 deaths. Figure 7 shows how the relative risk changes as we correct for age and geographic distributions — and why it’s important to do so when talking about race and COVID-19 (or, indeed, any disease). Nationwide figures may be easiest to grasp, but they’re not necessarily useful for understanding racial disparities in coronavirus hotspots — just like the national poll ratings of presidential candidates can’t tell us how each state is going to vote come November.
We encourage health experts, data journalists and policymakers to explain to their audiences exactly what they are measuring and why, and to make clear the limitations of national-level statistics when dealing with multiple localized epidemics. Factoring in geography and age is key to getting a clearer picture of how COVID-19 affects different racial groups. For everyone concerned about racial inequities, it’s the local context that counts.
This post was made possible by the following Surgo Foundation team: James Baer, Mokshada Jain, Sema Sgaier, Peter Smittenaar, and Grace Charles. See our public Tableau page to learn more about our process.
- Provisional Death Counts for Coronavirus Disease (COVID-19): Weekly State-Specific Data Updates (2020), CDC
- Table 2, Single-Race Population Estimates 2010–2018 Request (2020), CDC
- Provisional COVID-19 Death Rates by Sex, Age and State (2020), CDC
- Deaths Involving Coronavirus Disease 2019 (COVID-19) by Race and Hispanic Origin Group and Age, by State (2020), CDC
- Health Disparities: Race and Hispanic Origin (2020), CDC