HDG #009: Healthcare data is biased... and so are we
Read time: 3 minutes
Greetings, Gurus! Today we’re talking about all the ways we may inadvertently fall victim to bias in healthcare data… and in ourselves.
What is Bias?
Normally when you think of bias, you might think about data being skewed or inherently one-sided. In healthcare, this is amplified exponentially due to a number of major systemic issues, so this kind of data bias is indeed a major issue.
But now, compound that with the fact that we, ourselves, tend to analyze that data through our own personally-biased lens—because a person’s beliefs inadvertently shape their assumptions, view, and approach to the world. This is another definition of bias, and it is a highly problematic one. You see, “bias” doesn’t always carry a connotation of prejudice; literally everything we do in our life is biased in some way.
First order of business: the data in healthcare is biased.
In healthcare, we particularly need to be on the lookout for:
1. Sampling/Selection Bias. Sampling bias is when some members of a population are systematically more likely to be selected in a sample than others. Selection bias can be introduced via the methods used to select the study population of interest, the sampling methods, or the recruitment of participants (aka, flawed design in how you select what you’re including).
In the context of healthcare, these types of bias can refer to how we construct our analyses and what we include/exclude to understand a certain “population.” In healthcare, we use populations of interest to refer to many different groups of interest depending on the study, for instance: line of business; sub-lines of business, or groups; demographics; conditions; sub-groups of some combination of the above. As you can see, the design of the analysis and the “population” in question can get complex very quickly.
2. Undercoverage Bias occurs when a part of the population is excluded from your sample. Again, this definition makes sense in the traditional survey/sample/research design. In the context of healthcare, it can bite us in a tangible way (wasted money, wasted effort, shame, having no impact or change in outcomes, or… all of the above) if we are not careful.
For instance, we can’t see anything that happens “outside of our four walls.” We (generally) only have access to the data our organization has generated, which inherently is only part of the entire picture (think about a health plan that only has data about members who actually had a claim). This is not a deal breaker depending on what kind of analysis you’re doing and why, but another consideration to be aware of. Our [patients / members / employees / residents / etc.] may not all act similarly or even represent ALL [other patients / other groups / other types of employers / other regions / etc.].
3. Historical Bias or Systemic biases occurs when socio-cultural prejudices and beliefs are mirrored into systematic processes [that are then reflected in the data]. Systemic biases result from institutions operating in ways that disadvantage certain groups.
Obviously this is a biggie for me, writing about and analyzing health equity every week. This major issue becomes particularly challenging when data from historically-biased sources are used for data science models or analyses. It is also particularly important when you’re analyzing historically broken systems, such as healthcare.
As an example, you can see this in action on the front-and-center disclaimer on the Health Equity Tracker from Satcher Health Leadership Institute: https://healthequitytracker.org/exploredata
It (rightly) proclaims:
"Structural racism and oppression create health inequities, and lead to missing data. Our reports reflect the best data we can source, but we're working to close these known gaps which, in turn, will help create more effective health policies in the United States. Read more about missing and misidentified people on our methodology."
An entire newsletter can (and will) be written about these kinds of systemic biases at a later date.
But more than this.. we are biased, and we’re the ones analyzing that data.
That’s right. In healthcare, bias can present during things you don’t even think about, like “I’m looking to analyze health inequities, so I’m going to see how different African Americans’ outcomes are from Caucasian and pull in socioeconomic factors, demographics, race/ethnicity, and disease history.” See how this works? Even if we don’t consider ourselves biased in any way, we still tend to formulate hypotheses and analyses based on things we know anecdotally or things that have been analyzed before.
Or—how many times have you been asked (or done the asking) for insights about something and when the findings didn’t mirror what we anecdotally or emotionally felt to be accurate or true, they were discounted; similarly, people tend to gravitate to the key talking points that confirm their assumption in order to justify the decision they’ve already made mentally, and discount the rest. This is very hard to overcome with data alone, even when folks claim to be data-driven.
These are just a few examples of many different types of hidden biases we are all susceptible to.
Still don’t believe you could be biased in your daily life? Read [the absolutely brilliant] Cassie Kozyrkov’s article entitled “Data-Driven? Think again” here: https://kozyrkov.medium.com/data-inspired-5c78db3999b2
. . .
So what can you do?
Your Actionable Idea of the Week:
Your #1 goal is awareness.
We need to focus on breaking that anecdotal bias and use data to identify the unknown unknowns. That might mean new and novel sources of data, throwing in inputs or features that you would never have thought of using before, or developing more unsupervised methods, etc.
Your #2 goal: Do not let this cause analysis paralysis.
While it is important to keep this top of mind, it is even more important that we don’t get so hung up on it that we get analysis paralysis.
. . .
For more specifics about how bias is most likely creeping into your data life, I wrote about some concrete examples and potential mitigations in a Towards Data Science article. Read it here: Healthcare Data Is Inherently Biased—Here’s how to not get duped by it
. . .
See you next week!
-Stefany
2 more ways I can help you:
1. If you want to learn more about health data quickly so you can market yourself, your company, or just plain level up your health data game, I'd recommend checking out my free Guides. Courses and more resources are coming soon, so check back often.
2. Book some time to chat about team training or event speaking, pressing data questions, project or product ideas, and data consulting.