Left unchecked, algorithmic approaches can perpetuate bias in health care. Implementing responsible AI can help reverse that.
Many health systems have diversity, equity, and inclusion efforts in place, but these rarely address the algorithms they routinely use for millions of patients. Health care leaders are often unaware about algorithmic bias or, if they are aware of it, don’t have a way to address the issue. While a model might look like it performs well overall, when broken down by race it may not perform well for certain groups.
It’s been documented time and time again how people, institutions, and algorithms single out or underserve minority groups, including in health care.
Being a Black person in data science, I’m constantly focused on how methods and studies might be underserving my family and community.
This problem is both professional and personal. When I encounter potentially biased approaches in health care, I think about my grandmother having her reports about pain dismissed, or family members who have a tough time accessing the care they need. Some call algorithmic bias a “data problem,” but the data reflect the biases of the systems and people who generated it.
Each time a provider ignores or denies care based on a patient’s race, this biases the dataset. When people choose not to seek care because it’s too expensive or difficult to access, the dataset is biased toward the people who can afford care. When health systems use biased data to conduct studies or help prioritize outreach and patient engagement, they further bias the dataset.
Data science teams, like the one I’m on at SymphonyRM, see the results of years of accumulated bias in the form of rows, columns, and tables in a database. We might not be able to change the people who created the bias, but we figured we could work with algorithms in ways to help reduce it.
Optimizing AI for fairness is challenging technically, but it’s even more challenging philosophically. How do you quantify fairness? How might measures and metrics in AI disadvantage minorities? Is there an acceptable tradeoff between fairness and accuracy? These become deep, difficult, philosophical discussions too complex for Google and the published literature to answer.
Our team developed and debated many approaches to ensuring responsible, fair AI algorithms. Since this is a complex problem to solve, we realized we needed external guidance and validation. We noted Ziad Obermeyer’s extensive research on ethics and artificial intelligence and collaborated with his team at the University of Chicago Booth School of Business. Here’s a look at what we learned.
How to determine if a model is biased
First, it’s important to have a diverse team. You can’t determine if a model is biased without first asking whether it’s biased, and groups that lack diversity tend not to ask.
Second, assume all models are biased, or guilty until proven innocent as Christopher Penn, the founder and chief data scientist of Trust Insights, likes to say. Teams should put in rigorous steps at every step in their data science pipeline. A review article titled “Ethical Machine Learning in Health Care” outlines a powerful approach to looking at the pipeline:
Problem selection. Ensure that the problems and purposes of desired algorithms are ethical.
Data collection. Review data collection practices. Are you collecting enough data, such as race, ethnicity, gender, and sex, to identify and address disparities?
Outcome definition. Identify the factors that may lead to disparities in how outcomes are defined. For example, if overall historical medical spending is used as a proxy for sickness, does that produce disparities by race, gender, and income?
Algorithm development. Identify the right modeling approach, performance metrics, and optimization strategies that don’t produce disparities.
Post-deployment considerations. Consider whether, despite rigorous checks, end-users might still use the model to introduce bias into the process. Consider how people might ignore the algorithm’s suggestions, especially if they don’t understand how they work. Teaching users and leaders about bias mitigation efforts is key to driving the right type of adoption.
There’s a myth that by simply removing variables such as race and gender that a model won’t produce biased results. That isn’t true. In fact, in major cases today where bias has been found in models, in criminal justice or in health care, those variables were removed. It’s not as simple as just removing a few columns from a spreadsheet.
How to account for improved performance across all groups
Collecting the right data is paramount to measuring and addressing algorithmic bias. Without known values for race and ethnicity, gender, income, sex, sexual preference, and other social determinants of health, there is no way to test and control for fairness. Imagine, for example, trying to measure the impact an algorithm has on Black or Latinx patients without knowing which patients are Black or Latinx. Without the right data, addressing bias is a nonstarter.
Next, it’s important to consider how certain metrics may carry hidden bias. While it may be tempting to use a metric like “prior medical spending” as a proxy for medical need, this can also hide disparities. People in underrepresented groups — Black LGBTQIA+ people, for example — may be less likely to access services even when they have the same levels of medical need and risk as white people.
With any metric, label, or variable, checking its impact and distribution across race, gender, sex, and other factors is key. Before choosing a target variable (what the algorithm is optimizing for), it’s essential to measure its potential to introduce bias. Without checking for this early in the process, teams may waste time looking for other sources of bias when it was how they defined the outcome that is the problem.
When addressing bias for certain populations, it’s important to ensure the methods in use aren’t creating bias for other populations. Teams should design fairness metrics that are applicable across all groups. They must test continually against it. It’s possible that steps to tune or adjust the model may increase or decrease performance on the given metric for a group. To combat this, teams must test for this at each stage of the analysis and in each stage of the delivery pipeline.
How to get started on the path to responsible AI
Education is the first step. Many people aren’t aware of algorithmic bias, and those who are may not understand how serious it is. Those who understand how serious it is may not understand ways to address it. And those who understand ways to address it may not understand how to get their organizations to buy in. In other words, some education is needed at all levels.
It’s also important to revisit performance metrics and see that they align with strategy. The ideal metrics will vary depending on how the algorithm will be used and the problems a team is focused on solving, but here are some of the most common options:
Minimize false positives. False alarms — identifying that someone needs care when they actually don’t — are costly.
Minimize false negatives. Omitting someone in need, such failing to identify someone who may be at risk for sepsis, is even more costly.
Our work at SymphonyRM, for example, is based on driving outreach and communication. We determined that it was essential to look across groups and minimize the false negatives by optimizing for a score called “recall.” This is a metric that punishes an algorithm for missing people with a need. In practice, the better the algorithm is at picking up on people with needs, the higher its recall score.
How to set benchmarks and track progress
Here’s the good news: Even though the field of data ethics is still nascent, there are many ways to measure and track progress on the path to responsible AI. The first step involves starting with ways to measure the impact on the population of interest. For example:
Day zero bias check. Identify the racial distribution or impact of a model before making any changes or adjustments.
Post-adjustment check. After reviewing and revising data collection and modeling practices, identify the resulting racial distribution and how it compares with the day zero check.
Impact check. After the model has been launched and is in use, continually monitor for changes. What are the model’s results? Are they being adopted, and were previously underserved groups now receiving the right attention? Are there areas within the model or business processes that could be improved?
While some objectives are harder to quantify, such as client or team receptiveness to approaches, it’s important to keep an eye on goals and determine aspects that can be measured.
As teams and organizations set out to design and execute responsible AI, the biggest takeaway is to prepare to do significant education in other areas of the organization — this isn’t just a data science initiative. It requires buy-in from other areas, especially from people who will be on the ground employing the insights from these algorithms.
Put learnings to the test
What happens when actually putting these methods into practice? Our organization deploys this approach across algorithms aimed to drive outreach to consumers who may need specific health care services. These algorithms help health systems engage their patients with important communications about specific illnesses before these conditions become bigger problems. Hospitals use algorithms and communications for many different health care services, such as cardiology and orthopedics.
As an example, we used a data ethics pipeline for a health system’s cardiology outreach, which was intended to drive communications to patients who might need cardiology services. Comparing the “day zero bias check” versus the check after our algorithmic bias adjustments, it increased outreach to underserved Black and Asian populations by 23%. Although this required a huge amount of upfront work, questions, debates, and collaboration, responsible AI approaches are now embedded in our data science pipeline.
That’s a starting point, not a finish line. Our team is working to apply rigorous, responsible AI methods for gender, sex, income, and other forms of bias.
I believe that AI is at an important crossroad. Some organizations will focus heavily on ethics and responsible approaches to help undo some of the problems wrought by systemic bias and racism. Others may leave their algorithms unchecked, embedding, retaining, spreading, and growing the bias and racism that created the imbalanced data. If technology companies don’t address bias themselves, health care leaders must pressure and lean on their vendors to develop responsible approaches.
Chris Hemphill is vice president of applied AI and growth at SymphonyRM, a health care technology company focused on driving healthy decisions backed by evidence and science.