HACLab: Mitigating Bias in Machine Learning and AI Programming

On This Page

What causes algorithmic bias in health care? How might biases be mitigated? Finding the answers and implementing improvements are the work of the University of Pennsylvania’s Human Algorithm Collaboration lab (HACLab), directed by Master of Health Care Innovation faculty member Ravi B. Parikh, MD, MPP, FACP.

HACLab administers research projects that involve health care technology, data, and cancer care delivery. For example, HACLab and its partners work with the Department of Veterans Affairs to investigate bias in their predictive mortality and hospitalization algorithms. With a multidisciplinary team of biostatisticians, informaticians, physicians, and health equity scholars, we look for causes of bias and ways that biases could be mitigated.

Avoiding statistical and social bias in predictive machine learning and artificial intelligence (AI) algorithm programming is an essential part of our work. And Dr. Parikh addresses it in his course, Using Data for Transformation, for the Master of Health Care Innovation curriculum.

Types of Bias in Predictive Algorithms

Algorithms can leverage existing data to predict an outcome, using inputs that are associated with that outcome. The ability to utilize quantitative predictions based on large datasets can be leveraged in any data-driven setting to address complex challenges. Yet one problem with the forthcoming tide of predictive algorithms in health care and elsewhere is that such algorithms can be “unfair” or biased.

Dr. Parikh and our team at HACLab look at bias as a systematic mischaracterization of risk in favor of, or against, a person or group of people.

General bias can affect algorithms in many ways. For example, statistical bias is when the outcome does not reflect the underlying true value. The algorithm predicts the wrong value or inaccurately represents reality due to various causes, including suboptimal sampling or heterogeneity of treatment effects.

Social bias, on the other hand, is more difficult to detect and correct because contributors to social bias are not always well-defined in the data. Social bias is discrimination for or against a person or group, or a set of ideas or beliefs, in a way that is prejudicial or unfair.

Theoretically, you could apply different statistical methods or outcome labels within your own data to “fix” statistical bias. But researchers and developers cannot “fix” social bias in algorithms because they do not have the data necessary to do so.

Bias in Health Care Algorithms

In health care algorithms, inputs and outcomes can be biased due to inequities and systemic racism in care delivery, as well as the overrepresentation of white men in historical data. Genetic testing, for example, is often used in breast cancer care to detect markers of risk. But unfortunately, genetic testing is less likely to be performed for Black women. If a group of individuals is less likely to be tested, that weakens the ability of an algorithm trained on datasets of genetic test results to be able to discern high-risk mutations within this population.

Algorithmic biases are areas of study in themselves that involve theories and concepts covering topics as varied as mathematics and social justice. At HACLab, our work synthesizes and extends that research to yield many improvements to the operational maintenance of these algorithms nationally. And our work with the Department of Veterans Affairs has directly influenced the upcoming version update of their predictive mortality and hospitalization algorithm that will be rolled out in the coming months.

Learn about this work and more on the HACLab website. And learn more about constructing effective algorithms from Dr. Parikh in his Using Data for Transformation course.

Caleb Hearn (he/him/his) is a project manager working with Dr. Ravi Parikh. Prior to joining the Department of Medical Ethics & Health Policy, he worked in UPenn Student Health Services and Jefferson Health’s Epic@Jeff department. He received his MPH from Drexel University and BS from the University of Tennessee at Chattanooga.

References

Agniel, Denis, Isaac S. Kohane, and Griffin M. Weber. “Biases in Electronic Health Record Data Due to Processes within the Healthcare System: Retrospective Observational Study.” BMJ 361 (April 30, 2018): k1479.

Kontopantelis, Evangelos, Matthew Sperrin, et al. “Investigating Heterogeneity of Effects and Associations Using Interaction Terms.” Journal of Clinical Epidemiology 93 (January 2018): 79–83.

McCarthy, Anne Marie, Mirar Bristol, et al. “Health Care Segregation, Physician Recommendation, and Racial Disparities in BRCA1/2 Testing Among Women with Breast Cancer.” Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology 34, no. 22 (August 1, 2016): 2610–18.

National Museum of African American History and Culture. “Bias.”

Oommen, Thomas, Laurie G. Baise, and Richard M. Vogel. “Sampling Bias and Class Imbalance in Maximum-Likelihood Logistic Regression.” Mathematical Geosciences 43, no. 1 (January 1, 2011): 99–120.

Webster, Craig S., Saana Taylor, et al. “Social Bias, Discrimination and Inequity in Healthcare: Mechanisms, Implications and Recommendations.” BJA Education 22, no. 4 (April 2022): 131–37.

HACLab: Mitigating Bias in Machine Learning and AI Programming

Latest Posts