A team of researchers from Columbia University and Hunter College led by computer scientist Ansaf Salleb-Aouissi crafted one of seven winning proposals for the National Institutes of Health Decoding Maternal Morbidity Data Challenge.
The challenge was established to advance research on maternal health by identifying factors that impact maternal morbidity so clinicians may more quickly and accurately identify and treat pregnancy-related conditions and prevent severe illness or death for a pregnant person. Researchers were encouraged to devise new ways to analyze a multifaceted dataset on first-time pregnancies from the National Institute of Child Health and Human Development (NICHD) Nulliparous Pregnancy Outcomes Study. The dataset includes information gathered from a racially, ethnically, and geographically diverse sample of more than 10,000 people who were pregnant for the first time, including data from interviews, questionnaires, clinical measurements, patient charts, and biological specimens.
The Columbia and Hunter team used machine learning to predict and understand preeclampsia, a complex condition characterized by high blood pressure, damage to other organs, and poor tissue perfusion, and a leading cause for maternal morbidity in the U.S. While a number of preeclampsia risk factors have been identified, none are specific to the condition, and there is no clear-cut process to screen or diagnose. The researchers, whose interests span machine learning, genetics, sequential decision-making, obstetrics, and gynecology, combined preeclampsia risk factors and built a composite predictive model for the condition with the potential to develop a screening tool.
“Suppose every woman is a data point with a multidimensional position based on multiple factors,” said team lead Salleb-Aouissi, who is a lecturer in the discipline of computer science at Columbia Engineering and an affiliated member of the Data Science Institute‘s Foundations of Data Science and Health Analytics centers. “We look into the data and try to find patterns. Given these factors, what is the likelihood that person X will develop [preeclampsia]?”
Salleb-Aouissi collaborated with Columbia colleagues Itsik Pe’er, Ron Wapner, and Qi Yan; Hunter computer science professor Anita Raja; Columbia graduate students Andrea Clark-Sevilla and Adam Lin; and Hunter graduate students Adam Catto, Alisa Leshchenko, and Daniel Mallia. The group has already worked with the NICHD data to predict and potentially prevent pre-term birth; this recent challenge was an opportunity to look at the data from a new perspective.
But the team also discovered bias in the medical data, according to Salleb-Aouissi. “While we were able to mitigate these distortions, the fact that bias does exist warrants further research into the sources or causes of the bias. Is it in the clinical practice, in the data set, or in the model? Is it a disparity in how each section of population is represented, or a problem of mislabeling or screening?“
Publication based on this work is forthcoming.
— Karina Alexanyan, Ph.D.