A pandemic push for data sharing could pay off for pregnancy research

Despite stubbornly high maternal mortality in the United States, pregnancy is still woefully under-researched. But thanks to the Covid-19 pandemic, technology that makes it easier to study pregnancy is starting to catch up.

During the public health emergency, federal agencies, health systems, and medical data companies were motivated to open up their records and create privacy-preserving methods to help hospitals share sensitive patient records for research. Those steps have given researchers a near real-time window into Covid-19 outcomes and the efficacy of therapeutics and vaccines — including in pregnant people, who they’ve learned are at greater risk of severe illness.

Those new networks could pay dividends for the understanding of pregnancy beyond the pandemic. Right now, clinicians don’t fully understand why dangerous complications like preeclampsia occur, or whether many drugs and procedures are safe during pregnancy. The clinical trials that could help answer those questions have long been considered unethical — and despite recent regulatory changes to encourage enrollment, prospective research in pregnancy is still rare.


“The tolerance for risk is so low, and for that reason you do see very few studies — and when you do, it’s voluntary and at your own risk,” said Jose Figueroa, a physician and hospitalist at Brigham and Women’s Hospital who researches health policy and management. “So real-world data is super important.”

But the data that rolls in is extremely messy, and uniquely so in pregnancy. A single birth encompasses a huge variety of clinical encounters — from family planning and prenatal tests to delivery and a child’s first pediatric visits — which could each come with its own provider, record system, and standards for data collection.


“You can have the best questions in the world, but if you don’t have a good data set to answer those questions, there’s only so much you can do,” said Figueroa. “The trick is, are we able to develop tools and technology to fill in and make the data better?”

Researchers like Figueroa are trying — and their first challenge is simply determining who’s pregnant. Despite the more than 3 million births that happen in the U.S. every year, electronic medical records don’t have a consistent spot to plug in a pregnancy start or end date. And even if there were, there’s no guarantee they would be used consistently by providers. Even state death certificates didn’t consistently include a place to report a recent pregnancy until 2019.

“It’s a big disservice to research on pregnancy,” said Elaine Hill, a health economist at University of Rochester Medical Center, “because it takes us so much effort even to get to a place where you’ve indicated pregnancy.”

To study pregnancy, population health researchers have to construct their own definition of it in the data. A common way to do that is to lean on codes that are almost always associated with pregnancy outcomes — live births, stillbirths, ectopic pregnancies, and procedure codes for different types of deliveries. That’s the approach Figueroa took with his colleagues in a recent paper published in JAMA Open, capitalizing on discounted access to a nationwide hospital database that healthcare data company Premier, Inc. offered to some research groups during the pandemic.

It’s a massive dataset: Pulling from 463 hospitals around the country, it included records on nearly 850,000 pregnant patients before the pandemic hit, and more than 800,000 during the pandemic. And unlike much of the data that population health researchers rely on, there wasn’t much of a lag time — data from Covid-19 patients was being fed in by Premier’s hospital partners on an ongoing basis.

Drawing from that cohort, Figueroa’s group came to a grim statistic: During the acute phase of the Covid-19 pandemic, maternal death during hospitalization for delivery increased from 5.17 to 8.69 deaths for every 100,000 patients.

What they couldn’t do, though, was tie those excess maternal deaths to a particular cause. “Our data was limited in the sense that it was hospital-level data,” said Figueroa. “We didn’t have data in the prenatal period, and we didn’t have data in the postnatal period. We just knew there was a pregnant person admitted.”

To make patient records reveal those mysteries, researchers will have to try to reconstruct that entire journey.

Melissa Haendel calls that problem “putting the patient back together again.” A health informatics researcher at the University of Colorado Anschutz Medical Campus, she co-leads a sweeping federal effort to build a centralized electronic health record database called the National Covid Cohort Collaborative, or N3C. “The EHRs are set up to be very encounter-based,” she said. “A person is not really a person moving through time, they’re just a collection of snapshots.”

So in a recent preprint, she and colleagues built a combination of algorithms that turns those snapshots into a full pregnancy journey, even before it comes to an end — allowing them to identify a broader range of pregnant people to include in research. It also pieces together detailed data to backfill that missing “start date” field in EHRs. “We can say with a degree of accuracy of about a week, ‘this is when the pregnancy started,’” said Hill, another author on the paper and co-lead of N3C’s pregnancy team, which designed the method with records from more than 70 hospitals in the federal database.

Pinpointing gestational age from EHRs, even before birth has occurred, is meant to help researchers ask time-sensitive questions. Scientists could use the technology to identify patients and pool them — whether by gestational age or another characteristic — for research on risk and development during pregnancy. And because they can collect data in the middle of a pregnancy, it allows for more real-time analysis as, say, new Covid-19 variants emerge. That’s especially important to assess the impact of vaccination in the pregnant populations that weren’t explicitly included in clinical trials.

“It just took us two years to define pregnancy, so we’re ready to roll with new questions,” said Hill.

The work could also make it easier to study infant health outcomes. It’s hard to do real-world research in this area, because babies and their birth parents have distinct health records, often stored in different health systems.

Previous work from N3C and other groups have improved methods for tying those pieces together. But the new identifier allows them to go a step further by identifying not just individuals, but each of their pregnancies. “It’s really important in the context of the baby outcomes that we’re looking at the right data for the right pregnancy episode from the mother,” said Haendel. In N3C’s records, their method found more than 600,000 pregnant patients with more than 800,000 individual pregnancy episodes between the beginning of 2018 and mid-2022.

There’s still work to be done to refine the method, which has yet to be peer-reviewed, and make sure it works on patient records from hospitals not in the federal database. The risk of real-world research, conducted without the rigorous controls of prospective, randomized trials, is that unseen bias in the data will distort the results. N3C’s database, for example, is skewed toward academic hospitals, which may tend to see more complex deliveries. And patient records are always inconsistently filled out.

“As with any dataset, it’s not always fully accurate,” said Figueroa. “As researchers, if you have incomplete data, you have to find different ways of inferring what that missingness is.”

But efforts are underway to build better, more representative sources of patient data before, during, and after pregnancy — making it more likely that real-world research can deliver meaningful results.

Premier has a contract with the Department of Health and Human Services to support its Perinatal Improvement Collaborative. It pulls standardized information from EHRs and other sources automatically from 220 hospital partners, with the goal of capturing patients that are often left out of such databases for lack of resources. “Those 220 hospitals were chosen to be very diverse,” said Deb Kilday, principal for Premier’s women and infant services. “Everything from the tiniest critical access hospitals all the way up to the largest birthing facilities across the country.”

There are also plans to develop pregnancy research frameworks that fit neatly into existing patient records. The National Institutes of Health have started developing standards and guidelines for recording pregnancy outcomes in a format that enables easy exchange of information between health systems. Guides are being built to specifically support the use of that data for research from pre-pregnancy through a year after birth. And the latest version of the United States Core Data for Interoperability, the standardized set of data elements that enable national health information exchange, contains a new field for pregnancy status.

Just because those frameworks exist doesn’t mean that providers — already burdened by extra work associated with digital record-keeping — will use them consistently enough to supercharge pregnancy health research. But academics are enthusiastic about their potential.

One of the biggest challenges in real-world research is harmonizing electronic health records that record the same clinical data in totally different ways. It takes a huge amount of time and money to cross-link those records — one reason why, in the absence of pandemic discounts, databases like Premier’s can often be out of reach for researchers. It’s unclear that federal funders, hospitals, and for-profit medical data companies will be so eager to prioritize data sharing as the threat of Covid-19 subsides; Figueroa already notices vendors returning to “business as usual,” charging more expensive rates for their datasets.

“It would be amazing if people spoke the same language,” said Figueroa.

With more interoperable records, Figueroa said, it would be that much faster to understand the messages hidden in thousands of pregnancy records around the country. That work could go a long way toward filling the gaps in our understanding of pregnancy — and give patients some much-needed guidance.

Source: STAT