Opinion: Real-world evidence must not become evidence for abortion-related prosecution

The Supreme Court’s decision to overturn federal protection for abortion changed in an instant how many people think about pregnancy. But it is also changing how health systems need to think about their current and future sharing and monetization efforts for real-world evidence.

While the commercial and research benefits of using real-world evidence have been well-described, the potential unintended consequences and privacy implications of using such data have received far less attention. The Dobbs v. Jackson Women’s Health Organization decision necessitates a fundamental reexamination of the benefits and risks of health data markets, as data they contain could be used as evidence in the prosecution of patients and/or their health care providers in states where abortion is illegal.

With abortion already illegal in eight states, with more on deck, health care data from these states could be highly sensitive and provide potential evidence in criminal prosecution, although the specifics are rapidly evolving. South Dakota’s governor has claimed that doctors who provide abortions will be targeted for felony prosecution, though patients may not be. Kentucky, Louisiana and Missouri are proceeding similarly.


Real-world evidence is the application of real-world data to drug and medical device regulatory filings. Legal and privacy experts have identified the types of surveillance and data that could be used as evidence of having an abortion or providing one, such as GPS data, purchasing history, social media activity, phone call records, prescriptions, online drug purchases, and personal health information. These are the very definition of real-world data, according to the Food and Drug Administration, which broadly defines this category of data as “data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources.”

The FDA specifically refers to “electronic health records, claims and billing activities, patient generated data in home settings and data gathered from other sources that can inform on health status such as mobile devices” as real-world data.


Paradoxically, the data that have become so valuable to researchers are the data that can be used to prosecute people who have abortions or those who participate in their care. But is it possible to quantify the risks of these data to patients and providers?

The most prevalent threat models for technology and data come from the field of cybersecurity. Most commonly used is the cyber equation, in which risk represents the likelihood of damage to sensitive data, critical assets, finances, reputation, or people.

Put simply: Risk = (Threat) x (Vulnerability) x (Impact)

This equation can be applied to risks for patients, clinicians, or both, depending on the specific type of data.

Threat refers to the motivation and capabilities of those who would cause harm. Politicized state government officials and activists have been highly motivated, visible, and active in their pursuit of enforcing abortion bans and prosecuting those who violate them, and now have the ability to do so. Vulnerability refers to the ability of a victim to recover from an incident. In this case, it refers to the ability of a patient or health care provider to withstand prosecution or punishment, and the short-and long-term financial and reputational harms that would occur. Impact refers to the magnitude of potential harm, such as the loss of rights that accompany a felony conviction for a patient or loss of professional licensure for a clinician.

Without getting into the math, in states where abortion is illegal, the risk, threat, vulnerability, and impact stemming from the prosecutorial use of real-world data have all gone up significantly, for patients and health care providers. For patients, these risks are additive to any preexisting social vulnerabilities that limit their options for reproductive care.

Even before the Dobbs decision, the health care data market has been prone to questionable uses of data, improper tracking of patients across social media platforms, and failed or flawed privacy approaches. All of these will be exacerbated in the proposed legal environments of a post-Roe United States.

Ironically, one of the most common and virtuous uses of real-world health data is to study health equity and disparities. Tragically, with abortion soon to be illegal in more states, the Supreme Court has now created and codified the greatest health disparity in the nation. Many states have privacy laws that differ greatly from even their neighboring states, making some forms of data protected in some states and not in others.

Does health data follow patients across state lines? What will health care institutions do when served with subpoenas for medical records or prescribing histories? What about HIPAA? Who will it protect? Those are questions that desperately need to be answered. The regulatory environment is evolving too quickly to truly comprehend the effects in real time, and it’s possible that the interconnectedness of health care may end up serving prosecutors better than patients or doctors.

Real-world evidence is a rapidly evolving science and errors in data collection, linking, and interpretation are common. Improper design, confirmation bias, poor statistical and analytical design, low internal data validity, and lack of quality control are all well-understood issues. It is all too easy to assume that real-world data hold more correlative and predictive power than they do.

This is where privacy experts and I agree on the greatest potential for harm. Just as the concurrence of multiple prescriptions for mental health therapies and trauma from an auto accident don’t provide evidence that a parent is unfit to raise a child, prescriptions or procedure codes in an electronic health record or from real-world data may not be solid evidence that a person had an illegal abortion or a clinician performed one.

The complexity that makes electronic medical records difficult to use and burdensome to clinicians also makes them fertile ground for irrational associations and assumptions. If the Supreme Court moves on to attempting to ban contraceptives and same-sex marriage, as some have warned, in the absence of skilled data science, real-world data could represent and misrepresent data relationships that could result in spurious accusations.

Finding a way forward should start with determining if the benefits of real-world evidence and the resulting for-profit health data economy are worth the ever-escalating risks. According to cybersecurity theory, data carrying this level of potential harm simply should not be shared in ways that disproportionally raise threat, vulnerability, or impact. I believe that hospitals, at least those in states where abortion is illegal, should temporarily stop sharing data outside of the care continuum for aggregation or commercial research purposes until the risks are better understood and mitigated.

Beyond abstaining from data sharing, mitigations are possible but complex. One way to enable health data to continue to flow, but safely, would be to move to an “opt in” informed consent approach for patients and clinicians that are included in datasets that are being used externally. Many institutions currently use opt-out models, since they are less burdensome to administer and usually yield the highest participation, because opting out requires individuals to act. Switching to an opt-in approach places the burden onto the institution to document that each participant has explicitly agreed to be included in each data set, and/or in a new release of a preexisting data set. It also seems only fair that institutions selling and/or sharing data bear the burden of providing better privacy controls, given that they are profiting from the data, rather than the participants. Also, the addition of clinician consent is essential given the intent of many states to prosecute clinicians.

A second step is to move toward having all reproductive health research that includes human subjects being governed by institutional review boards that include people with adequate expertise to determine the associated risks of these data in consideration of new state laws. Today, so-called de-identified research is exempt from institutional review board oversight, but these data have become too sensitive, and it has also become too easy to identify individuals in de-identified datasets.

The types and ubiquity of surveillance technologies are expected to increase exponentially post-Roe. When medical record data are linked to location data, prescription and over-the-counter drug data, and even supermarket data, it is far too easy to identify people in real-world evidence datasets — and such identification or reidentification is not illegal in most states.

A third step involves considering implementing a version of the right to be forgotten for any aggregated real-world evidence datasets and associated systems. Within the General Data Protection Regulation, the right to be forgotten means that individuals have the right to ask “data controllers” to have their personal data erased from the dataset and that request should be acted on “without undue delay.”

This regulatory control is quite effective as data managers need to know who is in each dataset and where those datasets have been shared in order to remove them should a request be made. Beyond real-world evidence, in a situation where abortion is illegal in some states but not others, such an approach may become necessary within electronic health records to protect patients and clinicians. For now, it may be an important opportunity to keep health data flowing for real-world evidence purposes.

In addition to affecting individuals and their personal freedom, Dobbs v. Jackson will also affect commerce and other social structures. Although aggregated health data can be found in other ways, the secondary and tertiary markets for health data — those based on the commercialization of health data outside the health care continuum — are currently unregulated and not even subject to the Health Insurance Portability and Accountability Act, which serves as the most comprehensive health data privacy and security regulations, despite being assembled and enacted before smart phones existed.

Many existing health datasets make it far too easy to target individuals, specific groups of people, specific neighborhoods or geographies, and institutions for high-consequence surveillance. In human research, the first step following a major safety event is to stop the experiment to prevent further harm. Real-world evidence remains a valuable tool for biomedical research, but the human price required for it may soon become too high.

With many valid sources for aggregated health data in existence, for now it would be best if individual hospitals and health systems considered a serious pause of data sharing or further data sharing plans until the risks to their clinical staff and patients are fully understood and mitigated.

Eric D. Perakslis is a researcher and the chief science and digital officer at the Duke Clinical Research Institute, a faculty member in population health sciences at Duke University School of Medicine, and a lecturer in biomedical informatics at Harvard Medical School.

Source: STAT