Why big data didn’t deliver on its big promises to combat Covid-19

When the pandemic hit, technology companies pledged to do their part by cracking open their secretive datasets and letting public health researchers mine it for clues about how to bring Covid-19 under control. Two years in, it’s clear that big data isn’t the panacea they’d hoped for.

“Early on, when everyone was freaking out, there was a perception that there would be almost like a magic moment where the data would materialize and it would answer our questions, and we would adapt and control the pandemic and things would be great,” said Andrew Schroeder, a co-leader of the Covid-19 Mobility Data Network, a group of academic researchers and nonprofit partners that pulled in smartphone location data shared by tech companies so that public health officials could analyze it for insights into lockdown and distancing measures.

“And then this moment never came,” said Schroeder, also the vice president of research and analysis at Direct Relief, a disaster relief nonprofit.

advertisement

In part, that’s because the pandemic has stretched far longer than most anticipated. But as Schroeder and his co-leaders lay out in an opinion published Tuesday in PLOS Digital Health, it’s also because public health goals ran headlong into the business interests of the companies that provided their data for analysis, including Facebook and a cluster of ad-tech firms that tie clicks to location data.

“You have this really, really large, opaque ecosystem of companies which generate, buy, sell, and modify these datasets,” said Nishant Kishore, who worked in the lab of Caroline Buckee, another of the network’s leaders. Kishore, who worked closely with public health authorities using the network’s data, said that while it’s useful to researchers to have that data available, “unfortunately, what is collected is decided by entities whose priorities are different than that of the general public health.”

advertisement

Data collection and analysis is typically optimized for maximum commercial impact, not public health. “You might imagine that the way that GPS points or behavior data that’s collected off of Waze is going to be very different than how that data is collected from Tinder or from Grubhub,” said Kishore. And decisions that impact which questions can be answered are made well before researchers ever look at that data.

Facebook, for example, mitigated privacy risks by sharing only pre-calculated mobility metrics, such as percent change in movement and percent of Facebook users staying home. That made it impossible to calculate certain granular measures of mobility, which researchers thought, early in the pandemic, might serve as a proxy or input to calculate contact rates. (A study by the network, published recently in the Lancet Digital Health, found there are too many confounding factors to tie mobility data to that metric.)

Facebook’s metrics were also based on users’ average location in an eight-hour chunk of time. But the time chunks were only reported in UTC, making them more difficult to use in time zones where an eight-hour chunk didn’t align with a night spent sleeping at home.

Those challenges are only complicated by the fact that every tech company that volunteered its data did so in a different way, sometimes creating conflicting mobility metrics.

“From week to week it could be that — I’m picking random providers — Google, provider A, tells me that more people are staying at home, versus Facebook, provider B, is telling me that fewer people are staying at home,” said Kishore. “And now the question comes down to me as a researcher: Who do I trust?”

Schroeder said that the way personal data is shared in a crisis should be tailored to the needs of public health officials, but that’s unlikely to happen as long as companies are the ones generating it. “We just simply don’t have enough leverage” to encourage a behemoth like Facebook to collect and analyze data in a way that aligns perfectly with public health needs, he said.

That’s not stopping Schroeder and his colleagues from building with the tools at hand. The Covid-19 Mobility Data Network has evolved into a broader project called Crisis Ready, which aims to develop proactive data-sharing agreements with companies that allow for data pipelines that only open when necessary in a crisis, minimizing privacy risks.

They are also attempting to better preserve privacy when data is shared, testing systems that add noise to datasets so it’s more difficult to identify individuals. One ad-tech company, Cuebiq, is developing a platform that lets researchers analyze its data without giving them direct access. But “there needs to be research into the tradeoff between privacy protection and potential health benefit,” said Kishore. “We’re pretty far away from doing that right now because we’re not in the rooms where the data are.”

In the future, the researchers also hope to address the potential for commercial data sharing agreements to exacerbate inequity. “What will end up happening over and over again is that it’ll be researchers at Harvard or Stanford who have access to these datasets and continue producing the research,” said Kishore. “And oftentimes researchers and individuals who are truly representative of the populations on which we are doing the research can get cut out or left out of that process.”

Even among the cities, states, and countries that worked directly with the Covid-19 Mobility Data Network and its ivory tower researchers, low-resourced public health departments benefited less because they had less bandwidth to learn the ropes.

Mobility data also appear to be more reliable and more informative in urban areas than in rural areas, and researchers assume that data is skewed by age and socioeconomic status as well. “Facebook has been doing some research, because they do have information on age and gender and things of that nature, to develop a better representativeness index,” said Kishore. But those variables will remain hidden as long as researchers get their data secondhand.

Academics and public health officials can’t change the fact that a potential gold mine of epidemiological insight lies beyond their control. But they can try to train global health bodies to take advantage of these kinds of expanding data sources outside research’s ivory towers. “We’re not going to answer it with just more dashboards,” said Schroeder. “This has to be something where we have a much more concerted effort to create that layer of data translators throughout the world.”

Source: STAT