
Alexander Huth settled into an MRI machine in the Austin, Texas, neuroscience research building where he worked, a cozy blanket draped over him to stave off the chill from the machine’s magnet, and soundproof earbuds to drown out its drone. The sound in the earbuds, though, came through loud and clear.
“From the New York Times and WBUR Boston, this is ‘Modern Love,’” the podcast began.
Listening to those lines spurred brain activity, neurons firing and using up oxygen in his blood. As the deoxygenated blood flowed out, back to his lungs and heart, the magnet picked up its signal, betraying which parts of his brain were processing what he’d heard. And on the other side of the glass, a group of neuroscientists looked nervously at the data, trying to listen to the podcast just by looking at their adviser’s brain scans.
In a new Nature Neuroscience paper published Monday, Huth and a team of researchers from the University of Texas at Austin introduced a new “brain decoder” enabled by GPT-1, an earlier version of the artificial neural network technology that underpins ChatGPT. After digesting several hours of training data, the new tool was able to describe the gist of stories the three participants in the proof-of-concept experiment listened to — just by looking at their functional MRI scans.
While it’s reductive to say that the scientists have developed a way to read minds, combining a large language model with the extreme amount of fMRI data allowed Huth and co-workers to get closer than anyone else has. Earlier brain decoders focused on the parts of the brain that control motor functions like speech, but with the new decoder, scientists were able to output descriptions of videos participants watched, such as Pixar short films, that contained no spoken words.
Huth, an assistant professor of neuroscience and computer science at the University of Texas at Austin, hesitated to say he and his team are measuring thoughts. But, he said, “I think we are decoding something that is deeper than language.”
Francisco Pereira, a staff scientist at the National Institute of Mental Health who has worked on brain decoders for over a decade, said the last time scientists tried to do this, with the technology available in 2015 they would have been “lucky if we would get two words together that made sense, but it would not get any close to what the performance they get here.”
Of the findings, “the only thing I could be is impressed, because I know how hard it is to do this,” said Pereira, who was not affiliated with the research.
The main portion of the study used just three participants. Part of the reason for the tiny sample group is because it is so hard to get participants that stay still enough — for roughly two hours at a time, over 16 sessions — in the scanner. “We’ve gone through a lot of participants who just don’t keep still and you can’t use their data,” said Shailee Jain, a Ph.D. candidate in Huth’s lab and a co-author on the paper. Even “twitching your fingers or your toes can move muscles in your body that affects the data,” she said.
Because good data is such a bottleneck in this field, it’s rather routine for neuroscientists to go under the magnet in their own studies. Having skin in the game means the researchers are highly motivated to pay attention and stay still, which is one of the reasons Huth said he became one of the participants. As to the question of whether that could affect the data, “it just means that what they report is probably as good as it could get given the current techniques and methods that they use,” said Greta Tuckute, a Ph.D. candidate in the Massachusetts Institute of Technology’s department of brain and cognitive sciences, who was not involved with the study.
Normally, for these kinds of studies, words flash on a screen in front of the participant at a fixed rate, or they hear individual words, like “cat,” “dog,” or “horse,” said Jain. But for this study, which aimed to decode “continuous language” instead of individual words, the researchers used audio stories from “Modern Love” and “The Moth Radio Hour” for the 16 hours of training data, which kept participants’ interest much more easily.
“The Moth stories have been great. I’ve cried after listening to them. I’ve laughed really hard,” an anonymous study participant told STAT. That laughter, unfortunately, also made it hard not to move. “It’s a double-edged sword.”
It turns out that the engaging stories are an important component of getting good data; if a participant stops paying attention to the stimuli, or their mind wanders elsewhere, the researchers can’t train the model and the experiment doesn’t work.
In some ways, the fact that it’s so hard to get good data for the brain decoder is a feature, not a bug — needing a patient’s cooperation to build the model creates a built-in safeguard for patient privacy.
“It is important to constantly evaluate what the implications are of new brain decoders for mental privacy,” said Jerry Tang, a Ph.D. candidate in Huth’s lab and lead author on the paper, in a press briefing.
In devising ways to protect privacy, the authors asked participants to try to prevent the decoder from reconstructing the words they were hearing several different ways. Particularly effective methods included mentally listing off animals, and telling a different story at the same time the podcast was playing were particularly effective at stopping the decoder, said Tang. The authors also found that the decoder had to be trained on each subject’s data and wasn’t effective when used on another person.
Between these findings and the fact that any movement would make the fMRI scans worse, the authors concluded that it’s not currently possible for a brain decoder to be used on someone against their will.
“I don’t think that this is going to be used to violate people’s privacy,” said Pereira, noting that there are other behavioral ways to determine if someone is lying or if they recognize a picture. “Or if it is, and people are in a situation where they can compel you to be in the scanner for 16 hours and somehow get you to think about things, you’re already in a pretty compromised situation.”
Pereira appreciated why the team conducted the mental privacy experiments, having been asked similar questions about mental privacy over the years. But he laughed a bit out of frustration, knowing how hard it is to get a participant to focus on a particular topic to begin with. “Come on!” he said. “We spend all this time in the scanner, scanning ourselves for many hours to make sure we get one subject who’s not thinking about lunch!”
Huth has been working on language decoders since he was a graduate student in Jack Gallant’s lab at the University of California, Berkeley. Gallant was focused on figuring out how the brain processes vision, but Huth was the first one to say, “‘Well, wait a minute … we should be able to take these same tools and apply them to language,’” said Gallant, a professor of psychology an neuroscience at Berkeley. “And that worked great; it actually worked better than it worked in vision,” he said.
There are two key components that made the new brain decoder possible: the trove of data collected on a few participants, instead of the usual few hours of data on a lot of participants, and the advent of language models.
“Frankly … [MRI] measurements today are just as bad as they were 10 years ago,” said Gallant. “But what has changed are the power and availability of language models like GPT, BERT, PaLM, and others, which Huth used to “dramatically improve the performance of his encoding and decoding models,” according to Gallant.
A brain encoding model goes from the stimulus — in this case, the words being said in the podcast — to a prediction of what the brain activity will look like. A decoding model does exactly the opposite: It takes brain scans and predicts what stimulus — in this case, words — produced the brain activity. Thus, if you can solve one of those problems, you can solve the other, said Gallant.
This relationship between encoding and decoding is what causes neuroscientists like Tuckute to say that the work is “a cool proof of concept,” but mostly “a good piece of engineering work.”
“Brain decoding is not particularly a valuable scientific thing to do,” Gallant agreed. “It’s basically applying the science that you have learned to build a device.”
The decoder Huth created never directly asks what a person was thinking about. Instead, it uses an “encoding” model in a loop to create the “decoding” effect.
The researchers used GPT-1 to generate possible guesses for what phrases the person heard. Then, using the “forward” or “encoding” model that predicts brain activity based on a phrase, they modeled the brain activity that phrase might evoke. By comparing the predicted scan to the actual scan, they ranked the guessed phrases from best to worst and iterated, adding more words to the phrase.
Though decoding entire stories might seem harder than decoding individual words, as has been attempted in the “dog,” “cat,” and “horse” studies, predicting stories is actually easier because of the way brain data is currently collected.
Because a person can process multiple words in a single second, measuring language via fMRI — whose signal is sluggish, taking place over several seconds — means that any signal is an average of several words, said Gallant. “You can squint at it and be like a fortune teller reading the tea leaves and try to make yourself believe that you’re decoding. And maybe it’s good sometimes, but overall, it’s really, really bad.”
This effect made it nearly impossible to decode single words or even clumps of words in past studies. It’s also why the brain decoder is generally only able to capture the gist of what the participant heard.
Huth noted that the model is particularly bad at pronouns. It’s unclear if this is because of the artificial language model the study used or because that information is represented somewhere in the brain that’s hard to see in this data.
However, the researchers did find that there are three different brain networks that seem to process language, perhaps even somewhat redundantly. “[This] is exciting because there are different views as to how meaning is structured in the brain,” said Tuckute at MIT, “and given that they show that you can actually use different networks, that does provide an interesting perspective on what type of information is in these different networks.”
Furthermore, because language seems to be processed similarly in different languages, and because the decoder seems to be detecting some sort of meaning or thought rather than speech, the decoder might work in different languages: If the decoder was trained in English on someone who is bilingual and the person listened to audio in another language they understood, the decoder should theoretically be able to output what the person heard in English.
For now, though, that is preliminary research. Huth and his team are currently pursuing work to make the approach more practical for people who have suffered a stroke, have ALS, or have other health problems that impact their speech. The first step in that process is using a less cumbersome technology than MRI. A method called fNIRS — functional near-infrared spectroscopy — that measures similar signals could potentially do the same thing, but also be worn on the head, said Gallant.
In the meantime, Huth is writing research protocols with clauses that haven’t been needed before. For example, “telling people that we’re not going to try to decode anything except for scans where we explicitly say at the beginning, ‘This is a scan during which we might try to decode your brain activity,’” he said with a smile. “Which seems like a good policy.”