KEY POINTS-

  • A brain-computer interface (BCI) can decode brain activity into text, speech, and facial expressions.
  • The BCI was developed for a woman named Ann who was left quadriplegic and unable to speak.
  • BCI offers hope for locked-in syndrome, speech, and communication conditions.
DeltaWorks/Pixabay
 
DeltaWorks/Pixabay

High school math teacher Ann was only 30 when she suffered a brainstem stroke in 2005 that left her quadriplegic and unable to communicate with her toddler and eight-year-old.

A pioneering neuroscience study shows how for the first time in 18 years, Ann is able to communicate more naturally with facial expressions through a digital avatar that uses artificial intelligence (AI) and brain-computer interface (BCI) technologies to interpret her brain activity.

 

In a major scientific breakthrough, researchers at the University of California, San Francisco (UCSF) and the University of California, Berkeley combine neurotechnology and artificial intelligence deep learning to power an animated digital avatar, which is capable of facial gestures for a richer, more natural speech communications. This was the first brain-computer interface to synthesize both speech and facial expressions from brain activity.

 

“Here we demonstrate flexible, real-time decoding of brain activity into text, speech sounds, and both verbal and non-verbal orofacial movements,” wrote professor Edward Chang, M.D., neurosurgeon and chairman of the department of neurological surgery at UCSF, along with study co-authors Gopala Anumanchipalli, Karunesh Ganguly, Adelyn Tu-Chan, Inga Zhuravleva, Michael Berger, Peter Wu, Jessie Liu, Maximilian Dougherty, Ran Wang, Margaret Seaton, David Moses, Alexander Silva, Kaylo Littlejohn, and Sean Metzger.

 

The team aimed to decode Ann’s intended sentences from brain activity obtained from electrodes on a thin array placed over the speech cortical regions of the brain’s sensorimotor cortex (SMC) and the superior temporal gyrus.

A digital avatar of Ann was created using AI machine learning to bridge AI facial animation software by Edinburgh-based Speech Graphics with her brain activity that was generated while she was attempting speech and facial movements. Berger, the co-founder and CTO of Speech Graphics, is a co-author who contributed to this research.

 

To gather brain signal data, a high-density electrocorticogram (ECoG) array with 253 disc-shaped electrodes by PMT Corporation was surgically implanted on the surface of the left hemisphere of the brain in the regions related to language perception and speech production. A Blackrock Microsystems percutaneous pedestal connector links the thin array to a brain signal, headstage device, called the Blackrock Microsystems’ CerePlex E256. The headstage device converts analog brain activity into digital signals and sends the minimally processed data to a computer for further processing.

 

Next, the research team gathered data to train the AI deep learning algorithm for decoding. Ann was prompted to attempt to say certain text or attempt to perform a specified action as her brain activity was recorded. Instead of training the AI deep learning models to identify words, the UCSF and UC Berkeley researchers trained the algorithms to decode words using the smallest unit of sound in a language called a phoneme. For example, the word “when” has four phonemes consisting of the w, e, n, and t sound units. Using this approach, their AI algorithm was able to decode any word in the English language from just 39 phonemes.

 

During the training of the AI decoders, researchers used a Connectionist Temporal Classification Loss (CTC Loss) function to infer sequences when the exact time alignment between a letter or speech sound (phone) and speech waveforms is not known. Connectionist Temporal Classification is often used for automatic speech recognition tasks. CTC is a scoring function for neural network output where there may not be alignment at every timestep between the input and output sequence.

 

“We used CTC loss during training of the text, speech, and articulatory decoding models to enable prediction of phone probabilities, discrete speech-sound units, and discrete articulator movements, respectively, from the ECoG signals,” the researchers reported.

According to the researchers, their neurotechnology solution can rapidly decode brain signals into text in real-time at a median rate of 78 words per minute (WPM). This performance greatly exceeds the 14 WPM rate of Ann’s current assistive device that requires her to select letters, a much slower process.

 

The scientists wrote,

Faster, more accurate, and more natural communication are among the most desired needs of people who have lost the ability to speak after severe paralysis. Here we have demonstrated that all of these needs can be addressed with a speech-neuroprosthetic system that decodes articulatory cortical activity into multiple output modalities in real time, including text, speech, audio synchronized with a facial avatar, and facial expressions.

Recent advancements in artificial intelligence and brain-computer interfaces offer hope for those with locked-in syndrome, such as Ann. The pioneering breakthroughs in this study show how innovation in artificial intelligence and brain-computer interfaces have the potential to restore the ability to communicate and vastly improve the quality of life for those who need it the most.