AI and speech: how artificial intelligence can help paralyzed people communicate again


How speech AI can help paralyzed people to regain their voices.

Paralysis is a condition that affects millions of people around the world, causing them to lose the ability to move or control some or all of their body parts. Paralysis can result from various causes, such as spinal cord injury, stroke, brain injury, or neurodegenerative diseases. Depending on the location and extent of the damage, paralysis can affect different functions of the body, including speech.

Speech is one of the most important forms of human communication, allowing us to express our thoughts, feelings, and needs. Speech also plays a vital role in our social interactions, personal relationships, and professional activities. Losing the ability to speak can have a devastating impact on one's quality of life, self-esteem, and mental health.

Fortunately, advances in artificial intelligence (AI) and neuroscience have opened new possibilities for restoring speech function in paralyzed people,

First, let's understand what artificial intelligence is all about.

Artificial intelligence, or AI, is the field of computer science that studies how to create machines or software that can perform tasks that normally require human intelligence, such as reasoning, learning, decision-making, perception, and natural language processing. AI (artificial intelligence) is mainly classified into two groups: "general AI" and "narrow AI"

Narrow AI is the type of AI that is designed to perform a specific task or function, such as playing chess, recognizing faces, or driving cars. Narrow AI systems are usually based on rules, algorithms, or statistical models that are trained on large amounts of data. Narrow AI systems can be very effective and efficient at their tasks, but they cannot handle situations that are outside their scope or domain.

General AI is the type of AI that aims to achieve human-like intelligence and abilities across a wide range of domains and tasks. General AI systems would be able to understand and communicate in natural language, reason and solve problems, learn from experience and feedback, and adapt to new situations and goals. General AI systems are still a theoretical and long-term goal of AI research, as they require a deeper understanding of the nature and mechanisms of intelligence.

Some examples of narrow AI applications are:

Search engines, such as Bing, that can process natural language queries and rank web pages based on relevance and popularity.

Voice assistants, such as Cortana, that can understand spoken commands and questions, and provide answers or actions.

Recommendation systems, such as Netflix, that can analyze user preferences and behavior and suggest relevant movies or shows to watch.

Self-driving cars, such as Waymo, that can navigate complex road environments and traffic situations, and avoid collisions and accidents.

Facial recognition systems, such as Face ID, that can identify and authenticate people based on their facial features.

Some examples of general AI challenges are:

The Turing test is a test of whether a machine can exhibit human-like intelligence and behavior in a conversation with a human judge.

The Winograd Schema Challenge, is a test of whether a machine can resolve linguistic ambiguities and common sense reasoning in natural language sentences.

The AGI Prize is a prize for creating a machine that can pass an eighth-grade science exam. 

AI can also help analyze and interpret large amounts of data, such as brain signals or speech sounds.

In recent years, researchers have developed various AI-powered technologies that can help paralyzed people to regain their voices. These technologies can be broadly classified into two categories: "speech synthesis" and "speech decoding"

Speech synthesis

Speech synthesis is the process of generating artificial speech sounds from text or other inputs. Speech synthesis can be used to create voice assistants, text-to-speech applications, or speech prostheses for people who cannot speak.

One example of speech synthesis technology is a brain-computer interface (BCI) that can translate brain activity into speech sounds. A BCI is a system that can establish a direct communication channel between the brain and an external device, such as a computer or a robotic arm. A BCI typically consists of three components: a "sensor" that can measure brain signals, such as electrical activity or blood flow; a "decoder" that can process and interpret the brain signals; and an "actuator" that can execute commands based on the decoded signals.

A BCI for speech synthesis can use different types of sensors to measure brain activity related to speech production or perception. For example, some sensors can be implanted on the surface or inside the brain (invasive BCI), while others can be attached to the scalp or worn on the head (non-invasive BCI). The sensor then sends the brain signals to the decoder, which uses AI algorithms to extract features and patterns from the signals that correspond to speech elements, such as words, syllables, or phonemes. The decoder then converts these features into text or speech commands that are sent to the actuator, which can be a speaker or a screen that produces synthetic speech sounds or displays text.

One of the challenges of speech synthesis using BCI is to achieve high accuracy and naturalness of the synthetic speech. This requires a deep understanding of how the brain encodes and decodes speech information, as well as sophisticated AI models that can learn from large datasets of speech samples. Another challenge is to ensure the safety and reliability of the BCI system, especially for invasive BCI that require surgery and pose potential risks of infection or rejection.

Despite these challenges, several studies have demonstrated the feasibility and potential of speech synthesis using BCI. For example, in 2023, researchers from Stanford University reported that they used an invasive BCI with electrodes implanted on the surface of the brain to help three patients with amyotrophic lateral sclerosis (ALS), a progressive neurodegenerative disease that causes muscle weakness and paralysis, including speech impairment. The BCI system used an artificial neural network (ANN), a type of AI model inspired by biological neurons, to decode brain activity related to vocal tract movements and translate it into synthetic speech sounds. The system achieved an average word error rate (WER) of 25%, which means that one out of four words was incorrectly synthesized. The system also produced natural-sounding speech with intonation and prosody.

Another example is a study conducted in 2023 by researchers from Switzerland and France, who used a non-invasive BCI with electrodes attached to the scalp to help a woman who was paralyzed after suffering a stroke that damaged her brainstem, leaving her unable to speak or move any part of her body except her eyes. The BCI system used an ANN to decode brain activity related to speech perception and facial expressions and translate it into synthetic speech sounds and digital avatar animations. The system achieved an average WER of 12%, which means that one out of eight words was incorrectly synthesized. The system also produced expressive speech with emotions and facial gestures.



Speech decoding

Speech decoding is the process of reconstructing speech sounds from other sources than text or brain activity. Speech decoding can be used to enhance speech recognition, voice conversion, or speech restoration for people who have lost their voice due to injury or disease.

One example of speech decoding technology is a nerve-stimulation device that can activate the muscles involved in speech production. A nerve-stimulation device is a system that can deliver electrical pulses to specific nerves that control the movement of the vocal cords, tongue, lips, and jaw. The device typically consists of two components: a "stimulator" that can generate and deliver the electrical pulses; and a "controller" that can adjust the timing, intensity, and frequency of the pulses.

A nerve-stimulation device for speech decoding can use different types of controllers to determine the optimal stimulation parameters for producing speech sounds. For example, some controllers can be operated manually by the user or a caregiver using a switch or a touchscreen (external controller), while others can be controlled automatically by an AI software that can detect the user's intention to speak from brain signals or other cues (internal controller). The controller then sends the stimulation commands to the stimulator, which can be implanted near the target nerves (implanted stimulator) or attached to the skin over the target nerves (transcutaneous stimulator). The stimulator then delivers the electrical pulses to the nerves, which trigger the contraction of the muscles, resulting in speech sounds.

One of the challenges of speech decoding using a nerve-stimulation device is to achieve precise and coordinated stimulation of multiple nerves and muscles that are involved in speech production. This requires a fine-tuning of the stimulation parameters for each user, as well as a feedback mechanism that can monitor and adjust the stimulation effects. Another challenge is to ensure the comfort and safety of the user, especially for implanted stimulators that require surgery and pose potential risks of infection or damage to the nerves or muscles.

Despite these challenges, several studies have shown the promise and benefit of speech decoding using nerve-stimulation devices. For example, in 2022, researchers from Switzerland and the Netherlands reported that they used an implanted stimulator with electrodes placed on the spinal cord to help three patients who were paralyzed after a spinal cord injury that affected their lower body, including their ability to walk, cycle, and swim. The simulator was controlled by AI software that used an ANN to decode brain activity related to motor intention and translate it into stimulation commands. The system enabled the patients to walk, cycle, and swim again by stimulating the nerves and muscles that control their legs.

Another example is a study conducted in 2023 by researchers from the Netherlands and Switzerland, who used a transcutaneous stimulator with electrodes placed on the neck to help a man who was paralyzed after falling on ice and damaging his spinal cord at the level of his chest. The simulator was controlled by AI software that used an ANN to decode brain activity related to speech intention and translate it into stimulation commands. The system enabled the man to regain some movement of his arms, hands, and fingers by stimulating the nerves and muscles that control his upper limbs.

Conclusion

AI can help paralyzed people regain their voices by using various technologies that can either synthesize or decode speech sounds from different sources. These technologies can improve the communication and quality of life of paralyzed people, as well as provide new insights into how the brain and body work together to produce speech. However, these technologies also face several challenges and limitations that need to be addressed before they can be widely adopted and applied. Therefore, further research and development are needed to optimize and validate these technologies, as well as to ensure their ethical and social implications.


Popular posts from this blog

Artificial Intelligence: Languages, Types, Disadvantages, and Robots

Artificial Neural Networks for Image Recognition, Natural Language Processing, and Speech Synthesis

Artificial Intelligence: History, Applications, and Impacts