A group of researchers led by Elke Rundensteiner has developed a highly effective technology that screens voice recordings for signs that a speaker is depressed, an important advance that could alert physicians and other clinicians to people who need help.
Audio-assisted Bidirectional Encoder Representations from Transformers (AudiBERT), the system developed by the researchers, leverages the words a speaker uses as well as the speaker’s tone, says Rundensteiner, William Smith Dean’s Professor of Computer Science and founding director of WPI’s Data Science Program.
“Clinicians can detect depression and other mental ailments based on the content and tone of interviews with patients,” Rundensteiner says. “With deep learning data science techniques, we have developed a digital technology that examines a speaker’s words and tone for signs of depression. If widely deployed, this tool could dramatically expand mental health screening at low costs.”
The researchers’ innovation was selected for presentation in November 2021 at the Association for Computing Machinery Conference on Information and Knowledge Management, where it received the Best Applied Research Award. The authors are Rundensteiner; Ermal Toto ’20 (Ph.D.), previously a graduate student in computer science with Rundensteiner and now WPI assistant director of academic research computing; and ML Tlachac, a Ph.D. student in data science with Rundensteiner. Tlachac has accepted a position as an assistant professor at Bryant University.
AudiBERT builds on the researchers’ previous work on the feasibility of using machine learning to analyze voice samples and other digital data from smartphones and social media and on audio-based depression screening as a way to address the societal problems of depression and limited mental health resources. At the core of the research is the idea that a person’s voice can reveal hidden issues.
“If a person is depressed, their vocal tone becomes a monotone,” Toto says. “Their voice might jitter, or shake, a little bit. Trained clinicians can intuitively detect these variables during conversations. Now we can automate the detection in the human voice through machine learning models.”
AudiBERT also addresses a critical research challenge: Relatively few voice data sets exist that have been labeled for indicators of depression. This limits the amount of data available for training deep learning models, a type of machine learning that automatically analyzes raw digital data to produce a model that can make predictions. Generally, more data leads to better models.
“Voice recording technologies are everywhere, from our smartphones to digital home assistants, but privacy concerns about recordings mean that it’s difficult to find large voice data sets that label spoken words as signs of mental ailments,” Tlachac says. “We set out to innovate a depression-screening solution that could be trained, even using small data sets. In addition, we wanted to demonstrate that voice is an excellent modality for screening.”
To evaluate AudiBERT, the researchers experimented with 15 voice data sets consisting of clinical interviews in which a virtual agent asked patients different questions such as “How are you doing today?” The data sets were labeled with scores indicating the depression status of each participant based on a clinical depression screening questionnaire. The researchers discovered that AudiBERT displayed the ability to accurately detect depression in voice recordings.
Rundensteiner is excited about the potential for this promising nonintrusive screening technology. AudiBERT could be deployed by doctors for universal mental health screening and to monitor depressed patients over time for their mental health signals, according to the researchers. They envision the day when a patient visiting a doctor’s office and filling out a health questionnaire on a computer tablet could seamlessly be screened for mental health concerns.
Source: Read Full Article