Board 209: Bridging Language Barriers in Healthcare Education: An Approach for Intelligent Tutoring Systems with Code-Switching Adaptation

Zechun Cao; German Zavala Villafuerte; Ali Jalooli; Renu Balyan; Sanaz Rahimi Moosavi; Francisco Iacobelli

Download Paper | Permalink

Conference: 2024 ASEE Annual Conference & Exposition
Location: Portland, Oregon
Publication Date: June 23, 2024
Start Date: June 23, 2024
End Date: July 12, 2024
Conference Session: NSF Grantees Poster Session
Tagged Topics: Diversity and NSF Grantees Poster Session
Permanent URL: https://strategy.asee.org/46776

Request a correction

Paper Authors

biography

Zechun Cao Texas A&M University, San Antonio orcid.org/0000-0002-4542-7791

visit author page

Zechun Cao received his master's and Ph.D. degrees in computer science from the University of Houston. His research lies at the intersection of cybersecurity, privacy, and artificial intelligence (AI). His doctoral thesis centers around developing network and host intrusion detection methods leveraged by intelligent user behavior recognition. He also collaborates with economists and city planners on devising AI algorithms that result in long-lasting real-world impact. More recently, he has been passionate about designing algorithms and tools to keep users' private confidential data secure in an AI-driven world. Dr. Cao's work has been published in international conferences and journals. He is a member of ACM and IEEE and has served as a TPC member and reviewer for various journals and international conferences.

visit author page

author page

Dr. Iacobelli is a Computer Scientist with a research focus at the intersection between human-computer interaction, natural language processing, education and artificial intelligence. He has been applying this research to healthcare and to bridge health disparities. Dr. Iacobelli is an associate professor in the Computer Science Department at Northeastern Illinois University where he has taught since 2011. He is also an associated faculty member of the Center for Advancing Safety in Machine Intelligence (CASMI) at Northwestern University.

visit author page

Download Paper | Permalink

Abstract

The recent rapid development in Natural Language Processing (NLP) has greatly enhanced the effectiveness of Intelligent Tutoring Systems (ITS) as tools for healthcare education. These systems hold the potential to improve health-related quality of life (HRQoL) outcomes, especially for low-literacy populations such as the Hispanic community with limited reading and writing skills. However, despite the progress in pre-trained multilingual NLP models, there exists a noticeable research gap when it comes to code-switching within the medical context. Code-switching is a prevalent phenomenon in multilingual communities where individuals seamlessly transition between languages during conversations. This presents a distinctive challenge for healthcare ITS aimed at serving multilingual communities, as it demands a thorough understanding of and accurate adaptation to code-switching, which has thus far received limited attention in research.

The hypothesis of our work asserts that the development of an ITS for healthcare education, culturally appropriate to the Hispanic population with frequent code-switching practices, is both achievable and pragmatic. Given that text classification is a core problem to many tasks in ITS, like sentiment analysis, topic classification, and smart replies, we target text classification as the application domain to validate our hypothesis.

Our model relies on pre-trained word embeddings to offer rich representations for understanding code-switching medical contexts. However, training such word embeddings, especially within the medical domain, poses a significant challenge due to limited training corpora. In our approach to address this challenge, we identify distinct English and Spanish embeddings, each trained on medical corpora, and subsequently merge them into a unified vector space via space transformation. In our study, we demonstrate that singular value decomposition (SVD) can be used to learn a linear transformation (a matrix), which aligns monolingual vectors from two languages in a single meta-embedding. As an example, we assessed the similarity between the words “cat” and “gato” both before and after alignment, utilizing the cosine similarity metric. Prior to alignment, these words exhibited a similarity score of 0.52, whereas after alignment, the similarity score increased to 0.64. This example illustrates that aligning the word vectors in a meta-embedding enhances the similarity between these words, which share the same meaning in their respective languages. To assess the quality of the representations in our meta-embedding in the context of code-switching, we employed a neural network to conduct text classification tasks on code-switching datasets. Our results demonstrate that, compared to pre-trained multilingual models, our model can achieve high performance in text classification tasks while utilizing significantly fewer parameters.

Citation
Format

Cao, Z., & Zavala Villafuerte, G., & Jalooli, A., & Balyan, R., & Moosavi, S. R., & Iacobelli, F. (2024, June), Board 209: Bridging Language Barriers in Healthcare Education: An Approach for Intelligent Tutoring Systems with Code-Switching Adaptation Paper presented at 2024 ASEE Annual Conference & Exposition, Portland, Oregon. https://strategy.asee.org/46776

TY  - CPAPER
AB  - The recent rapid development in Natural Language Processing (NLP) has greatly enhanced the effectiveness of Intelligent Tutoring Systems (ITS) as tools for healthcare education. These systems hold the potential to improve health-related quality of life (HRQoL) outcomes, especially for low-literacy populations such as the Hispanic community with limited reading and writing skills. However, despite the progress in pre-trained multilingual NLP models, there exists a noticeable research gap when it comes to code-switching within the medical context. Code-switching is a prevalent phenomenon in multilingual communities where individuals seamlessly transition between languages during conversations. This presents a distinctive challenge for healthcare ITS aimed at serving multilingual communities, as it demands a thorough understanding of and accurate adaptation to code-switching, which has thus far received limited attention in research.

The hypothesis of our work asserts that the development of an ITS for healthcare education, culturally appropriate to the Hispanic population with frequent code-switching practices, is both achievable and pragmatic. Given that text classification is a core problem to many tasks in ITS, like sentiment analysis, topic classification, and smart replies, we target text classification as the application domain to validate our hypothesis. 

Our model relies on pre-trained word embeddings to offer rich representations for understanding code-switching medical contexts. However, training such word embeddings, especially within the medical domain, poses a significant challenge due to limited training corpora. In our approach to address this challenge, we identify distinct English and Spanish embeddings, each trained on medical corpora, and subsequently merge them into a unified vector space via space transformation. In our study, we demonstrate that singular value decomposition (SVD) can be used to learn a linear transformation (a matrix), which aligns monolingual vectors from two languages in a single meta-embedding. As an example, we assessed the similarity between the words “cat” and “gato” both before and after alignment, utilizing the cosine similarity metric. Prior to alignment, these words exhibited a similarity score of 0.52, whereas after alignment, the similarity score increased to 0.64. This example illustrates that aligning the word vectors in a meta-embedding enhances the similarity between these words, which share the same meaning in their respective languages. To assess the quality of the representations in our meta-embedding in the context of code-switching, we employed a neural network to conduct text classification tasks on code-switching datasets. Our results demonstrate that, compared to pre-trained multilingual models, our model can achieve high performance in text classification tasks while utilizing significantly fewer parameters.
AU  - Zechun Cao
AU  - German Zavala Villafuerte
AU  - Ali Jalooli
AU  - Renu Balyan
AU  - Sanaz Rahimi Moosavi
AU  - Francisco Iacobelli
CY  - Portland, Oregon
DA  - 2024/06/23
PB  - ASEE Conferences
TI  - Board 209: Bridging Language Barriers in Healthcare Education: An Approach for Intelligent Tutoring Systems with Code-Switching Adaptation
UR  - https://strategy.asee.org/46776
ER  -