Board 72: Adaptive Affect-Aware Multimodal Learning Assessment System for Optimal Educational Interventions

Andres Gabriel Gomez; Catia S. Silva

Download Paper | Permalink

Conference: 2024 ASEE Annual Conference & Exposition
Location: Portland, Oregon
Publication Date: June 23, 2024
Start Date: June 23, 2024
End Date: July 12, 2024
Conference Session: Educational Research and Methods Division (ERM) Poster Session
Tagged Division: Educational Research and Methods Division (ERM)
Tagged Topic: Diversity
Page Count: 8
Permanent URL: https://strategy.asee.org/48370

Request a correction

Paper Authors

biography

Andres Gabriel Gomez University of Florida

visit author page

I am a second year MS student in the department of Electrical & Computer Engineering at the University of Florida. My research interests include, but are not limited to, computer vision in healthcare (i.e., medical image segmentation), AI for clinical workflows, and education technologies. I am currently working on cardiac magnetic resonance imaging (CMRI) segmentation and pursuing an independent study project in education technology. I hope to pursue PhD degree where I can work on AI problems that serve science and society.

visit author page

biography

Catia S. Silva University of Florida

visit author page

Catia S. Silva is an Instructional Assistant Professor in the ECE department at the University of Florida. Her expertise is in machine learning, data science and engineering education. Dr. Silva is a GitHub Campus Advisor and can help integrate GitHub with your courses.

visit author page

Download Paper | Permalink

Abstract

This abstract is for a Work in Progress (WIP) paper and addresses adaptive computer learning and personal response systems with a potential for mobile applications. Given the decreasing cost of sensors and the increasing emphasis on harnessing multimodal data in education, researchers are exploring how to use this diverse data to improve student engagement and enhance academic performance [1]. Over the past several years, the performance of classifiers for each modality has improved significantly [2]. Several studies have investigated leveraging multimodal data and a combination of classifiers to model user engagement, knowledge and preferences [1]. This paper aims to leverage affect-aware classifiers and other personal data to better determine individualized optimal educational interaction, thereby replicating some of the benefits of one-on-one learning experiences. We present research focused on designing, developing, and evaluating a learning system that integrates facial expression, head pose, and task performance, to construct rich models of users’ affective mannerisms, answer proficiencies, and interaction preferences. These models inform our selection of an optimal educational interaction, such as suggesting when to review content, proceed to new content, take a break, or provide emotional support, leading to highly adaptive and engaging educational experiences. The proposed approach involves collecting webcam and interactive data between a user and the learning system. Features are extracted from the webcam footage and fed to multiple modules, each designed for a particular task: head pose estimation, facial expression and American Sign Language (ASL) recognition. Although, this system is designed to help parents of deaf children learn ASL, it is available for anyone who wants to learn the language. Moreover, the proposed system can be modified to learn content other than ASL. The pose estimation and expression recognition modules determine the affect unit, which is designed to learn the user’s idiosyncratic emotional expressions. An answer accuracy is derived from the hand-gesture recognition module and is combined with interactive information, such as the answer rate. These data are aggregated with the features generated by the affect unit. The aggregator fuses the diverse features and labels, evaluating the state of each in relation to user performance. Subsequently, the data is transmitted from the aggregator to a recommendation agent, which, guided by criteria including answer accuracy, response rate, and user affect, discerns the most suitable educational interaction. In this abstract, we have outlined three pivotal aspects that our research addresses: comprehending and modeling human multimodal data (i.e. affect states); flexibly adapting interactive user and task models; and optimizing the selection of educational interactions. In completing this research, we hope to inform future research on using affect-aware, adaptive multimodal systems to optimize interactions and boost user engagement and curricular performance. [1] Wilson Chango, Juan A. Lara, Rebeca Cerezo, Cristóbal Romero A review on data fusion in multimodal learning analytics and educational data mining. WIREs Data Mining and Knowledge Discovery. 05 April 2022 [2] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime MultiPerson 2D Pose Estimation using Part Affinity Fields. In CVPR.

Citation
Format

Gomez, A. G., & Silva, C. S. (2024, June), Board 72: Adaptive Affect-Aware Multimodal Learning Assessment System for Optimal Educational Interventions Paper presented at 2024 ASEE Annual Conference & Exposition, Portland, Oregon. https://strategy.asee.org/48370

TY - CPAPER
AB - This abstract is for a Work in Progress (WIP) paper and addresses adaptive computer learning and personal response systems with a potential for mobile applications.
Given the decreasing cost of sensors and the increasing emphasis on harnessing multimodal data in education, researchers are exploring how to use this diverse data to improve student engagement and enhance academic performance [1]. Over the past several years, the performance of classifiers for each modality has improved significantly [2]. Several studies have investigated leveraging multimodal data and a combination of classifiers to model user engagement, knowledge and preferences [1].
This paper aims to leverage affect-aware classifiers and other personal data to better determine individualized optimal educational interaction, thereby replicating some of the benefits of one-on-one learning experiences. We present research focused on designing, developing, and evaluating a learning system that integrates facial expression, head pose, and task performance, to construct rich models of users’ affective mannerisms, answer proficiencies, and interaction preferences. These models inform our selection of an optimal educational interaction, such as suggesting when to review content, proceed to new content, take a break, or provide emotional support, leading to highly adaptive and engaging educational experiences.
The proposed approach involves collecting webcam and interactive data between a user and the learning system. Features are extracted from the webcam footage and fed to multiple modules, each designed for a particular task: head pose estimation, facial expression and American Sign Language (ASL) recognition. Although, this system is designed to help parents of deaf children learn ASL, it is available for anyone who wants to learn the language. Moreover, the proposed system can be modified to learn content other than ASL.
The pose estimation and expression recognition modules determine the affect unit, which is designed to learn the user’s idiosyncratic emotional expressions. An answer accuracy is derived from the hand-gesture recognition module and is combined with interactive information, such as the answer rate. These data are aggregated with the features generated by the affect unit. The aggregator fuses the diverse features and labels, evaluating the state of each in relation to user performance. Subsequently, the data is transmitted from the aggregator to a recommendation agent, which, guided by criteria including answer accuracy, response rate, and user affect, discerns the most suitable educational interaction.
In this abstract, we have outlined three pivotal aspects that our research addresses: comprehending and modeling human multimodal data (i.e. affect states); flexibly adapting interactive user and task models; and optimizing the selection of educational interactions. In completing this research, we hope to inform future research on using affect-aware, adaptive multimodal systems to optimize interactions and boost user engagement and curricular performance.
[1] Wilson Chango, Juan A. Lara, Rebeca Cerezo, Cristóbal Romero A review on data fusion in multimodal learning analytics and educational data mining. WIREs Data Mining and Knowledge Discovery. 05 April 2022
[2] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime MultiPerson 2D Pose Estimation using Part Affinity Fields. In CVPR.

AU - Andres Gabriel Gomez
AU - Catia S. Silva
CY - Portland, Oregon
DA - 2024/06/23
PB - ASEE Conferences
TI - Board 72: Adaptive Affect-Aware Multimodal Learning Assessment System for Optimal Educational Interventions
UR - https://strategy.asee.org/48370
ER -