A Comparative Study of Topic Models for Student Evaluations

Joseph Carpenter Sheils; David A Dampier; Haroon Malik

Download Paper | Permalink

Conference: 2024 ASEE North Central Section Conference
Location: Kalamazoo, Michigan
Publication Date: March 22, 2024
Start Date: March 22, 2024
End Date: March 23, 2024
Page Count: 12
DOI: 10.18260/1-2--45589
Permanent URL: https://peer.asee.org/45589
Download Count: 19

Request a correction

Paper Authors

biography

Joseph Carpenter Sheils Marshall University

visit author page

Joseph C. Sheils is an undergraduate researcher at Marshall University. With a background in statistics, he has conducted research on machine learning, probability theory, and natural language processing.

visit author page

biography

David A Dampier Marshall University

visit author page

Dr. Dave Dampier is Dean of the College of Engineering and Computer Sciences and Professor in the Department of Computer Sciences and Electrical Engineering at Marshall University. In that position, he serves as the university lead for engineering and computer sciences. He also serves as Director of the Institute for Cyber Security.

visit author page

biography

Haroon Malik Marshall University

visit author page

Dr. Malik is an Associate Professor at the Department of Computer Sciences and Electrical Engineering, Marshall University, WV, USA.

visit author page

Download Paper | Permalink

Abstract

Student evaluations, such as those collected within a higher education institution or externally on RateMyProfessors.com, leverage the individual experiences of students to provide comprehensive assessments of their teachers and schools. RateMyProfessors.com, the world’s largest crowd-sourced web service for student evaluation, provides a rich set of historical data on student evaluations. Over time, it has become a vast repository of student perceptions.

In order to create a more cohesive educational environment between students and university administrators, educators, and policymakers, topic models provide a method to efficiently research large-scale collections of student evaluations. This research evaluates the efficacy of multiple topic modeling techniques for academic feedback, identifying an appropriate method for the higher education community to use in identifying themes and patterns of student perceptions of their educators and schools.

Though quantitative Likert-scale student evaluations can be easily compared and analyzed, the unstructured nature of textual comments poses challenges in the analytical process. To facilitate efficient, large-scale analysis of the textual data in student evaluations, past research efforts have successfully utilized topic modeling, a natural language processing (NLP) technique. Topic models automatically discover the main topics present within a collection of student evaluations, thereby making it easy to compare student comments across the discovered topics. The most widely used topic modeling technique is Latent Dirichlet Allocation (LDA). However, it often suffers from unsuitability to short texts, the production of overlapping topics, and the need for extensive text preprocessing to obtain topics which are interpretable. Since LDA was established in 2003, various advanced topic modeling methods that leverage neural techniques such as transformers and word embeddings have been developed, reducing the need for extensive text preprocessing while improving performance for short texts, such as those found in student evaluations.

To pinpoint a suitable topic modeling technique for usage with academic feedback on educators and higher education institutions, we conduct a comparative study on the performance of four topic modeling techniques: namely, (1) Latent Dirichlet Allocation (LDA), (2) Nonnegative Matrix Factorization (NMF), (3) BERTopic, and (4) Top2vec. LDA and NMF are traditionally used topic modeling techniques that statistically extract topics through the structure of documents, while BERTopic and Top2Vec extract topics through word embeddings. The four techniques are chosen to represent both conventional means of topic modeling, as well as those which utilize recently developed, complex approaches. Comments within student evaluations on schools and educators, collected from RateMyProfessors.com, serve as the textual basis on which model performance is assessed.

Though the chosen techniques span a wide range of implementation processes, the topics produced by each are held to the same evaluation standard. The metrics used to evaluate the performance of the chosen topic modeling techniques are topic coherence, topic diversity, and human interpretation of the topics. Topic coherence is a measure of topic quality described by Hoyle et al. (2021) as “An intangible sense, available to human readers, that a set of terms, when viewed together, enable human recognition of an identifiable category.”. Along with coherence, we evaluate topic diversity, which measures how different topics are from each other. While these evaluation metrics are important proxies for model performance, we also investigate the human interpretability of topics and provide visualizations of model results.

Citation
Format

Sheils, J. C., & Dampier, D. A., & Malik, H. (2024, March), A Comparative Study of Topic Models for Student Evaluations Paper presented at 2024 ASEE North Central Section Conference, Kalamazoo, Michigan. 10.18260/1-2--45589

TY - CPAPER
AB - Student evaluations, such as those collected within a higher education institution or
externally on RateMyProfessors.com, leverage the individual experiences of students to provide
comprehensive assessments of their teachers and schools. RateMyProfessors.com, the world’s
largest crowd-sourced web service for student evaluation, provides a rich set of historical data on
student evaluations. Over time, it has become a vast repository of student perceptions.

In order to create a more cohesive educational environment between students and
university administrators, educators, and policymakers, topic models provide a method to
efficiently research large-scale collections of student evaluations. This research evaluates the
efficacy of multiple topic modeling techniques for academic feedback, identifying an appropriate
method for the higher education community to use in identifying themes and patterns of student
perceptions of their educators and schools.

Though quantitative Likert-scale student evaluations can be easily compared and analyzed,
the unstructured nature of textual comments poses challenges in the analytical process. To facilitate
efficient, large-scale analysis of the textual data in student evaluations, past research efforts have
successfully utilized topic modeling, a natural language processing (NLP) technique. Topic models
automatically discover the main topics present within a collection of student evaluations, thereby
making it easy to compare student comments across the discovered topics. The most widely used
topic modeling technique is Latent Dirichlet Allocation (LDA). However, it often suffers from
unsuitability to short texts, the production of overlapping topics, and the need for extensive text
preprocessing to obtain topics which are interpretable. Since LDA was established in 2003, various
advanced topic modeling methods that leverage neural techniques such as transformers and
word embeddings have been developed, reducing the need for extensive text preprocessing while
improving performance for short texts, such as those found in student evaluations.

To pinpoint a suitable topic modeling technique for usage with academic feedback on
educators and higher education institutions, we conduct a comparative study on the performance
of four topic modeling techniques: namely, (1) Latent Dirichlet Allocation (LDA), (2) Nonnegative Matrix Factorization (NMF), (3) BERTopic, and (4) Top2vec. LDA and NMF are traditionally used topic modeling techniques that statistically extract topics through the structure
of documents, while BERTopic and Top2Vec extract topics through word embeddings. The four
techniques are chosen to represent both conventional means of topic modeling, as well as those
which utilize recently developed, complex approaches. Comments within student evaluations on
schools and educators, collected from RateMyProfessors.com, serve as the textual basis on which
model performance is assessed.

Though the chosen techniques span a wide range of implementation processes, the topics
produced by each are held to the same evaluation standard. The metrics used to evaluate the
performance of the chosen topic modeling techniques are topic coherence, topic diversity, and
human interpretation of the topics. Topic coherence is a measure of topic quality described by
Hoyle et al. (2021) as “An intangible sense, available to human readers, that a set of terms, when
viewed together, enable human recognition of an identifiable category.”. Along with coherence,
we evaluate topic diversity, which measures how different topics are from each other. While these
evaluation metrics are important proxies for model performance, we also investigate the human
interpretability of topics and provide visualizations of model results.
AU - Joseph Carpenter Sheils
AU - David A Dampier
AU - Haroon Malik
CY - Kalamazoo, Michigan
DA - 2024/03/22
PB - ASEE Conferences
TI - A Comparative Study of Topic Models for Student Evaluations
UR - https://peer.asee.org/45589
DO - 10.18260/1-2--45589
ER -