Asee peer logo

Evaluating Stereotypical Biases and Implications for Fairness in Large Language Models

Download Paper |

Conference

2024 ASEE North East Section

Location

Fairfield, Connecticut

Publication Date

April 19, 2024

Start Date

April 19, 2024

End Date

April 20, 2024

Page Count

10

DOI

10.18260/1-2--45767

Permanent URL

https://peer.asee.org/45767

Download Count

40

Request a correction

Paper Authors

biography

Danushka Bandara Fairfield University

visit author page

DANUSHKA BANDARA received the bachelor’s degree in Electrical Engineering from the University of Moratuwa, Sri Lanka, in 2009. He received his master’s and Ph.D. degrees in Computer Engineering and Electrical and Computer Engineering from Syracuse University, Syracuse, NY, USA, in 2013 and 2018, respectively. From 2019 to 2020, he worked as a Data Scientist at Corning Incorporated, Corning, NY, USA. Currently, he is an
Assistant Professor of Computer Science and Engineering
at Fairfield University, Fairfield, CT, USA. His Current research interests include Applied machine learning, Bioinformatics, Human-computer interaction, and Computational social science.

visit author page

Download Paper |

Abstract

In this study, we investigate the types of stereotypical bias in Large Language Models (LLMs). We highlight the risks of ignoring bias in LLMs, ranging from perpetuating stereotypes to affecting hiring decisions, medical diagnostics, and criminal justice outcomes. To address these issues, we propose a novel approach to evaluate bias in LLMs using metrics developed by Stereoset [1]. Our experiments involve evaluating several proprietary and open-source LLMs (GPT4, GEMINI PRO, OPENCHAT, LLAMA) for stereotypical bias and examining the attributes that influence bias. We used a selected 100 prompts from the stereoset dataset to query the LLMs via their respective APIs. The results were evaluated using the language modeling score, stereotype score and the combination iCAT[1] score. In particular, open source LLMs showed higher levels of bias in handling stereotypes than proprietary LLMs (40% average stereotype score for the open source LLMs and 47% average stereotype score for the proprietary ones: 50% being the ideal, unbiased stereotype score). The language modeling score was even between the models, with the open source models achieving 94% and the proprietary ones 91%. The combined average iCAT score was 76.6% for the proprietary models and 62.5% for the open source models. This disparity in stereotypical bias could be due to the regulatory inspection and user testing through reinforcement learning with human feedback (RLHF) that the proprietary models are subject to. We present our findings and discuss their implications for mitigating bias in LLMs. Overall, this research contributes to the understanding of bias in LLMs and provides insights into strategies for improving fairness and equity in NLP applications.

[1] Nadeem, M., Bethke, A., & Reddy, S. (2020). StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.

Cao, C., & Bandara, D. (2024, April), Evaluating Stereotypical Biases and Implications for Fairness in Large Language Models Paper presented at 2024 ASEE North East Section, Fairfield, Connecticut. 10.18260/1-2--45767

ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2024 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015