A Novel Method for Analysing Racial Bias: Collection of Person Level References: Conclusion

14 May 2024

This paper is available on arxiv under CC 4.0 license.


(1) Muhammed Yusuf Kocyigit, Boston University;

(2) Anietie Andy, University of Pennsylvania;

(3) Derry Wijaya, Boston University.


We conduct, to the best of our knowledge, the first large scale, longitudinal investigation into books in the United States for investigating racial bias towards African Americans. Our study is also unique in the sense that we use person-names directly, which we crawled from Wikipedia, instead of referring phrases. We also adapt our representation learning method to accommodate both the absence of positive and the presence of negative context to better analyze the bias towards African Americans.

We show that there exist remarkably higher toxicity, 50% higher, around African American figures in books. This is problematic since books constitute a large part of our cultural heritage. In Table 1 we illustrate the differences in representation that take into account the learned semantic space. In the presented semantic axes, the difference is significant with African Americans positioned closer to negative adjectives defined by words such as lowly, obscure, inferior, subordinate or illegitimate. These results support the claim that African Americans are presented in a more negative and toxic context in books over time.

Additionally, we display the timeline of how African American representation has changed, in which the biggest change has occurred around the beginning of the 20th century. This decade, 1900s, when investigated together with previous results, is also the time where we start observing much higher toxicity, putting additional emphasis that the context around African Americans might have been negatively inclined for a long time.

Our study provides important insights into the representation of African Americans in literature and written medium over time. The observed discontinuities in the representation of African Americans highlight the ongoing struggle for representation and equality in society and emphasize that while a lot of progress has been made there remains steps to be taken. Finally, our proposed method for analyzing personallevel bias provides a promising approach to understanding the implicit biases present in books, and may serve as a useful tool for addressing racial bias in literature and media.

Our method also enables a new approach to analyzing the differences between group representations without the need for selecting referring phrases. This is especially useful in cases where context is changing, making selecting referring phrases challenging without making assumptions, or selecting unbiased referring phrases is itself a roadblock.