Uploaded by Illia Kozlov

The African Americans Language Project

advertisement
Post-emancipation media references to African Americans
and how they changed over time
Introduction
The aim of this research was to trace changes in the language used to refer to African
Americans in American media after the Emancipation Proclamation in 1863. A corpus of
newspaper articles about Emancipation celebrations in the Shenandoah Valley during the
1868-1934 time period was used as an object of research.
Methods and Tools
●
●
3 main steps were followed to obtain the result:
○ Filtering the corpus text into a list of articles, where each article is bound to its
date.
○ Using a Large Language Model GPT-3.5 to extract specific terms referring to
African Americans from each article in the list.
○ Preparing the obtained data for display in the form of a year-correlated term
chart.
Every step of text processing was automated with a Python script. The corpus and the
term chart were represented as a Google Document and a Google Sheet respectively.
Python Scripts
Step 1: Data filtering algorithm
Step 2: Interacting with GPT-3.5
Prompts:
Term retrieving function:
Step 3: Creating the term chart
Results
The term chart represents qualitative analysis of the language. It can be seen that the most
frequent lexeme is colored, which is used in a wide range of collocations throughout most
years. The second most used lexeme is negro and the third is black.
It can also be seen that the result does not allow us to visualize changes in the language. That
is largely due to the lack of corpus data, as well as the GPT-3.5 error.
Conclusions
●
●
In order to obtain meaningful statistics of language changes over time, a more extensive
dataset is required.
The error of LLMs like GPT-3.5 and the cost of its API use should be taken into
consideration when performing similar analyses.
Download