US Census Bureau Data (Undergrad Research Project)

Project objective and overview:

I was very fortunate to be working with Dr. Heather Kitada-Smalley during my undergraduate experience at Willamette University. Her goal was to analyze public opinion of the controversy on adding a citizenship question to the 2020 census. The motivation behind the project was mostly rooted in my professor's personal interest in the topic, and I was excited to be working with a random sample of the massive dataset from X (formerly Twitter). The dataset took up 10 GB of space, so I had to work on external storage.

The premise of the controversy is that advocates assert that a citizenship question on the census would benefit the legitimacy of the then-upcoming 2020 Census which would have been critical to political redistricting and, in effect, the outcome of the 2020 US presidential election. From the data, it seemed that proponents argued that the citizenship question could be used to distinguish people living in the United States with and without citizenship which could then be used to ensure that only citizens of the US who are registered to vote would be voting in the election.

The opposition argued that the citizenship question had no place as a question on the Census because the point of a census is to survey every member of a population. They ultimately make the case that the goal of the US Census should be to better understand relevant features of the country's population to better allocate resources to support cities and the residents in those cities, thus the question of one's citizenship would be irrelevant.

The dataset was compiled by scraping the X platform of all tweets using specific keywords that were ostensibly "census," "census bureau," "(Donald) Trump," "Wilbur Ross," "DOJ," and other words and names that were critical to the controversy. The particular scraping methodology was abstracted and is inferred based on my experience with the sample dataset. My primary responsibility involved populating the sample dataset with a human-set ground truth as to whether or not a given tweet was specifically referring to the US 2020 Census as opposed to, say, a census of animals for some wildlife preservation purposes.

This page and the deliverable are being created asynchronously.

Project outline
Feature Scope Milestones and Dates Status Roadblocks Deliverable Notes
Feature Scope Milestones and Dates Status Roadblocks Deliverable Notes


Feature matrix
Ideas Backlog Priority Work in Progress Completed