The Census records dataset contains information on sex, age, race, birthplace, mother’s and father’s birthplace, literacy, occupation, and relationship between a person and the head of household if one exists. The US Census Bureau categorizes all people who are not in housing units as living in group quarters. Examples of group quarters include prisons, missions, and college housing. If a person on the census is not part of a household, their group quarter is listed. Information for these are the type, funding, and details of the group quarter.
Geographically, this dataset covers the city of Albany, New York. It contains census records for 90 years and contains data for at least 708,000 people. A majority of the data was collected by enumerators who visited residential and group quarter locations, and asked questions of the people living there. The original sources were the people questioned as well as the enumerators, as some determinations were made by the enumerator, such as race. This dataset does not contain any data from the 1060, 1870, and 1890 censuses. The city boundaries for Albany have also changed over time, so different people would have been included or excluded depending on the year. It’s also likely that the same individuals have shown up in multiple census if they lived in Albany for a long period of time.
Anticipated Data Cleaning
This dataset appears to be very clean already. A preliminary look only shows a need to facet certain occupations that are related into fields. A more in-depth look while using Open Refine could also reveal more data that needs to be cleaned up, so this is not the extent of the data cleaning necessary.
One source of secondary information is the US Census website, which can provide more detailed information on the process of census taking for each year, as well as provide historical context.
Anticipated Research Questions
Possible research questions for this involve seeing if there is any correlation between birthplace and other categories such as occupation, literacy, family size, and whether the individual is in a household or a group quarter. This project will focus on changes in birthplace over time which suggest a rise or fall in immigration, as well as provide context to the changes.
Goals for Data Visualization
One visual already planned is a map of birthplaces. Ideally, the reader will be able to interact with it by selecting different census years in order to see the differences between them. I’d also like to include a visualization to show different correlations between categories if they exist. For this, I’m thinking of a chart of some kind, however that may change.
Timeline of Project Milestones
- March 26 – April 3: Data cleaning using OpenRefine
- April 3 – 9: Final Project wireframe
- April 11: Feedback form
- April 11 – 16: Review feedback and tweak as much or as little as necessary, work on refining the project and creating solid ideas on how to work with the data.
- April 16: Block out a whole day to just sit and stare at all the open project files on my computer. Cry, maybe multiple times.
- April 16 – 23: Start with Tableau to help work on visualizations
- April 18 – 25: Experiment with Gephi to see if I can get any good visualizations with it.
- April 25 – May 15: Review feedback and refine the project to it’s final form.
- May 16: Submit the Final project