Final Proposals

Final Proposal or: How I Learned to Stop Worrying and Love the Data Set

Data Critique

University of Pennsylvania Museum of Archaeology and Anthropology – Online Collections

According to the museum’s website, “the online database contains over 379,499 object records representing over 1,141, 045 objects (or components of objects) with 259,376 images illustrating 84,360 object records.” Essentially, the online database has over 300,000 objects that represent all objects at the museum, and about 259,000 images illustrate about 84,000 of the 300,000 object records. The entire data set available through the museum includes all 379,499 object records.

Information in this data set was possibly pulled from two sources, the online database and the museum’s files on each object. The open source online database fields reflect the majority of the fields in the dataset including object name, title, and number, culture, Provenience, site, cultural area, measurements, and materials (wood, stone, etc). The data set also includes accession credit, dates made, and the emulRN. The emulRN column is not reflected anywhere on the online database, this number may be from another database used by the museum that is not open to the public.

Looking closely at one data set, Early Modern America, as an example of data available we can begin to ask questions about the objects origins. All data sets include “native_name” and “culture” columns. However, in the Early Moden America data set, both columns are missing data for almost all of the objects, only 19 objects have native names and 62 objects are associated with a specific culture. While the rest of the objects may not have either of these identifications, it does not mean the origins of the objects are not from the Americas. Looking at the “culture_area” column, all of the objects are associated with a region in North America with over 300 being from the Greater Southwest.

While we can glean from the data set general area’s an object originated from, we cannot narrow down the culture an object represents. However, with the “accession_credit_line” column, which includes donor information, we can do a little more research on what type of objects specific donor gifts to the museum.

Anticipated Data Cleaning Using OpenRefine

  • Clean “object_name”, “Provenience”, “Material”
  • Keep blanks
  • Split  “accession_credit_line” to “Gift/Purchase”, “Name”, “Date”
  • Delete “date_made_early” and “date_made_late”


Dyson, Mary C., and Kevin Moran, “Informing the Design of Web Interfaces to Museum Collections” Museum Management and Curatorship 18, no. 4 (2000): 391-406.

Turner, Hannah, “The Computerization of Material Culture Catalogues: Objects and Infrastructure in the Smithsonian Institution’s Department of Anthropology” Museum Anthropology 39 no. 2 (2016): 163-177

Research Questions

  • In the data set, how are objects categorized?
  • Is there a correlation between the amount of metadata about an object, how the object was gifted, and when/where the object is from?
  • Is there a bias on what types of objects are collected/donated/gifted?
  • What terms are used to describe similar objects? Terms to describe unique objects?
  • What objects have the most metadata and why? What objects have the least metadata and why?

Goals for Data Visualization

  • Heat map or a map to show frequency and location
  • Text analysis to show frequency or usage of terms
  • Picture slide bar- either to show landscape or image available in online database verse drawing based on the available description
  • Timeline
  • Pop up of the context of an object’s collection history


  • March 26: Ask Prof. Kane to approve my half-baked idea and use class time to create a better milestone/timeline for my project. Begin cleaning up the data set and pray my laptop will cooperate.
  • March 30-31: Clean data set and second guess my project for the first time. Complete Infographic assignment.
  • April 2: Use class time to continue cleaning data and get feedback from peers and Prof. Kane. Work on Twine assignment.
  • April 6-7: Scramble to create a rough draft, complete final project wireframe assignment and second guess my project for the second time.
  • April 9: Use class time to get feedback from peers and Prof. Kane. Rethink project milestones.
  • April 11: Use class time to work on the project.
  • April April 13-14: Work on Text Analysis and think “wow, I got this!”
  • April 16: Use class time to work on the project and receive feedback from peers and Prof. Kane. Think “Yay! I’m a year older!” then cry in my birthday wine once I get home.
  • April 20-21: “Play” with maps. By “play” I mean to mess it up a couple of times, cry some more, then pull myself together and figure it out.
  • April 23: Use class time to work on the project and receive feedback from peers and Prof. Kane.
  • April 25: Use class time to work on the project and receive feedback from peers and Prof. Kane.
  • April 26: Take a step back to really think about what the end product will look like, then decide on the next data visualizations.
  • April 27: Analysis context of objects collection history and probably more crying and second-guessing myself for the last time.
  • April 28: Work on other data visualizations
  • April 30: Use class time to work on the project and receive feedback from peers and Prof. Kane.
  • May 3: Pulling all the data visualizations together and creating a cohesive project.
  • May 7: Use class time to work on the project and receive feedback from peers and Prof. Kane. Cry because this is the last time I can receive feedback.
  • May 11-14: Cry as I scramble to complete my Comps Exam and ignore everything else.
  • May 14-15: Clean up any last minute stuff and edit.
  • May 16: Submit project. Graduate. Hang out with family. Cry for the last time. Sleep.
The level of sleep I hope to achieve on May 16 as shown by Oscar.

One Comment

  • Maeve Kane

    What do you mean by unique objects? How are you going to group “similar” objects? Be careful with the last metadata question–the completeness of the metadata may only be determined by how many staff were on hand when something was accessioned.

    If you have donation date, it would be very interesting to see if there’s change over time in the geographic focus or temporal focus of the collection in line with eg, the Egyptian fad of the 1920s or similar.

    For your text analysis, you’ll need to narrow your question to one category or column for it to have meaning, so let’s talk more about this.