Final Proposals

International Pirate Trials

International Pirate Trials

Data Critique

           This data set is created from the Law Library of Congress’ collection of trial records on piracy ( The documents range between 1696-1905. As it is still in creation (by me), the specifics of what is recorded is subject to change. Currently, the dataset identifies: the name of the person on trial, the date and location of the trial, where they are from (if known), the charges against them and the outcome of the trial. It also includes the ship, if relevant and any extra notes that seem pertinent.

As these are court records, they record how piracy was dealt with in the courts in different countries.  They give detailed accounts of the crimes and trial proceedings. They will not describe how pirates organized or even necessarily where they came from. It will also not explain why some were brought to trial and others were not and if not, what happened to them (i.e. were they just allowed to reintegrate into society?). One thing that has stood out is that, so far, most were or had previously been privateers. One main challenge is that a handful of documents are written in a language other than English. I have yet to decide how to handle these records.

Data Cleaning

            This will probably not be necessary as I am the one creating it. I try hard to keep things uniform. Of course, things do change as I go through documents so I may have some reordering or clean up when I am finished.


            Rossignol, K. (2012). Pirate Trials: Dastardly Deeds & Last Words. CreateSpace Independent Publishing Platform

Cordingly, D. (2006). Under the Black Flag: The Romance and the Reality of Life Among the Pirates. Random House


            As this dataset is still in creation these initial questions may change. As these are legal documents, there is a lot here worth exploring about how different countries tried piracy, how they decided jurisdiction and how these ideas changed over time. Some questions: How many people were actually convicted of piracy? Were more people likely to be sentenced to death in different countries or at different times? How does privateering and nations at war relate to piracy?

Data Visualizations

            1: A map depicting where each trial took place and potentially linking to the corresponding documents on the Law Library’s website.

            2: Graph showing how many tried for piracy began as privateers

           3:  Maybe a graph showing the outcome of trials, i.e. death, prison, etc.

            4: Since I have all the documents handy and OCR’d it might be interesting to do a word analysis and include that if it adds to the overall argument.


           4/2       Finish transcription.

            4/8       Clean data (if necessary) + Word analysis

                        Start planning graphs. Maps and texts

                        Gather additional data

4/9       Rough draft due

4/16     Create Map

          4/23     Work on Text

                        Create graphs

           5/16   Due

My timeline is vague. I am concerned I have created too much work for myself by also creating the dataset. I am excited about it though and hope it will all come together in the end.

One Comment

  • Maeve Kane

    I mentioned this in class, but you should also think about doing a comparison against Old Bailey robbery trials to give some context for the piracy convictions.

    If you want to do topic modeling, start thinking about the units of analysis in your text: can you break them down into paragraphs, chapters, etc, so that you can get a finer grained analysis of the topics. If there are OCR errors and they’re fairly regular, that can be corrected, but you should do a spot check on the OCR quality by copying a couple of docs to a .txt file before committing to topic modeling. If you do decide to commit, the file should be formatted as:

    year-docID {tab} [text]

    where year is a numeric year, and docID is some identifier for each different trial that has a numeric ID following (so like 1822-Heamann1, 1822-Heamann2, etc) The file should then be saved as a .tsv and given to that topic modeling site we used, and you can use the same stopwords list.