Final Project

AskHistorians Visualization Project

AskHistorians is leading the charge in the online space with community engagement between the public and historians. Over 1.5 million users frequent the community to ask over 4,000 questions a month from a panel of experts and knowledgeable users. This project aimed to discover the trends of questions asked by the community to see the most popular topics and their change over time. I looked at a six month span from July to December, 2018 and only at threads that were not deleted.

The Dataset and Methods

The data set was a scrapped database of thread data of the subreddit. The data set was provided to me after asking the de facto Head Moderator “Georgy_K_Zhukov,” a lecturer at George Washington University, for the data set. The data set was compiled by “Terminus-Trantor” another moderator of the subreddit. The data set was 27 separate files with multiple sheets each. Each file was a set month during the 2016-2018 period. Included in each is a summary of the month, the thread data, user activity, mod activity, gold list, and script logs.

The relevant sheet is the thread data for this project which includes the data and time of the thread, the thread title or question, the score of upvotes/downvotes on the thread, the text of thread submitted by the author, number of comments, if the thread was removed/deleted, and metadata information about the comments made on the thread. The thread title and the thread text was the most relevant column.

To extract the the topic of the thread I used Named Entity Recognition software. Specifically I used Dandelion API who were gracious enough to provide an educational key to use for this project. This API was able to be used within OpenRefine. To provide the best possible matches for Dandelion API I concatenated the individual thread titles and any thread text into a single column and then ran the named entity software. Although an option was to include the comment text, I felt that it could provide incorrect detections on the topics and with Dandelion the limited amount of text from the combined thread title and text was enough to provide sufficient hits for the Dandelion API.

I used OpenRefine to clean the resulting data as well as a way to bin the topics. To provide the most amount of clarity I consolidated different names of countries to a single name. For example, Britain and the United Kingdom remained simply as the “Britain.” The exceptions to this rule was “Nazi Germany” and the “Soviet Union” both of which have significant historical differences to “Germany” and “Russia” respectively. Furthermore I hand binned topics through OpenRefine. The most major of these topics was the “Military Science” topic which did not exist before. I added a topic to any threads referencing warfare, military technology, units, and weapons that became the catch-all term of military science.

Conclusions

The History Channel before devolving into pawn stores and a man slowly being abducted by aliens was chided for being the World War II channel. This project has shown that the public really does love its World War II history. Adding together “Nazi Germany” and “World War II” there is as many topics as the entirety of American history questions, discounting the fact that many American history questions are also World War II questions too. Looking at this topic count The History Channel’s old programming makes far more sense.

The criticisms leveled at the History Channel could then be leveled back at the viewers or the community of AskHistorians. With such a fascination with World War II, Nazis, Rome, and “Great Men” the lesser known or the periphery becomes sidelined. Women are amazingly underrepresented, even women within leadership roles. Oliver Cromwell is asked more about than Elizabeth I or Elizabeth II. Catherine the Great is asked about 100% less than Peter the Great. This very well could be tied in with AskHistorians gender distribution. Back in 2016 the moderators ran a census of the community and the results were that 81% of the community was male, majority being young, and American. Unfortunately there was not a census done closer to the period for the current data, although it seems unlikely that the numbers would have shifted to any considerable degree as the previous two censuses had similar numbers, 84% and 85% respectively.

The lack of diversity is evident when staring at the visualization. The moderation team has spent time trying to increase the breadth of involvement beyond World War II, Nazis, and War, however as the visualization has shown that the changes are not sticking. It also delves into a fundamental question about public history as a whole. Can we as historians change what kind of questions people ask and by doing so will we alienate the public? As tiring as it is to read another question that asks “What Hitler thought about…” if that is what the public is interested in, should we discourage those questions? The answer is not straight forward, not an easy one and one the community has wrestled with in different forms throughout the years. Increasing diversity of the community will lead to more diverse questions. The sad fact is, that the visualization has shown that people ask questions about the most popular subjects. That the community asks not about what they do not know, but what they do know.

The first major idea was to increase diversity of the community in the hope that it will then lead to more diverse questions. The second has shown evidence of working within the visualization. The Weekly Features has seen a positive impact on questions being asked about less popular topics. For example on the week of September 9th, 2018 topics relating to the Soviet Union saw an uptick. Followed by a corresponding uptick for China the next week. Both of these correspond to a weekly feature campaign run by the moderators that promote questions on the subject. These methods to promote less popular topics seem to still retain an organic nature, but promote thinking outside what the community normally asks. There is also some organic rising and falling in less popular topics such as surrounding November 9th, 2018 and World War I. The rise here is in response to the centenary of Victory Day for the war.

The sad fact is, that the visualization has shown that people ask questions about the most popular subjects. That the community asks not about what they do not know, but what they do know. Unless coaxed by the moderators or experts to think outside the confines of Europe or America, the users will gladly ask about Hitler’s favorite color. The rise of Middle Ages questions could be conveniently tied to the popularity of Game of Thrones. With that in mind, I guess we can hope that the next HBO show is about a Mayan women. Or that Spielberg does a movie on the Edo period.