Reflections on the Spring Semester and Year 1 as a Digital History Fellow

It seems like just yesterday we walked into the Center for History and New Media a bit unsure about what our first year as DH fellows would entail. Looking back it has been an extremely rewarding and valuable experience. Last fall we blogged about our rotations in both the Education and Public Projects divisions. In the Spring we moved to Research for seven weeks where we worked on a programming project for THATCamp and on the PressForward project before moving onto a seminar about the history of CHNM. I want to use this blog post to reflect on the spring semester and look back at the year as a whole.

Our first stop during the spring semester was the Research division. We began our seven weeks by taking on a topic modeling project which aimed to mine all the posts from the THATCamp individual websites and blog about the process. As we used the Programming Historian to learn python (or at least attempt to), we thought a lot about tools and the scholarly research process. We discussed Zotero as a tool and the values and community behind THATCamp as a training network and community for the Digital Humanities. Although we struggled with the programming aspect of this assignment and managed to miss important concepts behind Topic Modeling, the assignment gave us some insight into what kinds of challenges and opportunities topic modeling holds. From this project I learned first hand the importance of understanding the black box behind Digital Humanities tools. After finishing with our topic modeling project we moved onto the PressForward project. We spent a week working as Editors-at-Large and helped second year fellow Amanda Morton with her Editor-in-Chief duties. Thinking about scholarly gray literature and measuring reception of scholarly works on the internet we also spent time researching AltMetrics.

At the end of the three rotations we were left with a very clear understanding of each division, its current and past projects, the audiences it creates for and the overlap between each division. We then began a seminar with Stephen Robertson that explored the history of RRCHNM. In this seminar we tried to understand how RRCHNM developed over the years into its current state and how RRCHNM fits into the larger history of the digital humanities. Beginning with an overview of what a Digital Humanities Center is and how its defined, we collaboratively looked at all 150 centers in the United States and tried to get a sense of the different models that exist and just how many actually fit the definition of a digital humanities “center” as defined by Zurich. What we realized is that the Center for History and New Media stands out from other Digital Humanities centers due to its unique attachment to the History Department but also because of the origins of the center and because of Roy Rosenzweig’s vision.

After we defined just what a center was and looked at the different models, we started to look at the origins of RRCHNM and try to create a genealogy of the different projects and trace the development of the center. Each of the first year fellows took a different major project and traced its history through grant documents and reports. I read up on Zotero in its different iterations and learned a lot about how Zotero was originally conceived as well as how it has grown, expanded, and changed since 2004.

I think one of the things that has been immensely useful for the first year fellows is the ways much of our work at the center was paralleled by our coursework. In the PhD program at GMU we’re required to take a two course sequence in digital history. The first sequence focuses on the theory of Digital History and the second is largely a web design course that introduces us to the basics of HTML and CSS. Often times the topics in Clio I related directly to why we were doing at the center and the dual exposure allowed us to see the application of things we had discussed in Clio first hand.

At the suggestion of Spencer Roberts, the fellows decided to begin a Digital History Support Space in the Fall. The support space offers “advice, guidance, and assistance for students doing digital history projects.”  Every Monday from noon to 5pm (and sometimes even on weekends) we met with students taking the Clio courses, offered advice about and brainstormed potential projects, helped to debug code, and offered a space to work where help was available if needed. We were able to draw on experience from the center and offer advice about what kinds of tools are available and where resources might be found. We weren’t experts but working with the other students in our Clio classes was equally beneficial. It left me with a better understanding of the issues, topics, and tools discussed in our classes. As many of the PhD students move onto Clio III: Programming for Historians with Lincoln Mullen this fall, I’m looking forward to continuing the Support Space.

The fellowship has been structured in such a way that each element has built on itself to provide us with experience and an understanding of digital history, digital humanities, and the debates, methodologies, and histories of the discipline. This fall I’ll be working in the Research Division on the PressForward project and helping to manage both Digital Humanities Now and the Journal of Digital Humanities. Our first year as Fellows has gone by extremely fast but I’m looking forward to beginning a new year and moving into the role of mentor to the new group of DH Fellows.

THATCamp Mallet Results

We have spent the last few weeks working to build a python script that would allow us to download and prep all of the THATCamp blog posts for topic modeling in MALLET (for those catching up, we detailed this process in a series of previous posts). As our last post detailed, we encountered a few more complications than expected due to foreign languages in the corpus of the text.  After some discussion, we worked through these issues and were able to add stoplists to the script for German, French, and Spanish.  Although this didn’t solve all of our issues and some terms do still show up (we didn’t realize there was Dutch too), it led to some interesting discussion about the methodology behind topic modeling.  Finally we were able to rerun the python script with the new stopwords and then feed this new data into MALLET.

Continue reading

Unexpected Challenges Result in Important and Informative Discussions: a transparent discussion about stripping content and stopwords

As described in previous posts, the first year Digital Fellows at CHNM have been working on a project under the Research division that involves collecting, cleaning, and analyzing data from a corpus of THATCamp content. Having overcome the hurdles of writing some python script and using MySQL to grab content from tables in the backend of a WordPress install, we moved on to the relatively straightforward process of running our stripped text files through MALLET.

As we opened the MALLET output files, excited to see the topic models it produced, we were confronted with a problem we didn’t reasonably anticipate and this turned into a rather important discussion about data and meaning.

Continue reading

Pre-processing Text for MALLET

In our previous post, we described the process of writing a python script that pulled from the THATCamp MySQL Database. In this post, we will continue with this project and work to clean up the data we’ve collected and prepare it for some analysis. This process is known as “pre-processing”. After running our script in the THATCamp database all of the posts were collected and saved as text files. At this stage, the files are filled with extraneous information relating to the structure of the posts. Most of these are tags and metadata that would disrupt any attempts to look across the dataset. Our task here was to clean them up so they could be fed into MALLET. In order to do this, we needed to strip the html tags, remove punctuation, and remove common stopwords. To do this, we used chunks of code from the Programming Historian’s lesson on text analysis with python and modified the code to work with the files we had already downloaded.

Continue reading

Extracting Data from the THATCamp Database Using Python and MySQL

This week we’ve continued to work on building a python script that will extract all of the blog posts from the various THATCamp websites. As Jannelle described last week, our goal was to write a script that downloads the blog posts in plain text form and strips all of the html tags, stopwords, and punctuation so that we can feed it into MALLET for topic modeling and text analysis. After several long days and a lot of help from second year fellow Spencer Roberts, we’ve successfully gotten the code to work.

Continue reading

Spring Semester in Research and a THATCamp Challenge

The spring semester is here and the first year DH fellows have begun our rotation into the Research division of CHNM.

To get the ball rolling, we spent a week working through the helpful tutorials at the Programming Historian. As someone new to DH, with admittedly limited technical skill and knowledge, these were immeasurably useful. Each tutorial breaks content into smaller, less intimidating units. These can be completed in succession or selected for a particular topic or skill. While there is useful content for anyone, we focused our attention on Python and Topic Modeling with the aim of solving our own programming dilemma.

Our central challenge was to extract content across the THATCamp WordPress site to enable us to do some text analysis.

Continue reading