Research Division Reflections

My first few weeks as a Digital History Fellow at RRCHNM have been both an amazing experience and a complete challenge.  Before I began my PhD program, I didn’t really understand digital history and I wasn’t quite sure what I would be doing during the first year of my fellowship.  I had a hunch that I would learn some computer programing, do some blogging, and for some crazy reason thought that I would be digitizing historical documents.  However, my first few weeks in Research taught me that my ideas about digital history and the RRCHNM were a little off.

First, I’ve never had much experience with websites, blogs, etc.  For the 2012 Society for Military History conference, I managed a wordpress website for the program committee to rate and select papers and panels.  However, all I was asked to do was upload posts, pages, and rating systems.  Most of what I was doing was simple copy and paste.  When I heard we would be working with PressForward and Digital Humanities now, I was excited because I would have a leg up on understanding basic components of a wordpress site—not so much.  For the first few days my colleagues Jordan, Alyssa, and I kept looking at each other with complete confusion.  There were so many acronyms and lingo that we’d never heard and jumping into the digital world made me worried that I wouldn’t be able to figure everything out.

However, the Research Division never let us slip and I am so grateful everyone on the team knew that I walked into the fellowship with very little experience.  We were so lucky to have Mandy, Amanda, and Stephanie sit down with us to explain the components of PressForward  (and for me what an RSS Feed even meant) and Digital Humanities Now.  Without a doubt, working on Digital Humanities Now was my favorite part of the last four weeks.  Having the opportunity to be an editor at large and select dozens of articles collected by PressForward made me feel like I was living in a digital world, but kept me in my comfort zone—i.e.  intense sessions of reading.   We got about two or three days to look through PressForward, mining for different articles that would be worthy of a front page spot on Digital Humanities Now, and then had the opportunity pick which articles were chosen for that week.  Mandy showed us how to use wordpress and publish the top stories, as well as setting up Twitter to tweet out our selections at different times throughout the day.

Learning PressForward and what goes into updating Digital Humanities Now was challenging, but fun because after a couple of mistakes playing in sandbox, the material started to click and I could then recognize and regurgitate steps.  However, the bottom kind of dropped out when we were told our next assignment was to work with the Programing Historian website and learn the ins and outs of Python and Zotero.  I started having problems almost immediately because of my “awesome” PC.  All of the digital history fellows, young and old, have Macs—I do not.  I had to download different programs than Jordan and Alyssa, such as Text Wrangler, and any time I a problem would occur not many people in the room could figure out what was going on to help me.  After about a day we had things up and going and I starting the Programing Historian tutorials.  Like I said before, I’ve never worked with programing and I was excited to start these tutorials because “Programing Historian” sounds like a step-by-step guide for those in the humanities who have never used a computer for more than research, Facebook, and e-mail.  This was not necessarily the case.  While the first four lessons were simple enough, I felt like I started fighting the tutorials and the tutorials started fighting me.  After eight years of higher education in the liberal arts, I’ve been trained to question everything.  Why does a “/” go in front of this phrase?  Why do I have to have a variable?  Why do I have to set up a string?  I wasn’t questioning programing to be a big liberal arts jerk, I honestly just wanted to know why.  I’ve been trained for almost a decade to understand how and why patterns work and then after that it sticks in my head and it becomes a natural reflex.  The problem I faced was not with the programing, but with letting go of having only one very specific way of learning a new tool.  Jordan told me quite a few times that there doesn’t have to be a reason for the amount of spaces and slashes in programing—you just do it.  Once I stopped fighting with myself, I finally started learning how to write code.

Once I was past my natural reflex of being stubborn, I started having a few more problems.  Programing Historian for Microsoft really has a tendency to just “run away with itself.”  I kept getting multiple errors and could never get further than the sixth tutorial because no one really around me understood why my computer was being crazy and Programming Historian doesn’t show what to do if you have common errors.  I knew there were free coding tutorials online, but very few offered lessons in Python.  Jordan suggested that I use Code Academy and within minutes I was on a tutorial page and learning code at double the speed of Programing Historian.  I was doing well with Code Academy and even got to yell out to the fellow table every time I collected a new badge.  However, within about two days, Jordan and Alyssa were on entirely different places in their Programing Historian tutorials than I was in Code Academy and I could no longer ask them questions about how to do “this” and “that” and they couldn’t really ask me anything either.

I appreciate Programing Historian because it taught two out of three people how to code in Python.  It’s not that we didn’t like each other; it’s just that our relationship really wasn’t working out and Programing Historian and I decided to see other people.

Challenges are a huge part of a PhD program and I knew I would have them starting out in a field of history in which I had very little experience. I’m sad that my time in Research is over because I was just getting the hang of Python and I really want to explore more options and how to manipulate Zotero for my personal research needs.  Everyone in Research was constantly at my side, making sure that I had all of the tools I needed to learn PressForward, Digital Humanities Now, Programing Historian, and Zotero.  I can honestly say that I am glad the first part of my fellowship was spent in the Research division and I’m excited at the possibility of working with them once again.

My rotation through the Research Division

I am not sure what I was expecting when the first year fellows were assigned to the Research division. I came with a preconceived notion of what Digital History research is and what historians do with it. It turned out that the scope of my understanding was actually quite limited. My time in the division has taught me a lot about the vast applications and possibilities of Digital History. We (the first year fellows) were given chances to get our hands dirty and it proved very rewarding. Sadly, this blog post marks the end of our rotation through the Research division.

Our first assignment was to PressForward. We started from the ground up by familiarizing ourselves with the project. We installed the plugin on the sandbox server and got to bang around on it. We explored the PressForward.org site as well as the digitalhumanitiesnow.org site. I must admit that my initial reaction was that PressForward was a glorified RSS reader with some added features of promoting articles. I use Feedly (a RSS reader) on my phone to follow various history blogs and I, at first, did not see a big difference between the two. It wasn’t until someone explained “gray literature” that the full purpose of PressForward came into view. Until that point, I had been ignorant to the issue of online scholarship. The PressForward site explains “gray literature” to be “conference papers, white papers, reports, scholarly blogs, and digital projects.” Online scholarship is being under-appreciated and forgotten in a discipline that has focused so heavily for so long on printed material.  My assignment as an Editor-at-Large and then as an Editor-in-Chief brought this issue into focus for me

Working as an Editor-at-Large and Editor-in-Chief really solidified the importance of PressForward. As an Editors-at-Large, we worked through the live feed of articles and websites coming into Digital Humanities Now. I learned that it can be labor intensive to sift through the various websites and articles to find important, relevant material. It is not always easy to find the scholarship and pertinent information. I also learned first hand about the limitations of the software. On a couple of occasions I fell victim to the browser’s back button instead of closing a window. I then found myself back at the beginning of the feed instead of where I was before I had clicked on the article. After shadowing Amanda and Mandy when they were Editors-in-Chief, the first year fellows were able to make decisions on what would be published to DH Now. It was a very fun experience that helped me begin to grasp the extent of online scholarship and publishing. In addition, reading through the articles helped us to be informed an the various projects in the field. I even found articles that did not qualify for DH Now but were of interest to me. I bookmarked more than a handful that I wanted to return to later.

The second assignment was Programming Historian. While our time in PressForward gave us an overview of one of the projects, Programming Historian introduced us to the “nuts and bolts” of the division. It was here that my experience differed from Alyssa’s and Stephanie’s experiences. I came into this program with a background in computer programming. While I am not a computer science “person” I did take classes, during my undergraduate, on C#, HTML, CSS, and Javascript. I struggled at first with the syntax of Python but my background in programming proved very helpful in picking up the language and quickly moving through the lessons. However, I found the lessons to be more focused  on the task of the program (manipulating strings, working with web pages etc.) than learning the language itself. I think it would be beneficial to those without programming experience to work through the Python lessons at Codecademy before starting the Programming Historian lessons. I found the lessons to be very interesting and fun to do. I am excited to use these programs, such as frequency counts and n-grams, in my own research.

The final part of Programming Historian were the lessons on APIs, more specifically the Zotero API. I had never used Zotero so these lessons introduced me to both Zotero and the Zotero API. Before I began the lessons I played around on Zotero, starting my own library and learning to love the program. From the beginning, I wanted to use my personal library in the lessons and not the sample one provided. By doing this, Spencer and I found a problem in the lessons when my program couldn’t access my library. Alyssa has since reported it and a problem she had to GitHub. After finishing the API lessons, I wanted to do things that the lessons did not delve into. With help from Spencer I was able to bang around on the API in an attempt to add/edit the author field of an item. While we did not find a solution we did make headway and it really piqued my interest in working on the Zotero API.

I am leaving a much improved Digital Historian. The Research division had to help the first year fellows through a learning curve that, in some ways, Education and Public Projects don’t. We now know the Center and feel comfortable in it.We got our feet wet and our hands dirty. The Research division was a great place to do that.

Research Division Reflection

It’s hard to believe that the first year fellows have already completed our first rotation within a division. I was nervous to begin the fellowship in the Research Division, since I’m not super-technical (I was rightly told that I can no longer claim to not be a “technology person”), but I have had quite a learning experience. I learned new skills – I can now effectively explain to someone what a plugin actually does and how it works – and went out of my comfort zone in learning Python.

In our first week, we began with PressForward. After playing around with the sandbox site, I installed the PressForward plugin onto my dev site to get a better handle of how it worked. Once I was more comfortable with the logistics of the plugin I moved on to working as an editor-at-large of Digital Humanities Now. It was incredibly interesting to see how the plugin can be used for academic purposes and how it aggregates and organizes content. I was astounded by the quantity of content that was part of the all content feed, especially since a disproportionate amount of the posts were not related to digital humanities.

In our second week, we shadowed Tuesday’s editors-in-chief, Amanda and Mandy, and watched them go through the process of examining the articles under review and deciding which pieces should be published. Prior to Thursday, I familiarized myself with the editors-at-large corner and read several editors’ choice articles. I especially enjoyed reading “Thoughts on feminism, digital humanities and women’s history,” since my area of research is women and gender. On Thursday we were editors-in-chief, which was such a fun experience.

It was beneficial to begin work with PressForward from the ground up. We started with the sandbox, moved on to seeing how the plugin worked for DH Now, and then used the plugin to publish an issue of DH Now. It is a fantastic tool for disseminating often overlooked material to a wide audience and for collecting and curating information. Overall, I had a positive experience with PressForward and DH Now.

After PressForward, we started learning Python through the Programming Historian lessons. I had minimal experience using HTML, CSS, and XML to create a website from scratch when I was in library school, but programming is not something I am comfortable with. At first Programming Historian was fairly easy and the first few lessons seemed straight-forward, but once I got past the “Manipulating Strings in Python” I started to feel lost. After completing those lessons I moved onto the Zotero API lessons. These were more difficult for me to comprehend, especially since, as Stephanie pointed out, they are not in layman’s terms. With help from Jordan and Spencer, I was able to get through the lessons using the sample Zotero library.

I cultivated my own Zotero library and then went back through the API lessons using it instead of the sample in order to see how much of the lessons I could understand on my own. I was successfully able to get through the first two lessons, which was very exciting. I ran into some problems with the third lesson when Text Wrangler was not reading the URLs from the first two items in my library. It was working when I used the sample library because the URLs are links to simple HTML pages, but the links in my library are linked to more complicated sites, such as the source’s record in EBSCO. Jordan had discovered another problem earlier with the user and group tags, and I went into GitHub and reported both of our problems. I am excited to see how I will use Python in the future with other digital humanities projects.

It was an illuminating contrast to work with both PressForward and Python and to see how the latter influences the former. I can understand why we began in the Research Division since the technical skills we learned are necessary in order to have a solid foundation and understanding of digital history.

THATCamp Mallet Results

We have spent the last few weeks working to build a python script that would allow us to download and prep all of the THATCamp blog posts for topic modeling in MALLET (for those catching up, we detailed this process in a series of previous posts). As our last post detailed, we encountered a few more complications than expected due to foreign languages in the corpus of the text.  After some discussion, we worked through these issues and were able to add stoplists to the script for German, French, and Spanish.  Although this didn’t solve all of our issues and some terms do still show up (we didn’t realize there was Dutch too), it led to some interesting discussion about the methodology behind topic modeling.  Finally we were able to rerun the python script with the new stopwords and then feed this new data into MALLET.

Continue reading

Unexpected Challenges Result in Important and Informative Discussions: a transparent discussion about stripping content and stopwords

As described in previous posts, the first year Digital Fellows at CHNM have been working on a project under the Research division that involves collecting, cleaning, and analyzing data from a corpus of THATCamp content. Having overcome the hurdles of writing some python script and using MySQL to grab content from tables in the backend of a WordPress install, we moved on to the relatively straightforward process of running our stripped text files through MALLET.

As we opened the MALLET output files, excited to see the topic models it produced, we were confronted with a problem we didn’t reasonably anticipate and this turned into a rather important discussion about data and meaning.

Continue reading

Pre-processing Text for MALLET

In our previous post, we described the process of writing a python script that pulled from the THATCamp MySQL Database. In this post, we will continue with this project and work to clean up the data we’ve collected and prepare it for some analysis. This process is known as “pre-processing”. After running our script in the THATCamp database all of the posts were collected and saved as text files. At this stage, the files are filled with extraneous information relating to the structure of the posts. Most of these are tags and metadata that would disrupt any attempts to look across the dataset. Our task here was to clean them up so they could be fed into MALLET. In order to do this, we needed to strip the html tags, remove punctuation, and remove common stopwords. To do this, we used chunks of code from the Programming Historian’s lesson on text analysis with python and modified the code to work with the files we had already downloaded.

Continue reading

Extracting Data from the THATCamp Database Using Python and MySQL

This week we’ve continued to work on building a python script that will extract all of the blog posts from the various THATCamp websites. As Jannelle described last week, our goal was to write a script that downloads the blog posts in plain text form and strips all of the html tags, stopwords, and punctuation so that we can feed it into MALLET for topic modeling and text analysis. After several long days and a lot of help from second year fellow Spencer Roberts, we’ve successfully gotten the code to work.

Continue reading

Spring Semester in Research and a THATCamp Challenge

The spring semester is here and the first year DH fellows have begun our rotation into the Research division of CHNM.

To get the ball rolling, we spent a week working through the helpful tutorials at the Programming Historian. As someone new to DH, with admittedly limited technical skill and knowledge, these were immeasurably useful. Each tutorial breaks content into smaller, less intimidating units. These can be completed in succession or selected for a particular topic or skill. While there is useful content for anyone, we focused our attention on Python and Topic Modeling with the aim of solving our own programming dilemma.

Our central challenge was to extract content across the THATCamp WordPress site to enable us to do some text analysis.

Continue reading