A Tale of Two Projects

In a week’s time, the semester and by extension the DH Fellowship will come to an end. As such, it is time for the end of the semester blog post. IN the time since my last blog post, I have had divided my time into two projects associated with Digital Humanities Now. The first project (Web Scraping) was focused on the content published by DHNow while the second (Web Mapping) focused on DHNow’s Editors-at-Large base.

Web Scraping

Over the years, Digital Humanities Now has published hundreds of Editor’s Choice pieces. For 2014 alone, roughly 165 Editor’s Choice articles from numerous authors were featured. Such a large corpus of documents provided a ready source of data about the publishing patterns of DHNow. In order to translate the documents into usable data we needed to format the Editor’s Choice articles into a usable format, namely machine-readable text. The task, then, was to go through each Editor’s Choice article and scrape the body text down into a .txt file. I had never scraped a website before, so this project was going to be a great learning opportunity.

I began the project by reading through the Beautiful Soup web scraping tutorial on Programming Historian by Jeri Wieringa. It uses a Python library called Beautiful Soup to go into a website and scrape the data. During my rotation in the Research Division last semester, the three first year Fellows had quickly worked through the Beautiful Soup tutorial but I needed a refresher. However, I made a switch from Python and to R. This change came from the suggestion of Amanda Regan who has experience using R. As she explained it, R is a statistical computing language and would be a better resource in analyzing the corpus of Editor’s Choices than Python. After downloading R Studio (a great IDE) and playing around with R, I found it to be a fairly intuitive language (more so for those who have some background in coding). I came to rely on Mandy and Lincoln Mullen when running into issues and they were both extremely helpful. Learning R was fun and it was also exciting as R is the primary language taught and used in the Clio Wired III course, which I plan on taking the next time it is offered.

In order to scrape the body text of each post, I relied on the class names of each html tags containing the text. I imported in a .csv file of all the Editor’s Choice articles and search each website for a specific class name. When found, R would scrape all the text found in that tag, place it in a .txt file whose name corresponds with the articles ID number. Finding the class name was a hang up, but I was able to use the Selector Gadget tool to expedite the process. It essentially makes your webpage’s css structure interactive allowing you to click on items to view their extent and class names. I learned a lot about website structures in while identifying each body text’s class name. In the end, I was able to scrape 150 of the 165 Editor’s Choice articles.

You can find my code on my Github account here.

Web Mapping

The second project I was fortunate to work on was displaying our Editor-at-Large spatially on a map. My undergraduate work is in Geographic Information Systems (GIS) so this project in part came out of my interests and prior experience. In association with this project I am writing two blog posts for the soon to be DHNow blog. The first will detail the process of developing and designing the map while the second will delve into what the map is “telling us.” For the sake of the Fellows blog, I will instead reflect on my experience in creating the Editors-at-Large map and will link to the other two blogs when they are published.

It had been almost a year since I devoted any real time to cartography. I decided to use the same model I went through in my undergraduate capstone class on web mapping. To being with, I needed a dataset that I could use on the web. During my undergraduate, I used ArcGIS to convert a .csv into a geoJSON file that could be used on the web. However, since coming to GMU and the Center, I have embraced Open Source (both by choice and by financial force) and instead relied on Quantum GIS (QGIS). I had no real experience with QGIS so this project provided me an opportunity to become familiar with the QGIS platform. This was an added benefit that I both appreciated and enjoyed. In the end, converting to a geoJSON format was fairly straightforward.

To render the web map, I used Leaflet, which I was introduced to in my undergraduate coursework. While as an undergraduate, I found Leaflet somewhat difficult to use but this is probably because I was simultaneously learning HTML, CSS, and Javascript while working with Leaflet. Returning to Leaflet, my impression was how easy it was to use and its fairly intuitive design. I attribute this change in attitude to the training in and supportive nature of the Research Division as I was exposed to Python and other coding languages. In the end the map turned out pretty good and my work on the project has reignited my passion for cartography and all things spatial.

In the final days of the Fellowship, I feel both excited and melancholy. I am sad that the fellowship is coming to an end and I am moving out from the Center. It has been a wonderful experience working with great people on interesting and engaging projects. Yet, it is exciting to think back to myself on the first day of the Fellowship and realize how far I have come in my digital work.

Reflections on Spring Semester

This semester I’ve continued my work on the PressForward project in the Research division. Throughout the semester I’ve served as editor-in-chief, helped troubleshoot and test the latest version of the PressForward plugin for public release, and continued to develop my php and web development skills by working on the TurnKey PressForward WordPress theme. In addition to working on PressForward, I’ve helped out in the support space, organized a brown bag, and spent some time mentoring Stephanie Seal. My time in the Research division on PressForward has allowed me to develop my programming skills and further acquaint myself with the software development process. I’ve learned so much about programming in general over the last two years, but I’ve also gained valuable experience in things like UI/UX design principals and about the workflow for developing/maintaining an open source piece of software.

The PressForward All Content page in 3.5 features improved navigation, filtering, sorting, and searching.

The PressForward All Content page in 3.5 features improved navigation, filtering, sorting, and searching.

In March, PressForward released version 3.5 which included some significant User Interface(UI) and User Experience(UX) changes. This version was the result of several months of work by the PressForward team and included a redesigned toolbar in ‘Nominated’ and ‘Under Review’ and some reorganization of tools and options in the plugin. Throughout the first months of this semester, I attended development meetings, tested new features, and helped to rewrite our documentation based on the new features. Releasing a new version of the software is a big task as it involves updating all our documentation, screenshots, and descriptions of the plugin. 

Output of the Subscribed Feeds Shortcode in the PressForward TurnKey Theme

Output of the Subscribed Feeds Shortcode in the PressForward TurnKey Theme

Building the PressForward TurnKey Theme allowed me to apply a lot of the concepts I was picking up through bug-testing and in the weekly discussions with our developer Aram. For example, I helped to write a shortcode that displays a list of the subscribed feeds and aims to allows PressForward users to further expose the metadata collected by the plugin. We came up with this idea after realizing how many of DHNow’s feed were broken and how poor the metadata that is associated with the feeds often are. Attributing credit to posts we feature when the author is not clearly listed in the metadata is often difficult and problematic. The shortcode allows users to highlight the RSS metadata pulled in by the plugin by providing options for displaying both active and inactive feeds. We hope allowing administrators to make their feedlist (as well as the feed title and author) visible outside of the plugin will prompt scholars to revisit the metadata contained in their RSS feeds. Participating in development meetings this semester, I have not only continued to further my understanding of the backend of the plugin but also have learned more about php and WordPress core. 

My work on PressForward has been immensely helpful in building my programming skills and as I look back at the last two years of this fellowship, I’m struck by how much my skills have grown. In addition to technical skills, I’ve also gained experience in managing an active publication and an open source project. Thanks to projects like our cohort’s THATCamp topic modeling experiment in Python, the Clio Wired sequence, the support space, and my time in Research my skills have vastly improved. As I finish up this fellowship and look towards beginning my dissertation and developing a digital component, the skill set I’ve cultivated through this fellowship will be immensely useful. At the very least, the skills I’ve developed her have given me a foundation in computational thinking and I feel confident in learning whatever new programming skills will be required for my own research.

Aside from our duties in our respective divisions, the fellows have also had some common projects we’ve worked on.  Stephanie Seal and I produced several episodes of Digital Campus this semester and continued to maintain the blog.  Producing Digital Campus involves finding stories for everyone to discuss, managing and scheduling the recording, and preparing a blog post summarizing the episode for the Digital Campus blog.

Additionally, each year the fellows are asked to host and organize a brown bag at the center.  This year I invited Micki Kaufman down from the City University of New York to talk about her dissertation research, entitled “Everything on Paper Will Be Used Against Me”: Quantifying Kissinger, A Computational Analysis of the DNSA’s Kissinger Collection Memcons and Telcons.” I had previously met Kaufman at the 20th Anniversary conference and the brownbag was an excellent opportunity for the fellows to invite down another graduate student and participate in conversations about digital methodologies and approaches as they apply to a dissertation.

 

PressForward Workshop

This year PressForward has been focused on outreach. The PressForward team has been working to develop the plugin’s user interface and to help several pilot partners get PressForward publications up and running. As the fellow positioned on this project I’ve been involved with the continued development of the plugin. Last weekend, Amanda Morton, a former DH Fellow, and I were given the opportunity to give a PressForward workshop at the Advancing Research Communication and Scholarship (ARCS) conference in Philadelphia. The ARCS conference is “a new conference focused on the evolving and increasingly complex scholarly communication network.” Interdisciplinary in nature, the conference featured a set of workshops on Sunday and a set of diverse panels on Monday. Many of the panels focused on linked and open data, alternative publishing models, alt metrics and other ways of measuring impact, and open access digital repositories. The conference was a great opportunity to interact with organizations and communities that might be interested in PressForward and get an idea of what features might be important to these groups.

Our workshop focused on PressForward and covered topics such as the origins of the project, features that make the plugin standout, and an overview of how we use the plugin to maintain DH Now’s editorial process. Lastly, we set up a sandbox and gave users logins so they could follow along as we walked through important features of the plugin. We had about thirty people from libraries and science organizations attend and it was interesting to hear different ideas about how the plugin might be useful. The workshop was a nice break from some of the more technical things I’ve been doing this semester and it was great to get to talk about the project as a whole and how it fits into the scholarly communication ecosystem.

Below is a copy of the powerpoint we put together for the workshop.