About Anne McDivitt

My name is Anne Ladyem McDivitt. I am a graduate student pursuing my PhD in history at George Mason University. I am a first year Digital History Fellow at the Center for History and New Media. I received my MA and BA from the University of Central Florida. My research focuses on the US video game industry and masculinity from 1958-1986. I am also involved in digital and public history research.

THATCamp Mallet Results

We have spent the last few weeks working to build a python script that would allow us to download and prep all of the THATCamp blog posts for topic modeling in MALLET (for those catching up, we detailed this process in a series of previous posts). As our last post detailed, we encountered a few more complications than expected due to foreign languages in the corpus of the text.  After some discussion, we worked through these issues and were able to add stoplists to the script for German, French, and Spanish.  Although this didn’t solve all of our issues and some terms do still show up (we didn’t realize there was Dutch too), it led to some interesting discussion about the methodology behind topic modeling.  Finally we were able to rerun the python script with the new stopwords and then feed this new data into MALLET.

MALLET, or MAchine Learning for Language Toolkit, is an open source java package that can be used for natural language processing.  We used the Programming Historian’s tutorial on MALLET.  Topic modeling is an important digital tool that analyzes a corpus of text and seeks to extract ‘topics’ or sets of words that are statistically relevant to each other.  The result is a particular number of word sets also known as “topics.” In our case we asked MALLET to return twenty topics based on our set of THATCamp blog posts. The topics returned by MALLET were:

  1. xa digital art history research university scholarship graduate field center publishing open today institute cultural knowledge professor online world
  2. university games pm humanities http digital september knowledge kansas saturday game conference state registration play information representation workshop boise
  3. thatcamp session sessions day participants free technology page unconference university nwe conference discussion information google propose hope event proposals
  4. people make time questions things idea access process ideas world work great making lot build add kind interesting nthe
  5. digital humanities data tools text projects research scholars omeka texts tool analysis scholarly archive reading online based book scholarship
  6. digital humanities session dh library libraries support projects open discussion librarians amp talk work journal sessions propose faculty list
  7. history public digital historical collections museum media project projects mobile online maps museums collection historians users sites site applications
  8. games zotero thinking place game code end cultural chnm hack year documentation humanists version number pretty application visualization set
  9. session open area data workshop tool knowledge teach interested bay prime bootcamp gis workshops reality night thatcampva virginia lab
  10. work interested students ways teaching post working talk writing blog love issues don conversation create collaborative thinking start discuss
  11. project web content information tools community resources archives experience research create learn creating learning share development materials specific provide
  12. xb xa del se humanidades digitales xad al madrid www mi este aires buenos digital personas taller cuba parte
  13. caption online id align width open attachment accessibility women read university american building accessible gender media november floor race
  14. data http org session www open twitter texas good wikipedia nhttp status wiki start commons drupal metadata people crowd
  15. xa workshop session omeka publishing http gt propose org workshops friday docs open hands amp doc studies topic discuss
  16. students digital learning technology education media college faculty humanities research game pedagogy student courses classroom assignments skills arts social
  17. xa oral digital humanities video event local application community offer interviews planning center education software jewish weekend college histories
  18. een het voor op te zijn deze met workshop kunnen om digitale bronnen data onderzoek historici nl wat worden
  19. social media technology studies arts performance museums xcf play participants cultural performing reading st email object platforms interaction technologies
  20. xa thatcamp org http thatcamps details read movement published planned access nthatcamp browse software follow break series google join

As you can see we have an impressive list of terms. Before we organize them in a meaningful way, we will briefly point out a common problem that scholars may confront when working with MALLET. As you may notice, we realized that we have quite a few errors such as ‘xa’ that appear in the results.  While we don’t have a great answer for why this is, we think it has to do with complex encoding issues related to moving content from a WordPress post that is stored in a MySQL database using Python. Each of these uses a different coding system and the error appears to be related to non-breaking spaces.  A little bit of Googling revealed that the non-breaking space character used by WordPress is ‘&nbsp’ which is different that the ASCII encoding of a non-breaking space ‘/xa0’.  When Python reads WordPress’s non-breaking space character ‘&nbsp’, it understands the space but encodes it as the UTF-8 version ‘/xa0’.  As second year fellow Spencer Roberts explained the issue is that meaning is lost in translation. He used this analogy: Python reads and understands the French word for “dog” then translates it and returns the English word.

In this case, what shows up in our results is not ‘/xa0’ but rather ‘xa’ because we had stripped out all of the non-alphanumeric characters prior to running the data through MALLET.  We think the errors such as ‘xa’ and ‘xb’ are because of these encoding issues.  Anyone interested in clarifying or continuing this discussion with us can do so in the comments.

Returning to our MALLET results, our next challenge was to present and analyze the large amount of data.  We drew from both Cameron Blevins and Robert K. Nelson in our approach and decided to group the topics by theme so that trends could be more easily identified.  We determined that there were about seven broad themes in the corpus of THATCamp blog posts from 2008 to present:

  1. Accessibility
  2. Building
  3. Community
  4. International
  5. Pedagogy
  6. Public Digital Humanities
  7. THATCamp Structure

Utilizing these larger categories, we were able to create several charts that demonstrate the changes over time with the THATCamps. The charts are available below; you’ll note that we have graphed them using percentages. The percentages that appear represent the number of times that topic occurred within the posts at that camp.

Chart of Topics Overall

Topics Overall

Chart of the topic Accessibility.

Topics relating to Accessibility

Graph of the Community Topic

Topics relating to Community

Graph of the THATCamp Structure Topics

Topics relating to THATCamp Structure

Graph of topics relating to Pedagogy

Topics relating to Pedagogy

Graph of topics relating to Public Digital Humanities

Topics relating to Public Digital Humanities

Graph of topics relating to building in the humanities.

Topics relating to Building

Topics relating to the international influence of THATCamp.

International Digital Humanities

We found these results to be particularly interesting. A larger overall conclusion is that THATCamp content emphasizes the various applications of digital technology to scholarship, from public uses to tool building or teaching. Since THATCamp was founded, it has become a more varied community. However close examination of the topic models this exercise produced reveals that a number of the same terms appear frequently across all of the topic models (“digital”, for instance, appears in 8 of the 20 topics). This references the way in which ideas are circulated throughout camps and unifies the community. It also reflects the subjects that are the focus of the community.

If you’re interested in the data, you can view the various files here:

Reflections on Public Projects

This week, we finished our rotation block with Public Projects. I both struggled and thoroughly enjoyed working in Public Projects, as I learned so many new and helpful things while I also found my weaknesses in some of the more technical aspects of digital history. This block included many different types of projects, such as live testing a new website at the National Mall, writing entries for that project, testing Omeka, and even transcribing letters for Papers of the War Department.

I also got to venture into DC for the first time for work during this rotation, which I enjoyed immensely. I was very thankful that I got to test the new National Mall project with my other first year fellows, and you can read more about that experience here. I am excited to see it go live, and I hope that when it is live, many other first-time and returning visitors to the Mall can utilize it.

I also had some difficulties in the block that I overcame, which makes me feel incredibly accomplished. Although I felt comfortable with Omeka coming into this block, I have learned so much more about how it functions and the different uses than I had previously known. I also learned a lot about how transcribing and pulling out keywords from handwritten letters are entirely different experiences. This was difficult, especially figuring out what particular words were, but it was so useful, connecting, and interesting to read these letters from when the US was a brand new country.

I loved working within this block, and I liked that I was challenged by a lot of the projects we worked on. I have learned a lot of useful skills that I can apply to my future career or dissertation as a historian. Coming into George Mason University, I already had my MA in Public History, and I have a real passion for making history accessible to the public. I believe that a lot of the work that is being done in the Public Projects section of CHNM is applying this concept, and I take great inspiration from the people and projects that I have encountered while working here.

Education Reflection

My time in the Education block of the Center for History and New Media as a Digital History Fellow has been quite interesting for me. Previously, my experience with teaching was limited to either working as a Graduate Teaching Assistant for introductory-level history courses or teaching fourth graders as a Public History Educator at a museum in Sanford, Florida.  Due to my admittedly limited experience with K-12 education, this experience has been revealing on how technology can accommodate teaching history to students at those levels.

Although historians always analyze information and primary documents, it is a lot more difficult to determine the best way for students to utilize those resources for learning. For example, while writing reviews for Teaching History, I had to consider the typical things for historians, such as bias, type of information, and quality and quantity of the primary documents. What is new to me is that I also had to think of how these items could potentially enhance a lesson plan for a teacher for their class. In addition, I also had to consider the usability of these websites and tools. If a website is too difficult or confusing for a student to use, then it is problematic to consider it a valuable teaching resource, even if the information is good.

I have previously mentioned the challenges of thinking as an educator, and these challenges continue to be something that I must tackle as I continue in the educational portion of CHNM, as well as my future as a historian. I believe that these are some of the valuable lessons that I can take form working at a Digital History Fellow at CHNM, because I will be able to utilize the skills that I have obtained from working on these projects in future endeavors.

The Challenges of Making a Challenge

For the past few weeks at the Center for History and New Media, my fellow first year Digital History Fellows and myself were assigned to work in the Education division, which produces projects that are designed to teach history to a wide scope of people through various educational resources. While in the Education division, we have been working with a new web project meant to engage and educate the audience by allowing them to examine liberty in the United States in a new and interesting way. This is achieved by incorporating age and ability-appropriate “challenges” and access to primary documents and images. This project seeks an audience of teachers, K-12 students, as well as the general public.

There are intriguing methods in creating a challenge for students. While creating our own challenge for the project, there were multiple questions that we had to ask ourselves. First, what was the goal of the project? What did we want the students to achieve from doing the challenge? What skills would they use? In terms of examining the sources, we attempted to view them in an analytic manner, but with a basic guided direction so that the students do not get overwhelmed. We wanted the students to come away with an understanding of the importance of understanding not only the document itself, but also their context. By giving the students a choice of what documents they could utilize for their own project, it allows them to view our examples and use the skills they gained to create an interesting project from their understanding.

 
Although this project has yet to publicly launch, I have been testing the website from multiple angles to ensure that it will work properly for the end users. This has certainly been a fun process for me, as I have had to work as both a teacher and a student! This meant that I had to get myself into a mindset of, “if I were in tenth grade, how would I have completed this assignment? What did I know? What did I not know?” It was also quite engaging to utilize the primary documents and photographs in conjunction with the provided tools to create interesting projects with the website. I would imagine that K-12 aged students would also find this to be quite exciting, but I also think that it would be a fun experience for teachers who are designing challenges for their students, as well. I know all of the DH Fellows that worked on this project took our assignments very seriously beyond just the testing phase, as we worked for hours to perfect our challenge assignments!

Originally posted on Center for History and New Media Blog