Keystone DH 2017 has ended
Now in its third year, Keystone DH is an annual conference and a network of institutions and practitioners committed to advancing collaborative scholarship in digital humanities research and pedagogy across the Mid-Atlantic.
Back To Schedule
Thursday, July 13 • 2:00pm - 3:30pm
#s2a Data / Data & Civic Engagement

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Collaborative Notes for Session (add your own thoughts!)

The Challenge of Anomalous Historical Data in DH (Kira Homo)

Working with historical documents presents many challenges to the conscientious researcher. Sources can be fragmentary, frustratingly elusive (or allusive), or simply absent. Extant documents may have been written for specific purposes that affected how the author chose to present the information. Indigenous or minority populations may not be represented in the archives at all, or at least not in their own words. For the digital scholar attempting to construct a regularized dataset, however, the nature of the historical record presents additional problems beyond the everyday challenges most historians confront in their primary sources. Most fundamentally, of course, the vast majority of pre-20th-century historical documents do not currently exist in a digital format. As a result, DH scholars engaged in historical research must often undertake wholesale transcription prior to using text analysis tools to study an entire corpus, or construct their own datasets from existing fragmentary sources to use in network analysis, geospatial analysis, and statistical analysis. In addition to negotiating gaps in the historical record, scholars must contend with irregular orthography, irregular or missing dates, archaic place-names, and much more. This paper examines a few common anomalies I have encountered in my own research and offers strategies and approaches for negotiating these potential pitfalls.

Dat Dataset Prototype Tho: Using Dat for Data Preservation (Rachel Appel, Chad Nelson)
Open civic data portals are a relatively new and growing trend in cities and states that hope to bridge the gap between citizens and government and stimulate civic engagement by making datasets originating from governmental agencies and civic organizations easily accessible online. OpenDataPhilly.org is unusual in that the portal is not managed centrally by the city itself, but instead by members of the community through a collection of links and descriptive metadata about those datasets. Digital preservation and versioning are often not considered because of the fluidity of the data and maintaining it online ceases to be an organizational priority. Our project, “Future-Proofing Civic Data,” is an attempt to learn how libraries can and should use their expertise in digital preservation and curation to provide long term access to those datasets by using OpenDataPhilly.org as a testbed. One approach we used was tracking and sharing downloaded files in Dat (datproject.org). Dat is a secure and distributed package manager for data which can be used locally or for sharing and syncing versioned data over the Internet with an optional peer-to-peer network. We will discuss data sharing as preservation, our use cases for Dat, how this approach allows other members of the open data community to easily store live-updated copies, our curatorial decisions, workflows for update monitoring and versioning, and discovery. 

Metadata Analysis in the Age of Email (Shane Lin)
Digital technology and the Internet has dramatically altered the study of the late 20th century. My paper examines the ways in which the uses of embedded metadata, specifically in mailing list and Usenet documents, aid the investigation of historical and broader humanities inquiries into the late 20th century in ways unavailable to studies of previous eras.
My dissertation project explores the construction of digital privacy rights in the 1970s through the 1990s, a process that took place in part over sprawling crypto-anarchist mailing lists and technical Usenet newsgroups. Collections of the digital age in the mega-, giga-, and peta-bit scale dwarf the linear-distance measures of archives of earlier eras. Through my research, I've found that the impressive volume of such data is not the only benefit. The metadata in born-digital communication formats that natively relied on this metadata for sorting and function make it an especially powerful tool for researchers. Beyond the basic ability to order by category, author, or date-time, the structured organization of such protocols allows scholars to isolate important topics and influential participants by the volume of downstream response. With more advanced mechanisms of analysis, researchers can trace the directional flow of ideas across time and discussion domains, mechanizing and quantifying the methodologies of intellectual history and deploying them against collections of vast scale and broad comprehension.
These techniques do not replace traditional text analysis. Metadata is, after all, not text. My paper reflects on the ways that metadata, used in conjunction with those techniques, can focus and deepen readings, isolate important individuals and periods, and illuminate networks of influence.


Kira Homo

PhD Student, Penn State Univerity

Shane Lin

Graduate Student, University of Virginia
avatar for Chad Nelson

Chad Nelson

Lead Technology Developer, Temple University

Thursday July 13, 2017 2:00pm - 3:30pm EDT
Ullyot South Chemical Heritage Foundation