Keystone DH 2017 has ended
Now in its third year, Keystone DH is an annual conference and a network of institutions and practitioners committed to advancing collaborative scholarship in digital humanities research and pedagogy across the Mid-Atlantic.
Back To Schedule
Friday, July 14 • 11:15am - 12:30pm
#s3b Text Analysis

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

Collaborative Notes for Session (add your own thoughts!)

I Want to Be Believed: Content and Structure as Value Propositions in Ufology Websites (S. E. Hackney)

This project analyzes online documents related to Ufology— the pseudo-scientific field dedicated to the study of UFOs— and focuses on two websites, http://ufocasebook.com and http://roswelltruth.homestead.com. These sites were chosen because they are run by individuals, rather than as official branches of Ufology groups, and for their longevity (online since 2001, possibly earlier).  From these sites, documents falling into two categories were collected: UFO encounter narratives (400 individual stories, ~219,000 words), and supporting government/media reports (5 news reports, and the text of the Robertson Panel Report, ~11,000 words). Content analysis via word-frequency analysis and topic modeling were done on these two corpora using Voyant and MALLET, respectively.
 The documents analyzed come from disparate sources, with most lacking provenance. Because the sites that host them do not provide verifiable sources, readers must trust that they are ‘real.’ By analyzing the content of these documents, I hone in on what is considered "valuable" or "true" by the sites’ creators, regardless of any outside verifiability.
 Analyses on these corpora show themes of physical location and the movement of objects within the UFO encounter stories, whereas the government/media reports focus on reactions and explanations to previously observed phenomena. These sites collect different types of documentation, under the same umbrella of "Ufology", using much of the same vocabulary, such as "sighting," "report" and the crucial "ufo."
 From this foundation, I consider the structure of the sites themselves as formative to how value is expressed by their creators. This is done through site mapping both websites, and highlighting the location of the documents used for the content analysis. This suggests that at the foundation of these sites is an interest in highlighting experience and description, rather than analyzing data or posing questions to move the field forward.

Taming Text and Locating Imperial Language in Japan's "Taiyō" Magazine (Molly Des Jardin)
Taiyō, a Japanese general-interest magazine (1895-1928), was one of the most popular publications in the early twentieth century and provides an opportunity to examine the ways in which Japanese authors were thinking about the expanding Japanese empire and its relationship to its colonies, as well as the idea of the Japanese nation itself. I have obtained a hand-keyed corpus of selected years of the magazine compiled by the National Institute of Japanese Language & Linguistics and spent recent months converting it to Unicode plain text from Shift-JIS XML, scraping and formatting its rich metadata, tokenizing it into whitespace-separated words, and making first forays into exploring what the text data reveals about imperial relations at the time. I will share the surprisingly involved process of preparing this text for analysis and collecting metadata from the article XML tags. I will also discuss preliminary results regarding turn-of-the-century wars involving China, Russia, and the newly annexed Taiwan. My work with Taiyō illustrates the special challenges of dealing with Japanese (and other non-whitespace separated languages) and various text encodings, and introduces the specific tools needed to prepare the text in an iterative and interpretive process. This presentation explores the ways in which I have dealt with the text: the visualization and modeling tools I used for initial analysis, iterative text cleaning, and identifying the specific "imperial language" usage for imperial foreign relations in early twentieth-century Japan.

Transitioning among Gender Markers: A Computational Approach to Style, Gender, and Modernist Literature (Sean Weidman, Aaren Paster)
James Pennebaker claims in The Secret Life of Pronouns (2011) that there may be “no better way to start a discussion of language and differences among people than with gender. Do men and women use words differently?” The question, of course, is not a new one—dating back to the critical explosion that occurred after Robin Lakoff’s study, “Language and Woman’s Place,” (Language in Society 2 [1973]), linguists (and more recently, digital humanists) have long studied and theorized the differences in communication between men and women. However, while there has been extensive linguistic criticism around genre and gender in writing very few projects have addressed these potential differences in the literature of the 20th century. Our project thus asks a relatively simple question: do men and women writers of modernist fiction have distinct stylistic markers unique to their gender? The modernist period offers an interesting case study in gender and style, given writers’ preoccupations with stylistic experimentation and formal innovation—i.e., one might expect non-traditional forms to give rise to non-traditionally “gender-marked” texts. But after employing a series of stylometric analyses to consider these questions in a modernist corpus of 30 “canonical” female and male authors (about 120 novel-length works, split between genders), our results suggest that there are a number of distinct features that separate the styles of men and women in modernist literature.
However, most obviously—but most interestingly—we find that there are exceptions to the rule. The cases wherein our assumptions about gender don’t hold, we suggest, offer a critique of the current DH debate surrounding gender markers in writing, and productively direct the debate to more fertile critical territory. Virginia Woolf’s Orlando: A Biography, for instance, features both narrative and stylistic gender transition, and while the novel is stylistically consistent with the rest of her corpus, its gender signals differ from those of her contemporaries. What might these exceptions tell us about the limits of our critical ability, even aided by computational techniques, to generalize “features” of gender? What do they reveal about the implicit biases we bring to our analyses? Do we “gender” texts more than we realize? Pairing our stylometric methods with some more conventional literary ones, we hope to showcase the benefits of a more nuanced framing of the fluidity of gender when scholars analyze gender markers in literature.

avatar for S. E. Hackney

S. E. Hackney

Doctoral Student, University of Pittsburgh
avatar for Molly Des Jardin

Molly Des Jardin

Japanese Studies Librarian, University of Pennsylvania
I have a PhD in Japanese studies (book history and literature) and an MS in Information Science, and currently work as the Japanese librarian at Penn. My research focus is on 19th-century publishing and authorship, but I am also branching into digital scholarship on foreign relations... Read More →

Aaren Pastor

Pennsylvania State University

Sean Weidman

Pennsylvania State University

Friday July 14, 2017 11:15am - 12:30pm EDT
Ullyot North Chemical Heritage Foundation