The Edge

LDNA at Digital Humanities Congress 2016, Sheffield

LDNA organised two panels at the 2016 Digital Humanities Congress (DHC; Sheffield, 8th-10th September. Both focused on text analytics, with the first adopting the theme ‘Between numbers and words’, and the second ‘Identifying complex meanings in historical texts’. Fraser reports:

Iona began the first panel, chaired by James O’Sullivan, discussing the kind of information which can be lost in big data projects when information about texts is abstracted from the contents and contexts of texts themselves. Drawing on LDNA’s work with EEBO-TCP, Iona showed examples of data that can slip through the cracks of large-scale automated analysis and described development of a systematic approach to supporting such research with close reading techniques.

Rosie Shute outlined her approach to using vector space modelling to study William Caxton’s compositors, beginning with a clear explanation of how vector space modelling works. Taking sections of text known to have been type-set by the same compositor, Rosie has applied statistical analysis of the frequencies of variant spellings to create models of an individual’s lexical usage which can then be contrasted against each other. She also discussed the difficulties in applying statistical analysis to the complex reality of spelling variation, and the capabilities and limitations of digital methods for this work.

The first panel concluded with description and demonstration of tools for exploring digital corpora developed by Harri Siirtola, Terttu Nevalainen, and Tanja Säily, with all three taking a turn in presenting their contribution to the work. The tools, including the Text Variation Explorer (TVE), are primarily aimed at establishing the extent to which different corpora are comparable, thus allowing researchers to make informed decisions on the data suitable for their purposes. Tanja demonstrated the capabilities of TVE using the Brown, ICE, and CEEC corpora, and discussed the further enhancements being made for TVE version 2.0.

The second of the two panels was chaired by Michael Pidd, and began with Seth discussing the applicability of computational distributional semantics approaches to the identification of concepts in historical text. He examined the diverging theories, frameworks, and goals which underlie current linguistic research and humanities research, discussing the extent and areas in which the former can form part of the latter and providing examples from his investigations using distributional semantics methods with Early Modern English texts in LDNA.

Gabriel Recchia, a computational linguist working with the Cambridge Concept Lab, considered the use of quantitative data in humanities research, describing the interaction of researchers with methods such as topic modelling and statistical models employed for distributional semantics research. Gabriel discussed the extent to which the complexity of the mathematical underpinnings of these methods may hinder researchers’ ability to accurately interpret the results, before continuing to demonstrate some of the ways in which count-based distributional methods can produce results which might not be found using topic modelling or word embedding methods in isolation.

The second panel closed with the demonstration of new tools for conducting historical sociolinguistic research, led by Eetu Mäkelä, Tanja Säily, and Terttu Nevailainen. These allow researchers to use contextual metadata attached to their linguistic data as a means of achieving results which take account of factors which are external to the language itself. The team demonstrated their tools’ abilities by examining the suffix -er used as an inflection indicating comparison and as a method of deriving a noun from a verb, observing sociolinguistic factors affecting the development of both uses.

The conference as a whole offered many more fascinating papers, and was an excellent opportunity to learn about the ever widening scope of digital humanities whilst gaining valuable feedback into the project from an engaged audience. Twitter users can see tweets from the conference by searching for the hashtag #dhcshef.