Linguistic DNA at SRS 2018: Abstracts

Knowledge, truth and expertise: experiments with Early English Books Online

Wondering what Linguistic DNA is bringing to the Society for Renaissance Studies? Here are the abstracts for two panels of papers, and information about our hands-on demonstration session (drop in).

United by a common interest in data-driven approaches to meaning and a focus on the transcribed portions of Early English Books Online (EEBO-TCP), this interdisciplinary panel brings together new research from the Linguistic DNA project and the Cambridge Concept Lab. 

What is EEBO anyway? Contextual study of a universe in print
Iona Hine and Susan Fitzmaurice (University of Sheffield)

Since 2015, the Linguistic DNA team has been developing methods for mapping meaning and change-in-meaning in Early Modern English. Our work begins with the hypothesis that meanings are not equivalent with words, and can be invoked in many different ways. For example, when Early Modern writers discuss processes of democracy, there is no guarantee they will also employ a keyword such as democracy. We adopt a data-driven approach, using measures of frequency and proximity to track associations between words in texts over time. Strong patterns of co-occurrence between words allow us to build groups of words that collectively represent meanings-in-context (textual and historical). We term these groups “discursive concepts”.

The task of modelling discursive concepts in textual data has been absorbing and challenging, both theoretically and practically. Our main dataset, transcriptions of texts from Early English Books Online (EEBO-TCP), contains more than 50 000 texts. These include 9000 single-page broadsheets and 162 volumes that span more than 1000 pages. There are 127 items printed pre-1500, and nearly 7000 from the 1690s. The process of analysis therefore requires us to think carefully about how best to control and report on this variation in data distribution.

One particular question that has arisen affects all who attempt to use EEBO: what is in it? To what extent is its material from pre-1500 similar in kind (genre, immediacy, etc.) to that of the messy 1550s (as the English throne shifted speedily between Edward VI and his siblings), the 1610s (era of Shakespeare and the King James Version), or the 1640s (when Civil War raged)? This paper is a sustained reflection on attempts to find out “What’s in EEBO?”

In the beginning was the word?
EEBO-TCP and another universe of meaning

Seth Mehl (University of Sheffield)

When a new idea is conceived, how does it find expression in language? Between 1450 and 1750, the English lexicon expanded dramatically, and literary scholars, philologists, linguists, and historians have sought to document and demonstrate the paths taken by key social and cultural vocabulary, charting the history of what would become key social and cultural ideas, discourses, and concepts. In such cases, the topic and language for investigation has been intuited on the basis of extended qualitative reading, and the objects of investigation tend to be individual words. With the advent of a searchable database of early modern texts, such intuitions can be tested at scale, and the initial object of inquiry can shift from individual words to relationships between sets of words.

What happens when we invert the traditional process, taking the thousands of texts digitised in EEBO-TCP and applying computational techniques to model language change independent of human intuition? Can such techniques indicate meaningful relationships between key words that human researchers had not intuited or observed? To what extent do observations founded on over 1 billion words of early modern English correspond to and diverge from what scholarly readers have already inferred? Is it possible to identify discourses around key ideas even when the apparently related key words are absent? Combining insights from the Keywords Project with tools developed by the Linguistic DNA project, this paper will explore how concept modelling can be applied to re-examine meaning in early modern texts.

Beyond Power Steering:
re-constituting structures of knowledge in 17th-century texts

John Regan (University of Cambridge)

One of the axioms of the Cambridge Concept Lab is that digital means of enquiry should provide qualitatively new kinds of knowledge, if we are to realise their full value. This is to say, that computation should not merely provide ‘power steering for the humanities’, but allow one to discover something different in kind about how knowledge was structured in the past.

Making good on this axiom necessitates judgements on the part of the user of digital technology about how to design one’s modes of address to (for example) natural language data sets such as Early English Books Online- TCP, in order that one is not only adding ‘power steering’ to existing, familiar types of enquiry. It also necessitates making decisions about when to come to rest at results (that is, when to cease enquiry); judgements of where digital data can be said to be producing discrete and unfamiliar forms of knowledge.

This paper will present tentative first signs of what the Cambridge Concept Lab believe are historically-discrete conceptual structures, based on data from the early seventeenth-century portion of EEBO-TCP. Two such structures will be described, one entitled ‘Mutual Dependence’, the other ‘Self-Consistency’. As will be shown, familiar forms of knowledge that are held and expressed in sentences and paragraphs, organised by grammar and understood by readers largely as explicit sense, may be contrasted with this evidence of qualitatively different conceptual structures in the textual record. While this paper does not set out to debunk existing theories of the structuration of knowledge and its transmission in the seventeenth century as have become established through centuries of close reading, it does seek to enrich our understanding of these traditions by attending to conceptual, and not exclusively semantic, thematic or rhetorical, structures.

It appears uncontroversial to assert that concepts are determining with regard to features of language use such as explicit and implicit semantic fields, theme, word order, and syntactic relations at the level of the sentence. Nevertheless, recognising that concepts have lexical and semantic extension is not the same as accepting that the two are identical in kind. This paper’s claims about conceptual structure will be based upon evidence from the early decades of seventeenth-century data from EEBO-TCP.

Our afternoon panel is a little depleted (by ill-health) but features Jose M. Cree (Sheffield) on Neologisms and the English reformation, Lucas van der Deijl (Amsterdam) on The collaborative Dutch translations of Descartes by Jan Hendrik Glazemaker (1620-1682), and a little extra time for discussion.


All SRS delegates are very welcome to drop in to our demo workshop, where we will be providing a 10-15-minute introduction to our tools (3:30pm, repeated at 4:30pm) and the opportunity for hands-on experimentation.  This is in the Hicks Building, Floor G, room 29. (About 2 minutes walk from Jessop West, across the main road and a little uphill. Directions.)

Snapshot from campus map, featuring the Hicks Building.

From Spring to Summer: LDNA on the road

June 2016:
For the past couple of months, our rolling horizon has looked increasingly full of activity. This new blogpost provides a brief update on where we’ve been and where we’re going. We’ll be aiming to give more thorough reports on some of these activities after the events.

Where we’ve been

Entrance to University Museum, UtrechtIn May, Susan, Iona and Mike travelled to Utrecht, at the invitation of Joris van Eijnatten and Jaap Verheul. Together with colleagues from Sheffield’s History Department, we presented the different strands of Digital Humanities work ongoing at Sheffield. We learned much from our exchanges with Utrecht’s AsymEnc and Translantis research programs, and enjoyed shared intellectual probing of visualisations of change across time. We look forward to continued engagement with each others’ work.

A week later, Seth and Justyna participated in This&THATCamp at the University of Sussex (pictured), with LDNA emerging second in a popular poll of topics for discussion at this un-conference-style event. Productive conversations across the two days covered data visualisation, data manipulation, text analytics, digital humanities and even data sonification. We hope to hear more from Julie Weeds and others when the LDNA team return to Brighton in September.

Next week, we’ll be calling on colleagues at the HRI to talk us through their experience visualising complex humanities data. Richard Ward (Digital Panopticon) and Dirk Rohman (Migration of Faith) have agreed to walk us through their decision-making processes, and talk through the role of different visualisations in exploring, analysing, and explaining current findings.

Where we’re going

The LDNA team are also gearing up for a summer of presentations:

  • Justyna Robinson will be representing LDNA at Sociolinguistics Symposium (Murcia, 15-18 June), as well as sharing the latest analysis from her longitudinal study of semantic variation focused on polysemous adjectives in South Yorkshire speech. Catch LDNA in the general poster session on Friday (17th), and Justyna’s paper at 3pm on Thursday. #SS21
  • Susan Fitzmaurice is in Saarland, as first guest speaker at the Historical Corpus Linguistics event hosted by the IDeaL research centre, also on Thursday (16th June) at 2:15pm. Her paper is subtitled “Discursive semantics and the quest for the automatic identification of concepts and conceptual change in English 1500-1800”. #IDeaL
  • In July, the Glasgow LDNA team are Krakow-bound for DH2016 (11-16 July). The LDNA poster, part of the Semantic Interpretations group, is currently allocated to Booth 58 during the Wednesday evening poster session. Draft programme.
  • Later in July, Iona heads to SHARP 2016 in Paris (18-22). This year, the bi-lingual Society are focusing on “Languages of the Book”, with Iona’s contribution drawing on her doctoral research (subtitle: European Borrowings in 16th and 17th Century English Translations of “the Book of Books”) and giving attention to the role of other languages in concept formation in early modern English (a special concern for LDNA’s work with EEBO-TCP).
  • In August, Iona is one of several Sheffield early modernists bound for the Sixteenth Century Society Conference in Bruges. In addition to a paper in panel 241, “The Vagaries of Translation in the Early Modern World” (Saturday 20th, 10:30am), Iona will also be hosting a unique LDNA poster session at the book exhibit. (Details to follow)
  • The following week (22-26 August), Seth, Justyna and Susan will be at ICEHL 19 in Essen. Seth and Susan will be talking LDNA semantics from 2pm on Tuesday 23rd.

Back in the UK, on 5 September, LDNA (and the University of Sussex) host our second methodological workshop, focused on data visualisation and linguistic change. Invitations to a select group of speakers have gone out, and we’re looking forward to a hands-on workshop using project data. Members of our network who would like to participate are invited to get in touch.

And back in Sheffield, LDNA is playing a key role in the 2016 Digital Humanities Congress, 8-10 September, hosting two panel sessions dedicated to textual analytics. Our co-speakers include contacts from Varieng and CRASSH.  Early bird registration ends 30th June.