Looking back, looking forward: Linguistic DNA in 2016 and 2017

As we move into 2017, we’ve been looking back at achievements in 2016, and ahead to what we aim to achieve in the coming year.

2016 was an outwardly busy year as we travelled to Bruges, Essen, Krakow, Lausanne, Leeds, Brighton, Murcia, Nottingham, Paris, Saarbrucken, and Utrecht, sharing more of our thinking and early data with different audiences. Closer to “home”, we benefitted from the exchange of ideas with LDNA-hosted panels at Sheffield DH Congress and our second methodological workshop in Sussex. In 2017, we will be focusing back on our interface development and some more in-depth research, though we intend to be present at DH, SHEL, ICAME and SHARP, in order to continue some fruitful conversations.

On the blog, we have been reflecting on representativeness and the nature of EEBO-TCP. We’ve also documented our decision not to use ECCO’s OCR data to analyse eighteenth century print. You can expect to hear about the alternative 18th century datasets we’re choosing to work with later in 2017.

During the Autumn, the LDNA researchers collaborated on two articles about the project, its theory and praxis, both (hopefully) to be published this year following peer review. Generating examples from each research theme based on our early data and tying these together effectively was an enjoyable challenge, and we have already used the draft of one piece as part of our briefing materials for upcoming MA placements at The Digital Humanities Institute | Sheffield (formerly known as HRI Digital).

In the past six months, the Sheffield team have captured funding for two additional applications of the Linguistic DNA “concept modelling” tools:

The ESRC project Ways of Being in a Digital Age combines our quantitative insights with a qualitative literature survey of academic publications. Scheduled to inform the ESRC’s next programme of digital society funding, this impact-full study has compelled us toward rapid prototype development. The interface being put together to serve ‘WoBDA’ colleagues will also form the kernel of the subsequent LDNA workbench.
From next month, we are involved in another funded impact-related project, collaborating with the University of Leeds to explore the conceptual structure of millions of YouTube video comments on the theme of militarisation, as part of a larger project funded by the Swedish Research Council. This is a six-month commitment, bringing in a further research associate to theorise what’s involved in applying our measures to some very different data.

We also have three significant applications in place for other pots of funding, including Horizon 2020 collaborations, attesting confidence about our nascent processes and the multifarious opportunities for their application and impact.

Meanwhile, Glasgow has been using the present word co-occurrence data to develop its methodology for investigating processor data from the perspective of key Historical Thesaurus categories. We have continued to develop analysis of Thesaurus categories, looking for those which show abnormal instances of growth or decline; a provisional methodology for establishing statistical ‘baselines’ has been plotted out which is now being implemented and refined. Further possibilities are being tested, such as amalgamating data across whole layers of the HT hierarchy rather than by individual category, and the effects of separating out part of speech within categories or layers.