LDNA Blog

  • Visualising Patterns in Historical Concept Mapping

    Curating Patterns with Visualisations To interpret the patterns emerging from our metric calculations, we’re deploying visualisations that make these complex relationships easier to grasp. Using tools like network graphs and heatmaps, we can identify clusters of words that form cohesive themes. These visualisations allow us to observe how concepts interrelate, helping us to curate distinct…

  • Mapping Historical Concepts with LDNA

    Our current project employs the LDNA (Linguistic DNA) demonstrator to investigate historical texts, aiming to uncover how ideas evolved in Early Modern English. By analysing ‘concept constellations’ which are clusters of the quads (sets of four highly frequent common co-occurring lemmas, including a central node lemma), we gain insights into how groups of words represented…

  • Talk About Change: LDNA at Festival of the Mind

    Last weekend, Linguistic DNA & friends took over the Spiegeltent in Sheffield city centre, as part of the University’s Festival of the Mind. Spiegeltents are a Belgian invention–tents decorated internally with mirrors, creating the perfect space to share myriad reflections.  Over the course of two hours, we hosted a performance of new writing that emerged…

  • Linguistic DNA at SRS 2018: Abstracts

    Knowledge, truth and expertise: experiments with Early English Books Online Wondering what Linguistic DNA is bringing to the Society for Renaissance Studies? Here are the abstracts for two panels of papers, and information about our hands-on demonstration session (drop in). United by a common interest in data-driven approaches to meaning and a focus on the transcribed…

  • Talk About Change

    In a time when events seem ever and ever out of our control, writing is resistance. –Our Mel. In April (2018), Linguistic DNA began collaborating with local social entrepreneurs Our Mel to do some collective thinking about the power of language. This work is funded by the University of Sheffield’s Festival of the Mind and our…

  • Translation, Gender, Sexuality: a report from Genealogies of Knowledge 2017

    In December 2017, Sheffield MA student Nathaniel Dziura attended part of the Genealogies of Knowledge conference in Manchester. While the LDNA team were exchanging conceptual insights with other data-driven scholars, Nathaniel participated in sessions connected to a different field of interest. He writes: As a member of the LGBTQ+ community, I am keen to contribute…

  • Quantity and quality: lessons from an MA work placement

    Sheffield MA student Nadia Filippi reflects on her experience after 100 hours with the Linguistic DNA team at DHI | Sheffield: As part of my MA studies in English Language and Linguistics, I had the opportunity to undertake a work placement of 100 hours at the University of Sheffield’s Digital Humanities Institute. The placement offered…

  • Looking back, looking forward: Linguistic DNA in 2016 and 2017

    As we move into 2017, we’ve been looking back at achievements in 2016, and ahead to what we aim to achieve in the coming year. 2016 was an outwardly busy year as we travelled to Bruges, Essen, Krakow, Lausanne, Leeds, Brighton, Murcia, Nottingham, Paris, Saarbrucken, and Utrecht, sharing more of our thinking and early data…

  • On “Lost Books” (ed. Bruni & Pettegree)

    Review: Lost Books: Reconstructing the Print World of Pre-Industrial Europe. Ed. Flavia Bruni and Andrew Pettegree. Library of the Written Word 46 / The Handpress World 34. Leiden & Boston: Brill, 2016. 523 pages. We solicited this book for review because we have been keenly aware that we cannot take what has been transcribed and…

  • LDNA at Digital Humanities Congress 2016, Sheffield

    LDNA organised two panels at the 2016 Digital Humanities Congress (DHC; Sheffield, 8th-10th September. Both focused on text analytics, with the first adopting the theme ‘Between numbers and words’, and the second ‘Identifying complex meanings in historical texts’. Fraser reports:

  • Language, visualisation and methodology: our second workshop

    Monday 5 September saw the Linguistic DNA team camping out at the University of Sussex for our second methodological workshop. This year the theme was “Visualisation and Language Change”, and we’ve harnessed the powers of Storify to put together a short account of a long and enjoyable day’s work. See more (on Storify).  

  • What does EEBO represent? Part II: Corpus linguistics and representativeness

    What exactly does EEBO represent? Is it representative? Often, the question of whether a corpus or data set is representative is answered first by describing what the corpus does and does not contain. What does EEBO contain? As Iona Hine has explained here, EEBO contains Early Modern English, but it is much larger than that…

  • Under the surface: SHARP, LDNA and sundry sources

    This blog post excerpts material Iona wrote reflecting back on her contribution to the SHARP conference in Paris in July 2016, building on the work of her PhD thesis and incorporating material and processes that have formed part of the Linguistic DNA project. The full post can be found on Iona’s personal blog. In preparation…

  • What does EEBO represent? Part I: sixteenth-century English

    Ahead of the 2016 Sixteenth Century Conference, Linguistic DNA Research Associate Iona Hine reflected on the limits of what probing EEBO can teach us about sixteenth century English. This is the first of two posts addressing the common theme “What does EEBO represent?” The 55 000 transcriptions that form EEBO-TCP are central to LDNA’s endeavour to…

  • Digital Humanities 2016, Kraków

    Conference reflections jointly written with Justyna Robinson Four members of the LDNA team—Marc Alexander, Justyna Robinson, Brian Aitken, and Fraser Dallachy—attended this year’s Digital Humanities (DH) conference in Kraków, Poland. With over 800 attendees, the conference is an excellent opportunity to exchange ideas, learn of new areas of potential interest, and network with academics from…

  • Text Analytics at Sheffield DH Congress

    Earlier in the year (2016), we issued a special call for papers, inviting others to join LDNA panel sessions at the Sheffield Digital Humanities Congress. We were delighted by the responses, and further delighted that the full DHC programme includes plenty of other material relevant to our text analytics’ interests–and a noticeable body of book historical input too. As a…

  • Experimenting with the imperfect: ECCO & OCR

    When the Linguistic DNA project was first conceived, we aimed to incorporate more than 200 000 items from Eighteenth Century Collections Online (ECCO). Comparing findings for one portion of ECCO that has been digitised in different ways, this 2016 blogpost details why that ambition proved impractical. The public database uses ECCO-TCP as its main eighteenth-century…

  • Sociolinguistics Symposium 21: conference reflections

    Linguistic DNA Co-Investigator Justyna Robinson attended the Sociolinguistics Symposium at the University of Murcia, Spain, 15-18 June. This year’s conference theme was ‘attitudes and prestige’, and the event included over 1,000 presentations. Justyna represented LDNA with a poster in the general poster session entitled ‘Linguistic DNA: Modelling concepts and semantic change in English, 1500-1800’. Below,…

  • QPL Semantic Spaces Workshop, University of Strathclyde

    The workshop day, titled ‘Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science’, was part of a larger Quantum Physics and Logic (QPL) conference held at the University of Strathclyde. The workshop focussed on computational approaches to modelling semantics and semantic relations in language. The day was divided into three parts: the first…

  • From Spring to Summer: LDNA on the road

    June 2016: For the past couple of months, our rolling horizon has looked increasingly full of activity. This new blogpost provides a brief update on where we’ve been and where we’re going. We’ll be aiming to give more thorough reports on some of these activities after the events. Where we’ve been In May, Susan, Iona and Mike…

  • Errors, searchability, and experiments with Thomason’s Newsbooks

    Back in 2012, HRI Digital ran a project, with the departments of English, History, and Sociological Studies, looking at participatory search design. The project took as its focus a subset of George Thomason’s 17th-century newsbooks, transcribing every issue of Mercurius Politicus plus the full selection of newsbooks published in 1649 (from the images available through ProQuest’s Early English Books Online). Building the…

  • Conference report: Diachronic corpora and genre in Nottingham

    On Friday 8 April 2016, Susan Fitzmaurice and Seth Mehl attended Diachronic corpora, genre, and language change at the University of Nottingham, where Seth gave a paper entitled Automatic genre identification in EEBO-TCP: A multidisciplinary perspective on problems and prospects. The event featured researchers from around the globe, exploring issues in historical data sets; the…

  • LDNA’s first year: Reflections from RA Seth Mehl

    In wrapping up the first year of LDNA, I’ve taken a moment to consider some of the over-arching questions that have occupied much of my creative and critical faculties so far. What follows is a personal reflection on some issues that I’ve found especially exciting and engaging. Semantics and concepts The Linguistic DNA project sets…

  • Learning with Leuven: Kris Heylen’s visit to the HRI

    In 2016, Dr Kris Heylen (KU Leuven) spent a week in Sheffield as a HRI Visiting Fellow, demonstrating techniques for studying change in “lexical concepts” and encouraging the Linguistic DNA team to articulate the distinctive features of the “discursive concept”. Earlier this month, the Linguistic DNA project hosted Dr Kris Heylen of KU Leuven as…

  • Dr Kris Heylen: Tracking Conceptual Change

    In February 2016, Linguistic DNA hosted Dr Kris Heylen as an HRI Visiting Fellow, strengthening our links with KU Leuven’s Quantitative Lexicology and Variational Linguistics research group. This post outlines the scheduled public events. Next week, the Linguistic DNA project welcomes visiting scholar–and HRI Visiting European Fellow—Dr Kris Heylen of KU Leuven.   About Kris:…

  • A Theoretical Background to Distributional Methods (pt. 2 of 2)

    Introduction In the previous post, I presented the theoretical and philosophical underpinnings of distributional methods in corpus semantics. In this post, I touch on the practical background that has shaped these methods. Means of analysis The emergence of contemporary distributional methods occurs alongside the emergence of Statistical Natural Language Processing (NLP) in the 1990s. Statistical…

  • A theoretical background to distributional methods (pt. 1 of 2)

    Introduction When discussing proximity data and distributional methods in corpus semantics, it is common for linguists to refer to Firth’s famous “dictum”, ‘you shall know a word by the company it keeps!’ In this post, I look a bit more closely at the theoretical traditions from which this approach to semantics in contexts of use…

  • Operationalising concepts (Manifesto pt. 3 of 3)

    This blog post completes our series of three extracts from Susan Fitzmaurice’s paper on “Concepts and Conceptual Change in Linguistic DNA”. (See parts 1 and 2.) The supra-lexical approach to the process of concept recognition that I’ve described depends upon an encyclopaedic perspective on semantics (e.g. cf. Geeraerts, 2010: 222-3). This is fitting as ‘encyclopaedic…

  • Defining the content of a concept from below (Manifesto pt. 2 of 3)

    This blog post features the second of three extracts from Susan Fitzmaurice’s paper on “Concepts and Conceptual Change in Linguistic DNA”. (See previous post.) Before tackling the problem of actually defining the content of a concept ‘from below’, we need to imagine ourselves into the position of being able to recognize the emergence of material…

  • A manifesto for studying conceptual change (Manifesto pt. 1 of 3)

    As those who follow our Twitter account will know, Linguistic DNA’s principal investigator, Susan Fitzmaurice, was among the invited speakers at the recent symposium on Digital Humanities & Conceptual Change (organised by Mikko Tolonen, at the University of Helsinki). It was an opportunity to set out the distinctive approach being taken by our project and…

  • Distributional Semantics II: What does distribution tell us about semantic relations?

    Distributional Semantics II: What does distribution tell us about semantic relations? In a previous post, I outlined a range of meanings that have been discussed in conjunction with distributional analysis. The Linguistic DNA team is assessing what exactly it can determine about semantics based on distributional analysis: from encyclopaedic meaning to specific semantic relations. In…

  • Naomi Tadmor: Semantic analysis of keywords in context

    On 30 October, Prof. Naomi Tadmor led a workshop at the University of Sheffield, hosted by the Sheffield Centre for Early Modern Studies. In what follows, I briefly summarise Tadmor’s presentation, and then provide some reflections related to my own work, and to Linguistic DNA. The key concluding points that Tadmor forwarded are, I think,…

  • From Data to Evidence (d2e): conference reflections

    Fraser and Iona report (November 2015): Six members of the Linguistic DNA team were present at the recent d2e conference held by the VARIENG research unit at the University of Helsinki, Finland. The focus of the conference was on tools and methodologies employed in corpus linguistics, whilst the event took for its theme ‘big data, rich data,…

  • Distributional Semantics I: What might distribution tell us about word meaning?

    Distributional Semantics I: What might distribution tell us about word meaning? In a previous post, I asked ‘What is the link between corpus data showing lexical usage, on the one hand, and lexical semantics or concepts, on the other?’ In this post, I’d like to forward that discussion by addressing one component of it: how…

  • Workshop Reflections

    A fortnight ago, our first methodology workshop was held at the University of Sussex. It was a full programme and productive for the project team with lots of opportunities for us to test out our thinking about how we move forward, and it has given us plenty to think about. We can perhaps best summarise…

  • The Historical Thesaurus of English and its Related Projects

    One of the resources which the Linguistic DNA project is drawing on is the Historical Thesaurus of English. Organising every word in the language, present and past, into a hierarchical structure based on word-meaning, the Historical Thesaurus is an invaluable tool for historical semantic research. The data from the Thesaurus will be involved in the…

  • Proximity Data II: Co-occurrence and distance measurements

    In a previous post, we addressed proximity data by defining proximity, term, and co-occurrence. In this post, we weigh specific options for measuring co-occurrence. In particular, we look at an array of distance measurements or windows for co-occurrence. In the Linguistic DNA project, we will be experimenting with many of the options and approaches below.…

  • Proximity Data

    Background The Linguistic DNA project will be interrogating cleaned-up EEBO and ECCO data in various ways, to get at its lexical semantic and conceptual content. But how do we get semantic and conceptual information from textual data? Sticking with  the original project proposal, we begin with an analysis of ‘proximity data’. What is proximity data,…

  • Liest thou, or hast a Rewme? Getting the best from VARD and EEBO

    This post from August 2015 continues the comparison of VARD and MorphAdorner, tools for tackling spelling variation in early modern English. (See earlier posts here and here.) As of 2018, data on our public interface was prepared with an updated version of MorphAdorner and some additional curation from Martin Mueller at NorthWestern. This week, we’ve…

  • Illustrating the tools: first insights on VARD & MorphAdorner

    In 2015, we compared two tools developed to address spelling variation in early modern English: VARD and MorphAdorner. This post documents some of that work, outlining how the design and intent of the two tools affects their impact. The Sheffield RAs are hard at work on our audit of Early English Books Online, figuring out how best to…

  • EEBO-TCP and standard spelling

    This post from 2015 outlines the challenge posed by non-standard spelling in early modern English with particular attention to Early English Books Online. It introduces two tools developed by others in order to assist searching and other language-based research: VARD and MorphAdorner. The Linguistic DNA project relies on two very large linguistic data sources for…

  • Welcome to the Linguistic DNA blog!

    The Linguistic DNA blog is a space for those working on the project to reflect on methodology, findings, and other aspects of the project in an informal way. Fraser, Iona, and Seth (the research associates) will be taking it in turns to share what we have been working on.  At present, the website is gradually taking…