Monthly Archives: July 2016

Chart showing frequency of stem "transl-" in ECCO OCR as % of TCP.

Experimenting with the imperfect: ECCO & OCR

When the Linguistic DNA project was first conceived, we aimed to incorporate more than 200 000 items from Eighteenth Century Collections Online (ECCO). Comparing findings for one portion of ECCO that has been digitised in different ways, this 2016 blogpost details why that ambition proved impractical. The public database uses ECCO-TCP as its main eighteenth-century source. Continue reading

Sociolinguistics Symposium 21: conference reflections

Linguistic DNA Co-Investigator Justyna Robinson attended the Sociolinguistics Symposium at the University of Murcia, Spain, 15-18 June. This year’s conference theme was ‘attitudes and prestige’, and the event included over 1,000 presentations. Justyna represented LDNA with a poster in the general poster session entitled ‘Linguistic DNA: Modelling concepts and semantic change in English, 1500-1800’. Below, she reflects on her experience:

For the LDNA project, one of the really important panel sessions was the one organised by Terttu Nevalainen and Marijke van der Wal, entitled Historical sociolinguistics: Dispelling myths about the past. The session included papers which aimed at revisiting a range of assumptions about the past and the study of the past that are not supported by historical sociolinguistic research. In doing so, particularly important for LDNA, were papers of a methodological nature in which methodologies of historical linguistic research were interrogated. For example, in ‘People, work, values: Tracing societal change through linguistic shifts’, Minna Palander-Collin, Anni Sairio, Minna Nevala, and Brendan Humphries (University of Helsinki) explored social changes in Britain between 1750 and 1900 by analysing keywords within the conceptual domains of PEOPLE, WORD, and VALUES. Questions that emerged from the discussion of this research included whether social shifts can be identified in keywords. This quickly led to asking what concepts are and what kind of relationship exists between keywords and concepts. Although the answer to this question wasn’t decided, there was a unanimous desire to explore the question further. Another observation from Palander-Collin et al.’s talk was that certain concepts can linger on in language when in practice the real–life referents designated by the concepts may be long gone. Miriam Meyerhoff added that this issue was also observed in New Zealand data, i.e. researchers looking at Maori keywords found out that references to certain plants lingered on in narratives of a community, well after the time these plants were used. In this discussion the audience continued to reference the LDNA project as well. It was great to hear that more and more people know about LDNA and are following our progress.

The LDNA poster presentation was set up in a beautiful setting in one of the cloisters at Murcia University. The poster attracted a lot of attention. In it, we presented first findings from using positive and negative PMI values to model discursive concepts around the word soldier in the window of +/-100 words. Having set such a large proximity window. we did not initially know whether what we would find would be interesting and useful in our quest to determine what concepts are. One conclusion from this analysis was that large proximity windows still yield meaningful information and clear semantic domains emerge that are important in grasping the discursive concepts around soldier. Another methodological finding of this research is the value of using negative PMI values in improving our understanding of what concepts are. Thus, soldier shows a notably rare association with a group or items that are a semantically cohesive group. These include religious terms, such as sin and church. One may ask whether this systematic weak correlation may indicate the end of a disappearing concept or the beginning of the development of a new concept. These questions will be soon answered by looking at our data diachronically.


Abstracts from the conference are available from the conference website.