Concept Modelling Interfaces

1. Early Modernity (EEBO-TCP and ECCO-TCP)

The Linguistic DNA interface allows users to search Linguistic Concept Modelling (LCM) pair or trio data for EEBO-TCP Phase 1 and ECCO-TCP. Users can also search curated sub-collections of sermons or scientific texts; the works of Thomas Becon; the 1550s; or 20-year spans from 1473 to 1803. Data represents co-occurrence trios or pairs of noun lemmas occurring at least 5000 times in EEBO-TCP, with the exception of lemmas containing only one or two letters, and a stoplist of very high frequency lemmas (god, man, thing, christ, time, lord, king, word, church, day).

The interface is currently being updated with a version that will be faster, more intuitive, and with accompanying guidance.

Please note: The interface queries several billion rows of data. Because of the size of the data, some searches may take a few minutes. 

2. Ways of Being in the Digital Age

The WoBDA interface presents LCM trio data for 1900 social science journal articles from 1968 to 2017. Users can search pair or trio data for all articles, or for curated sub-collections representing the topical categories ‘Data’, ‘Citizenship’, ‘Economy’, ‘Communication’, ‘Governance’, ‘Communities’, ‘Health’. Data represents trios for the 200 highest-ranked pair co-occurrences in the data, containing noun lemmas occurring at least 50 times in the whole dataset.  

The interface queries 62,000 rows of data.

3. Militarisation 2.0

This interface provides access to LCM trio data for approximately 1 billion YouTube comments, across tens of thousands of threads, relating to military video games and the arms and military service industries. Data represents trios around a manually curated wordlist related to core research themes including the military, gender, and nationality; and evaluative adjectives.

The interface is currently restricted to research project members but will become publicly accessible as soon as possible.

4. OED Concept Interface

This interface is being used by lexicographers at the Oxford English Dictionary to explore divisions of senses and sub-senses in word definitions, and is restricted to the lexicography team. It contains LCM trio data for EEBO-TCP. Users can search and rank trios around specific headwords under review by OED lexicographers. Trios contain all nouns, adjectives, and (non-modal) verbs occurring at least two times in EEBO-TCP.

Access to the interface is restricted to OED editors.

5. BBC Radio News Scripts

The BBC is exploring this data to inform new approaches to semantic discovery of natural language data in their archives. The data is currently restricted to research project members. It contains LCM trio data for 180,000 transcriptions of BBC radio news scripts from 1940 to 1990.