Concept Modelling Interfaces

1. Concept Modelling Demonstrator (EEBO-TCP)

On this site you will find concept models built from 1,000 of the most frequently occurring nouns, adjectives, and verbs in EEBO-TCP (the first word in the search interface). The second, third, and fourth words of each quad are restricted to nouns, adjectives, and verbs that occur at least 5,000 times in EEBO-TCP (but systematically excluding some high frequency words). The second, third, and fourth words of each quad occur within 50 words (tokens) to the left or right of the first word. We exclude quads that do not pass a Pearson’s Chi-Square test threshold of 2.706 (p<0.05).

Please note: The interface is currently being updated with new concept models being added each date.

2. Ways of Being in the Digital Age

The WoBDA interface presents LCM trio data for 1900 social science journal articles from 1968 to 2017. Users can search pair or trio data for all articles, or for curated sub-collections representing the topical categories ‘Data’, ‘Citizenship’, ‘Economy’, ‘Communication’, ‘Governance’, ‘Communities’, ‘Health’. Data represents trios for the 200 highest-ranked pair co-occurrences in the data, containing noun lemmas occurring at least 50 times in the whole dataset.  

The interface queries 62,000 rows of data.

3. Militarisation 2.0

This interface provides access to LCM trio data for approximately 1 billion YouTube comments, across tens of thousands of threads, relating to military video games and the arms and military service industries. Data represents trios around a manually curated wordlist related to core research themes including the military, gender, and nationality; and evaluative adjectives.

The interface is currently restricted to research project members but will become publicly accessible as soon as possible.

4. OED Concept Interface

This interface is being used by lexicographers at the Oxford English Dictionary to explore divisions of senses and sub-senses in word definitions, and is restricted to the lexicography team. It contains LCM trio data for EEBO-TCP. Users can search and rank trios around specific headwords under review by OED lexicographers. Trios contain all nouns, adjectives, and (non-modal) verbs occurring at least two times in EEBO-TCP.

Access to the interface is restricted to OED editors.

5. BBC Radio News Scripts

The BBC is exploring this data to inform new approaches to semantic discovery of natural language data in their archives. The data is currently restricted to research project members. It contains LCM trio data for 180,000 transcriptions of BBC radio news scripts from 1940 to 1990.