### 1. EEBO-TCP Trios

This data is used by the Early Modernity interface (see above). Each row of data represents a trio, and contains 44 columns. The columns of interest are likely to be: Column 1 (the trio); Column 2 (Lemma A); Column 3 (Lemma B); Column 4 (Lemma C); Column 35 (the frequency of the trio in EEBO-TCP); Column 43 (MI score of the trio, cf. Mehl 2019); Column 44 (Chi Square score for the trio). Other columns contain data related to parts of speech and their co-occurrence, for implementation of MI and chi-square with a grammatical baseline (cf. Mehl 2019).

**Note:** This dataset has several billion rows of data.

- You can download the data here (812 mb as a zip file; over 6 gb when uncompressed)

### 2. Newsbooks 1649

In this data, the first three columns are the three lemmas contained in the trio. Column 4 is the frequency of the trio in the dataset. Column 5 represents: given that lemma B (column 2) appears within 50 tokens to the left or right of lemma A (column 1), how many other nouns also occur within 50 tokens to the left or right of lemma A? And column 6 represents the Linguistic DNA project’s implementation of MI score (cf. Mehl 2019).

The dataset contains 119,000 rows of trio data and 607 POS-tagged texts.

- You can download the data here (7.5 mb as a zip file)

### 3. Militarisation 2.0

In this data, column 1 is the row number; column 2 is the frequency of the trio in the dataset; column 3 is lemma A; column 4 is lemma B; and column 5 is lemma C. This data is built on a curated list of lemmas related to the research themes; and evaluative adjectives.

This data is currently unavailable for download, but should be available soon.

### 4. Ways of Being in the Digital Age

In this data, the first three columns are the three lemmas contained in the trio. Column 4 is the frequency of the trio in the dataset. Column 5 represents: given that lemma B (column 2) appears within 50 tokens to the left or right of lemma A (column 1), how many other nouns also occur within 50 tokens to the left or right of lemma A? And column 6 represents the Linguistic DNA project’s implementation of MI score (cf. Mehl 2019). This data contains the 200 highest-ranked pair co-occurrences in the corpus, containing noun lemmas occurring at least 50 times in the whole dataset.

The dataset contains 62,000 rows of data.

- You can download the dataset here (12.1 mb as a zip file)