Our current project employs the LDNA (Linguistic DNA) demonstrator to investigate historical texts, aiming to uncover how ideas evolved in Early Modern English. By analysing ‘concept constellations’ which are clusters of the quads (sets of four highly frequent common co-occurring lemmas, including a central node lemma), we gain insights into how groups of words represented broader themes and evolving concepts over time.
Understanding Quads and Lemmas in Concept Constellations
Two essential building blocks in our analysis are ‘quads’ and ‘lemmas.’ Quads are sets of frequently co-occurring lemmas that appear within a proximity of 100 words in a text, suggesting a thematic or conceptual link, while lemmas represent the root forms of words. Examining the relationships between quads and lemmas enables us to map out how key ideas, centring on lemmas such as ‘church,’ ‘heaven,’ and ‘earth,’ were expressed and discussed across historical texts.
As an example, we’re using the concept constellation centred on the node lemma ‘church’:
- The constellation for ‘church’ is notably large, comprising 276 unique quads, each involving three other related words.
- Altogether, this constellation includes 263 distinct lemmas that form various quads around the node lemma ‘church.’
Calculating Metrics of Quads and Lemmas
We’re employing several metrics to analyse patterns within these constellations. By measuring frequency, co-occurrence, and variability, we can identify concepts that were stable or prominent in historical discourse and others that were more complex and diverse. This analytical approach lays the groundwork for visualising these relationships and coding them against established historical linguistic frameworks.
In this study, we’ve reviewed 34 constellations of quads generated by the LDNA demonstrator. The metrics highlight:
- The size of each constellation and the number of lemmas within each one,
- The density and complexity of these constellations, revealing both the conceptual richness and how densely interconnected certain ideas were.
The metrics provide insights, such as:
- The constellation for ‘church’ is not only large but also contains the most variable lemmas, suggesting that this concept is particularly complex or discursive as it occurs in discourses.
- In contrast, while the constellation for ‘king’ is substantial (consisting of 116 quads), it contains only 27 unique lemmas. This may suggest a simpler or less varied concept, with a more consistent discourse surrounding it.
In the next blog entry, we’ll move beyond these metrics to explain how visualisations and thematic coding further enhance our understanding of concept patterns. In doing so, we’ll continue building a fuller picture of these historical constellations, bringing the past to life in new ways.