Tag Archives: LDNA

Talk About Change: LDNA at Festival of the Mind

Last weekend, Linguistic DNA & friends took over the Spiegeltent in Sheffield city centre, as part of the University’s Festival of the Mind. Spiegeltents are a Belgian invention–tents decorated internally with mirrors, creating the perfect space to share myriad reflections.

Over the course of two hours, we hosted a performance of new writing that emerged from collaboration with Our Mel (a Sheffield-based social enterprise dedicated to exploring cultural identity) and novelist Désirée Reynolds. Each of the pieces performed have also been published as part of a limited edition anthology: “Talk About Change: Writing as Resistance”.

The Researchers’ Introduction outlines a little more of the process that culminated in some extraordinary writing (excerpted from the print anthology):

Talk About Change: Writing as Resistance

Funded by the University of Sheffield’s Festival of the Mind, our collaborative workshops used examples of early modern word use (from the Linguistic DNA project and related research) as a starting point to think about language use today. How can the past speak to the present? How might the present speak to the past

As reflected in the structure of this anthology, the workshops explored four central themes: diversity, feminism, immigration and race. These were selected by Annalisa and Désirée, who also provided the extra focus on “writing as resistance”. In each case, the Linguistic DNA researchers sought to introduce historic material that might prompt conversation about the themes—and perhaps even fuel the resistance. Some input drew on prior research (especially for feminism and immigration sessions, which drew on Iona’s thesis and engaged also with the 500 Reformations project). As often, it was a basic excursion into early modern material—with a beginners’ introduction to linguistics and studying meaning (courtesy of Seth)

The most inventive work happened when we brought this material into the open sessions

Together with all who attended the workshops, we compared the role of diversity in historic texts to its position in modern culture: what once characterised a multiplicity of opinion is now used paradoxically of something individual. We considered aspects of feminist debate before the word feminism existed, exploring how the power of virtue changed as men (mostly) discussed the role of women in sixteenth-century England. Using texts about strangers, we examined parallels between the way people wrote (and complained) about early modern outsiders and modern discourse about immigrants. We reflected on the roots of race, its links to kinship, descent, and community and the relationship between structures of language and structures of power.

In each session, novelist and creative-writing facilitator Désirée Reynolds recommended other writings to bring out different dimensions of the themes. Wide reading was encouraged, and what you will find in the pages that follow reflects the careful crafting of a range of experience and inspiration drawing on at least five centuries of language use.

It is Writing as Resistance.

It comes from Talking About Change.

If you would like a copy of the anthology (free!), you can register interest (first come, first served) by filling out a short Google form.

(You can also read some words from the Editor, over on the 500 Reformations website.)

Linguistic DNA at SRS 2018: Abstracts

Knowledge, truth and expertise: experiments with Early English Books Online

Wondering what Linguistic DNA is bringing to the Society for Renaissance Studies? Here are the abstracts for two panels of papers, and information about our hands-on demonstration session (drop in).

United by a common interest in data-driven approaches to meaning and a focus on the transcribed portions of Early English Books Online (EEBO-TCP), this interdisciplinary panel brings together new research from the Linguistic DNA project and the Cambridge Concept Lab.

What is EEBO anyway? Contextual study of a universe in print
Iona Hine and Susan Fitzmaurice (University of Sheffield)

Since 2015, the Linguistic DNA team has been developing methods for mapping meaning and change-in-meaning in Early Modern English. Our work begins with the hypothesis that meanings are not equivalent with words, and can be invoked in many different ways. For example, when Early Modern writers discuss processes of democracy, there is no guarantee they will also employ a keyword such as democracy. We adopt a data-driven approach, using measures of frequency and proximity to track associations between words in texts over time. Strong patterns of co-occurrence between words allow us to build groups of words that collectively represent meanings-in-context (textual and historical). We term these groups “discursive concepts”.

The task of modelling discursive concepts in textual data has been absorbing and challenging, both theoretically and practically. Our main dataset, transcriptions of texts from Early English Books Online (EEBO-TCP), contains more than 50 000 texts. These include 9000 single-page broadsheets and 162 volumes that span more than 1000 pages. There are 127 items printed pre-1500, and nearly 7000 from the 1690s. The process of analysis therefore requires us to think carefully about how best to control and report on this variation in data distribution.

One particular question that has arisen affects all who attempt to use EEBO: what is in it? To what extent is its material from pre-1500 similar in kind (genre, immediacy, etc.) to that of the messy 1550s (as the English throne shifted speedily between Edward VI and his siblings), the 1610s (era of Shakespeare and the King James Version), or the 1640s (when Civil War raged)? This paper is a sustained reflection on attempts to find out “What’s in EEBO?”

In the beginning was the word?
EEBO-TCP and another universe of meaning
Seth Mehl (University of Sheffield)

When a new idea is conceived, how does it find expression in language? Between 1450 and 1750, the English lexicon expanded dramatically, and literary scholars, philologists, linguists, and historians have sought to document and demonstrate the paths taken by key social and cultural vocabulary, charting the history of what would become key social and cultural ideas, discourses, and concepts. In such cases, the topic and language for investigation has been intuited on the basis of extended qualitative reading, and the objects of investigation tend to be individual words. With the advent of a searchable database of early modern texts, such intuitions can be tested at scale, and the initial object of inquiry can shift from individual words to relationships between sets of words.

What happens when we invert the traditional process, taking the thousands of texts digitised in EEBO-TCP and applying computational techniques to model language change independent of human intuition? Can such techniques indicate meaningful relationships between key words that human researchers had not intuited or observed? To what extent do observations founded on over 1 billion words of early modern English correspond to and diverge from what scholarly readers have already inferred? Is it possible to identify discourses around key ideas even when the apparently related key words are absent? Combining insights from the Keywords Project with tools developed by the Linguistic DNA project, this paper will explore how concept modelling can be applied to re-examine meaning in early modern texts.

Beyond Power Steering:
re-constituting structures of knowledge in 17th-century texts
John Regan (University of Cambridge)

One of the axioms of the Cambridge Concept Lab is that digital means of enquiry should provide qualitatively new kinds of knowledge, if we are to realise their full value. This is to say, that computation should not merely provide ‘power steering for the humanities’, but allow one to discover something different in kind about how knowledge was structured in the past.

Making good on this axiom necessitates judgements on the part of the user of digital technology about how to design one’s modes of address to (for example) natural language data sets such as Early English Books Online- TCP, in order that one is not only adding ‘power steering’ to existing, familiar types of enquiry. It also necessitates making decisions about when to come to rest at results (that is, when to cease enquiry); judgements of where digital data can be said to be producing discrete and unfamiliar forms of knowledge.

This paper will present tentative first signs of what the Cambridge Concept Lab believe are historically-discrete conceptual structures, based on data from the early seventeenth-century portion of EEBO-TCP. Two such structures will be described, one entitled ‘Mutual Dependence’, the other ‘Self-Consistency’. As will be shown, familiar forms of knowledge that are held and expressed in sentences and paragraphs, organised by grammar and understood by readers largely as explicit sense, may be contrasted with this evidence of qualitatively different conceptual structures in the textual record. While this paper does not set out to debunk existing theories of the structuration of knowledge and its transmission in the seventeenth century as have become established through centuries of close reading, it does seek to enrich our understanding of these traditions by attending to conceptual, and not exclusively semantic, thematic or rhetorical, structures.

It appears uncontroversial to assert that concepts are determining with regard to features of language use such as explicit and implicit semantic fields, theme, word order, and syntactic relations at the level of the sentence. Nevertheless, recognising that concepts have lexical and semantic extension is not the same as accepting that the two are identical in kind. This paper’s claims about conceptual structure will be based upon evidence from the early decades of seventeenth-century data from EEBO-TCP.

Our afternoon panel is a little depleted (by ill-health) but features Jose M. Cree (Sheffield) on Neologisms and the English reformation, Lucas van der Deijl (Amsterdam) on The collaborative Dutch translations of Descartes by Jan Hendrik Glazemaker (1620-1682), and a little extra time for discussion.

DROP-IN SESSION

All SRS delegates are very welcome to drop in to our demo workshop, where we will be providing a 10-15-minute introduction to our tools (3:30pm, repeated at 4:30pm) and the opportunity for hands-on experimentation. This is in the Hicks Building, Floor G, room 29. (About 2 minutes walk from Jessop West, across the main road and a little uphill. Directions.)

Snapshot from campus map, featuring the Hicks Building.

Translation, Gender, Sexuality: a report from Genealogies of Knowledge 2017

In December 2017, Sheffield MA student Nathaniel Dziura attended part of the Genealogies of Knowledge conference in Manchester. While the LDNA team were exchanging conceptual insights with other data-driven scholars, Nathaniel participated in sessions connected to a different field of interest. He writes:

As a member of the LGBTQ+ community, I am keen to contribute to research on how social factors impact language use, particularly gender and sexuality. As a second-generation Polish immigrant, raised with influence from both Polish and English culture, I am also very interested in the effect cultural background can have on the production of linguistic features.

Next year, I hope to start a PhD focused on this interplay between social and linguistic elements. Schumann (1978) suggested that the degree of ‘acculturation’ influences use of non-standard variants in second language learners. In other words, if the speaker is more immersed in the culture of their second language, they will be more likely to acquire native speaker-like linguistic variation. However, previous studies have not considered how other social factors such as sexuality might affect which features are acquired. This is despite previous studies having shown certain linguistic features to be cross-culturally associated with LGBTQ+ membership. These features include fronted-/s/ (Levon, 2006; Pharao et al., 2014) – colloquially stereotyped as the ‘gay lisp’ – and creaky-voice (Zimman, 2013: 3) – speaking with a low elongated ‘creak’, like a stereotypical ‘valley girl’. LGBTQ+ people do not inherently use these features, but they can play an important part in interaction (Barrett, 2017: 9).

I want to help fill this gap in the research by investigating how sexuality might affect the linguistic variants acquired in English by second language speakers (specifically, Polish migrants to England). I will examine whether the use of these features differs depending on two variables: the level of integration into British culture. And the level of involvement with the LGBTQ+ community.

This was the project I had in mind as I headed to Manchester for the conference. I was rewarded by an excellent thematic session on ‘Translation, Gender, Sexuality’.

I found Przemysław Uściński and Agnieszka Pantuchowicz’s presentations to be pertinent and insightful. Uściński’s talk focused on the downfalls with approaching Queer Theory in Poland from a ‘Western perspective’. The political environments in Poland and England have differed historically, and continue to do so. Uściński argues that ‘LGBT emancipation’ has not yet occurred in Poland. Critical theorisations of gender are intentionally scarce in Polish academic discourse. The reception of Queer Theory in academia has been comparatively belated, and has sometimes discredited the LGBTQ+ movement. British society has its share of problems with LGBTQ+-phobia. Yet, Poland has seen much far-right and religious rejection of the LGBTQ+ community. These groups have dismissed LGBTQ+ identities as ‘Western secular propaganda’ and ‘gender ideology’. So, English translations of concepts within Queer Theory, which are gradually being introduced to Polish academic works, reflect English notions and societal progress. Even when concepts from Queer Theory enter Polish, there is no possibility for their dissemination within Polish society. Queer Theory tends to be viewed as a ‘foreign’ and subversive concept. A theoretical importation into Polish from English, and not one congruous with Polish culture.

In another paper, Pauline Henry-Tierney noted that misinterpretations in translation of Beauvoir’s ‘Mauvaise Foi’ have slowed academic progress on the subject. Taking this into account, perhaps misinterpretations of Queer Theory as a ‘foreign’ concept to Poland are hindering the normalisation of LGBTQ+ concepts and perpetuate their perception as something radical and provocative.

This thematic session highlighted that introducing concepts into a language through translation can be a step towards spreading those ideas within another culture. However, this alone might not be enough to achieve society’s understanding and acceptance of those concepts. The translation of Queer Theory between cultures was not an issue I had previously considered. This thematic session reinforced that the political and social environments in Polish and English culture exhibit stark differences. This is significant within the framework of acculturation: LGBTQ+ community membership is arguably more accepted in British culture, and consequently so are associated non-standard language features. So one might predict that LGBTQ+ Polish migrants to England who become more British-acculturated are more likely to produce non-standard features associated with LGBTQ+-community membership than those who are less British-acculturated.

Overall, I was able to interact with academics from areas such as translation studies and politics with whom I would not otherwise be able to network. I am very grateful to the Linguistic DNA team for inviting me to attend the conference. The insights it has given me will be useful in my academic pursuits!

Featured image:
Jaap Verheul (Utrecht) presents an example from ShiCo research at the Genealogies of Knowledge conference, 8 December. Photo (c) I.C. Hine.

References:

Barrett, R. (2017) From Drag Queens to Leathermen: Language, Gender, and Gay Male Subcultures (Studies in Language Gender and Sexuality) Oxford: Oxford University Press

Henry-Tierney, P. (2017) ‘Translating in ‘Bad Faith’? Articulations of Beauvoir’s ‘Mauvaise Foi’ in English’, Genealogies of Knowledge I: Translating Political and Scientific Thought across Time and Space, Manchester: University of Manchester

Levon, E. (2006) ‘HEARING “GAY”: PROSODY, INTERPRETATION, AND THE AFFECTIVE JUDGMENTS OF MEN’S SPEECH’ American Speech 81 (1): 56–78

Pantuchowicz, A. (2017) ‘Translation and the Failure of Gender Mainstreaming in Poland’ Genealogies of Knowledge I: Translating Political and Scientific Thought across Time and Space, Manchester: University of Manchester

Pharao, N., M. Maegaard, J. S. Møller & T. Kristiansen (2014) ‘Indexical meanings of [s] among Copenhagen youth: Social perception of a phonetic variant in different prosodic contexts’ Language in Society 43, 1–31

Schumann, J. H. (1986). Research on the acculturation model for second language acquisition. Journal of Multilingual and Multicultural Development, 7, 379-392

Uściński, P. (2017) ‘Thinking Sexuality/Translating Politics: Queerness in(to) Polish’ Genealogies of Knowledge I: Translating Political and Scientific Thought across Time and Space, Manchester: University of Manchester

Zimman, L. (2013) ‘Hegemonic masculinity and the variability of gay-sounding speech: The perceived sexuality of transgender men’ Journal of Language & Sexuality 2 (1): 1-39

Seth and Iona present a joint paper with LDNA data at Genealogies of Knowledge. Photos (c) Japp Verheul.

Quantity and quality: lessons from an MA work placement

Sheffield MA student Nadia Filippi reflects on her experience after 100 hours with the Linguistic DNA team at DHI | Sheffield:

As part of my MA studies in English Language and Linguistics, I had the opportunity to undertake a work placement of 100 hours at the University of Sheffield’s Digital Humanities Institute. The placement offered a good overview on the typical tasks and responsibilities of a researcher and was an excellent choice for me because I am interested in doing research and I am considering going onto PhD research.

When registering for the placement module, I only had basic knowledge of corpus linguistics. I was accustomed to qualitative research but wanted to discover quantitative methodologies and the possibilities that quantitative research can offer. Starting my placement, I was at a stage in my studies in which I was still looking for definite answers to all my questions about research. Moreover, I respected everything to do with numbers, but the idea of actually ‘doing statistics’ made me nervous. I consciously chose a placement to force myself out of my qualitative comfort zone.

My concerns resolved themselves during the placement. I had to familiarise myself with and use statistical software packages like SPSS and lost my initial fear. I began to understand how statistics could be used effectively to discuss questions and find information that qualitative research could not do in timely manner. For example, finding out which words frequently co-occur in a large dataset. Furthermore, I came to understand that doing research does not exclusively mean to narrowly focus on finding a clear answer to an initial research question. It is often more about refining the question, developing another one and accepting that there can be more than one right answer to it.

The power of the Digital Humanities Institute lies in quantitative analysis, engaging with statistical distribution, auditing datasets and computational methods. Yet, there is still qualitative work to do. For instance, I audited and reported on qualities of the YouTube dataset, wrote summaries of previous research and searched for suitable approaches or tools (e.g. a Part-Of-Speech tagger suited to social media data), by consulting published research from similar projects.

A YouTube Convert

It turned out that the placement as a whole, the experiences I made and the tasks I was given shaped my other studies. At the beginning of my placement, the Linguistic DNA team had just started providing support for the Militarization 2.0 project, in collaboration with the University of Leeds. I was immediately drawn-in by this study of YouTube gaming discussion and it ultimately gave me an idea for my MA dissertation.

I had the chance to look through some of the 6.7m YouTube comments gathered by Nick Robinson and his team at the University of Leeds, and think through how they might be analysed for concept modelling.

Screenshot showing comments on Battlefield 1 official trailer, via YouTube (15 May 2017). https://www.youtube.com/watch?v=c7nRTF2SowQ

In exploring the comments, I had to consider the characteristics of commenters’ language and reflect on the research questions. Gaming language, for example, is filled with specialist abbreviations such as “CoD:ww2”, which stands for the game Call of Duty: World at War 2. Information about nationalities (“the Germans”) and militarised language (“disabled”, “destroyed”) may also be key to answering questions about how users’ remarks connect with video content. Close reading of excerpts helps to inform how the Sheffield team respond to the main interests of the mother project Militarization 2.0: if and how social media is militarized and what effect that has on our society and the individual citizens.

By attending meetings, I gained insights into the process and decision-making in a big research project. This included, for example:

preparing big data (should we standardise the spelling of the comments or not?)
practical obstacles, such as YouTube’s technical limitations (which prevent us from retrieving all the answers to a specific comment)
deciding which variables to include (time, author, number of likes)
time and scope (how can the resources available be matched to the aims and desired outcomes of a project?)

Knowing the kinds of challenges that such a project can face was helpful in planning my dissertation, which I will be writing over the summer. Prompted by the DHI’s YouTube work, my research will discuss the kind of language generated by exposure to military video game trailers and investigate if there is a difference between the language produced online and offline. In undertaking this research, I will work with my own corpus of YouTube comments as well as with focus groups. The qualitative aspect of my dissertation will allow me to explicitly address and discuss the violence in these game trailers within my focus groups.

Overall, the work placement has been one of the most valuable and enjoyable modules of my MA. I developed many new skills, academically as well as personally. I am more confident about quantitative approaches and numbers, as well as the importance of humanities research as a whole.

Top image shows Sheffield MA student Nadia Filippi at the Linguistic DNA and Militarization 2.0 stand at the 2017 Festival of Arts & Humanities Showcase, Sheffield. The showcase was “a fantastic opportunity to open a dialogue about humanities research and its impact with the public”.

Showcasing Linguistic DNA

On Saturday 11 March (2017), some of the LDNA team took part in a Showcase as part of the University of Sheffield’s Festival of the Arts and Humanities. The event took place at Sheffield’s Millennium Galleries, allowing members of the public to discover different aspects of humanities research presented through exhibitions, activities and short presentations. Visitors found information about literature or archaeological findings, had the possibility to try out different instruments or take an implicit bias test brought in by the Philosophy Department. We asked Sheffield postgraduates Nadia and Winnie to reflect on their experience preparing for and staffing a stall as part of their MA work placements.

Winnie writes:

I prepared a handout using data from Ways of Being in a Digital Age (WOBDA). The process—zooming from abstract trios extracted from a dataset to see the patterns they made in a small extract of text—was fascinating. At first I was worried about how well the concept would translate to a non-specialist audience, but then I realised that involves negative preconceptions about what a non-specialist audience is: somehow less interested or capable of critical engagement than a specialist one. I therefore decided not to “aim” anything “at” anyone, but instead tried to summarise trios in a way that made the most sense to me as a newcomer to Linguistic DNA’s methods. I chose a single pair (internet + craving), made up a colour-coded table of the items that formed trios with it, and then put this alongside highlighted examples of trios in a journal abstract.

Snapshot of a table showing associations with 'internet' and 'craving', with example text from a social sciences journal.

Illustrating patterns of association with “internet” + “craving” (from Winnie’s handout).

This turned out to be really useful because people were interested in the project from all kinds of angles, some of which changed how I thought about what LDNA does. Fiddling with data on the placement meant I’d got sidetracked in a sense into thinking about WOBDA as a technical exercise, but the Showcase helped me see the bigger picture. Visitors were intrigued by Linguistic DNA as a name; one person was interested in whether the project was making any claims about genetic hard-wiring. Another, an IT professional, was interested in the double helix visualisation on the website, and said it would make him think about his own designs. I particularly remember a conversation with an artist who was interested in researching discourses around disability. We talked about how to query corpora, which tools were available and easy to use, the advantages and disadvantages of the BNC versus the web as corpus, how the age of the BNC might affect the language it contained, and the difference between collocations and discourse concepts as shown in WOBDA. She was also interested in word clouds; the idea of extracting implicit relationships in language and making them visible seemed to be something that appealed strongly to both adults and children who stopped at the stall.

Nadia writes:

Beyoncé and crew salute military-style. Photo by Asterio Tecson.

Beyoncé Knowles performing in Central Park, July 2011. Image copyright (c) Asterio Tecson; used under creative copyright license 2.0.

To prepare for this event, I mostly focused on the YouTube data. I prepared an informative and colourful poster with prominent examples, including images of Beyoncé (left) and the video game World of Tanks, to attract visitors and to suggest that we conduct contemporary research. I also searched our data for some prominently occurring words.

Individual associations, courtesy of @ShefEnglish.

Individual associations at the LDNA stand, courtesy of @ShefEnglish on Twitter.

Because we do not yet have representative results for the Militarization 2.0 work, I often pointed to the Linguistic DNA research as the mother project of the YouTube project. The examples proved very useful since they complemented the information given on the posters. The audience was provided with representative examples from the Linguistic DNA project for them to look at and take away. Moreover, people had the chance to play with word cards and group them together according to their own individual word associations (right).

I observed that many people grouped together words with a similar meaning (such as ‘succeed’ and ‘win’), whereas others clustered together words according to very personal associations. An 8-year-old girl was fascinated by the cards, pairing ‘victory’ and ‘win’; we looked at how these words appear in our given examples and the advantages of having a computer that counts words as lemmas. One visitor told us about his aphasia and how it changed and affected his use of language, which made me realise that next to the Linguistic DNA we are researching, every person has his or her own, very personal linguistic DNA. Another visitor was inspired by the YouTube project and connected language use to social issues, such as the omnipresence of on- and offline violence, providing food for thought for all participants of the conversation.

From my point of view, the event was a success. Many visitors seized the opportunity to have a chat with us, which led to various stimulating encounters and conversations. It was intriguing to see that numerous people were willing to share personal stories and views on language and its importance. The public seemed to engage and identify with our project on many different levels, which confirms how important this kind of research is—not only for the academic community but also for the public.

Also participating in the Showcase were LDNA Research Associates Seth Mehl (below left), who delivered a bitesize talk asking “What can computers teach us about meaning in early English books?”, and Iona Hine (right, during her bitesize talk about “Luther’s Language”).

(Photos courtesy of @DHIShef and D. Clark.)