Tag Archives: optical character recognition

Chart showing frequency of stem "transl-" in ECCO OCR as % of TCP.

Experimenting with the imperfect: ECCO & OCR

When the Linguistic DNA project was first conceived, we aimed to incorporate more than 200 000 items from Eighteenth Century Collections Online (ECCO). Comparing findings for one portion of ECCO that has been digitised in different ways, this 2016 blogpost details why that ambition proved impractical. The public database uses ECCO-TCP as its main eighteenth-century source. Continue reading