Page:Wikipedia and Academic Libraries.djvu/291

From Wikisource
Jump to navigation Jump to search
This page has been validated.
278
Gavin Willshaw

with Wikimedia, led to the formation of a Wikimedia Community of Interest, and resulted in the embedding of Wikimedia activity in staff work.

Keywords

Wikisource, Crowdsourcing, Scottish chapbooks, National Library of Scotland, Sta engagement, Digital skills.


Introduction

Like many cultural heritage organizations, the National Library of Scotland faces a significant challenge when digitizing texts: how to efficiently generate accurate transcriptions that meet users’ needs, not just for search and retrieval but also for computational analysis using text and data mining (Europeana Pro, 2019). The Library runs typed and printed text through OCR software to generate transcriptions automatically and makes these available online alongside digital images on its Digital Gallery (National Library of Scotland, 2020a). Unfortunately, these often contain spelling mistakes and errors as the software struggles to deal with issues such as faint text, hyphenation, and archaic letters including the long-s (ſ) (Alex, 2012). Such issues require human intervention to correct but the Library lacks the staff resource to undertake this work. One area that the Library has been interested in exploring is whether corrections could be crowdsourced using Wikisource, Wikimedia’s online library of out of copyright, digitized books. When a book is added to Wikisource, a community of thousands of editors work together to improve transcriptions using the platform’s in-built error correction module and then publish the book on Wikisource (“Wikisource,” 2020). Recent developments in functionality mean that the completed books can be exported not only in PDF or ePUB format but also as TXT files (“Wikisource:WSexport,” 2020). The Library wanted to explore whether out of copyright, digitized books from its collections could be uploaded to Wikisource, where transcriptions would be improved in collaboration with the Wikisource community and then reimported back into the Library’s image repository to improve the quality of search on the Digital Gallery.