Page:Crowdsourcing and Open Access.djvu/26

From Wikisource
Jump to navigation Jump to search
This page has been validated.
616
SANTA CLARA COMPUTER & HIGH TECH. L.J.
[Vol. 26

volume as a whole—such as the title, author, publisher, year of publication, and possibly a table of contents.[1] The volume’s Index page also includes links to each individual page contained within the volume. Each page link is color-coded using a standard schema that applies site-wide and reflects, in essence, the level of confidence of the project’s users that the text reproduced at that link accurately reflects the content of the corresponding scanned page. Thus, the Index page reveals at a glance how much progress the site’s users have made towards finalizing the proofreading and correction of the work. The color codes used on the site are:

  • Red (“Not Proofread”): Signifies that the linked page contains text, but no user of the site has checked the text for accuracy. This color code is typically applied where the text included on the linked page consists entirely of the raw output of OCR software.[2]
  • Yellow (“Proofread”): Signifies that one user of the site has proofread and corrected the linked text so that it matches the content and formatting of the corresponding scanned page image.[3]
  • Green (“Validated”): Signifies that two or more users of the site have proofread and corrected the text of the linked page. This is the highest rating of page quality available on Wikisource.[4]

In addition, there are three further color codes used on the site that convey additional information about the status of the corresponding linked page:

  • Purple (“Problematic”): Signifies that the text on the linked page does not match the scanned original due to an error in the scanned image (such as a blurry or misaligned page), or because the content

  1. See, e.g., Index:Le Morte d’Arthur—Volume 1, http://en.wikisource.org/wiki/Index:Le_Morte_d%27Arthur_-_Volume_1.djvu (last visited Feb. 10, 2010). Where a single work is originally published in multiple separately bound volumes, it is common for each volume’s Index page to include links to the Index pages of the other volumes in the series to aid navigation. See id. The “djvu” suffix refers to a common file format optimized for storing scanned images. See DjVu, http://en.wikipedia.org/wiki/DjVu (last visited Feb. 10, 2010).
  2. See Help:Page Status, at http://en.wikisource.org/wiki/Help:Page_Status (last visited Feb. 10, 2010).
  3. See id.
  4. See id. At the time of this writing, the number of pages that had reached each quality tier on the English Wikisource were: Not Proofread, 252,667; Proofread, 36,582; Validated, 15,190. The author of the present Essay is partly to blame for the predominance of pages consisting entirely of raw OCR output, having personally uploaded some 70,000 such pages to the site using automated scripts. See infra notes 143–148 and accompanying text (discussing project to host the United States Statutes at Large on Wikisource).