Wikisource:Scan Lab

From Wikisource
(Redirected from Wikisource:Scan lab)
Jump to navigation Jump to search
Scan Lab

Shortcut:
WS:LAB

A central resource for assistance with creation, downloading, uploading, processing and other operations on scans of texts.

Times have changed, but it still can be hard to put 600 pages in the right order!
Instructions

If you need help with a scan, add your request in the relevant section below as a new sub-section. If you can, include all the details someone will need to work on the request without further questioning. You can use {{ping project|Scan Lab}} to send an immediate notification to all subscribed Scan Lab members. Once you have been answered, ping only that user when you reply with {{re|Their username}} (do not ping the whole project on every comment).

If your request has been completed, you should acknowledge that your issue is resolved and close the section with {{section resolved|1=~~~~}}.

Participants

[edit]

Add your name to Module:Mass notification/groups/Scan Lab to be notified via {{ping project|Scan Lab}}. Also add your name below with details of any particular tasks you can help with.

Participant Can help with Instructions
Inductiveload
  • General scan tasks: scraping/download, batch uploads, scan repair
  • Splitting/combining scan images/photos from a scanner or camera into scan file (with ScanTailor)
Xover
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)
Mpaa
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)

Requests for downloading scans

[edit]
Instructions

If you would like scans that already exist online to be transferred to Wikisource, leave a message here. This includes batch transfers from the Internet or Hathi Trust for multi-volume works. Please include necessary bibliographic information so that scans can be uploaded to Commons with proper information and license templates. Author, country, and date of first publication. A suggested file name on Commons can also be helpful.

Jane Austen Juvenilia Volume 2 and 3

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) The scans of the manuscripts of Austen's Juvenelia are available on here and here. They're both in the PD, but I have absolutely no clue as how to download them. The images are higher resolution than the ones on the BL website, but they're in the zoomify flash format. Languageseeker (talk) 02:58, 2 February 2022 (UTC)[reply]

Mooresville, Indiana High School yearbooks, 1914–1930

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) These scans exist in the form of galleries on the Mooresville High School Alumni Association's Facebook page, and extracting them by hand is tedious enough that I'm hoping someone can do it with a bot. The procedure I have in mind is:

Thanks! —CalendulaAsteraceae (talkcontribs) 01:59, 22 September 2023 (UTC)[reply]

Penny Cyclopedia volumes 1 to 27

[edit]

The IA scans currently linked on the page are unusable (blank pages where there should be content), so I checked HathiTrust ([here's the search I used]). There are four complete sets of scans attached to [this record] (ignoring the supplements for now), but I'm not sure at the moment which ones would be the best to import. Arcorann (talk) 02:14, 24 December 2023 (UTC)[reply]

I've found pretty good scans of volumes 4 and 24 which are already on Commons, and I've added the links to the Penny Cyclopedia page. I don't have a Hathi Trust account, so I can't help you there. Ciridae (talk) 05:21, 27 December 2023 (UTC)[reply]

Journal of the Optical Society of America

[edit]

Volumes 1-40 of this fairly esteemed journal are out of copyright. Vol. 30, issue 12 and Vol. 33, issue 7 are here already, but there are *a lot* that are not here: https://archive.org/details/pub_optical-society-of-america-journal If you upload them, I can tidy the pile up at commons and get them ready to go here. For copyright concerns: https://onlinebooks.library.upenn.edu/webbin/cinfo/jopticalsocamerica --RaboKarbakian (talk) 20:43, 8 February 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) would someone with access please download the scan from HathiTrust? Thanks! —Beleg Âlt BT (talk) 17:34, 22 April 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) would someone with access please download the scan from HathiTrust? Thanks! —Beleg Âlt BT (talk) 16:31, 6 May 2024 (UTC)[reply]

@Beleg Tâl: -- c:File:The Coming of Cassidy and the Others - Clarence E. Mulford.pdf -- Hrishikes (talk) 14:20, 4 July 2024 (UTC)[reply]

Per this discussion, I'd like to replace the current PDF scan with a DJVU scan. However, both IA-Upload and Any2Djvu are stalling out on me, and pdf2djvu.com gave pretty shoddy results. Could someone please upload this scan as a new version of File:Dictionary of Hymnology 1908.djvu? Thanks!

The IAA-Upload failure looks to be: https://phabricator.wikimedia.org/T215647 caused by the number of pages. MarkLSteadman (talk) 03:34, 4 July 2024 (UTC)[reply]
@Beleg Tâl: -- c:File:A Dictionary of Hymnology - John Julian.djvu -- Hrishikes (talk) 16:38, 5 July 2024 (UTC)[reply]
@Hrishikes you're the best!! —Beleg Tâl (talk) 17:59, 5 July 2024 (UTC)[reply]

Finding scans

[edit]
Instructions

Requests for locating scans for existing works at Wikisource, or works you wish to add yourself but cannot find scans for. For general text requests, see Wikisource:Requested texts.

The Criterion Volume 2 and 3

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Would it be possible to locate Volumes 2 and 3 of The Criterion? I'm especially trying to complete The Woman Who Rode Away that began in Volume 3. Languageseeker (talk) 18:36, 23 December 2022 (UTC)[reply]

Scan repair

[edit]
Instructions

Request repair work on existing scans here.

When requesting page insertion, rearrangement or deletion, always include the page numbers (as marked on the pages) as well as the position of the page within the scan file. This makes it much easier for the repairing user to locate the defect in the file and fix it, as well as allowing a double-check against mistakes.

Please do not use this page to request repairs on works that you don’t really care about: the backlog at Category:Index - File to fix is a known backlog. If you want to help with those, you can add {{missing pages}} to those indexes if they do not already have it, along with details of the missing pages.

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) This scan is missing two pages (xxvi–xxvii). Also, it would be nice if the images for this volume and the second volume could be regenerated, as they are of quite poor quality. TE(æ)A,ea. (talk) 22:21, 3 December 2023 (UTC)[reply]

@TE(æ)A,ea.: Done (missing pages xxvi–xxvii). Existing text moved. M-le-mot-dit (talk) 15:10, 25 October 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Pages 482 and 483 of this volume were missing in the original scan; pageholders have been introduced, so all that is necessary is the replacement. That replacement can come from Index:Alumnioxonienses02univ.pdf, which exists solely for the purpose of supplying that gap. So, the missing pages from the PDF should be added in over the pageholders from the DJVU; the transclusion fixed; and the PDF deleted. TE(æ)A,ea. (talk) 23:46, 3 December 2023 (UTC)[reply]

Not sure I follow, pages 482 and 483 (djvu/99 and djvu/100) seem to be legit images and the 2 missing pages should be inserted between djvu/100 and djvu/101. Or ...? Mpaa (talk) 18:09, 4 December 2023 (UTC)[reply]

This file claims to be Volume 135 and is residing in the list of volumes as Volume 135 but it is actually Volume 136, probably (but not verified) a duplicate of Index:The Atlantic Monthly Volume 135.djvu. Can the file be replaced with https://babel.hathitrust.org/cgi/pt?id=uc1.32106019602660 ?--RaboKarbakian (talk) 15:47, 29 March 2024 (UTC)[reply]

Also, while you are at it:
  1. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030146099 Vol. 139
  2. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030146081 Vol. 140
  3. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030145968 Vol. 141
  4. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030145745 Vol. 142
--RaboKarbakian (talk)


File was renamed at Commons, and needs re-aligning.

https://en.wikisource.org/w/index.php?search=intitle%3A%2FA+dictionary+of+the+language+of+Mota.djvu%2F&title=Special:Search&profile=advanced&fulltext=1&ns0=1&ns100=1&ns102=1&ns104=1&ns106=1&ns114=1 ShakespeareFan00 (talk) 17:40, 1 May 2024 (UTC)[reply]


Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) A bit of a different one this time. This work contains several copyrighted images that need to be blanked out in the scan. The affected pages are listed here: Index talk:Sm all cc.pdf#Possible copyright violation. —Beleg Tâl (talk) 15:17, 14 May 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) The pages here are offset when they are loaded. The PDF is correct, the text layer is correct, and if you call OCR the right pages are referenced; however, the wrong pages show us visually. I don’t where this problem originates. TE(æ)A,ea. (talk) 00:16, 16 June 2024 (UTC)[reply]

@TE(æ)A,ea. it seems ok to me. Mpaa (talk) 20:15, 6 July 2024 (UTC)[reply]

This scan is in the Monthly Challenge, but is missing the images facing pages 16 and 304. Can those images be found and inserted (and black verso) into the correct locations in the scan? There is a list of illustrations beginning on this page. --EncycloPetey (talk) 16:46, 14 August 2024 (UTC)[reply]

@EncycloPetey: Done. 2 images inserted (without text layout). Pages after djvu 77 should be moved or deleted.--M-le-mot-dit (talk) 12:41, 28 August 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) The OCR layer is offset from the pages, but not in a consistent way.

  • Pages 27-65, OCR off by one page
  • Pages 66-123, OCR off by two pages
  • Pages 124+, OCR off by three pages

(Page numbers refer to the scan page, not the book page. OCR is shifted forwards, toward the front of the book.) —Beleg Tâl (talk) 14:43, 23 September 2024 (UTC)[reply]

@Beleg Tâl: Done M-le-mot-dit (talk) 17:34, 23 September 2024 (UTC)[reply]


I have yet to create to create an index for this file, as I did not realise that the front page was Google's digital scan statement (the page images weren't all showing up properly on Internet Archive, before I uploaded to Commons). Could you please delete the first page of the djvu?

Thanks, TeysaKarlov (talk) 23:49, 26 October 2024 (UTC)[reply]

@TeysaKarlov: Done. Content replaced by a pdf conversion (from the same source). Original djvu file has a poor definition and many images are missing. // M-le-mot-dit (talk) 10:04, 27 October 2024 (UTC)[reply]
@M-le-mot-dit Thanks for the improvements, and quick turnaround! Regards, TeysaKarlov (talk) 19:18, 27 October 2024 (UTC)[reply]

This file has a lot of pages missing, see Index:The Best continental short stories of and the yearbook of the continental short story 1924-25.pdf. There is a better scan now available at https://babel.hathitrust.org/cgi/pt?id=uc1.b3123528 but the sequenced pages 264 to 317 of the scan are duplicated and need to be removed. --Jan Kameníček (talk) 11:51, 30 October 2024 (UTC)[reply]

@Jan.Kamenicek: Done // M-le-mot-dit (talk) 16:38, 30 October 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) The current scans for The Story of My Experiments with Truth/Volume 1 is missing pages (without placeholders) and duplicates others. I've uploaded a corrected file here: File:Gandhi, 1927, The Story of My Experiments With Truth, Vol 1.pdf. I need assistance in moving the current project over to the new scans while keeping the already proofread pages. Thanks! — Qx3Jw (talk) 14:40, 30 October 2024 (UTC)[reply]

See also

[edit]
  • Commons:Graphic Lab at Wikimedia Commons - they can help with general image problems
  • Image extraction - guidance for extracting images from scans
  • Requested texts - general text requests. Many of these also need scans to be located.
  • Category:Index - File to fix - contains indexes that have various defects. Please do add templates like {{missing pages}} if needed to indicate what the problems are, but please do not bring the files here unless you would like it fixed to allow work in the near future.