Jump to content

Wikisource:Scan Lab

From Wikisource
Scan Lab

Shortcut:
WS:LAB

A central resource for assistance with creation, downloading, uploading, processing and other operations on scans of texts.

Times have changed, but it still can be hard to put 600 pages in the right order!
Instructions

If you need help with a scan, add your request in the relevant section below as a new sub-section. If you can, include all the details someone will need to work on the request without further questioning. You can use {{ping project|Scan Lab}} to send an immediate notification to all subscribed Scan Lab members. Once you have been answered, ping only that user when you reply with {{re|Their username}} (do not ping the whole project on every comment).

If your request has been completed, you should acknowledge that your issue is resolved and close the section with {{section resolved|1=~~~~}}.

Participants

[edit]

Add your name to Module:Mass notification/groups/Scan Lab to be notified via {{ping project|Scan Lab}}. Also add your name below with details of any particular tasks you can help with.

Participant Can help with Instructions
Inductiveload
  • General scan tasks: scraping/download, batch uploads, scan repair
  • Splitting/combining scan images/photos from a scanner or camera into scan file (with ScanTailor)
Xover
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)
Mpaa
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)
Alien333
  • General scan tasks: scraping/download (including from Hathi), scan repair, manipulating DJVU & PDF

Requests for downloading scans

[edit]
Instructions

If you would like scans that already exist online to be transferred to Wikisource, leave a message here. This includes batch transfers from the Internet or Hathi Trust for multi-volume works. Please include necessary bibliographic information so that scans can be uploaded to Commons with proper information and license templates. Author, country, and date of first publication. A suggested file name on Commons can also be helpful.

Jane Austen Juvenilia Volume 2 and 3

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) The scans of the manuscripts of Austen's Juvenelia are available on here and here. They're both in the PD, but I have absolutely no clue as how to download them. The images are higher resolution than the ones on the BL website, but they're in the zoomify flash format. Languageseeker (talk) 02:58, 2 February 2022 (UTC)[reply]

Mooresville, Indiana High School yearbooks, 1914–1930

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) These scans exist in the form of galleries on the Mooresville High School Alumni Association's Facebook page, and extracting them by hand is tedious enough that I'm hoping someone can do it with a bot. The procedure I have in mind is:

Thanks! —CalendulaAsteraceae (talkcontribs) 01:59, 22 September 2023 (UTC)[reply]

Penny Cyclopedia volumes 1 to 27

[edit]

The IA scans currently linked on the page are unusable (blank pages where there should be content), so I checked HathiTrust ([here's the search I used]). There are four complete sets of scans attached to [this record] (ignoring the supplements for now), but I'm not sure at the moment which ones would be the best to import. Arcorann (talk) 02:14, 24 December 2023 (UTC)[reply]

I've found pretty good scans of volumes 4 and 24 which are already on Commons, and I've added the links to the Penny Cyclopedia page. I don't have a Hathi Trust account, so I can't help you there. Ciridae (talk) 05:21, 27 December 2023 (UTC)[reply]

Journal of the Optical Society of America

[edit]

Volumes 1-40 of this fairly esteemed journal are out of copyright. Vol. 30, issue 12 and Vol. 33, issue 7 are here already, but there are *a lot* that are not here: https://archive.org/details/pub_optical-society-of-america-journal If you upload them, I can tidy the pile up at commons and get them ready to go here. For copyright concerns: https://onlinebooks.library.upenn.edu/webbin/cinfo/jopticalsocamerica --RaboKarbakian (talk) 20:43, 8 February 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) would someone with access please download the scan from HathiTrust? Thanks! —Beleg Âlt BT (talk) 17:34, 22 April 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) would someone with access please download the scan from HathiTrust? Thanks! —Beleg Âlt BT (talk) 16:31, 6 May 2024 (UTC)[reply]

@Beleg Tâl: -- c:File:The Coming of Cassidy and the Others - Clarence E. Mulford.pdf -- Hrishikes (talk) 14:20, 4 July 2024 (UTC)[reply]

The New York Times 1926-09-30

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) I would like a DJVU of Internet Archive identifier: sim_new-york-times_1926-09-30_75_25086 (it's too big for IA Upload).

=={{int:filedesc}}==
{{Book
 |Author           = 
 |Translator       = 
 |Editor           = Rollo Ogden
 |Illustrator      = 
 |Title            = The New York Times
 |Subtitle         = 
 |Series title     = 
 |Volume           = Volume 75, Issue 25086
 |Edition          = 
 |Publisher        = 
 |Printer          = 
 |Publication date = 1926-09-30
 |City             = New York
 |Language         = en
 |Description      = {{en|1=Issue of ''The New York Times''}}
 |Source           = {{IA|sim_new-york-times_1926-09-30_75_25086}}
 |Permission       = 
 |Image            = 
 |Image page       = 1
 |Pageoverview     =
 |Wikisource       = s:en:Index:{{PAGENAME}}
 |Homecat          = 
 |Other_versions   = 
 |ISBN             = 
 |LCCN             = 
 |OCLC             = 
 |References       = 
 |Linkback         = 
 |Wikidata         = 
 |noimage          = 
}}

=={{int:license-header}}==

{{PD-US-expired|country=US}}

[[Category:The New York Times, 1926]]

Thanks! —CalendulaAsteraceae (talkcontribs) 09:37, 9 January 2025 (UTC)[reply]

@CalendulaAsteraceae:: c:File:The New York Times, 1926-09-30.djvu. --M-le-mot-dit (talk) 16:24, 9 January 2025 (UTC)[reply]
Thank you! —CalendulaAsteraceae (talkcontribs) 02:21, 10 January 2025 (UTC)[reply]
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. —CalendulaAsteraceae (talkcontribs) 02:21, 10 January 2025 (UTC)[reply]

Flight, Tropic Death and Fine Clothes to the Jew

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) Requesting DJVUs of these three books. The scans of Walter White's Flight and Eric D. Walrond's Tropic Death are located at https://dpul.princeton.edu/catalog/dcn009wc92m and https://dpul.princeton.edu/catalog/dc6d570725f respectively. I have already uploaded the latter as File:Tropic Death (collection).djvu, so I request for that file to be overwritten because the Princeton scan is better. The scan of Fine Clothes to the Jew is located at https://digital.library.yale.edu/catalog/17290813. prospectprospekt (talk) 18:15, 27 February 2025 (UTC)[reply]

File:Hughes - Fine Clothes to the Jew (1927).djvu and File:Fine Clothes to the Jew dust cover.jpg
File:White - Flight (1926).djvu
File:Walrond - Tropic Death (1926).djvu
@Prospectprospekt: DoneM-le-mot-dit (talk) 17:49, 2 March 2025 (UTC)[reply]
Thanks! prospectprospekt (talk) 21:45, 3 March 2025 (UTC)[reply]

Czecho-Slovak Student Life, Volume 18

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) Requesting scans of the Czecho-Slovak Student Life, Volume 18, from https://collections.carli.illinois.edu/digital/collection/ben_listy/id/3184/rec/2 . Unfortunately the scans seem available only as individual pages in .jpg format. Would it be possible to compile them into one pdf of djvu format? --Jan Kameníček (talk) 11:15, 12 March 2025 (UTC)[reply]

Done at File:Czecho-Slovak Student Life, Volume 18.djvu. — Alien  3
3 3
21:28, 12 March 2025 (UTC)[reply]
@Alien333: Great! Thanks very much! --Jan Kameníček (talk) 21:54, 12 March 2025 (UTC)[reply]

Finding scans

[edit]
Instructions

Requests for locating scans for existing works at Wikisource, or works you wish to add yourself but cannot find scans for. For general text requests, see Wikisource:Requested texts.

The Criterion Volume 2 and 3

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) Would it be possible to locate Volumes 2 and 3 of The Criterion? I'm especially trying to complete The Woman Who Rode Away that began in Volume 3. Languageseeker (talk) 18:36, 23 December 2022 (UTC)[reply]

Scan repair

[edit]
Instructions

Request repair work on existing scans here.

When requesting page insertion, rearrangement or deletion, always include the page numbers (as marked on the pages) as well as the position of the page within the scan file. This makes it much easier for the repairing user to locate the defect in the file and fix it, as well as allowing a double-check against mistakes.

Please do not use this page to request repairs on works that you don’t really care about: the backlog at Category:Index - File to fix is a known backlog. If you want to help with those, you can add {{missing pages}} to those indexes if they do not already have it, along with details of the missing pages.

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) This scan is missing two pages (xxvi–xxvii). Also, it would be nice if the images for this volume and the second volume could be regenerated, as they are of quite poor quality. TE(æ)A,ea. (talk) 22:21, 3 December 2023 (UTC)[reply]

@TE(æ)A,ea.: Done (missing pages xxvi–xxvii). Existing text moved. M-le-mot-dit (talk) 15:10, 25 October 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) Pages 482 and 483 of this volume were missing in the original scan; pageholders have been introduced, so all that is necessary is the replacement. That replacement can come from Index:Alumnioxonienses02univ.pdf, which exists solely for the purpose of supplying that gap. So, the missing pages from the PDF should be added in over the pageholders from the DJVU; the transclusion fixed; and the PDF deleted. TE(æ)A,ea. (talk) 23:46, 3 December 2023 (UTC)[reply]

Not sure I follow, pages 482 and 483 (djvu/99 and djvu/100) seem to be legit images and the 2 missing pages should be inserted between djvu/100 and djvu/101. Or ...? Mpaa (talk) 18:09, 4 December 2023 (UTC)[reply]

This file claims to be Volume 135 and is residing in the list of volumes as Volume 135 but it is actually Volume 136, probably (but not verified) a duplicate of Index:The Atlantic Monthly Volume 135.djvu. Can the file be replaced with https://babel.hathitrust.org/cgi/pt?id=uc1.32106019602660 ?--RaboKarbakian (talk) 15:47, 29 March 2024 (UTC)[reply]

Also, while you are at it:
  1. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030146099 Vol. 139
  2. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030146081 Vol. 140
  3. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030145968 Vol. 141
  4. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030145745 Vol. 142
--RaboKarbakian (talk)


File was renamed at Commons, and needs re-aligning.

https://en.wikisource.org/w/index.php?search=intitle%3A%2FA+dictionary+of+the+language+of+Mota.djvu%2F&title=Special:Search&profile=advanced&fulltext=1&ns0=1&ns100=1&ns102=1&ns104=1&ns106=1&ns114=1 ShakespeareFan00 (talk) 17:40, 1 May 2024 (UTC)[reply]


Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) A bit of a different one this time. This work contains several copyrighted images that need to be blanked out in the scan. The affected pages are listed here: Index talk:Sm all cc.pdf#Possible copyright violation. —Beleg Tâl (talk) 15:17, 14 May 2024 (UTC)[reply]

This scan is in the Monthly Challenge, but is missing the images facing pages 16 and 304. Can those images be found and inserted (and black verso) into the correct locations in the scan? There is a list of illustrations beginning on this page. --EncycloPetey (talk) 16:46, 14 August 2024 (UTC)[reply]

@EncycloPetey: Done. 2 images inserted (without text layout). Pages after djvu 77 should be moved or deleted.--M-le-mot-dit (talk) 12:41, 28 August 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) The OCR layer is offset from the pages, but not in a consistent way.

  • Pages 27-65, OCR off by one page
  • Pages 66-123, OCR off by two pages
  • Pages 124+, OCR off by three pages

(Page numbers refer to the scan page, not the book page. OCR is shifted forwards, toward the front of the book.) —Beleg Tâl (talk) 14:43, 23 September 2024 (UTC)[reply]

@Beleg Tâl: Done M-le-mot-dit (talk) 17:34, 23 September 2024 (UTC)[reply]


This file has a lot of pages missing, see Index:The Best continental short stories of and the yearbook of the continental short story 1924-25.pdf. There is a better scan now available at https://babel.hathitrust.org/cgi/pt?id=uc1.b3123528 but the sequenced pages 264 to 317 of the scan are duplicated and need to be removed. --Jan Kameníček (talk) 11:51, 30 October 2024 (UTC)[reply]

@Jan.Kamenicek: Done // M-le-mot-dit (talk) 16:38, 30 October 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) The current scans for The Story of My Experiments with Truth/Volume 1 is missing pages (without placeholders) and duplicates others. I've uploaded a corrected file here: File:Gandhi, 1927, The Story of My Experiments With Truth, Vol 1.pdf. I need assistance in moving the current project over to the new scans while keeping the already proofread pages. Thanks! — Qx3Jw (talk) 14:40, 30 October 2024 (UTC)[reply]

Print page 115 has somehow got into the file at page position 114, when it should be at page position 121 with pages 115 to 121 moved back by one. Thanks, Beeswaxcandle (talk) 07:44, 24 November 2024 (UTC)[reply]

@Beeswaxcandle: Done M-le-mot-dit (talk) 13:26, 24 November 2024 (UTC)[reply]

This file needs to regenerated from the copy at HathiTrust (here). TE(æ)A,ea. (talk) 14:35, 7 December 2024 (UTC)[reply]

@TE(æ)A,ea. This scan contains incomplete pages (see views 37-40, after page 28, and views 79-80). This one contains unfolded tables (Experiments on Japanese Timber) to replace views 37-40, and several figures (views 129 to 152) but figures vi to viii are blanked. The table after page 64 is also incomplete. // M-le-mot-dit (talk) 17:56, 7 December 2024 (UTC)[reply]
  • M-le-mot-dit: Thank you for noticing, I thought I had checked the scan when I listed it on TASJ. If you can extract the pages from the edition I found, (because it has the original title-page and only volume 4,) I can edit the missing plates and tables in on the back end. The issue with folded tables, missing plates, &c., is one which I have ran into on previous occasions; I have just asked other libraries to scan in the missing plates or tables and send them to me. I did this for the first/second volume and for the third volume (although I haven’t gotten around to uploading the table from that volume yet). TE(æ)A,ea. (talk) 18:12, 7 December 2024 (UTC)[reply]
    @TE(æ)A,ea.: Done. M-le-mot-dit (talk) 22:07, 7 December 2024 (UTC)[reply]

Remove pages with copyrighted text from File:Ah Q and Others.djvu

[edit]

Please, remove print pages no. 130–183 with copyrighted text from File:Ah Q and Others.djvu (those marked at Index:Ah Q and Others.djvu as problematic). For reasons see the copyright discussion at Special:PermanentLink/14783607#Ah_Q_and_Others. Thanks, --Jan Kameníček (talk) 11:48, 12 January 2025 (UTC)[reply]

@Jan.Kamenicek: Done, pages 130-183 (DjVu 164-217) replaced by placeholders. • M-le-mot-dit (talk) 13:40, 12 January 2025 (UTC)[reply]

I found [this scan] on Hathi, which has the indicated missing pages from the magazine (pages 313 and 314 as listed in the index, corresponding with pages 321 and 322 in the URLs). There are still some pages that have been worked on that come after this, so be mindful of that. CitationsFreak (talk) 09:02, 17 January 2025 (UTC)[reply]

@CitationsFreak: DoneM-le-mot-dit (talk) 10:56, 17 January 2025 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) This is missing two pages between 51 and 52. I asked at the Scriptorium and got the two pages starting at [1]. There are several pages worked on after the missing section. Thank y'all! Overthrows (talk) 15:47, 11 February 2025 (UTC)[reply]

@Overthrows: DoneM-le-mot-dit (talk) 19:21, 11 February 2025 (UTC)[reply]

A Room of One's Own

[edit]

The London School of Economics has a copy of the 1929 Hogarth Press edition at https://www.lse.ac.uk/library/assets/documents/rare-books/45-A-Room-of-Ones-Own.pdf. I am requesting for this to be uploaded to Commons with the ex libris on page 2 redacted. This is because this copy was given to a library not located in the US, and we don't know when that transfer occured, so it's possible that the ex libris is still copyrighted. prospectprospekt (talk) 18:13, 25 February 2025 (UTC)[reply]

Actually i'm just going to upload it per Prosfilaes's comment prospectprospekt (talk) 13:41, 26 February 2025 (UTC)[reply]
@Prospectprospekt: Done - File:A Room Of One's Own.pdf Ciridae (talk) 13:50, 26 February 2025 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) This book is missing the following pages: Part I: 1, 2, 37, 38, 43, 44, 105, 106, 191, 192, 258, 259, 343, 344. Part II: 67, 68, 74, 75, 77, 78, 105, 106, 115, 116, 133, 134, 135, 136, 193, 194, 195, 196, 213, 214, 237, 238, 257, 258, 265, 266. As well as 4 unnumbered pages immediately preceding page 1 of Part I. A different scan of this edition seems complete so you can get replacements here: https://archive.org/details/bim_eighteenth-century_the-compleat-geographer_1723. Just adding placeholders would also be helpful so proofreading can be started. Treebitt (talk) 08:49, 28 February 2025 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) The OCR for this scan is one page off. For example the OCR for page 5 should actually belong to page 6. unsigned comment by ToxicPea (talk) .

It's not the usual "corrupted layers" issue. I'm currently re-generating it from the JP2 to see if that works. — Alien  3
3 3
09:38, 8 March 2025 (UTC)[reply]
Done: it worked. — Alien  3
3 3
11:51, 8 March 2025 (UTC)[reply]
Thanks. Any idea why that happened ? Did I do something wrong ? -- Beardo (talk) 16:21, 8 March 2025 (UTC)[reply]
Misaligned OCR happens (mostly) when the part of MW or PRP that fetches the text encounters an error one one page; it leaves something like FAILED instead of the OCR, which is removed without inserting an empty placeholder, and so all further OCR is shifted one page downwards.
Usually, djvused signals where and how there is an error, but here it didn't. When it does, it can be corrected by manually removing the OCR of that page.
As for actual causes, errors are most often due to mistakes made by conversion software, that as far as I know aren't correlated to any user behaviour. — Alien  3
3 3
16:32, 8 March 2025 (UTC)[reply]

This work is in the current Monthly Challenge, but it is missing 16 pages, including 12 numbered pages of text and 4 unnumbered pages of illustrations. It also has blank pages in two locations where an image should appear.

The DjVu file needs to be repaired and uploaded. This can be done in two stages:

(1) replace scan pages 142 & 174 with a copy of the illustration page found here: [2] facing page 59. These two pages are blank in our copy, but should each bear this fleuron image.
(2) insert 16 pages between p.50 and p.62 (scan pages 68 & 69), shifting the contents and OCR. The pages to be inserted are consecutive and can all be found in this scan.

I can adjust the Index and move the already-transcribed pages myself after the repair, if you like. --EncycloPetey (talk) 21:33, 11 March 2025 (UTC)[reply]

 Comment The scan file has been altered so that some of the corrections have been made, but some have not, and some of the page images and OCR are now misaligned. --EncycloPetey (talk) 23:30, 12 March 2025 (UTC)[reply]
 Comment The twelve numbered pages of text were inserted, but the four image pages were not. They remain to be inserted between pages 58 and 59 (scan pages 76 & 77), as in the scan identified for repair. --EncycloPetey (talk) 23:44, 12 March 2025 (UTC)[reply]
@EncycloPetey: Done. Refreshing the local cache may be necessary. • M-le-mot-dit (talk) 01:03, 13 March 2025 (UTC)[reply]
Thanks. I will check it in a few hours so that the changes can percolate. --EncycloPetey (talk) 01:15, 13 March 2025 (UTC)[reply]
All pages seem OK except for
I have tried reloading, purging and cache clearing by several methods, and these two pages do not load the correct page image, either in Page view or in the Page editor window. --EncycloPetey (talk) 02:59, 13 March 2025 (UTC)[reply]
@EncycloPetey: I do see these pages correctly. Have you purged your local cache (alt-shift-R or equivalent)? • M-le-mot-dit (talk) 10:39, 13 March 2025 (UTC)[reply]
Yes, just as I said. The 142 page is correct today, but the 141 page is still incorrect. --EncycloPetey (talk) 15:12, 13 March 2025 (UTC)[reply]
Well, it was. Now both 141 & 142 are showing the wrong content again. I've no idea what is happening, as there is no reason I can see that the displayed page should be correct, then become incorrect again. --EncycloPetey (talk) 18:52, 13 March 2025 (UTC)[reply]
The "Image" tab will display the correct page. The problem is present only in Page view and in the Editor window. --EncycloPetey (talk) 18:54, 13 March 2025 (UTC)[reply]

See also

[edit]
  • Commons:Graphic Lab at Wikimedia Commons - they can help with general image problems
  • Image extraction - guidance for extracting images from scans
  • Requested texts - general text requests. Many of these also need scans to be located.
  • Category:Index - File to fix - contains indexes that have various defects. Please do add templates like {{missing pages}} if needed to indicate what the problems are, but please do not bring the files here unless you would like it fixed to allow work in the near future.