Jump to content

Wikisource:Scan Lab

From Wikisource
(Redirected from Wikisource:Scan lab)
Scan Lab

Shortcut:
WS:LAB

A central resource for assistance with creation, downloading, uploading, processing and other operations on scans of texts.

Times have changed, but it still can be hard to put 600 pages in the right order!
Instructions

If you need help with a scan, add your request in the relevant section below as a new sub-section. If you can, include all the details someone will need to work on the request without further questioning. You can use {{ping project|Scan Lab}} to send an immediate notification to all subscribed Scan Lab members. Once you have been answered, ping only that user when you reply with {{re|Their username}} (do not ping the whole project on every comment).

If your request has been completed, you should acknowledge that your issue is resolved and close the section with {{section resolved|1=~~~~}}.

Participants

[edit]

Add your name to Module:Mass notification/groups/Scan Lab to be notified via {{ping project|Scan Lab}}. Also add your name below with details of any particular tasks you can help with.

Participant Can help with Instructions
Inductiveload
  • General scan tasks: scraping/download, batch uploads, scan repair
  • Splitting/combining scan images/photos from a scanner or camera into scan file (with ScanTailor)
Xover
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)
Mpaa
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)
Alien333
  • General scan tasks: scraping/download (including from Hathi), scan repair, manipulating DJVU & PDF

Requests for downloading scans

[edit]
Instructions

If you would like scans that already exist online to be transferred to Wikisource, leave a message here. This includes batch transfers from the Internet or Hathi Trust for multi-volume works. Please include necessary bibliographic information so that scans can be uploaded to Commons with proper information and license templates. Author, country, and date of first publication. A suggested file name on Commons can also be helpful.

Jane Austen Juvenilia Volume 2 and 3

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) The scans of the manuscripts of Austen's Juvenelia are available on here and here. They're both in the PD, but I have absolutely no clue as how to download them. The images are higher resolution than the ones on the BL website, but they're in the zoomify flash format. Languageseeker (talk) 02:58, 2 February 2022 (UTC)[reply]

Mooresville, Indiana High School yearbooks, 1914–1930

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) These scans exist in the form of galleries on the Mooresville High School Alumni Association's Facebook page, and extracting them by hand is tedious enough that I'm hoping someone can do it with a bot. The procedure I have in mind is:

Thanks! —CalendulaAsteraceae (talkcontribs) 01:59, 22 September 2023 (UTC)[reply]

Penny Cyclopedia volumes 1 to 27

[edit]

The IA scans currently linked on the page are unusable (blank pages where there should be content), so I checked HathiTrust ([here's the search I used]). There are four complete sets of scans attached to [this record] (ignoring the supplements for now), but I'm not sure at the moment which ones would be the best to import. Arcorann (talk) 02:14, 24 December 2023 (UTC)[reply]

I've found pretty good scans of volumes 4 and 24 which are already on Commons, and I've added the links to the Penny Cyclopedia page. I don't have a Hathi Trust account, so I can't help you there. Ciridae (talk) 05:21, 27 December 2023 (UTC)[reply]

Journal of the Optical Society of America

[edit]

Volumes 1-40 of this fairly esteemed journal are out of copyright. Vol. 30, issue 12 and Vol. 33, issue 7 are here already, but there are *a lot* that are not here: https://archive.org/details/pub_optical-society-of-america-journal If you upload them, I can tidy the pile up at commons and get them ready to go here. For copyright concerns: https://onlinebooks.library.upenn.edu/webbin/cinfo/jopticalsocamerica --RaboKarbakian (talk) 20:43, 8 February 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) would someone with access please download the scan from HathiTrust? Thanks! —Beleg Âlt BT (talk) 17:34, 22 April 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) would someone with access please download the scan from HathiTrust? Thanks! —Beleg Âlt BT (talk) 16:31, 6 May 2024 (UTC)[reply]

@Beleg Tâl: -- c:File:The Coming of Cassidy and the Others - Clarence E. Mulford.pdf -- Hrishikes (talk) 14:20, 4 July 2024 (UTC)[reply]

The New York Times 1926-09-30

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) I would like a DJVU of Internet Archive identifier: sim_new-york-times_1926-09-30_75_25086 (it's too big for IA Upload).

=={{int:filedesc}}==
{{Book
 |Author           = 
 |Translator       = 
 |Editor           = Rollo Ogden
 |Illustrator      = 
 |Title            = The New York Times
 |Subtitle         = 
 |Series title     = 
 |Volume           = Volume 75, Issue 25086
 |Edition          = 
 |Publisher        = 
 |Printer          = 
 |Publication date = 1926-09-30
 |City             = New York
 |Language         = en
 |Description      = {{en|1=Issue of ''The New York Times''}}
 |Source           = {{IA|sim_new-york-times_1926-09-30_75_25086}}
 |Permission       = 
 |Image            = 
 |Image page       = 1
 |Pageoverview     =
 |Wikisource       = s:en:Index:{{PAGENAME}}
 |Homecat          = 
 |Other_versions   = 
 |ISBN             = 
 |LCCN             = 
 |OCLC             = 
 |References       = 
 |Linkback         = 
 |Wikidata         = 
 |noimage          = 
}}

=={{int:license-header}}==

{{PD-US-expired|country=US}}

[[Category:The New York Times, 1926]]

Thanks! —CalendulaAsteraceae (talkcontribs) 09:37, 9 January 2025 (UTC)[reply]

@CalendulaAsteraceae:: c:File:The New York Times, 1926-09-30.djvu. --M-le-mot-dit (talk) 16:24, 9 January 2025 (UTC)[reply]
Thank you! —CalendulaAsteraceae (talkcontribs) 02:21, 10 January 2025 (UTC)[reply]
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. —CalendulaAsteraceae (talkcontribs) 02:21, 10 January 2025 (UTC)[reply]

Varney the Vampire

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) The (copyrighted) 1970 edition of this influential work has been scanned here; could someone please download the main part of the book, without the new introduction &c.? TE(æ)A,ea. (talk) 04:41, 24 March 2025 (UTC)[reply]

To make sure I got it: it would be page 1, and pages 54-336. Is that right? — Alien  3
3 3
07:33, 24 March 2025 (UTC)[reply]
  • Alien: Well, 14 rather than 1, but yes. I would appreciate if you clip the reprint publisher’s label from /14. In addition, the full work (one volume originally) was published as three volumes in this edition, so a full volume would need to take /15–/314 from the second volume and /15–/303 of the third volume. TE(æ)A,ea. (talk) 15:15, 24 March 2025 (UTC)[reply]
    Got it (didn't check that the page it loaded on was the first one). Regarding the publisher's label, will do. — Alien  3
    3 3
    15:48, 24 March 2025 (UTC)[reply]
    Done at c:File:Varney the Vampire.pdf. Again, haven't done djvu conversion or OCR, feel free to ask.
    Aaaand it of course fell prey to the PDF-specific 0x0 bug. If you prefer to stick to a PDF, it'll be a few days' wait.
    Off-topic comment: Extracting the pages from a 1970 reprint sounds like a complicated way of getting a 1847 text. Makes one wonder how it happens that publishers can get their hands on a scan or an original, but there isn't one on the whole internet.Alien  3
    3 3
    16:53, 24 March 2025 (UTC)[reply]
    • Alien: So long as the scan works (eventually) and the OCR looks fine, I don’t particularly care. The publishers either bought a copy at auction (for a one-off) or got in contact with a library (like The Orphan of the Rhine below, which is part of a ~50-reel series of microfilm). Unfortunately, schmuck-who-wants-to-make-a-digital-copy isn’t a good enough reason to get most of these sent out of Special Collections, although you might be able to get something if go to one of these libraries in person. It would be nice to work on digitizing reels of microfilm; I’ve seen a number of valuable series of microfilm reels, which are all public-domain contents but not digitized. TE(æ)A,ea. (talk) 17:18, 24 March 2025 (UTC)[reply]

Finding scans

[edit]
Instructions

Requests for locating scans for existing works at Wikisource, or works you wish to add yourself but cannot find scans for. For general text requests, see Wikisource:Requested texts.

The Criterion Volume 2 and 3

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) Would it be possible to locate Volumes 2 and 3 of The Criterion? I'm especially trying to complete The Woman Who Rode Away that began in Volume 3. Languageseeker (talk) 18:36, 23 December 2022 (UTC)[reply]

Scan repair

[edit]
Instructions

Request repair work on existing scans here.

When requesting page insertion, rearrangement or deletion, always include the page numbers (as marked on the pages) as well as the position of the page within the scan file. This makes it much easier for the repairing user to locate the defect in the file and fix it, as well as allowing a double-check against mistakes.

Please do not use this page to request repairs on works that you don’t really care about: the backlog at Category:Index - File to fix is a known backlog. If you want to help with those, you can add {{missing pages}} to those indexes if they do not already have it, along with details of the missing pages.

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) Pages 482 and 483 of this volume were missing in the original scan; pageholders have been introduced, so all that is necessary is the replacement. That replacement can come from Index:Alumnioxonienses02univ.pdf, which exists solely for the purpose of supplying that gap. So, the missing pages from the PDF should be added in over the pageholders from the DJVU; the transclusion fixed; and the PDF deleted. TE(æ)A,ea. (talk) 23:46, 3 December 2023 (UTC)[reply]

Not sure I follow, pages 482 and 483 (djvu/99 and djvu/100) seem to be legit images and the 2 missing pages should be inserted between djvu/100 and djvu/101. Or ...? Mpaa (talk) 18:09, 4 December 2023 (UTC)[reply]

This file claims to be Volume 135 and is residing in the list of volumes as Volume 135 but it is actually Volume 136, probably (but not verified) a duplicate of Index:The Atlantic Monthly Volume 135.djvu. Can the file be replaced with https://babel.hathitrust.org/cgi/pt?id=uc1.32106019602660 ?--RaboKarbakian (talk) 15:47, 29 March 2024 (UTC)[reply]

Also, while you are at it:
  1. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030146099 Vol. 139
  2. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030146081 Vol. 140
  3. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030145968 Vol. 141
  4. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030145745 Vol. 142
--RaboKarbakian (talk)


File was renamed at Commons, and needs re-aligning.

https://en.wikisource.org/w/index.php?search=intitle%3A%2FA+dictionary+of+the+language+of+Mota.djvu%2F&title=Special:Search&profile=advanced&fulltext=1&ns0=1&ns100=1&ns102=1&ns104=1&ns106=1&ns114=1 ShakespeareFan00 (talk) 17:40, 1 May 2024 (UTC)[reply]


Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) A bit of a different one this time. This work contains several copyrighted images that need to be blanked out in the scan. The affected pages are listed here: Index talk:Sm all cc.pdf#Possible copyright violation. —Beleg Tâl (talk) 15:17, 14 May 2024 (UTC)[reply]

This scan is in the Monthly Challenge, but is missing the images facing pages 16 and 304. Can those images be found and inserted (and black verso) into the correct locations in the scan? There is a list of illustrations beginning on this page. --EncycloPetey (talk) 16:46, 14 August 2024 (UTC)[reply]

@EncycloPetey: Done. 2 images inserted (without text layout). Pages after djvu 77 should be moved or deleted.--M-le-mot-dit (talk) 12:41, 28 August 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) The OCR layer is offset from the pages, but not in a consistent way.

  • Pages 27-65, OCR off by one page
  • Pages 66-123, OCR off by two pages
  • Pages 124+, OCR off by three pages

(Page numbers refer to the scan page, not the book page. OCR is shifted forwards, toward the front of the book.) —Beleg Tâl (talk) 14:43, 23 September 2024 (UTC)[reply]

@Beleg Tâl: Done M-le-mot-dit (talk) 17:34, 23 September 2024 (UTC)[reply]


This file has a lot of pages missing, see Index:The Best continental short stories of and the yearbook of the continental short story 1924-25.pdf. There is a better scan now available at https://babel.hathitrust.org/cgi/pt?id=uc1.b3123528 but the sequenced pages 264 to 317 of the scan are duplicated and need to be removed. --Jan Kameníček (talk) 11:51, 30 October 2024 (UTC)[reply]

@Jan.Kamenicek: Done // M-le-mot-dit (talk) 16:38, 30 October 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) The current scans for The Story of My Experiments with Truth/Volume 1 is missing pages (without placeholders) and duplicates others. I've uploaded a corrected file here: File:Gandhi, 1927, The Story of My Experiments With Truth, Vol 1.pdf. I need assistance in moving the current project over to the new scans while keeping the already proofread pages. Thanks! — Qx3Jw (talk) 14:40, 30 October 2024 (UTC)[reply]

Print page 115 has somehow got into the file at page position 114, when it should be at page position 121 with pages 115 to 121 moved back by one. Thanks, Beeswaxcandle (talk) 07:44, 24 November 2024 (UTC)[reply]

@Beeswaxcandle: Done M-le-mot-dit (talk) 13:26, 24 November 2024 (UTC)[reply]

This file needs to regenerated from the copy at HathiTrust (here). TE(æ)A,ea. (talk) 14:35, 7 December 2024 (UTC)[reply]

@TE(æ)A,ea. This scan contains incomplete pages (see views 37-40, after page 28, and views 79-80). This one contains unfolded tables (Experiments on Japanese Timber) to replace views 37-40, and several figures (views 129 to 152) but figures vi to viii are blanked. The table after page 64 is also incomplete. // M-le-mot-dit (talk) 17:56, 7 December 2024 (UTC)[reply]
  • M-le-mot-dit: Thank you for noticing, I thought I had checked the scan when I listed it on TASJ. If you can extract the pages from the edition I found, (because it has the original title-page and only volume 4,) I can edit the missing plates and tables in on the back end. The issue with folded tables, missing plates, &c., is one which I have ran into on previous occasions; I have just asked other libraries to scan in the missing plates or tables and send them to me. I did this for the first/second volume and for the third volume (although I haven’t gotten around to uploading the table from that volume yet). TE(æ)A,ea. (talk) 18:12, 7 December 2024 (UTC)[reply]
    @TE(æ)A,ea.: Done. M-le-mot-dit (talk) 22:07, 7 December 2024 (UTC)[reply]

Remove pages with copyrighted text from File:Ah Q and Others.djvu

[edit]

Please, remove print pages no. 130–183 with copyrighted text from File:Ah Q and Others.djvu (those marked at Index:Ah Q and Others.djvu as problematic). For reasons see the copyright discussion at Special:PermanentLink/14783607#Ah_Q_and_Others. Thanks, --Jan Kameníček (talk) 11:48, 12 January 2025 (UTC)[reply]

@Jan.Kamenicek: Done, pages 130-183 (DjVu 164-217) replaced by placeholders. • M-le-mot-dit (talk) 13:40, 12 January 2025 (UTC)[reply]

I found [this scan] on Hathi, which has the indicated missing pages from the magazine (pages 313 and 314 as listed in the index, corresponding with pages 321 and 322 in the URLs). There are still some pages that have been worked on that come after this, so be mindful of that. CitationsFreak (talk) 09:02, 17 January 2025 (UTC)[reply]

@CitationsFreak: DoneM-le-mot-dit (talk) 10:56, 17 January 2025 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) This is missing two pages between 51 and 52. I asked at the Scriptorium and got the two pages starting at [1]. There are several pages worked on after the missing section. Thank y'all! Overthrows (talk) 15:47, 11 February 2025 (UTC)[reply]

@Overthrows: DoneM-le-mot-dit (talk) 19:21, 11 February 2025 (UTC)[reply]

A Room of One's Own

[edit]

The London School of Economics has a copy of the 1929 Hogarth Press edition at https://www.lse.ac.uk/library/assets/documents/rare-books/45-A-Room-of-Ones-Own.pdf. I am requesting for this to be uploaded to Commons with the ex libris on page 2 redacted. This is because this copy was given to a library not located in the US, and we don't know when that transfer occured, so it's possible that the ex libris is still copyrighted. prospectprospekt (talk) 18:13, 25 February 2025 (UTC)[reply]

Actually i'm just going to upload it per Prosfilaes's comment prospectprospekt (talk) 13:41, 26 February 2025 (UTC)[reply]
@Prospectprospekt: Done - File:A Room Of One's Own.pdf Ciridae (talk) 13:50, 26 February 2025 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) This book is missing the following pages: Part I: 1, 2, 37, 38, 43, 44, 105, 106, 191, 192, 258, 259, 343, 344. Part II: 67, 68, 74, 75, 77, 78, 105, 106, 115, 116, 133, 134, 135, 136, 193, 194, 195, 196, 213, 214, 237, 238, 257, 258, 265, 266. As well as 4 unnumbered pages immediately preceding page 1 of Part I. A different scan of this edition seems complete so you can get replacements here: https://archive.org/details/bim_eighteenth-century_the-compleat-geographer_1723. Just adding placeholders would also be helpful so proofreading can be started. Treebitt (talk) 08:49, 28 February 2025 (UTC)[reply]

This work is in the current Monthly Challenge, but it is missing 16 pages, including 12 numbered pages of text and 4 unnumbered pages of illustrations. It also has blank pages in two locations where an image should appear.

The DjVu file needs to be repaired and uploaded. This can be done in two stages:

(1) replace scan pages 142 & 174 with a copy of the illustration page found here: [2] facing page 59. These two pages are blank in our copy, but should each bear this fleuron image.
(2) insert 16 pages between p.50 and p.62 (scan pages 68 & 69), shifting the contents and OCR. The pages to be inserted are consecutive and can all be found in this scan.

I can adjust the Index and move the already-transcribed pages myself after the repair, if you like. --EncycloPetey (talk) 21:33, 11 March 2025 (UTC)[reply]

 Comment The scan file has been altered so that some of the corrections have been made, but some have not, and some of the page images and OCR are now misaligned. --EncycloPetey (talk) 23:30, 12 March 2025 (UTC)[reply]
 Comment The twelve numbered pages of text were inserted, but the four image pages were not. They remain to be inserted between pages 58 and 59 (scan pages 76 & 77), as in the scan identified for repair. --EncycloPetey (talk) 23:44, 12 March 2025 (UTC)[reply]
@EncycloPetey: Done. Refreshing the local cache may be necessary. • M-le-mot-dit (talk) 01:03, 13 March 2025 (UTC)[reply]
Thanks. I will check it in a few hours so that the changes can percolate. --EncycloPetey (talk) 01:15, 13 March 2025 (UTC)[reply]
All pages seem OK except for
I have tried reloading, purging and cache clearing by several methods, and these two pages do not load the correct page image, either in Page view or in the Page editor window. --EncycloPetey (talk) 02:59, 13 March 2025 (UTC)[reply]
@EncycloPetey: I do see these pages correctly. Have you purged your local cache (alt-shift-R or equivalent)? • M-le-mot-dit (talk) 10:39, 13 March 2025 (UTC)[reply]
Yes, just as I said. The 142 page is correct today, but the 141 page is still incorrect. --EncycloPetey (talk) 15:12, 13 March 2025 (UTC)[reply]
Well, it was. Now both 141 & 142 are showing the wrong content again. I've no idea what is happening, as there is no reason I can see that the displayed page should be correct, then become incorrect again. --EncycloPetey (talk) 18:52, 13 March 2025 (UTC)[reply]
The "Image" tab will display the correct page. The problem is present only in Page view and in the Editor window. --EncycloPetey (talk) 18:54, 13 March 2025 (UTC)[reply]
Sorry, I have no idea of what to do. On my side I actually see p. 119 as a facsimile of djvu 141, as in the {{raw image}} below.

(Upload an image to replace this placeholder.)

M-le-mot-dit (talk) 09:17, 15 March 2025 (UTC)[reply]

The Orphan of the Rhine

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa, User:Alien333) I have just recently obtained the second volume, and had already obtained the other three volumes, of this Gothic novel (the last of the “horrids”). However, three of the volumes are scanned two original pages to the PDF page; would anyone be interested in dealing with it? I can upload the volumes if that is the case. TE(æ)A,ea. (talk) 20:20, 23 March 2025 (UTC)[reply]

I could, assuming that the widths of the halves are fixed (meaning that the first X columns of pixels are always the first page, and the Y others are always the second page). — Alien  3
3 3
21:03, 23 March 2025 (UTC)[reply]

See also

[edit]
  • Commons:Graphic Lab at Wikimedia Commons - they can help with general image problems
  • Image extraction - guidance for extracting images from scans
  • Requested texts - general text requests. Many of these also need scans to be located.
  • Category:Index - File to fix - contains indexes that have various defects. Please do add templates like {{missing pages}} if needed to indicate what the problems are, but please do not bring the files here unless you would like it fixed to allow work in the near future.