Wikisource:Scan Lab/Archives/2021-09
Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date. See current discussion or the archives index. |
[Scan here.] The scan has four pages to a page; please separate them. (Also, please clean out the blank space on the bottom of the last page.) TE(æ)A,ea. (talk) 00:40, 1 September 2021 (UTC)
- @TE(æ)A,ea.: I have a split DjVu up at File:Psychology of Religion.djvu, but the resolution of the PDF is so low as to be nearly useless for OCR and pretty rough to proofread from manually. See e.g. Page:Psychology of Religion.djvu/7 as an example. I don't suppose there is any way to get a higher-resolution scan? What's the source of the scan? Xover (talk) 10:39, 3 September 2021 (UTC)
- The text of this book is available online (e.g. a raw text search). So a manual match and split (i.e. you insert the match headings manually) against that might be an effective starting point if there's nothing OCRable. Inductiveload—talk/contribs 11:08, 3 September 2021 (UTC)
- This is now done.Mpaa (talk) 22:12, 14 September 2021 (UTC)
- This section was archived on a request by: Inductiveload—talk/contribs 13:27, 15 September 2021 (UTC)
Notifying all members of Scan Lab (more info · opt out): It is two pages off. OCR first. I pasted two more pages onto page 12, if that matters. Thanks--RaboKarbakian (talk) 22:36, 13 September 2021 (UTC) And,it has been suggested to remove the last four pages. (again, thanks)--RaboKarbakian (talk) 00:09, 14 September 2021 (UTC)
- @RaboKarbakian: Done Please check the result.And just for reference, when the OCR layer is offset the fix is usually a relatively simple matter of regenerating the DjVu and uploading over the old file. It's caused by a combination of bugs in the tools used to make the original DjVu and in the code MediaWiki uses to extract it (technical details on request for anyone interested). My custom DjVu tools have armour-plating to protect against this, so absent other factors I can just set the tools to work while I go grab a coffee. Xover (talk) 06:09, 15 September 2021 (UTC)
- Excellent! That is good news twice! I have checked the text, farther than I did when determining that it was messed up and it seems in order with this one reservation: maybe it is off by some later in the text.
- My method of pasting and deleting is so much like the interestingly named Hanoi Tower, I like your method better. (throws a couple of tea bags in @Xover: 's general direction)--RaboKarbakian (talk) 12:27, 15 September 2021 (UTC)
- This section was archived on a request by: --Xover (talk) 06:17, 17 September 2021 (UTC)
- This section was archived on a request by: --Xover (talk) 05:19, 24 September 2021 (UTC)
Low priority—page rotated, for some reason. No text. TE(æ)A,ea. (talk) 18:01, 16 September 2021 (UTC)
- @TE(æ)A,ea.: It seems to be just this one page, and the page is just a no-text protective sheet? If that's the case I don't think it's worth the effort to correct it to begin with. In addition, PDF files are not easily manipulated in this way and might require reencoding the entire file (with sub-optimal tools) instead of just the one affected page. In other words, unless it's personally important to you for whatever reason (in which case I'd be happy to see what I could do regardless), I'd be inclined to just leave it as is. Xover (talk) 06:28, 17 September 2021 (UTC)
- This section was archived on a request by: --Xover (talk) 05:20, 24 September 2021 (UTC)
- Notifying all members of Scan Lab (more info · opt out): Please add page 1. Thank you! TE(æ)A,ea. (talk) 19:14, 20 September 2021 (UTC)
Transactions N.Z. Institute Vol. 9 supplement
- This section was archived on a request by: Thanks to Mpaa Beeswaxcandle (talk) 23:21, 27 September 2021 (UTC)
Notifying all members of Scan Lab (more info · opt out): I've just discovered that a supplement was issued for Transactions and Proceedings of the New Zealand Institute/Volume 9. The only source I can find is here onwards to the end. I don't know if it's possible to scrape from here, but would be appreciative of an attempt. Ideal file name is "Transactions NZ Institute Volume 9 Supplement.djvu" Thanks, Beeswaxcandle (talk) 20:01, 27 September 2021 (UTC)
Notifying all members of Scan Lab (more info · opt out): Is it possible to re-ocr the whole thing? I re-ocr'd one page, the difference inspired me to ask this here.--RaboKarbakian (talk) 17:41, 30 September 2021 (UTC)
- Done Mpaa (talk) 20:28, 30 September 2021 (UTC)
- This section was archived on a request by: --Xover (talk) 13:52, 3 October 2021 (UTC)
Notifying all members of Scan Lab (more info · opt out): Off by 1, starting at djvu 5.--RaboKarbakian (talk) 04:29, 26 September 2021 (UTC)
- @RaboKarbakian: Done Xover (talk) 18:03, 26 September 2021 (UTC)
- This section was archived on a request by: --Xover (talk) 13:52, 3 October 2021 (UTC)
Notifying all members of Scan Lab (more info · opt out): File had a couple of missing pages, so I tried to upload an alternative version. The Commons upload failed several times due to server timeouts, so I gave up and was going to ask for page interpolations. However, random sections of pages have been replaced by the alternative version, while retaining the OCR of the first version. For example, at page positions 113 to 117. The original file is https://archive.org/details/transactionsproc23newz and the alternative is https://archive.org/details/transactionsand01unkngoog. The original version is missing Plate V at page positions 67 & 68 and print pages 56 & 57. At this point, I've junked the pagelist as it wasn't making sense anymore and have gone through the original version in the page viewer in IA and it's only those pages missing. Could someone please rescue something out of this mess? [This is the last of the volumes for the Trans. N.Z. Inst. to sort.] Beeswaxcandle (talk) 07:34, 15 September 2021 (UTC)
- @Beeswaxcandle: Doing…. Inductiveload—talk/contribs 07:37, 15 September 2021 (UTC)
- @Beeswaxcandle: Done. The page list don't have all the plates marked explicitly, but the pages seem complete now.
- Also some image cache is stuck, but page index 101 is actually correct if you use a different resolution.
random sections of pages have been replaced by the alternative version
makes me wonder if there's a wider caching issue going one (possibly related to the datacentre move yesterday?). Inductiveload—talk/contribs 08:47, 15 September 2021 (UTC)- Ah. I've been getting very confused. I've tried purging here and at Commons and have decided that it's time I went to bed and see what it looks like in the morning. Thanks for your help. Beeswaxcandle (talk) 09:31, 15 September 2021 (UTC)
- @Inductiveload: There is ongoing work to remove the old in-core VipsScaler in favour of doing all scaling in Thumbor, which ran into trouble with invisible dependencies and was rolled back today. If you're seeing image-specific cache weirdness it is a likely cause. Searching Phab for the obvious keywords should net you the relevant tasks (T260504 is probably a good starting point). Note, incidentally, some circumstantial connections to the PageImages API if that's relevant. Xover (talk) 12:08, 15 September 2021 (UTC)
- This section was archived on a request by: --Xover (talk) 13:55, 3 October 2021 (UTC)