Index talk:Reminiscences of Earliest Canterbury 1915.pdf
Latest comment: 4 months ago by David Nind in topic Text layer version
General
[edit]- Use curly quotes.
- Markup for chapter headings: To do.
- Rules: none yet
Wikidata items
[edit]- Reminiscences of Earliest Canterbury (1915) (Q125590537)
- Reminiscences of Earliest Canterbury (Q125573333)
- James Hay (Q125573360)
Commons category
[edit]Category:Reminiscences_of_Earliest_Canterbury_
Text layer version
[edit]I have created a version of the PDF with a text layer (using Abbyy FineReader OCR Editor v 15). I spent a bit of time tidying up the text, so I think it should be reasonably free of errors and issues. I presume I will need to add it as a new version in Commons and purge the cache. Let me know if you want me to do this, or if it should be done differently. David Nind (talk) 03:59, 26 April 2024 (UTC)
- I decided to be bold, and uploaded the updated PDF file with the text layer.
- Things to note:
- For whatever reason, hyphens at the end of a line are missing.
- To proofread, the only things needed are to split into paragraphs and remove the line endings (adding back in the hyphen where required)
- Alternatively, I could revert back to the original version, or try and create the PDF again and leave the hyphens in (which is what I thought I had done 8-( ).
- Two text files are available, one with line breaks and ones without line breaks:
- Apologies if I have mucked up your plans Mike! David Nind (talk) 05:18, 27 April 2024 (UTC)
- No, this is great and will speed things up. I tend to use Clean Up OCR anyway. There are plenty more books to process, and I'll send you more PDFs for processing with Abbyy if you like; just cropped the pages of Tales of Banks Peninsula ready for upload. Giantflightlessbirds (talk) 09:56, 28 April 2024 (UTC)
- More than happy to process documents using Abbyy. Working through my normal process, you can get pretty clean OCR in a few hours. With Reminiscences of Earliest Canterbury I only discovered a couple of corrections required in Wikisource, apart from hyphens. feel free to send me the documents, or I can open up a Google Drive folder for you to share with me. David Nind (talk) 23:44, 2 May 2024 (UTC)
- No, this is great and will speed things up. I tend to use Clean Up OCR anyway. There are plenty more books to process, and I'll send you more PDFs for processing with Abbyy if you like; just cropped the pages of Tales of Banks Peninsula ready for upload. Giantflightlessbirds (talk) 09:56, 28 April 2024 (UTC)