Wikisource talk:WikiProject Catholic Encyclopedia Upgrade
Add topicCE matching tool
[edit]There is now a Magnus Manske tool for finding WP matches for the articles: see http://toolserver.org/~magnus/dnb/map2wp.php?letter=Heinrich&set=ce13 for a sample run. Charles Matthews (talk) 08:09, 29 October 2010 (UTC)
Signature and Contributors
[edit]Hi. To cope with the following: 1. missing contributors, 2. mispelled signature, I was thinking to scan pages for possible signatures and associate contributors, to obtain a reference list (where to fix also fix possible spellling mistakes). With it, check articles with a bot and:
- if no contributor exists -> add contributor based on detected signature and, if mispelled, fix it.
- if contributor exists -> check if signature belongs to the list of correct signatures and, if not, fix it.
Not all cases will be addressed as it is not straightforward to detect signatures unless they are at end of page (one improvement could be to search also those in the list).
An example of reference list:
Signature | Fixed Signature | Contributor |
ELIE J. AUCLAIR | Elie J. Auclair | Elie J. Auclair |
ELIE-J. AUCLAIR | Elie J. Auclair | Elie J. Auclair |
FRANCIS AVELING | Francis Aveling | Francis Aveling |
F. J. BACCHUS | F. J. Bacchus | Francis Joseph Bacchus |
F.J. BACCHUS | F. J. Bacchus | Francis Joseph Bacchus |
FRANCIS J. BACCHUS | Francis J. Bacchus | Francis Joseph Bacchus |
Can this be of interest?--Mpaa (talk) 22:01, 22 April 2013 (UTC)
- Bearing down on Category:CE no contributor can certainly be of interest: over 3000 to do. There are some clearcut cases, where the contributor sig is "Paul Maria Baumgarten" or "Patrick Boyle".
- I imagine it would take multiple bot runs. There isn't yet a full contributor list: I have something in my userspace but it is selective. Also the handbook seems not quite to be a full listing of contributors.
- Currently "no contributor recorded" in the field doesn't prevent the article being in that category. That is a template tweak, and I think the DNB basic header template has the analogous thing.
- As a work-around, I used "contributor = no contributor recorded | override_author = no contributor recorded". So it does not show-up in the Category.--Mpaa (talk) 20:32, 30 April 2013 (UTC)
- So, yes, of interest but looks like the issue would need to be salami-sliced. Charles Matthews (talk) 10:05, 24 April 2013 (UTC)
- I am matching your list of authors with existing signatures in articles. Then I'll run a small trial and post updates here (it will take a while).--Mpaa (talk) 13:15, 24 April 2013 (UTC)
- Thanks. Charles Matthews (talk) 18:39, 24 April 2013 (UTC)
- Few samples. Obviously the key is the contributor list:
I posted a first list here. Could you give a look and see if this is good enough to start? Feel free to strike out/remove what you do not feel appropriate.--Mpaa (talk) 21:25, 24 April 2013 (UTC)
Status?
[edit]I am a newbie and would like to help a little, but it's unclear from the Project page what has been done and what needs to be done. Is there any simple proofreading needed? Laura1822 (talk) 01:28, 31 August 2014 (UTC)
- Yes, the project page has been neglected.
- There is much that needs to be done on the actual text. Efforts have been made towards getting the articles in the correct encyclopedic order, firstly. And at titles that make up a rational system.
- Very little has so far been done in putting the text into the Page: namespace, and checking it against the scans there. It is known that the text as originally uploaded is quite deficient in some ways: the initial sentences are incomplete, the endnotes are quite often omitted (as they are commonly with most other postings online), and the Greek and Hebrew text needs to be sorted out. Charles Matthews (talk) 06:18, 31 August 2014 (UTC)
- Thanks! I had trouble finding the Index pages, but I finally got there. I will try to do some proofreading. Laura1822 (talk) 22:09, 3 September 2014 (UTC)
- An important issue IMO is to decide a convention for section headings in Page: ns. This makes it easier to automate work with bots if needed. It should be something not related to current article titles, as it might be needed to rename articles in the future. Keep in mind that titles have been reversed. E.g. "Byron, George" in the text has become "George Byron" as title (I am not sure his is a real case ...).--Mpaa (talk) 23:09, 3 September 2014 (UTC)
- Thanks! I had trouble finding the Index pages, but I finally got there. I will try to do some proofreading. Laura1822 (talk) 22:09, 3 September 2014 (UTC)
Request for explanations
[edit]I have just proofread, from the original OCR'd text here (not copying/pasting from another source) Page:Catholic Encyclopedia, volume 1.djvu/33. I initially came here to post a request for someone to check my work for formatting, since I am a newbie to this project (and to WS in general). Specifically, I would like to know if we are keeping inline citations or if I am supposed to mark them with ref tags, and more generally if I had formatted the text properly. Also, I wanted to know if there is a master list of contributors I can refer to with their Author namespace names so I won't have to search for each one. I found a list of them in the volume itself, which has been partially proofread with appropriate Author links. However, (1) it was missing one of the authors on the page I proofread, and (2) this list hasn't yet been fully proofread, so If there's not another master list somewhere to use, I would like to request that someone with some familiarity with the list of contributors make it a priority to work on proofreading those pages, so that I an other proofreaders can refer to it while proofreading.
Meanwhile, I searched for my missing contributor, E. A. Pace, through the regular search engine. He turned out to be Author:Edward Aloysius Pace. I discovered when I visited his Author page that the very articles I had just proofread are already present on WS (though their authors weren't linked).
So now my question is: Am I just duplicating someone else's efforts? Why were the articles already there, if they don't transclude the pages I am proofreading (which I created from scratch)?
- Authors already created are listed here Category:CE contributors or here Category:CE contributors complete but someone is still missing.
- (All, or most) articles are already there as they were imported automatically from an external site. You can see the TOC of the different volumes. But they are not guaranteed to be proofread or might be missing parts.--Mpaa (talk) 22:55, 14 September 2014 (UTC)
- One more thing. In order to transclude them, you need to mark a section begin/end. As I pointed above, how you choose to mark it is important in case we need to have automated tools to run on those pages (e.g. to recreate TOCs, or automating transcluding, etc.).--Mpaa (talk) 22:59, 14 September 2014 (UTC)
- Can you please explain to me how you want me to do this, or point me to some instructions or guidelines? Or show me by correcting the page I proofread? I have never done it before. Thanks for the link to the contributors, but why are there two separate categories? Laura1822 (talk) 18:24, 18 September 2014 (UTC)
- One more thing. In order to transclude them, you need to mark a section begin/end. As I pointed above, how you choose to mark it is important in case we need to have automated tools to run on those pages (e.g. to recreate TOCs, or automating transcluding, etc.).--Mpaa (talk) 22:59, 14 September 2014 (UTC)
- In terms of historical explanation: the CE articles were posted by a bot in 2007, which was before the Page: namespace and ProofReadPage were in operation. They mirrored text already available elsewhere on the Web. There were all sorts of problems with this initial posting, including a large gap under letter E; but it could be said to have covered "most of" the CE text. (I was using the text on enWP in 2008). That is why "upgrade" seemed appropriate. While I was working on the DNB project, I asked for scans of the CE to be uploaded here, so the work of transferring the text to the transclusion method, and correcting it, can now go on.
- I worked out a method of doing that which is time-efficient while working on the DNB. I can explain in detail, but it basically involves creating a long strip of text that is already marked up for transclusion, and then pasting it in page by page.
- The tables of contents of each volume is available as a link through the index pages, so you can find all the text that way. There are still some serious problems, I believe. Charles Matthews (talk) 05:14, 15 September 2014 (UTC)
- Thanks for the explanation. So what do you want me to do in terms of proofreading? Just proofread and leave the transclusion work to you or someone else to do in a large operation later? Or do you want it done article by article or page by page?
- What I still don't understand is whether this upgrade is a re-do from the page up, as it were, so that what you need from me as a proofreader is to just proofread pages as I would on any other work on WS, or if there is some sort of unique method that you expect ordinary editors to learn and do. Is the project not really ready yet for basic proofreading?
- I would also appreciate answers to the questions I posted above about formatting.
- Would you rather I work on the "E" volume than the "A" volume? Thanks for all your hard work and help. Laura1822 (talk) 18:24, 18 September 2014 (UTC)
- The tables of contents of each volume is available as a link through the index pages, so you can find all the text that way. There are still some serious problems, I believe. Charles Matthews (talk) 05:14, 15 September 2014 (UTC)
- The standard method of work here as of 2014, say for a work divided into articles, is to proofread the text in the Page: namespace, and then add the transclusion markup. Then create the article in the main namespace that pulls in the transcluded proofread text.
- The modifications in this case would be (i) you don't have to proofread from scratch: you can use the existing text here of an article to start from, and (ii) you don't need to create the article from scratch, but can replace the header plus text that there is already, by the same header plus transclusion code. So if you are basically checking and tweaking the existing text, you are copying it from the article, into the Page: namespace opposite the scan. The advantage being that anyone can later verify a particular word or date if there seems to be an issue.
- Anyone can start in on any part of a big work such as the CE wherever they wish. And they can use any pre-digitised text there is.
- On the point about inline citations, we leave them in the text.
- Some links:
- Catholic Encyclopedia (1913)/Aaron shows the basic method for transclusion. You can tell it is transcluded by the little numbers in the left margin that link into the Page: namespace where the text actually is. In the edit view of the Aaron article you can see the transclusion code in an example where the article runs over several pages. The transclusion markup needs to be on the first page and last page. You put in section begin and section end tags with name; and when they match on a page the software makes them into one heading between ## and ## at the top. See Help:Transclusion#How to transclude a portion of a page.
- Then there is this index footer:
- Please do ask about any further details. No one finds it that easy to get started here: I certainly didn't.