User talk:Phe
Comment sought ...
[edit]I found a similar existing template to Template:TOC link and started a conversation at John's page User talk:John Vandenberg#Align similar templates. Your thoughts would be welcomed as one template is presumably preferable. billinghurst (talk) 05:54, 6 December 2009 (UTC)
Philosophical Transactions
[edit]I thought you might like to know that the Phil. Trans. is also available at http://rstl.royalsocietypublishing.org/content/by/year. The banner at the bottom and a notice at the top is present, but since you have a script to cut that out, this could be a useful resource as it doesn't need a login like JSTOR (and maybe wouldn't be geared to detect scraping ;-)). Inductiveload (talk) 04:48, 10 February 2010 (UTC)
- Useful, sometimes a few pages are missing on commons. Unhopefully the script I use can only remove the bottom banner but there is probably some way to workaround that. Phe (talk) 07:42, 10 February 2010 (UTC)
Geological Transactions: figures
[edit]A couple of weeks I made a cryptic comment on references to figures in the Geological Transactions. I had arrived at this page on which there is a reference to "(Pl. 31. fig. 1, 2.—Pl. 31*, fig. 3.)". That is how it stands in the original text where the implied instruction is "go to Plate 31 and there look at fig.1 etc". But now the figures have been moved into the main body of the text so perhaps in the edited version we should change the reference to something like "(figures 1 and 2 above and figure 3 below)". I'm not sure if the original plate pages will appear in the final mainspace form but, if they do, there would presumable have to be a link to them as well. The reference would then be something like "(figures 1 and 2 above and figure 3 below and also on Plate 31)". That would be clumsy. I bring up this topic because it will recur throughout the volumes and an agreed policy would be useful. Peter Mercator (talk) 21:16, 28 March 2010 (UTC)
- I modified page 448 to try two ways to provide more information on the plate number, the first use a tool tip on the plate to provide the plate/figure number, the second is more intrusive and use thumb. I prefer the first but perhaps the more explicit way is better?
- Plates are transcluded in mainspace in Transactions of the Geological Society, 1st series, vol. 2/Plates and Maps, we can link directly to the plate with Transactions of the Geological Society, 1st series, vol. 2/Plates and Maps#638 but it's probably better to link to the plate description at the start of page (I added the needed anchor only for plate 14 to 32 [1]), then people going to the description can use the plate link to go to the plate itself, you can test the link in page 448. Phe (talk) 07:39, 29 March 2010 (UTC)
- Thanks for your comments. My gut feeling is to add figure captions which define the plate and in the text link to the plate captions (as you have illustrated). (I'm not particularly fond of tool tips.) Unless you strenuously object I'll go ahead with this method (in this article and the rest of the volume).
- There is something very odd with the figures in this section. Plate32-fig1 is not explicitly mentioned anywhere in the text and it is clearly in an inappropriate place. Since it was also grouped with Plate31-fig2 which is used in the (same) author's next article (section on Assynt) it may belong anywhere between its current position and the position of Plate31-fig2. Perhaps it will become obvious on a closer reading of the text of the two articles. If I can find a suitable spot I'll move it. If not I will simply remove it from the page namespace.
- I would have liked to search the whole of the text of this article with a more powerful editor (emacs). Is it possible to do this? Presumably the pages get assembled in transclusion.
- On a more general level, as a newbie to WS I'm not sure where to direct my queries. They landed in your space because I presume that you put up these pages and did the first proofread. Thanks for your patience. Peter Mercator (talk) 22:44, 29 March 2010 (UTC)
- First there was a mistake in the example I gave, the anchor name for description was the same than the anchor for the plate itself so this change, now we have anchor for plate description as "#Plate descr plate_nr" and plate anchor as "#plate plate_nr".
- I've nothing against a caption added with the thumb parameter but in some case you'll need to decrease a bit the image width, in a few case, with 440 pixel and thumb parameter to [[Image, the image will not fit in the column width when transcluded.
- I noted also the problem with Plate 32, figure 1, but the order of plate make sense, plate 31, 31* and first part of plate 32 for this article, second part of plate 32 for the other article, and the description of plate 32 figure 1 is (Descr) : "Plate 32. Fig. 1. Contortion of mica slate at Loch Lomond, p. 438." Phe (talk) 09:25, 30 March 2010 (UTC)
- (update) I found it in the Errata page (which was not visible in the Index:), Errata, so pl. 32 fig. 1 must really go after figure 3, perhaps just after and not a bit below as I did. Phe (talk) 09:33, 30 March 2010 (UTC)
- Good. So the order of plate/figs seems to p31f1,2, p31*f3(only), p32f1 and finally p31*(complete). Shall adjust captions and text as per errata (unless you have already done so). I'll do all the errata if you wish. To prevent errata edits being undone I suggest a comment in the edited text. Peter Mercator (talk) 10:21, 30 March 2010 (UTC)
- I didn't apply the Errata (or rather I don't remember to apply it for any volume of the GSL), a comment in the text for errata will be fine. Phe (talk) 10:28, 30 March 2010 (UTC)
- Good. So the order of plate/figs seems to p31f1,2, p31*f3(only), p32f1 and finally p31*(complete). Shall adjust captions and text as per errata (unless you have already done so). I'll do all the errata if you wish. To prevent errata edits being undone I suggest a comment in the edited text. Peter Mercator (talk) 10:21, 30 March 2010 (UTC)
- (update) I found it in the Errata page (which was not visible in the Index:), Errata, so pl. 32 fig. 1 must really go after figure 3, perhaps just after and not a bit below as I did. Phe (talk) 09:33, 30 March 2010 (UTC)
- Wikisource:Scriptorium is a good place for general purpose and technical question. Perhaps someone will now how to get the text for a set of page. It's probably not difficult to do with pywikipedia's scripts. Phe (talk) 09:33, 30 March 2010 (UTC)
Sorry. Just hacking away at the moment. Give me a chance to get on even keel.Peter Mercator (talk) 15:20, 30 March 2010 (UTC)
- Oops, I've been confused by the time of your edit, I though it was two hours ago, but it's the offset from my local time to utc time... Phe (talk) 15:22, 30 March 2010 (UTC)
- OK, I give in. The link on page 448 now goes to the appropriate description section but within the section the three links to p31, p31*, p32 move down the page to plates 31, 31, 31*. What's going on? The anchors on the figures seem ok. Is there any significance that the page links adjacent to many plates have two numbers superimposed? (BTW, I'm in Edinburgh UK and now on UTC.) Peter Mercator (talk) 15:59, 30 March 2010 (UTC)
- I don't see this behaviour, from the description, the three link go to separate plate. The duplicate page number shouldn't change anything, from the description we don't link to page number but to #Plate xxx. Perhaps you are trying the link from On the Geology of various parts of Scotland in main:, but after you modified page 448 you didn't reload the page so you continue to use the version with the link not corrected. Phe (talk) 16:12, 30 March 2010 (UTC)
- I have it. My default text zoom in FF is 120%. Resetting this to 100% cured the problem. (Creeping senility caused typo: I'm now on UTC+1). Peter Mercator (talk) 16:30, 30 March 2010 (UTC)
- I don't see this behaviour, from the description, the three link go to separate plate. The duplicate page number shouldn't change anything, from the description we don't link to page number but to #Plate xxx. Perhaps you are trying the link from On the Geology of various parts of Scotland in main:, but after you modified page 448 you didn't reload the page so you continue to use the version with the link not corrected. Phe (talk) 16:12, 30 March 2010 (UTC)
- OK, I give in. The link on page 448 now goes to the appropriate description section but within the section the three links to p31, p31*, p32 move down the page to plates 31, 31, 31*. What's going on? The anchors on the figures seem ok. Is there any significance that the page links adjacent to many plates have two numbers superimposed? (BTW, I'm in Edinburgh UK and now on UTC.) Peter Mercator (talk) 15:59, 30 March 2010 (UTC)
Well that's my first WS article done and dusted. I now understand a little about WS but I feel that I've just scratched the surface. I haven't validated the four pages starting at 448. Perhaps you can check that the figs and captions all work together. (I did check the text). Peter Mercator (talk) 22:19, 30 March 2010 (UTC)
- I made two changes ([2]), the added comment is not useful once the page is transcluded, the referred figure are shown just above and the meaning of "in some of the figures" is enough evident (well, that sort of things is also a matter of taste, but I preferred to keep the text as near as possible to the original text), second change follow the same rational, I applied the errata as it without any change (except pl. 31 --> pl. 31*), even if your wording was better. I let you to validate these four pages. (I could advance the proofread state, but the intent of validation is that only a different editor can advance the state a second time to ensure the state comes to "validated" only after it has been checked by two different people, in some case the software enforce this policy but not in this case) Phe (talk) 06:58, 31 March 2010 (UTC)
Geological Transactions: titles and authors
[edit]I presume that the readers of the future will probably arrive via the page for Geological Society of London and will then head off to say volume 2. They may be armed with an author name, say John MacCulloch, and a (true) title, say "XVIII. Miscellaneous Remark accompanying a Catalogue of Specimens transmitted to the Geological Society." Well, where are they? Your short title for this article "On the Geology of various parts of Scotland" is not an obvious choice and there is no sign of names. Of course one can find the article by going into the actual contents page and then jumping to the text. However I find the clash between full titles and your short titles a little confusing, particularly when both are visible at the same time as here. Moreover, on the author page for MacCulloch we find the invented short titles: surely one must have the actual titles listed here. Sorry to stir up problems but I was very puzzled when I started editing MacCulloch's paper. How puzzled will the default reader be in years to come? Perhaps there should be no use of short titles? Peter Mercator (talk) 11:57, 31 March 2010 (UTC)
- I hesitated a lot on that when I started to work on GSL publications, but first a remark, I didn't invent these titles, short titles come from the running header at top of each page, and it's a common habits to use the short title variant to refer to such paper. Compare a search on short title vs. long title, both are used, and when the title is very long, the short title is always used more often than the long.
- Beside that I agree it's problematic to show only the short title everywhere especially in the case you pinpoint where the short and long title are completely different. Perhaps we should transclude the three Contents pages instead of using a short title summary in Transactions of the Geological Society, 1st series, vol. 2, but 1) we will lost the link to Plate and Maps 2) we will get exactly the same problem you describe, how people will figure out than "On the Geology of various parts of Scotland" is identical to "Miscellaneous Remark accompanying ..." ?. For the author pages, I dunno if it's better to use the short or long title. For the article title itself and the prev/next article links in each article, the short title is the only viable option, as article title length is limited to 256 bytes. It'll interesting to check which title variant is used inside these books themselves, the short one, the long, both ? Phe (talk) 12:49, 31 March 2010 (UTC)
- This game gets more interesting by the day. Sorry to accuse you of invention; I should have spotted that you were using the short titles. As you point out both long and short titles are used in the literature. So may I just add one or two tentative suggestions for I know that you have already invested a great deal of thought and effort on GSL. Since a reader might enter searching for long title/ short title/ author, perhaps the main entry for Transactions of the Geological Society, 1st series, vol. 2 could be structured as follows:-
- Prelims: Title page/officers/other notes/actual contents pages --- transcluded together
- Articles (not a linked item, just an entry]
- Donations
- Index
- Plates/Maps
- followed immediately with a four column list showing
- Long title/Short title/author(s)/pagenumber link.
- I don't know how your data is structured so I can't assess whether such a page could be easily constructed. Other points. Does the prev/next banner have to use any titles at all? Simple prev/next 'buttons' could possibly suffice? Possibly the index page could have the same four column table instead of the present list of sort titles? (But does it need titles at all?) Finally, I suppose that the authors page should probably include both long and short titles!Peter Mercator (talk) 14:48, 31 March 2010 (UTC)
- I've started a test page, but I've not a lot of time to enhance it (the two first test are constrained to a short column but I dunno if it's a good idea). Don't take care about the red link, they are blue when transclusion is done in the right page. It shows already a few problem, transclusion of contents is easy and reuse existing pages but do we need the long author description ? "By J. Mac Culloch, M.D. F.L.S. Chemist to the Ordnance, and Lecturer on Chemistry at the Royal Military Academy at Woolwich, and Vice-President of the Geological Society". At the bottom of the test page I've added the first row of a table showing all the information we need, perhaps we need only this table (beside that an advantage of a table is that we can make it sortable by column (exemple). For the index I think we need to keep a summary, it's the habits on most Index:, but we can't use the four or five column table of contents as in the test page, it's probably too wide... For the data structure we have, it consists actually only in the Page:*, we can also add some sort of markup tag in Page:* to be able to transclude a specific portion of page, but the Page:* code will become quickly clumsy. Reversely we can transclude a portion of the Main: page in the index to avoid code duplication but it's boring to do if we go to the table solution. If you have ideas, feel free to test them in the test page by adding a new section or modifying the existing one. Phe (talk) 16:32, 31 March 2010 (UTC)
- This game gets more interesting by the day. Sorry to accuse you of invention; I should have spotted that you were using the short titles. As you point out both long and short titles are used in the literature. So may I just add one or two tentative suggestions for I know that you have already invested a great deal of thought and effort on GSL. Since a reader might enter searching for long title/ short title/ author, perhaps the main entry for Transactions of the Geological Society, 1st series, vol. 2 could be structured as follows:-
Perhaps what we have to do is to add a rubrik to the main page for volume 2. The structure of the page would then have five elements:
- A left bold heading saying simply End matter.
- Four links in column for prelims/donation/index/plates. (Prelims links to pp i-ix)
- A left bold heading saying simply Articles.
- A rubrik along the following lines: "The following list of authors and titles is supplemented by a list of the short titles used in the running heads. The short titles are frequently used in references to this volume from later volumes in the series. (And from other publications?)
- The actual table. After the above rubrik it is ok to use "Short title" in the header line. Is issue required? (Perhaps later volumes do have divisions into issues?)
Once the above distinction has been made the prev/next structure is ok. Perhaps just leave the index page as it is. There is no need for "long authors" on the main page: their qualifications and affiliations appear elsewhere.Peter Mercator (talk) 11:13, 1 April 2010 (UTC) (Update) I have added an example of the above layout to your test page. Peter Mercator (talk) 14:04, 2 April 2010 (UTC)
- I've added one more section to the test page, by omitting the short title when it's identical to the start of the title, I think it's enough obvious than On certain Products obtained in the Distillation of Wood, refers to same article as On certain Products obtained in the Distillation of Wood, with some account of Bituminous Substances, and Remarks on Coal. Phe (talk) 15:09, 3 April 2010 (UTC)
One or two final mods to the rubrik and layout. I'm perfectly happy to go ahead with this format (perhaps with the table centred and perhaps no italics). I'm prepared to help you on these pages for the four volumes. I would have to hack away with emacs macros but you may be able to do better with scripts. If the prelims page I constructed is ok then it could be copied over to with the appropriate name and then we need three more. Cheers, Peter Mercator (talk) 21:56, 3 April 2010 (UTC)
- Table centered, I increased the font size to 90%, either we need a 90% font size or remove the italics for short title. I can't devote time to do the real work at least for a few days. Phe (talk) 08:04, 4 April 2010 (UTC)
Further mods. Narrowed text (text lines were too long). Moved author to right of title as in actual contents. Played with the markup as a learning exercise. The apology: when you originally had a column headed 'issue' I thought you implied the volume had been published in issues and then bound. I don't think this is the case. The numbers are simply serial numbers within the volume and they should be present but I don't think the column needs a title. Lets defer major action for a few days until this page is stabilised. Peter Mercator (talk) 21:47, 4 April 2010 (UTC)
- I've duplicated the last line to try to put the short title on the same line but right aligned, it doesn't work... Do we need really a line feed between long/short title? Phe (talk) 09:33, 5 April 2010 (UTC)
There is only one instance of a different short title in this journal so I think we can afford a linebreak so that it stands out clearly. I don't know if similar problems arise in other volumes. The link to title etc is now to a new page for all the prelims (here). I have added appropriate prev/next pages in the prelims and the first article. I haven't deleted the link to 'Contents' on the main page but it should probably now go. I'm ready to edit the other titles on to the trial page and I'll go ahead now unless you object.Peter Mercator (talk) 20:03, 5 April 2010 (UTC) Update: decided to have a go anyway. What do you think?Peter Mercator (talk) 22:43, 5 April 2010 (UTC)
- Yes, time to go ahead, I've added a last final test version with sortable column on page number and Author, but I used directly span style="display:none" to specify the sort key, perhaps we would import the relevant template from w:en:Category:Sorting templates unless they already exists in ws: Phe (talk) 09:58, 6 April 2010 (UTC)
The sorting is a neat trick. What now? Will you put up the page or should I? As for the future I'm happy to tackle the contents of the other volumes, but not in a rush. It's time to proofread a little more. (Have just replied re 'always'). Peter Mercator (talk) 21:33, 6 April 2010 (UTC)
- I must admit I prefer to proofread too. Phe (talk) 07:18, 7 April 2010 (UTC)
GSL index
[edit]I have made some minor changes to the index page for vol2, here. I have added serial numbers to the first two articles in the contents list and indicated the pages on which these articles start. My motivation is that the index page doesn't indicate the pages where the article start. The changes help editors (I hope). Is there a better way? Shall complete these mods if you agree. Once again I'm thinking of how best to set up pages for the future volumes. Peter Mercator (talk) 21:13, 10 April 2010 (UTC)
- Fine for me, I made a minor change. Beside this change, I think the /Prelims page shouldn't use an abbreviation but be moved to /Preliminary pages Phe (talk) 09:28, 11 April 2010 (UTC)
- Shall make further mods along lines of above. In retrospect I think it would have been better to have separate pages for (a) Preliminary pages and (b) Contents pages. Peter Mercator (talk) 14:29, 11 April 2010 (UTC)
GLS volume 1 problem
[edit]Please look at this page. The page name is incorrect: it should be "Transactions of the Geological Society, 1st series, vol. 1/On the Geology of some parts of Hampshire and Dorsetshire". Do you have admin rights to move the page? You will also see that the title of this page is also incorrect: it is been added as "On the Wrekin . . ." (Hence prev/next header bar is nonsense). I'm just assuming that I can't move the page without admin rights. I could of course create a new page with the correct name but that doesn't seem the right way. (You must have been having a bad day. Look at the first word of the article!) Peter Mercator (talk) 21:38, 10 April 2010 (UTC)
- A gross error, good catch, no need for admin right to move a page but fixing the links can be a bit tricky, many of them comes from Page:* but "what links here" doesn't show them due to the use of convoluted template to create these links. I found another error : On the physical Structure of Devonshire and Cornwell --> Cornwall, I'm fixing it too. Phe (talk) 09:05, 11 April 2010 (UTC)
Phil Tran
[edit]Now moving on to validate Phil Tran volume 4. Perhaps you might like to do same for numbers 1,2. I shall leave number 3 for someone else to validate since I have no wish to edit a single page which uses 'long s'. I didn't use your first line indent style because it screws up the positioning of the drop cap (after transclusion only). Is there a way to have a dropcap for the first para (of a section) and a first line indent for all subsequent paras. In general I have always used no first line indent for the first para of a section and indents for subsequent paras.Peter Mercator (talk) 20:00, 13 April 2010 (UTC)
- There is two problem with dropcap and indent though css. The First letter (the figure) is shifted to the right. The second letter is shifted too, creating a big gap between the first letter and the second. I know only how to fix the second problem, by using a double line feed before the dropcap, see page 57. Phe (talk) 08:54, 14 April 2010 (UTC)
- (update) Do you think we shouldn't use indent through css and use {{gap}} all over the page ? Phe (talk) 16:47, 14 April 2010 (UTC)
- Priorities? It is clear that the drop gap must look ok on the final transcluded version. If the only way of achieving this alongwith subsequent indents is to use gap commands then perhaps we should do this. This is a small pain at the moment but once again it is probably good to agree a solution before launching into the next hundred volumes.Peter Mercator (talk) 21:23, 15 April 2010 (UTC)
- I changed the dropcap template to never do any indent, so it's usable now, except it must be preceded by a double line feed to ensure the text following the dropcap is not indented. Phe (talk) 06:48, 25 April 2010 (UTC)
- Priorities? It is clear that the drop gap must look ok on the final transcluded version. If the only way of achieving this alongwith subsequent indents is to use gap commands then perhaps we should do this. This is a small pain at the moment but once again it is probably good to agree a solution before launching into the next hundred volumes.Peter Mercator (talk) 21:23, 15 April 2010 (UTC)
- (update) Do you think we shouldn't use indent through css and use {{gap}} all over the page ? Phe (talk) 16:47, 14 April 2010 (UTC)
GLS vol2
[edit]Phe. I've been doing a fair amount of editing as you will realise. Most should be unexceptionable apart from this page. There were a few errors in the table which I have corrected but I moved the status back to proofread because I think the table should be rechecked. Is moving status back an acceptable action? The other edit may upset you: removing the latex fractions and using 'standard' fractions. My reasons are entirely aesthetic for the latex fractions are just too ugly. OK, they mimic the upright fractions of the original but they are much too large---and fuzzy into the bargain. If you object strongly then I'll undo these edits. If you accept them then the rest of the table needs attention. (Of course this method breaks down for fractions not in the 'standard' symbols.)Peter Mercator (talk) 17:00, 25 April 2010 (UTC)
- If you found real error (I mean things other than obvious misspelling where reader can have little doubt about what is the correct word), moving back the status is right for me, moving back is fine too for difficult to read table or text. I started to use latex everywhere but I tend to use the unicode fraction nowadays, these changes are fine for me. By the way, I commented in the above section about the dropcap template. Phe (talk) 18:17, 25 April 2010 (UTC)
- Fine. Signing off for a week!Peter Mercator (talk) 21:51, 25 April 2010 (UTC)
Magic indexing
[edit]Gday Phe. With your magical touch, I was wondering whether you would be able to address the index pages
To note the n and et seq. too. Thanks. — billinghurst sDrewth 13:27, 28 May 2010 (UTC)
- Done Phe (talk) 18:30, 14 June 2010 (UTC)
Index linking
[edit]Gday Phe. I would appreciated it if you would be able to do your extra special index linking for the work Highways and Byways of Sussex. The indexes start at Page:Highways and Byways in Sussex.djvu/469. Thanks. — billinghurst sDrewth 12:02, 23 June 2010 (UTC)
- Done Phe (talk) 08:18, 24 June 2010 (UTC)
Index linking for Index:Essays in librarianship and bibliography.djvu
[edit]Gday. I would appreciate if you were able to perform your index linking magic on the pages
- Page:Essays in librarianship and bibliography.djvu/361
- Page:Essays in librarianship and bibliography.djvu/362
- Page:Essays in librarianship and bibliography.djvu/363
Thanks of you are able to do so. — billinghurst sDrewth 02:20, 2 August 2010 (UTC)
- Done. Phe (talk) 14:21, 2 August 2010 (UTC)
Aboriginal welfare
[edit]Thanks for that & for cleaning up after me at commons. How did you do that? And also do I need to confirm the copyright thing beyond the National Library saying its out of copyright in Australia?Misarxist (talk) 12:04, 15 August 2010 (UTC)
- For the copyright I think the claim of the National Library is sufficient but I'm pretty lame on copyright issue. For the djvu I used tesseract to do the OCR but it is tedious to do, I needed to treat each image to strengthen the character drawing. I've no easy how todo it, rather it involded using djvulibre tools to extract image, ImageMagick tools to treat each image, tesseract to do the OCR and an additional pass through a sort of check speller to clean it up a bit, and even with that the text layer is full of error. Phe (talk) 12:42, 15 August 2010 (UTC)
Sysop
[edit]Hi Phe,
You are now a sysop. If you could add any other languages you're familiar with to the table in WS:ADMINS, that'd be great.—Zhaladshar (Talk) 14:54, 17 August 2010 (UTC)
- Thanks :) Phe (talk) 15:07, 17 August 2010 (UTC)
- Congrats. — billinghurst sDrewth 09:49, 18 August 2010 (UTC)
Index pages of Index:Elizabethan People.djvu
[edit]Gday. I am here to ask for your time and skills to run your magic tool over 10 index pages of the WS:PotM … Page:Elizabethan People.djvu/535 through Page:Elizabethan People.djvu/544 if you are able to do so. Appreciate it if you are able. Thanks. — billinghurst sDrewth 15:37, 21 August 2010 (UTC)
- Done Phe (talk) 16:19, 21 August 2010 (UTC)
Looking for some magic with Author search and links
[edit]Can you think about some potential magic. I am wanting a lazy boy approach to have a search link(s) when operating in Author: namespace that enables me to:
- perform a enWS search (prime focus, and primarily main namespace), and here looking to identify WORKS ABOUT AUTHOR
- potentially expand to a enWP search and Commons search, primarily for existing pages where we are looking for related articles or possible images
- and more potentially a search into Author namespaces of other sister wikisources, example if German author, I might want to see if they have a German page then I can interwiki both ways at that time.
I see the tool more relevant for those of us who construct author pages, and do the more gnomic work. Not sure whether it is a gadget or just something that I would insert into local user files. Thanks for your thought power here. — billinghurst sDrewth 02:08, 25 August 2010 (UTC)
- The two first are difficult to do in a reliable way, lookup in plain text (as opposed to lookup in metadata) is difficult. I didn't found any library catalogue with the necessary API to allow some sort of lookup to get author works/about author works. I'll look the third later. Phe (talk) 14:27, 25 August 2010 (UTC)
- If it came to a plain text, that is better than nothing. For WS/WP/Commons, even some sort of intitle: searching may be better than nothing for existing articles, even the ability to undertake the search that you take for new author pages would be useful, seeing that it is a sort of check that can be useful to run against pre-existing author pages where they may not have changed for a number of years. I wasn't thinking of a perfect tool, I am thinking of a tool that provides some ease to those who are doing the repetitive maintenance gnoming tasks through the site. — billinghurst sDrewth 02:19, 26 August 2010 (UTC)
- I've done the third, but it's limited and half broken, add to your monobook importScript('User:Phe/Interwiki.js'); and regexTool('Adding interwiki', 'add_interwiki()'); to your rmflinks() function. It check only for article in it:, pt: and fr:, see the comments at begin of User:Phe/Interwiki.js why this is done this way. Beside that, it adds iws blindly, if iws already exists some can be duplicated. The script doesn't sort the added iws. It's also of extremely limited use, there is not a lot of page with identical author name. You can test it on Author:William Stanley Jevons.
- If it came to a plain text, that is better than nothing. For WS/WP/Commons, even some sort of intitle: searching may be better than nothing for existing articles, even the ability to undertake the search that you take for new author pages would be useful, seeing that it is a sort of check that can be useful to run against pre-existing author pages where they may not have changed for a number of years. I wasn't thinking of a perfect tool, I am thinking of a tool that provides some ease to those who are doing the repetitive maintenance gnoming tasks through the site. — billinghurst sDrewth 02:19, 26 August 2010 (UTC)
- For your two first request, are you searching ala Template:Search author but with more links and added somewhere automatically ? I'm unsure how and where to add it and how to hide it for people who don't want it or ips. Phe (talk) 07:50, 26 August 2010 (UTC)
- Ermm, the first can be done partially, I implemented it, add importScript('User:Phe/Works about.js'); to your javascript and regexTool('Works about', 'works_about()'); to your rmflinks() function. This script works only if you use the preloading header gadget. Try it on Author:Johann Joachim Becher. Caveats: as you can see the section is added at start of the author: page, adding it at end it error prone because the script in this case act as if nothing occur. The script doesn't try to handle already existing works about entry, it's up to you to remove/move the code. The script handle only a few case, see the begin of the script or ask me with the links you want to be checked by the script, useful information is, a link to an existing article, the way to create a link to such works (template or direct link). Phe (talk) 13:00, 27 August 2010 (UTC)
- For your two first request, are you searching ala Template:Search author but with more links and added somewhere automatically ? I'm unsure how and where to add it and how to hide it for people who don't want it or ips. Phe (talk) 07:50, 26 August 2010 (UTC)
New author page creation tool
[edit]A buglet that I have found with the new creation tool is when one is creating a page name where the last compound word is not a word, eg. Author:Andrew Balfour (1630-1694). It takes the parenthetic word as the last name. Could we do it that it ignores paranthetic words and takes the previous last word. If that is a little tricky, then not a bother, as I can amend manually. — billinghurst sDrewth 04:16, 25 August 2010 (UTC)
- Done, I handle too a few special case, von, van, de and le, but Van or Von remains unhandled. Phe (talk) 13:04, 25 August 2010 (UTC)
Chronicle of the Grey friars of London index pages
[edit]Gday Phe, At some point in time, would be so kind to do your indexing magic upon Page:Chronicle of the Grey friars of London.djvu/145 and the three subsequent pages. Thanks. — billinghurst sDrewth 13:06, 26 August 2010 (UTC)
- Done Phe (talk) 16:15, 26 August 2010 (UTC)
Hi, thanks for putting the book together. (I don’t know how to do that yet.) I love the story because it is about the Cornish coast! Best regards, Another editor (talk) 11:39, 30 August 2010 (UTC)
- Answer on your talk page. Phe (talk) 11:40, 30 August 2010 (UTC)
Gday Phe. Usual pleading. Would you please undertake your indexing magic to Picturesque New Zealand/Index and its Page:namespace components. Thanks. — billinghurst sDrewth 14:45, 11 September 2010 (UTC)
- Done. — Phe (talk) 13:00, 20 October 2010 (UTC)
Gday Phe, would you please be able to run your magic script to link the page numbers to the pages. Thanks. — billinghurst sDrewth 11:38, 18 October 2010 (UTC)
- Done. — Phe (talk) 13:00, 20 October 2010 (UTC)
Bot errors
[edit]Thank you, I replied in your Italian source talk. --Aubrey (talk) 16:01, 21 October 2010 (UTC)
The robot seems to be stuck. I have 3 jobs (actually, 3 times the same job) in the match queue that never get completed. Paolo81 (talk) 20:39, 22 September 2011 (UTC)
- Fixed — Phe 01:11, 23 September 2011 (UTC)
- Thanks, but something weird happened which I reported on my talk page and forgot to post here. --Paolo81 (talk) 18:16, 27 September 2011 (UTC)
- What happens is that when I open Extract from Captain Stormfield's Visit to Heaven/Chapter II and have a look at the page links on the left of the text, when I reach page 64 the links are overlaid one on top of the other, then from page 67 on they appear normal again. I don't know whether it's only me who can see that. --Paolo81 (talk) 20:03, 29 September 2011 (UTC)
- Tried emptying the cache but no luck. Finally opened it with Firefox instead of Safari and it shows ok. Mah! --Paolo81 (talk) 20:06, 29 September 2011 (UTC)
- ok, I see the trouble with chrome too, a bug somewhere ;( — Phe 12:56, 1 October 2011 (UTC)
- Tried emptying the cache but no luck. Finally opened it with Firefox instead of Safari and it shows ok. Mah! --Paolo81 (talk) 20:06, 29 September 2011 (UTC)
Indexing on A Thousand-Mile Walk To The Gulf/Index
[edit]If you could do your magic here it would be appreciated. Thanks. — billinghurst sDrewth 02:40, 21 November 2010 (UTC)
- Done — Phe (talk) 10:23, 21 November 2010 (UTC)
OCRing EB1922 DjVu files
[edit]Hi Phe, You did text layers for the EB1911 DjVu files on commons, and the results are very nice. Would you be able to do the same for the three EB1922 volumes (30, 31, 32)? Thanks, Htonl (talk) 11:57, 15 December 2010 (UTC)
- Actually, never mind, as I see the files do have text layers already. Now I need to figure out why the text isn't automatically being put in the edit box for the Page:'s. - Htonl (talk) 12:03, 15 December 2010 (UTC)
- It occurs from time to time, uploading the file succeed but the text layer is not properly handled, you need to purge the File:, there is a gadget on commons to add a « purge » link to the left menu (but I guess you already found this trick :). — Phe (talk) 19:44, 18 December 2010 (UTC)
- Well, I didn't find it myself, but Zhaladshar found it for me. Thanks. :) - Htonl (talk) 19:55, 18 December 2010 (UTC)
- It occurs from time to time, uploading the file succeed but the text layer is not properly handled, you need to purge the File:, there is a gadget on commons to add a « purge » link to the left menu (but I guess you already found this trick :). — Phe (talk) 19:44, 18 December 2010 (UTC)
Author fill.js breaks in secure
[edit]Gday Phe. The js User:Phe/Author fill.js doesn't work from the secure server space. I am presuming that it is url related https://secure.wikimedia.org/wikisource/en/wiki/... cf https://secure.wikimedia.org/wikipedia/en/wiki/... — billinghurst sDrewth 04:25, 11 January 2011 (UTC)
- Works now, for author_fill.js and the MediaWiki:Gadget-TemplatePreloader.js — Phe (talk) 16:58, 11 January 2011 (UTC)
The Art of Bookbinding/Index and your index+ script
[edit]Phe, When you have a spare moment or three, I would appreciate it if you would be able to run your special index script over The Art of Bookbinding/Index to pair the page numbers on the index pages to the chapters. Thanks. Billinghurst (talk) 11:09, 11 March 2011 (UTC)
- Did it for Page:The Art of Bookbinding, Zaehnsdorf, 1890.djvu/221 then self-reverted after seeing most page number are wrong. — Phe (talk) 19:50, 11 March 2011 (UTC)
- <deskthunk> good catch. I replaced the wrong parameter on my regex. Fixed and ready for a second try if you wouldn't mind. Thanks. Billinghurst (talk) 12:28, 12 March 2011 (UTC)
- Done — Phe (talk) 12:39, 12 March 2011 (UTC)
- <deskthunk> good catch. I replaced the wrong parameter on my regex. Fixed and ready for a second try if you wouldn't mind. Thanks. Billinghurst (talk) 12:28, 12 March 2011 (UTC)
Indexes in need of treatment
[edit]Gday. When you have some time, would you be so kind as to run your indexing script over
Thanks. — billinghurst sDrewth 18:14, 23 July 2011 (UTC)
- Done. — Phe (talk) 11:06, 24 July 2011 (UTC)
- Thanks again. Could I also ask for Picturesque Nepal/Index? — billinghurst sDrewth 02:22, 27 July 2011 (UTC)
- Done — Phe (talk) 09:46, 27 July 2011 (UTC)
- Thanks again. Could I also ask for Picturesque Nepal/Index? — billinghurst sDrewth 02:22, 27 July 2011 (UTC)
Match and split
[edit]I have raised some points on how this is being used, in summary
- If the transcript is good, say a PG text, there is nothing to be done. Moving it to a scan is pointless.
- If the transcript is different, another or multiple editions, then it is very misleading to place it against a scanned edition. Detecting any differences, if attempted, is very difficult and time-consuming, much more bother than correcting ocr.
I don't think this bot should be automatically invoked, especially by users who are unaware of these considerations. CYGNIS INSIGNIS 00:22, 16 August 2011 (UTC)
- This user added these text, look like you know better than him from what edition these text come... — Phe 00:38, 16 August 2011 (UTC)
- A general comment on how this is being used, but in the recent example I don't know anything, because PG doesn't specify which edition it is. Even investigating that takes more time, and I know from experience that PG texts do not "match". CYGNIS INSIGNIS 01:20, 16 August 2011 (UTC)
- Yeps, PG failing to give edition info mean we should match it with the nearest possible edition. — Phe 08:22, 16 August 2011 (UTC)
- Why should a second-hand transcript be used to replace the OCR text layer? CYGNIS INSIGNIS 09:32, 16 August 2011 (UTC)
- Because it's a good start point. — Phe 12:44, 16 August 2011 (UTC)
- Why should a second-hand transcript be used to replace the OCR text layer? CYGNIS INSIGNIS 09:32, 16 August 2011 (UTC)
- Yeps, PG failing to give edition info mean we should match it with the nearest possible edition. — Phe 08:22, 16 August 2011 (UTC)
- A general comment on how this is being used, but in the recent example I don't know anything, because PG doesn't specify which edition it is. Even investigating that takes more time, and I know from experience that PG texts do not "match". CYGNIS INSIGNIS 01:20, 16 August 2011 (UTC)
Index magic
[edit]Gday Phe. When you have the time, would you be so kind to please apply your index magic to The Saxon Cathedral at Canterbury and The Saxon Saints Buried Therein/Index. Thanks. — billinghurst sDrewth 16:18, 23 August 2011 (UTC)
- Done, but you would consider to ask for links only after page validation, it's much more difficult to validate them now the code is full of template. — Phe 16:54, 24 August 2011 (UTC)
PSM progress statistics
[edit]Hi. I saw your (I guess) statistics page here. My goal is to create statistics for PSM project. Before I saw your page I did something with the help of Hesperian, based on pywikipediabot. You can see some results here, User:Mpaa/Sandbox1, User:Mpaa/Sandbox2, User:Mpaa/Sandbox3. I have 2 questions:
1. could your tool and graphs be customised/used to look only at PSM project pages so that we could use that instead? That would be much better than my newby attempt.
2. if not, any suggestion on how I can extract not only current status but also create deltas in a smart way, given the API I am using? E.g. using timestamps maybe?
Thanks. --Mpaa (talk) 19:24, 24 October 2011 (UTC)
- 1. not easily.
- 2. I don't think there is any way to get past statistics, the only way you have is to gather statistics each day, save it somewhere and allow diffing with previous days. Dunno what API your are using actually but an efficient way is trough api.php [3] (note how I put cllimit twice as gaplimit to simplify iteration as it allows to get all the needed information with one query per volume), I don't think the needed code actually exists in pywikipedia. — Phe 15:22, 25 October 2011 (UTC)
phe-bot bits
[edit]- The Websockets stuff doesn't seem to work for FF7, well, this throws the error your browser does not have websocket support; try Google Chrome or Firefox 4.
- I cannot remember the syntax for http://toolserver.org/~phe/stats_diff.txt to get it display a longer period. I am wanting to check the Page: ns traffic for all wikis back to when the LF error was introduced. Trying to work out how much I would be letting myself in for if I bot fixed it. Thanks. — billinghurst sDrewth 03:55, 29 October 2011 (UTC)
Typo
[edit]Thanks for fixing the typo! I am sure there are more, as I am still proofreading the article... slowly but surely! :) Londonjackbooks (talk) 14:48, 5 November 2011 (UTC)
Not sure if you can help ... — billinghurst sDrewth 22:45, 5 January 2012 (UTC)
How math-intensive pages are OCRd
[edit]Hi Phe,
I noticed with great intrigue how Phe-bot seemed capable of OCRing pages like this one: http://en.wikisource.org/w/index.php?title=Page:Elements_of_the_Differential_and_Integral_Calculus_-_Granville_-_Revised.djvu/254
I am looking to similarly index and OCR pages that are very intensive in math. How is it possible to convert them to mediawiki latex as you seem to have done?
Thanks in advance, Danachandler (talk) 17:59, 13 January 2012 (UTC)
- It wasn't ocr'ed but splited from a work already corrected in main [4] — Phe 21:17, 13 January 2012 (UTC)
At some point when you have an opportunity, it would be lovely if you would be able to index A Desk Book on the Etiquette of Social Stationery/Index. Thanks. — billinghurst sDrewth 03:45, 21 January 2012 (UTC)
- Done. — Phe 14:07, 21 January 2012 (UTC)
OCR!!!
[edit]Thanks for repairing the OCR tool. However, the button still doesn't show up in the new toolbar. Replacing
function addOCRButton2(id,comment,source,onclick){ var tb = document.getElementById("toolbar"); if(tb){ ... tb.appendChild(image); } }
with:
function addOCRButton2(id,comment,source,onclick){ var tb = document.getElementById("wikiEditor-ui-toolbar"); if(tb){ ... tb.firstChild.lastChild.appendChild(image); } }
will make it work, but if you have a better way, please do it. Thanks! --Eliyak T·C 06:39, 24 January 2012 (UTC)
- Canduala did it. — Phe 23:06, 24 January 2012 (UTC)
You're invited to Wikimedia events in June and July: script, template, bot, and Gadget makers wanted
[edit]I'm sorry -- I only speak English.
I invite you to the yearly Berlin hackathon, 1-3 June. Registration is now open. If you need financial assistance or help with visa or hotel, then please register by May 1st and mention it in the registration form.
This is the premier event for the MediaWiki and Wikimedia technical community. We'll be hacking, designing, teaching, and socialising, primarily talking about ResourceLoader and Gadgets (extending functionality with JavaScript), the switch to Lua for templates, Wikidata, and Wikimedia Labs.
We want to bring 100-150 people together, including lots of people who have not attended such events before. User scripts, gadgets, API use, Toolserver, Wikimedia Labs, mobile, structured data, templates -- if you are into any of these things, we want you to come!
I also thought you might want to know about other upcoming events where you can learn more about MediaWiki customization and development, how to best use the web API for bots, and various upcoming features and changes. We'd love to have power users, bot maintainers and writers, and template makers at these events so we can all learn from each other and chat about what needs doing.
Check out the the developers' days preceding Wikimania in July in Washington, DC and our other events.
Best wishes! - Sumana Harihareswara, Wikimedia Foundation's Volunteer Development Coordinator. Please reply on my talk page at mediawiki.org. Sumanah (talk) 00:07, 9 April 2012 (UTC)
would you be so kind to run your indexing component trickster through this page and work. If you can, thanks. — billinghurst sDrewth 13:10, 20 April 2012 (UTC)
- Done, there is an error on Page:Horsemanship for Women.djvu/173, a reference to page 192 which doesn't exist. — Phe 15:50, 27 April 2012 (UTC)
one word displaced by _Match_ routine
[edit]I'm new to the use of Phe-bot but ran the Match (and Split) routine on the page The Book of the Thousand Nights and a Night/Volume 3 and it has put the first word of every page onto the previous page. Is this a known problem? If so, is it being fixed? Is this the right place to report errors? Chris55 (talk) 23:42, 13 July 2012 (UTC)
- Yes it's a known problem, I get again a look at it and there is little way the bot can be improved. The trouble come partly from the running header in the ocr, the note at the bottom of page (on some page) and the ocr error (not a lot in this work). The text never match exactly, get a look at the proposed text at boundary of page 17-18
Then she wept with sore weeping and waxed wroth and shuddered in my face with skin bristling [FN#1] and looked at me with " and the ocr " TJaea she wept with sore weeping and waxed wroth and shuddered in VOL. III. A 2 Alf Laylah wa Laylah. my face with skin bristling^ and looked at me with "
the " VOL. III. A 2 Alf Laylah wa Laylah. " part in the ocr make the matching very approximate. — Phe 11:46, 14 July 2012 (UTC)
- Btw, it's the reason why the match and split is done in two step, first step match, then adjust manually the page boundary (better to upload the djvu and use a djview viewer to do that) then the second step, split the text with the bot. — Phe 11:50, 14 July 2012 (UTC)
- Thanks for looking. Yes, there's a footer and header to deal with, but apart from that (which is fairly predictable) the scans are pretty good. The line starting with a number at the top is a giveaway that's it's a running header and should be stored there.
- Possibly it doesn't matter (after all it will produce the right output), but since it gets so close and is relatively predictable in its mistakes I wonder whether it's worth looking at the code. Is it possible to have a look at it? Chris55 (talk) 17:13, 14 July 2012 (UTC)
- Remember it's a general purpose tool working for many lang, book type and ocr accuracy, it's not so predictable it look like. The code is available at [5], match part is done in align.py, function do_match(). Perhaps a solution will be to systematically move the last word of a page to the next page, if this is a very common error. Note there is already an auto fixup in some case (the part with a comment starting with # Move the end of the last page to the start of the next page ...), adding a if not match: # move the last word... after this if ... else will do the trick. — Phe 11:53, 15 July 2012 (UTC)
- Thanks, I really wanted to see how feasible it might be to add other common but optional tasks, such as converting to wiki line structure and dealing with headers/footers. But I'm really learning... Chris55 (talk) 12:27, 15 July 2012 (UTC)
- Remember it's a general purpose tool working for many lang, book type and ocr accuracy, it's not so predictable it look like. The code is available at [5], match part is done in align.py, function do_match(). Perhaps a solution will be to systematically move the last word of a page to the next page, if this is a very common error. Note there is already an auto fixup in some case (the part with a comment starting with # Move the end of the last page to the start of the next page ...), adding a if not match: # move the last word... after this if ... else will do the trick. — Phe 11:53, 15 July 2012 (UTC)
- Btw, it's the reason why the match and split is done in two step, first step match, then adjust manually the page boundary (better to upload the djvu and use a djview viewer to do that) then the second step, split the text with the bot. — Phe 11:50, 14 July 2012 (UTC)
Toolserver
[edit]Hi Phe, FYI, the toolserver froze on Tuesday 24th :-). Thanks for your work. --Aubrey (talk) 08:34, 28 September 2012 (UTC)
- Yeps, replication of the db stopped at that date, no idea when it'll recover. — Phe 17:24, 28 September 2012 (UTC)
Index links for Tracks of McKinlay and party across Australia/Index
[edit]Gday Phe. If you are still in the business of linking index pages, would you be so kind to do you magic with the linked work. At the moment there are just a few pages to be validated so should ready to go shortly. Thanks if you can. — billinghurst sDrewth 14:52, 2 December 2012 (UTC)
- Done. — Phe 15:53, 3 December 2012 (UTC)
- OOps, my bad, I linked the TOC not the index... — Phe 15:54, 3 December 2012 (UTC)
- I did the first but you need to check [6], the second page contains also a few error. — Phe 16:01, 3 December 2012 (UTC)
- OOps, my bad, I linked the TOC not the index... — Phe 15:54, 3 December 2012 (UTC)
Is there a replacement for ThomasBot?
[edit]Hi Phe, I know you've taken over some of the functions that ThomasBot used to perform. Has someone taken over the Dynamic Links function (to create the lists as in fr:Biographie universelle ancienne et moderne/2e éd., 1843/Tome 1)? Beeswaxcandle (talk) 04:02, 5 May 2013 (UTC)
oldwikisource:MediaWiki:Dictionary.js and EPUB
[edit][move whole post] Hi Tpt,
How have you formatted frWS works that use the DL nomenclature of Dictionary.js so that they are exported by EPUB tool? Example here is A_Dictionary_of_Music_and_Musicians/A which breaks. Thanks. — billinghurst sDrewth 04:03, 27 May 2013 (UTC)
- I don't know if there is any pieces of code related to export in EPUB of frWS works that use DL nomenclature. You should ask Phe (talk • contribs) who manages Dictionary.js and is also involve in WSexport. Tpt (talk) 14:44, 27 May 2013 (UTC)
- Moving the whole discussion to sit with Phe. :-) 07:24, 28 May 2013 (UTC)
- There is no support for exporting DL to epub, where do you see this ? Beside that I'm less and less convinced by DL, and discourage its use on fr (lack of category in DL article,
lack of automated export of microformat.Look like tou have something like that in this example). Currently I prefer article dictionary creation by a bot using the section tag ala ## "Abegg" ## to get the article name and the page number. — Phe 22:09, 29 May 2013 (UTC)
- There is no support for exporting DL to epub, where do you see this ? Beside that I'm less and less convinced by DL, and discourage its use on fr (lack of category in DL article,
- Moving the whole discussion to sit with Phe. :-) 07:24, 28 May 2013 (UTC)
Template:ALL TEXTS
[edit]{{ALL TEXTS}} update by bot, anyhow same bot can we use for other wikisourece like ours (bn.wikisource.org) statictis 1,074,104 is not actual number of text in wikisource in all wiki,. Please help.Jayantanth (talk) 18:02, 18 November 2013 (UTC)
- Answered on my talk page at wikisource.org. — Phe 16:07, 19 November 2013 (UTC)
Follow on from above
[edit]Forgive me if I’m missing something obvious, but what does Template:ALL TEXTS actually represent? Could you add some notes to the template or talk page? Moondyne (talk) 05:49, 23 January 2014 (UTC)
- Aha, got it from Wikisource:Administrators' noticeboard/snapshot. No. of pages in Main = 299,843. Moondyne (talk) 07:14, 23 January 2014 (UTC)
A tool to replace first page of djvu/pdf?
[edit]Heyho. Commons deletionists are now identifying that some Commons djvu/pdf files may have the Google added lead page that they see as copyright, and thus 'poisonin'g the whole djuv/pdf file, and therefore delete it all. Is there some for a cropbot like tool that grabs the djvu and/or pdf file, pulls it to toollabs, removes and replaces the lead page with a blank page, and then puts the file back as an overwrite? Thanks. — billinghurst sDrewth 23:56, 21 September 2014 (UTC)
- hmm, unlikely to be done 'cause accepting such removal is not compatible with CC-BY/CC-BY-SA. Some commonist should read the licence they accept. "attribution – You must attribute the work in the manner specified by the author or licensor" ([7]). emphasis mine, I think the wording is enough simple, perhaps if they read it a dozen of a time they'll start to understand it. — Phe 18:22, 23 September 2014 (UTC)
- Beside that, they'll not happy with that, what about the watermark on each page ? — Phe 18:43, 23 September 2014 (UTC)
- Stamping a work that is not yours with CC-BY/CC-BY-SA doesn't make it true. The watermark is a watermark, if they can be stripped, then I am all for that too. — billinghurst sDrewth 00:12, 24 September 2014 (UTC)
- Ok on that point, but it's also a simple matter of politeness, if the watermark is not hiding some part of the image, I'm not at all to remove them. — Phe 00:54, 24 September 2014 (UTC)
- Stamping a work that is not yours with CC-BY/CC-BY-SA doesn't make it true. The watermark is a watermark, if they can be stripped, then I am all for that too. — billinghurst sDrewth 00:12, 24 September 2014 (UTC)
- Beside that, they'll not happy with that, what about the watermark on each page ? — Phe 18:43, 23 September 2014 (UTC)
My recent efforts should have eliminated a few entries. Any chance of re-generating the list, or making it a dynamic labs query against the Wikisource database?ShakespeareFan00 (talk) 22:57, 24 September 2014 (UTC)
Index magic
[edit]Hi Phe. Would you mind applying your link indexing magic to Index:Divorce of Catherine of Aragon.djvu all now validated. Thanks if you can. — billinghurst sDrewth 09:00, 7 November 2014 (UTC)
- Done. Ping me on irc, if I'm too slow to answer on my talk page. — Phe 14:34, 20 November 2014 (UTC)
Call me confused — Phetools statistics
[edit]Hi Phe. I am looking at a report which is a report from 1 Nov to today (at time of creation), and it shows me these results
Difference between Fri Oct 31 2014 and Thu Nov 27 2014
Page namespace | Main namespace | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
language | all pages | not proof. | problem. | w/o text | proofread | validated | all pages | with scans | w/o scans | disamb | percent |
fr | 12126 | 3095 | -7 | 1148 | 7890 | 2018 | 1124 | 2743 | -1669 | 50 | 1.07 |
en | 10850 | 3916 | 218 | 677 | 6039 | 5680 | 1205 | 1347 | -169 | 27 | 0.31 |
…
If I interpret that report it means that this month that enWS has only moved 359 works to the proofread only status this month. If that is the case I find that hard to believe as I have done 100+ on one book alone (none validated) and Ineuw has been working on his PSM. Could you please explain this to me? — billinghurst sDrewth 01:00, 28 November 2014 (UTC)
- Hi, you get 359 with 6039-5680, but this has no meaning, what you have here is during this period you get +6039 yellow pages and +5680 green pages, all number for a given state are the number of page at the end of the period minus the number of page at the begin of the period. — Phe 23:36, 29 November 2014 (UTC)
- A note on the -7 problematic pages, it doesn't mean that 7 problematic has been fixed, but rather if during this period 10 pages has been marked as problematic then 17 problematic has been fixed so on the overall there is -7 problematic pages, same apply for other field, +5680 can mean than 5780 has been validated but 100 has been downgraded or deleted. This is important to understand variation of yellow pages, if 1000 pages are passed from yellow to green and zero page has been moved from red to yellow then you'll get -1000 yellow and +1000 green. — Phe 23:43, 29 November 2014 (UTC)
- Then on the page the wording that says "The "proofread" column counts all the pages that have been proofread : [category q3] + [category q4]" seems incorrect, as proofread by your logic is just [category q3] — billinghurst sDrewth 00:28, 30 November 2014 (UTC)
- Ok, I see the trouble now, I'll need to get a snapshot of all pages state with their title at 24 hours intervall, and compare the change to the en: recents changes. — Phe 00:59, 30 November 2014 (UTC)
- After thinking about it, the way you get 359 is not correct, 359 is the number of page proofread during this period AND that wasn't validated during the same period. Let say today only one page was proofread then validated, tomorrow in the stat the validated field will be increased by +1 and proofread by +1 too. Because proofread is q3+q4 doesn't mean mean proofread will be increased by +2. The state changed twice in a row but what is visible from the statistics is only one change from red --> green, because statistics provide count at fixed point in time not the continuous evolution of page state. So if you count proofread page by (validated - proofread) fields you'll get zero, not the correct result. I remember Ankry had question about that one or two years ago and it look like many people are confused about statistics, even me I've trouble to remember how exactly it works each time ask for clarification on statistics... — Phe 22:20, 30 November 2014 (UTC)
- Yes, I understand that, though I still don't believe it. As I said, I had done 100+ pages of a work which would equate to over 1/4 of that change (none are validated). I don't find it credible that the dynamic of enWS changes that amount from a PotM especially when historically we proofread ~250 pp a day, and validate approximately half of that. That is to say that I don't think that we drop from 30x 250 to 30x 12. I can run those basic statistics in my head, and analysis of data is part of my work, and to me even with a concerted effort in validation there are plenty who don't participate in PotM and continue on with their projects and that is predominantly proofreading. Just looking at Special:Contributions/Ineuw, Index:Popular Science Monthly Volume 26.djvu and Special:RecentChangesLinked/Index:Popular Science Monthly Volume_26.djvu and my additions to my work would seem to threaten the proofread additions alone theorem. — billinghurst sDrewth 09:55, 2 December 2014 (UTC)
- Not the first time that sort of things occur, [8] and look at [9] for example, a lot of validated page. Anyway I'm downloading a dump of en.ws and I'll check if the tag validated in the text is consistent with the contents of the validated category. — Phe 12:59, 2 December 2014 (UTC)
- Yes, I understand that, though I still don't believe it. As I said, I had done 100+ pages of a work which would equate to over 1/4 of that change (none are validated). I don't find it credible that the dynamic of enWS changes that amount from a PotM especially when historically we proofread ~250 pp a day, and validate approximately half of that. That is to say that I don't think that we drop from 30x 250 to 30x 12. I can run those basic statistics in my head, and analysis of data is part of my work, and to me even with a concerted effort in validation there are plenty who don't participate in PotM and continue on with their projects and that is predominantly proofreading. Just looking at Special:Contributions/Ineuw, Index:Popular Science Monthly Volume 26.djvu and Special:RecentChangesLinked/Index:Popular Science Monthly Volume_26.djvu and my additions to my work would seem to threaten the proofread additions alone theorem. — billinghurst sDrewth 09:55, 2 December 2014 (UTC)
- After thinking about it, the way you get 359 is not correct, 359 is the number of page proofread during this period AND that wasn't validated during the same period. Let say today only one page was proofread then validated, tomorrow in the stat the validated field will be increased by +1 and proofread by +1 too. Because proofread is q3+q4 doesn't mean mean proofread will be increased by +2. The state changed twice in a row but what is visible from the statistics is only one change from red --> green, because statistics provide count at fixed point in time not the continuous evolution of page state. So if you count proofread page by (validated - proofread) fields you'll get zero, not the correct result. I remember Ankry had question about that one or two years ago and it look like many people are confused about statistics, even me I've trouble to remember how exactly it works each time ask for clarification on statistics... — Phe 22:20, 30 November 2014 (UTC)
- Ok, I see the trouble now, I'll need to get a snapshot of all pages state with their title at 24 hours intervall, and compare the change to the en: recents changes. — Phe 00:59, 30 November 2014 (UTC)
- Then on the page the wording that says "The "proofread" column counts all the pages that have been proofread : [category q3] + [category q4]" seems incorrect, as proofread by your logic is just [category q3] — billinghurst sDrewth 00:28, 30 November 2014 (UTC)
- A note on the -7 problematic pages, it doesn't mean that 7 problematic has been fixed, but rather if during this period 10 pages has been marked as problematic then 17 problematic has been fixed so on the overall there is -7 problematic pages, same apply for other field, +5680 can mean than 5780 has been validated but 100 has been downgraded or deleted. This is important to understand variation of yellow pages, if 1000 pages are passed from yellow to green and zero page has been moved from red to yellow then you'll get -1000 yellow and +1000 green. — Phe 23:43, 29 November 2014 (UTC)
Here the table:
Difference between Sun Nov 2 2014 and Tue Dec 2 2014
Page namespace | Main namespace | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
language | all pages | not proof. | problem. | w/o text | proofread | validated | all pages | with scans | w/o scans | disamb | percent |
fr | 15366 | 5049 | 0 | 1194 | 9123 | 2106 | 1185 | 2969 | -1839 | 55 | 1.17 |
en | 12291 | 4210 | 342 | 665 | 7074 | 5943 | 1277 | 1615 | -366 | 28 | 0.39 |
Higher number than 359, around 1100 but the same pattern persists: unusual low number compared to another 30 days period. Here Validated page and Proofread page using the whole rc change during the last 30 days, from them I get 1300 page proofread, and from the stats 1130 page proofread, the 170 difference are today change (stats taken around 4:40), so yes, the result you showed look like unusual but it's not conclusive. The rc change table cover only the last 30 days so I can't completely check if the trouble is real or not. The result from the last dump taken the 26 November: q3 = 273636, q4 = 186234, q3+q4= 459870 (q3+q4 today from stats 461347), this doesn't look like buggy too. I guess I'll need to get a daily snapshot of the whole content of the validated/proofread/problematic/empty text category and if we get unusual pattern again to check for Page: disappearing from the cat (from last day and last week there is nothing unusual afaics). — Phe 16:57, 2 December 2014 (UTC)
Regeneration of this list appreciated
[edit]User:Phe/Check indexShakespeareFan00 (talk) 15:49, 6 January 2015 (UTC)
New Proposal Notification - Replacement of common main-space header template
[edit]Announcing the listing of a new formal proposal recently added to the Scriptorium community-discussion page, Proposals section, titled:
The proposal entails the replacement of the current Header template familiar to most with a structurally redesigned new Header template. Replacement is a needed first step in series of steps needed to properly address the long time deficiencies behind several issues as well as enhance our mobile device presence.
There should be no significant operational or visual differences between the existing and proposed Header templates under normal usage (i.e. Desktop view). The change is entirely structural -- moving away from the existing HTML all Table make-up to an all Div[ision] based one.
Please examine the testcases where the current template is compared to the proposed replacement. Don't forget to also check Mobile Mode from the testcases page -- which is where the differences between current header template & proposed header template will be hard to miss.
For those who are concerned over the possible impact replacement might have on specific works, you can test the replacement on your own by entering edit mode, substituting the header tag {{header
with {{header/sandbox
and then previewing the work with the change in place. Saving the page with the change in place should not be needed but if you opt to save the page instead of just previewing it, please remember to revert the change soon after your done inspecting the results.
Your questions or comments are welcomed. At the same time I personally urge participants to support this proposed change. -- George Orwell III (talk) 02:04, 13 January 2015 (UTC)
Updated scripts
[edit]Hi Phe. I edited your Running header.js to update to the latest version of TemplateScript. You were using a much older version called regex menu framework, so the main difference you'll see is improved compatibility and cleaner custom scripts. I also updated the code to reflect the latest changes and conventions (like using ajax instead of <script>
to fetch data from the API), and made the script self-contained so users don't need to enable anything separately. I also updated the scripts to a lesser extent in your common.js, Works about.js, and Interwiki.js. Let me know if anything breaks. :) —Pathoschild 23:06, 16 August 2015 (UTC)
phe-bot
[edit]Please update our data at bn.wikisource.org. Jayantanth (talk) 04:31, 20 September 2015 (UTC)
- My bad, I completely forget about that, the bot no longer saved these pages on bn waiting for a bug to be fixed in pywikibot then I forget to enable it again... It's up now. If you have any trouble it'll better if you ping me on fr.ws: fr:User talk:Phe I'm more active on fr than on en. — Phe 12:18, 21 September 2015 (UTC)
A Brief History of Modern Philosophy/Index index magic
[edit]Hi Phe. When you have a spare moment, would you mind doing your index-link magic to the pages A Brief History of Modern Philosophy/Index. Thanks. — billinghurst sDrewth 10:26, 20 March 2016 (UTC)
- @Billinghurst: Done, but as explained in a comment in the pagelist of this index, all page number are wrong in the index, so they need to be adjusted or my change reverted. — Phe 23:39, 10 April 2016 (UTC)
Saut de chapitre
[edit]Bonsoir Phe,
Auras-tu le temps de regarder cette page : comment est-ce que je dois la renommer pour qu'elle fonctionne ? Est-ce la longueur du titre, les points de suspension, ou une autre caractéristique, qui dérangent le navigateur ? Les autres chapitres du livre, en effet, se transcluent sans problème. Merci de ton aide ! --Zyephyrus (talk) 23:54, 12 December 2016 (UTC)
- En pointant vers Two Little Pilgrims' Progress/13 le navigateur redevient normal. J'ai ensuite redirigé la page qui bloquait. De cette façon on peut conserver le titre complet de la page au lieu d'un numéro de chapitre. --Zyephyrus (talk) 01:11, 17 December 2016 (UTC)
OCR and pywikibot
[edit]Hi. I took the liberty of using your tool via pywikibot. See https://gerrit.wikimedia.org/r/#/c/360575/.
I hope you do not mind and I hope there are no issues with copyright or similar.
I also would appreciate if you would like take a look at the patch to see if I missed something or if you have any suggestions.
Bye— Mpaa (talk) 22:17, 20 June 2017 (UTC)
- @Billinghurst:, you might be interested, would be glad if someone would like to test it.— Mpaa (talk) 22:21, 20 June 2017 (UTC)
Hi Phe. If you have time and inclination, I am wondering whether you would be able to run your scripts over this page to link the index to the chapters. Thanks if you can. — billinghurst sDrewth 13:50, 19 March 2018 (UTC)
administrator rights removed
[edit]Hi Phe,
Consensus at your annual confirmation was to remove your administrator rights due to long inactivity. I have actioned that now; you should lose the rights shortly.
Thanks for your contributions and your admin work. I hope you're enjoying whatever you're up to now, and hopefully we see you around here again in future.
Hesperian 02:03, 1 May 2019 (UTC)
Statistics for Assamese Wikisource
[edit]Hello Phe, Greeting from Assamese Wikisource. We are interested in updating our statistics related templates automatically. I see that you run a bot (Phe-bot) to update these {{ALL PAGES}}, {{ALL TEXTS}}, {{PR PERCENT}}, {{PR TEXTS}} template. Will it be possible to use your bot to achieve the same in Assamese Wikisource as well? Please let me know if need any more information. --SlowPhoton (talk) 09:42, 23 May 2019 (UTC)
- @SlowPhoton: Phe is not currently active on-wiki, but if this is still of interest to Assamese Wikisource I would be happy to try to help. Xover (talk) 08:19, 25 November 2021 (UTC)
Match and split
[edit]Hi.
match-and-split.py adds an extra </div> tag that is no longer needed after this. It causes a Special:LintErrors/stripped-tag lint error, which is then flagged and accumulated.
Could you please fix it? As far as I could see, it should by here in match_and_split.py:
else: header = u'<noinclude><pagequality level="1" user="Phe-bot" />\n\n\n</noinclude>' footer = u'<noinclude>\n<references/></div></noinclude>' content = header + content + footer
Also the "\n\n\n" are not needed any longer, even if they raise no error.
Thanks.Mpaa (talk) 13:18, 16 August 2019 (UTC)
Match & Split bot is down
[edit]Hi Phe, You may be aware, the match & split bot is down. I've recently found a really useful way to put it to use...will you be able to get it up and running any time in the near future? Thanks -- it's a great tool! -Pete (talk) 00:26, 16 December 2019 (UTC)
Need help to update
[edit]Dear @Phe:,
Kindly help us to update Template:ALL TEXTS on Punjabi Wikisource. It's pending for a long time. We need your help to automatically update this page.
- Satpal Dandiwal (talk) 05:41, 12 May 2020 (UTC)
- @Satpal Dandiwal: Phe has not been active on-wiki for a long time now, so it is unlikely they will be able to help any time soon. If you do not have any local users with the technical skills needed, you might try asking for assistance at the multilingual Wikisource. If your local Wikipedia has a technical community you might also have some luck asking for assistance there. --Xover (talk) 16:55, 12 May 2020 (UTC)
- Thanks @Xover:! I will follow your suggestions. - Satpal Dandiwal (talk) 06:28, 13 May 2020 (UTC)
Your bot
[edit]Don't take this a a complaint, it's most definitely not, just making you aware of something. When you imported Index:1920 - Engelsch-Nederlandsch Woordenboek DP.pdf, the bot producend very "linty" HTML (low priority, but it complains about bold and italics missing end tags every place there is a line break between them.)
I'm not proofreading it (not speaking Dutch, I'd probably add more errors than I fixed) but I have been going through doing this (which I don't mind doing, it's a 'while watching tv' task that's pretty trivial, just a lot of pages). I just thought I'd let you know, in case you are planning on importing anything else with so much inline formatting... don't know if a bot can really detect that to address it, but you might want to take a look. Either way, good work. Jarnsax (talk) 16:21, 23 August 2021 (UTC)
- @Jarnsax: Phe just operates the Match & Split bot. The actual import was done by Languageseeker (cf. the edit summary for the edits). Xover (talk) 08:04, 25 November 2021 (UTC)
Match & Split down
[edit]Hello, it seems that Split is down since the day before yesterday Match&Split.--Cunegonde1 (talk) 04:44, 25 November 2021 (UTC)
- @Cunegonde1: Your split job is in progress now. The bot crashed due to changes in the pywikibot framework for which it had not been updated (specifically, PWB has changed its definition and structure for named exceptions, so once one was triggered anywhere—in this case an edit conflict on enWS—the bot would crash as a result of trying to install a handler for a non-existent exception type).Phe isn't currently active, but I have access to the tool and am slowly updating it to work more reliably with the infrastructure changes that have happened over the last few years. Feel free to ping me if you run into trouble with any of the phetools. My turnaround time isn't particularly good, but I'm happy to help when I can. Xover (talk) 07:39, 25 November 2021 (UTC)
- @Xover Thank you very much for your help. Cunegonde1 (talk) 08:11, 25 November 2021 (UTC)
Match & Split down
[edit]@Xover: Hello "bot's tamer". I think that Match and Split is down. The queue is blogued to page Page:Chrysostome - Oeuvres complètes, trad Jeannin, Tome 8, 1865.djvu/346 since yesterday. My apologiesfor my poor english, and thank you for your attention.--Cunegonde1 (talk) 15:04, 30 April 2022 (UTC)
- @Cunegonde1: Thanks for the headsup. I've restarted the bot. Please retry your job. Xover (talk) 18:40, 30 April 2022 (UTC)
- @Xover Thank you very much. Cunegonde1 (talk) 04:31, 1 May 2022 (UTC)
- Hi @Xover the split seems down again. Can you do something? Thank you. CyrMatt (talk) 12:04, 1 May 2022 (UTC)
- @CyrMatt: Hmm. Ok, the bot is throwing an exception and crashing. I have a hunch what might be the cause, but this will take more effort than simply restarting it so I can't make any promises about when it'll be fixed. I'll give you a ping when I have any news. @Cunegonde1: FYI. Xover (talk) 13:45, 1 May 2022 (UTC)
- @CyrMatt, @Cunegonde1: Ok, I think I found the culprit (a long-deprecated parameter name in the Pywikibot framework that was finally removed in the latest release). Please retry your Match & Split jobs and let me know how it goes. Xover (talk) 07:19, 2 May 2022 (UTC)
- @Xover Thank you very much, I just try it with a little book with bi-columns pages here and it work perfectly. I hope it work also for book of @CyrMatt. Cunegonde1 (talk) 07:45, 2 May 2022 (UTC)
- It's OK for me! Thanks! CyrMatt (talk) 09:46, 4 May 2022 (UTC)
- @Xover Thank you very much, I just try it with a little book with bi-columns pages here and it work perfectly. I hope it work also for book of @CyrMatt. Cunegonde1 (talk) 07:45, 2 May 2022 (UTC)
- @CyrMatt, @Cunegonde1: Ok, I think I found the culprit (a long-deprecated parameter name in the Pywikibot framework that was finally removed in the latest release). Please retry your Match & Split jobs and let me know how it goes. Xover (talk) 07:19, 2 May 2022 (UTC)
- @CyrMatt: Hmm. Ok, the bot is throwing an exception and crashing. I have a hunch what might be the cause, but this will take more effort than simply restarting it so I can't make any promises about when it'll be fixed. I'll give you a ping when I have any news. @Cunegonde1: FYI. Xover (talk) 13:45, 1 May 2022 (UTC)
- Hi @Xover the split seems down again. Can you do something? Thank you. CyrMatt (talk) 12:04, 1 May 2022 (UTC)
- @Xover Thank you very much. Cunegonde1 (talk) 04:31, 1 May 2022 (UTC)
Match & Split not running
[edit]@Xover: Hi, Excuse me if I'm boring you and also for my poor english. I try to use M&S today and the response to command Match is "match_and_split robot is not running. Please try again later." Is it possible to run the bot please ? Thank you in advance.--Cunegonde1 (talk) 05:17, 2 June 2022 (UTC)
@Cunegonde1: I’m travelling at the moment, but I’ll try to take a look as soon as I can. There are some big infrastructure changes at Toolforge recently so this may be a bigger job to fix than simply restarting the bot. —Xover (talk) 11:43, 2 June 2022 (UTC)
- @Xover: Ok, and thank you very much for your response. I'm asking myself if this robot could be improved on 2 points : 1/ less fragility ; 2/ running on several instances for avoid congestion when bigs works are in queues. As Phe is not anymore active on wikimedia, can we ask this to developers ? Cunegonde1 (talk) 11:57, 2 June 2022 (UTC)
- @XoverThe bot was restarted by @Tpt. Bye Cunegonde1 (talk) 14:32, 3 June 2022 (UTC)
@Xover: Hello, the match and spit seems to be down again. I'm sorry I tried several time and my book is now 5 times in the queue... split queue —unsigned comment by CyrMatt (talk) 16:23, 28 June 2022 (UTC).
- @CyrMatt: I couldn't figure out what caused it to get stuck (too little time available), but I've restarted it so it should be working again. You and @Alex brollo (who also had split jobs in the queue) will have to resubmit them since the function for persisting the job queue over restarts is currently broken.PS. In order for notifications to work (and other things) it's important that you remember to sign your messages on talk pages. I recommend trying out the new "reply" tool (you'll find it in your preferences, probably in Beta features) since it automatically signs your posts (it's not perfect, but for most things it's a lot more convenient). Xover (talk) 10:46, 3 July 2022 (UTC)
- @Xover No matter for my pending splits. If needed I'll use an offline version of Match, slightly enhanced to run with pdf files too. --Alex brollo (talk) 14:45, 3 July 2022 (UTC)
Match & Split down
[edit]Hi @Xover Match & Split is down during split of few pages in a big book (900 pages) it crashes. I think the bot dislike big books. Thank you in advance for someone restart it, and thank's to excuse my poor english. Cunegonde1 (talk) 16:51, 26 July 2022 (UTC)
- @Cunegonde1: I've restarted it. Xover (talk) 17:37, 26 July 2022 (UTC)
- Thank you very much @Xover Cunegonde1 (talk) 04:21, 27 July 2022 (UTC)