Latest comment: 4 years ago2 comments2 people in discussion
I am proofreading and validating pages from The Great Gatsby scanned file (the images). Should the punctuation and spelling be altered, based off the scanned images? Windywendi (talk) 00:33, 2 January 2021 (UTC)
Latest comment: 4 years ago1 comment1 person in discussion
I downloaded this skin called DarkVector: User:PseudoSkull/vector.css. It's pretty cool and I implemented it to help me not get bad eyesight in 20 years of proofreading every day for Wikisource. It mostly works, but there are a few things it leaves the same, such as "User page" and "Discussion", "Read", "Edit", "History" etc. Also, the search bar is still white.
Latest comment: 4 years ago2 comments2 people in discussion
If you verify here (per usual process) that it was indeed you who joined the Discord server just now, I can make you an admin. PseudoSkull (talk) 21:26, 7 January 2021 (UTC)
Latest comment: 4 years ago6 comments2 people in discussion
Getting lots of hits on your markup filter. Would you please consider doing a Watchlist message to alert people so they are able to modify their editing behaviour, rather than just progress with edits unchanged. Noting that I got a throttle message from the system due to that filter. Thanks. — billinghurstsDrewth12:40, 10 January 2021 (UTC)
Abusefilter is telling me through Notifications said Abuse filter 41 you recently edited was throttled. so presumably too many edits were being checked and it was sucking resources (noting that I didn't check its stats at the time, probably should have). This is partly why I put some of the new top parts to lessen what it checks, and maybe I should exclude bots too. — billinghurstsDrewth21:49, 10 January 2021 (UTC)
Weird, I have never seen such a message. When does that message show itself?
First time that I have seen it, probably due my being trained (beaten!) WAY BACK to write tighter eliminating components at the beginning of a filter. To note that I have now excluded bots from the filter, and hazarding a guess that it will have been a bot editing somewhere that will have been the trigger; noting though there is no exact time nor greater precision on where the issue occurred. Apologies for not consulting about the term change from "bad" to "deprecated", just could see the confusion starting, and it is my experience that complaints will continue with connotative usage. Whitelist and blacklists will be renamed due to negative connotations of the colour.May wish to note anecdotal feedback in User talk:廣九直通車#Template:Center about other WSes, let alone other wikis. — billinghurstsDrewth01:44, 11 January 2021 (UTC)
Ah, I see. The TOC was originally designed in a different way, but then somebody started redesigning it using this template, but left it in the half of the way. (I personally prefer the previous version anyway). --Jan Kameníček (talk) 23:22, 10 January 2021 (UTC)
@Jan.Kamenicek: Sure! So a quick check produces a few (small) issues:
Put page breaks after images to prevent the following content starting halfway down the page. I know we had trouble before where there were widows before a title, but for an image, this is unlikely, and it's more likely you'll drag half of a title on the previous page. For example: Special:Diff/10831082
I think the smaller images in Acts I and II are OK as they are, since the size works well with surrounding text.
The {{block center}} should not be specified in px. On an e-reader, the display might be over 1000 pixels across (mine is 1072), but the font size is very large in term of pixels (a visually impaired user may have a screen only 10 or 15em across!), so you restrict things unnecessarily. Better is to specify the layout in terms of em. Browsers usually (unless the user configures it differently, e.g. for accessibility) generally have an em size around 16px, so your 420px here is about 28em. Which looks about the same in the browser, but looks better on the ereader (the e-reader screen is <30em, so it is effectively 100%).
Thanks very much, I will try to keep this advice in my mind for other works too. I can see that you have corrected the issues mentioned above, or is there anything left? --Jan Kameníček (talk) 14:29, 12 January 2021 (UTC)
This worked, but I'd feel happier with a templated solution moving forward, so that if the HTML/CSS required ever changed I don't have to change a vast number of Page: s . ShakespeareFan00 (talk) 15:43, 21 January 2021 (UTC)
@ShakespeareFan00: OK so there were some nasty left-overs from the previous {{FI}} implementation. Now it should be fine to use divs in there. The problem was it was trying to put a <div> (the {{center}}) in a <p>, and that's not cool. Now the caption is just a div. Not sure what that supposed to be a workaround for, but it looks long fixed (and the markup is way simpler now anyway). Inductiveload—talk/contribs16:42, 21 January 2021 (UTC)
Reasoning
Latest comment: 4 years ago4 comments2 people in discussion
@EncycloPetey: I was reviewing it for export. Firstly, align="center" is obsolete HTML and is currently (due to a bug, as it turns out) exporting incorrectly (but this needs to be fixed "eventually" anyway), and while I was there I changed it over to some templates which apply CSS that was previously omitted. For example, the cells were not correctly vertically aligned for small screens (should be left two columns top, right column bottom), and the last column can wrap (sometimes even in-between the 1 and 3 of 135!) if you don't set white-space:nowrap;.
Furthermore, using constructs such as {{gap}} at the end of a cell line to attempt to force a padding is also misguided, as it will not work when the line wraps (and the longest line always wraps first). The better thing to do is set a size—using text relative units like "em", never "px"—that cap the width at a certain point that gives a suitable gap (in this case, 25em looks "OK" to me, perhaps it may feel it a little wide?) and allow the built-in {{TOC begin}} style to apply a default max-width:100% to prevent overspill on narrow screens, while still allowing to close the gap and not waste space if needed (remembering that a mobile or vision-impaired device may only have 10–15em of width in total).
The dots are just arrant frippery of course, but since I've fixed (most of?) the templates to not export the dots, they aren't causing the exporting havoc that they used to (though the markup is still "not ideal", it's not actively harmful anymore). If you're against them, s/2out/2/g will sort it out for you. Inductiveload—talk/contribs00:51, 22 January 2021 (UTC)
Thanks for the explanation. I may ask more questions later, since I hope (when I may have time in a few months) to go back through a lot of my older contributions, and Featured Texts, to ensure they are fully formatted for downloads. --EncycloPetey (talk) 19:21, 23 January 2021 (UTC)
I've written some guidance at Help:Preparing for export with some dos, don'ts and known issues. It's not yet complete (notably for TOCs), because I've been trying to address some of the issues at source rather than advising workarounds (for example, dotted TOCs are much less of an issue than they used to be). Please let me know if 1) I break anything somehow, or 2) Help:Preparing for export is too vague on something or 3) you would like an opinion on something (there's also {{export to check}} you can use to ask for a once-over). Inductiveload—talk/contribs20:38, 23 January 2021 (UTC)
Context is King
Latest comment: 4 years ago1 comment1 person in discussion
T272704's task description needs a few "when exporting" and similar phrases sprinkled into it. The people working on ws-export will infer the context, but everyone else will be left scratching their heads. :) --Xover (talk) 12:53, 22 January 2021 (UTC)
NopInserter
Latest comment: 4 years ago2 comments2 people in discussion
I think you can just add <noinclude>*</noinclude> to the first line to fake a new list item. You don't get the "mid" indent, but I suppose it might be possible to fix in CSS with {{plainlist/m}} if it's critical {{plainlist/m}} can be used instead of {{plainlist/s}} to suppress the hanging indent on the first page (it won't make any difference in mainspace). Also note that apparently leaving blank line splits the list up into 'n' lists of a single item each. @PseudoSkull: has been doing something similar recently, BTW. Inductiveload—talk/contribs22:11, 28 January 2021 (UTC)
{{plainlist/m}} doesn't work, but don't fret it. I usually move the dangling paragraph end to the previous page, but here I am facing with "dangling" paragraphs 2-3 pages long. I don't think that merging them in one page is the right think to do.
Many thanks. Everything is working well. I checked the results in the main namespace. There is one exception to {{plainlist/m}}. It cannot be used for a paragraph which is joined by {{hws}} & {{hwe}}. This breaks the paragraph in the main namespace, so I use {{plainlist/s}}. Also, I do not enclose the list item (*). It does not affect the main namespace display.— Ineuw (talk) 13:54, 29 January 2021 (UTC)
Thanks again. I tested hyphenations as well without the template, and it works. I noticed that when using {{plainlist/m}}, it no longer indents the page namespace but it displays correctly in the main namespace.— Ineuw (talk) 18:42, 31 January 2021 (UTC)
And why do you think that I put it in front of someone competent, and not had a go at it myself. I have to go back and look at a whole lot of stuff I have done over so many years. Think that there is some clear POTM stuff that we did in the early-mid 2010s where we looked at works with image and we set them all with 500/600px widths. <shrug> — billinghurstsDrewth22:33, 10 February 2021 (UTC)
500/600px can be OK, but they might poke out of a Layout 2. There is CSS wrangling with max-width:100%; done to avoid really bad things happening on export (and in the mobile view). Things are very slowly beginning to work by default. :-) Inductiveload—talk/contribs22:40, 10 February 2021 (UTC)
adding enWS page link to the query
Latest comment: 4 years ago2 comments2 people in discussion
Latest comment: 4 years ago2 comments2 people in discussion
!important should be reserved for user stylesheets. Do we really need (need) it in site styles? It makes it effectively impossible for a user to override, both in on-wiki user styles and in UA styles. --Xover (talk) 14:09, 10 February 2021 (UTC)
Latest comment: 4 years ago4 comments2 people in discussion
Thanks for the welcome message. I have two early questions. The first is a simple one, on the page headers. The book that I have started transcribing has a header text that alternates format between the odd and even-numbered pages, so
The Voyage of Italy. Part I.
and
Part I. The Voyage of Italy.
Can I use the first form throughout, or should I alternate them as the original does?
Second, much more interesting, is on footnotes. I've read the guidance, but I'm not sure if the footnotes are only to duplicate footnote texts that are already in the source text, or whether they can include new explanatory notes by the transcriber. In researching some of the words/ places/ persons mentioned, I would be able to add a footnote to give the current name, or a clarification, or to note the correct spelling to an old form or an original typo. Eg: the text mentions "goistre" (for "goitre") and a "Monsieur Esselin" (actually fr:Louis Hesselin, intendant des plaisirs du roi, 1600-1662). By adding the correct/ modern text in the footnote, at least makes the text searchable. Is this something that can be done? Scarabocchio (talk) 15:05, 11 February 2021 (UTC)
@Scarabocchio: re the running headers, we do put them the right way around. There is a gadget to help you: Help:Gadget-RunningHeader, or there are template that auto-flip the side based on the page number. In either case, you need to take care of the section name change. @PseudoSkull: has a bot for this, perhaps they can help (then you can just leave them out and bot them in later - honestly, doing it manually is a bit of a waste of human life IMO).
Re the footnotes: these are considered WS:Annotations, and generally speaking, we don't put them in. However, it is allowed to link to authors and works at Wikisource, so the names are easy, and a very small number of links to Wiktionary for really odd words (like "goistre") are also fine, but often the word isn't at Wiktionary. Again, PseudoSkull can help you there, he's a Wikitionary person too. We do have {{User annotation}} which you use to mark your own, more detailed, annotations in footnotes. I don't find the concept objectionable at all (it's decent value-add to me), but it would be worth asking for clarification on the WS:Scriptorium, since that's probably not a universal opinion. Inductiveload—talk/contribs15:29, 11 February 2021 (UTC)
@InductiveLoad: Many thanks. On the running header, the text requires a three part header: <pageno>+"Voyage of Italy"+"Part I", or the alternate "Part I"+"Voyage of Italy"+<pageno>. The examples at Help:Gadget-RunningHeader show it working with a two-part header. Can it work with three parts?
The guidance in WS:Annotations implies that an original, unannotated version should exist before any annotations are added. I'll carry on transcribing just the base text for a while, keeping my own annotations separate, to confirm that I am going to carry on working here. Scarabocchio (talk) 16:22, 11 February 2021 (UTC)
* {{lsp||Text to use default spaci|ng}}
* {{lsp|1=|2=Text to use default spaci|3=ng}}
* {{lsp|0.15em|Text to use default spaci|ng}}
Text to use default spacing
Text to use default spacing
Text to use default spacing
Looks the same to me. Parameter 1 doesn't care about param 3 being present or not? Smart logic to allow {{lsp|Text to use default spaci|ng}} and notice that parameter 1 is not a valid CSS size is technically possible, but probably more confusing that just allowing a blank parameter.
My plan is to work to remove the custom spacing from {{sp}} and short-cut that to be {{lsp||{{{1}}}|{{{2|}}}}}, so it can be used for the default case and get the "un-spaced tail" capability. Inductiveload—talk/contribs22:21, 13 February 2021 (UTC)
Aside, There's not an easy mechanism to trap for px vs em based values is there? I seem to recall you stating elsewhere that px values in templates were deprecated in favour of em based ones which scale better for mobile devices. ShakespeareFan00 (talk) 22:31, 13 February 2021 (UTC)
@Jan.Kamenicek: It was already right-aligned in the Site CSS (look for .gen_header_forelink). The only place it was not right aligned was on mobile (which didn't matter much because previously it was crushed into 20% width, which is very narrow on a mobile screen). Inductiveload—talk/contribs15:34, 17 February 2021 (UTC)
Maybe it is cause by something different, but up to now the headers of author pages were centered, while today the names of authors and their dates appear much more to the right, see e. g. František Lützow or any other author page. --Jan Kameníček (talk) 16:01, 17 February 2021 (UTC)
@Jan.Kamenicek: oh, I see. That actually was a different thing, not the alignment. I've gone back to hardcoding the widths on wide screens to 20:60:20, and only do "free width" on small screens after wrapping. Thanks for the heads up :-) Inductiveload—talk/contribs16:11, 17 February 2021 (UTC)
Footer (slightly) borked
Latest comment: 4 years ago8 comments2 people in discussion
I'm seeing the footer missing the back/forward arrows, and its sizing is a hair off (too wide on the right). It probably needs adjusting after your changes to the markup of the {{header}}. Doesn't look like it's broken enough that most people will notice, so you can probably just stash it on the todo until your header modifications are stable.
@Xover: I've added the arrows back in, they're no longer in the #headerprevious/next element so they weren't being picked up.
I don't think the width has been affected by what I did. The header uses a fixed 20:40:20 width ratio (it always did, the new changes should only kick in on small screens). The footer uses a float method, much like {{rh}}, which means that the central "cell" might well not be centred if the two sides aren't the same width.
I might try a flexbox approach for the footer too at some point.
Hmm. Looking closer I see the various horizontal boxes all have "ragged" right edges. Probably worth looking into. Eventually. --Xover (talk) 18:41, 17 February 2021 (UTC)
Not sure. Never noticed before, but I'm inclined to think that's just because I haven't really looked that close until the recent changes. I may be being fooled by the sister project links in the notes field (I'll have to throw up some grid lines to be sure). The footer is off by half an em or something relative to the categories box. And eyeballing it on one page the header looked like it was also off by a similar amount, but now you ask I'm no longer sure. --Xover (talk) 19:11, 17 February 2021 (UTC)
Hmm, I have noticed that before, but never investigated. Looks like a stray 100% width in Mediawiki:Gadget-Site.css for .footertemplate. I think this'll fix it.
The header looks in-line to me, but the plain sister is inset a bit within the notes field (because the plain sister has 0.5ex of margin on all sides). The shortcuts box doesn't, so I think just removing the right-side margin will line things up better. Inductiveload—talk/contribs19:21, 17 February 2021 (UTC)
Latest comment: 4 years ago7 comments2 people in discussion
Hi. Added mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/index_preview.js&action=raw&ctype=text/javascript');. I see the thumbnail of a single page but the said controls do not show anywhere in the frame.— Ineuw (talk) 02:40, 19 February 2021 (UTC)
@Ineuw: For the moment, the controls only appear when the page does not already exist. Adding them requires some more UI logic to ask the user if they're sure and if they want to replace/append to existing content. Inductiveload—talk/contribs09:44, 19 February 2021 (UTC)
I hope you don't mind my testing the script and uploading screenshots. I clicked twice on the same page, with a different control. Could not escape from my error and the tab froze, but not the browser (Firefox). — Ineuw (talk) 17:02, 19 February 2021 (UTC)
Here the section tag is used for the Issue and for the Article name. This works, but leads to a long section name if the line-feed is removed to resolve a lint error.
Latest comment: 4 years ago3 comments2 people in discussion
cf. this comment. If we have to nerf taint checking in {{header}} because {{versions}} does something funky, my immediate thought is that we need to split the code further to keep the funkyness from complicating the code for {{header}}. Does it look like there's potential for separating presentation from parameter handling from metadata from structure, or some other sensible ontology? I mean, even if we keep all the special-casing and such it'll be better by the mere fact of being in Lua, but it'd be nice excise these warts when we have the opportunity. :)
@Xover: my thinking is to port {{header}}'s logic to Lua essentially as-is first (maybe with minor tweaks), then worry about tidying up once I have it as code that can be reasoned about.
Since {{versions}} calls {{header}} and manually inserts formatting like {{Font-weight-normal|Versions of}}<br />''{{{title|{{PAGENAME}}}}}'' into the header field, there's not much we can do right now other than just exclude them. Once the header module is looking tidier, we might consider another parameter to allow {{versions}} to do this in a saner way. We might consider that {{versions}} shouldn't call {{header}} itself, but everything calls {{header/base}}, a separate invocation of Module:Header or something else. Exactly how it shakes out is a task after moving the logic from the "title cell" of the header, which is my primary goal right now (so we can fix junk like double spacing and missing commas).
Re separation of structure, that's what {{header/main block}} is aiming for (rather than cramming it all into Module:Header and so it can be harmonised with other headers).
Yeah, getting into a state that is amenable to… well, anything, really… is the critical thing (you have no idea how grateful I am that I don't have to touch that mess!). The comment just tripped a red flag for me so I wanted to give you a poke while you have your hands deep in its guts.enWP have developed quite a bit of Module infrastructure that makes Lua a lot less primitive to work with. Arguments, Yesno, No globals, and utilities for working with categories, etc. And a lot of the modules (unlike the templates) are not too tied to enwp-specific logic. They're not really built to be easy to fork and keep in sync, but a surprisingly large portion can be used as-is. --Xover (talk) 14:20, 19 February 2021 (UTC)
Making hidden layouts visible...
Latest comment: 4 years ago4 comments2 people in discussion
Check out what I added to my common.css, That with your highlihter styles, is starting to look powerful..
What would be nice is a way to add some javascript to 'toggle' this on or off.
Known issues -
Currently I have no easy mechanism for limiting it to 'Content' pages, Talk pages look very weird when viewed with the current ruleset.
I should really set a rule for DL usage on talk page, as that usage is so widespread as to be de-facto usage.
No detection for tables currently.
Symbol set used probably needs to look more like something LibreOffice uses to show non-printing characters.
Latest comment: 4 years ago3 comments2 people in discussion
For the project Septuagint (Brenton 1879), is it OK to leave it in its present form, which is a little easier to navigate, and to import data from elsewhere?
@Bobdole2021: it really should be transcribed against the original so it can be proofread side by side with the book. However, "continuation" table rows is a bit of an issue that I haven't thought of a tidy solution for, and you will need that. There are work-arounds, but they are rather messy.
In general, we are trying to reduce the amount of non-scan backed works wherever possible. They don't need to be retranscribed, just have the existing content moved to the Page namespace.
OK that's fine. I'll both setup things up in the mainspace from existing data, and work on figuring out how to put it into page-by-page setup so I can double-check with the original. Sounds good. Bobdole2021 (talk) 22:43, 24 February 2021 (UTC)
Quantum categories
Latest comment: 4 years ago5 comments2 people in discussion
Just now, while the file is deleted at Commons (undel request opened), Index:Login USENIX Newsletter feb1983.djvu shows that it is in Category:Pages with missing files. Naturally enough. However, looking at the category page the index doesn't actually show up there. I'm flummoxed. You got any ideas?
The only things I can think of are either PRP is doing something non-standard, or MW has a configuration that excludes certain namespaces from showing up in a category. But neither one seems obviously plausible in light of the fact the the category does show on the Index: page. --Xover (talk) 10:38, 24 February 2021 (UTC)
No, what you're seeing is presumably because the file was undeleted at Commons. I purged both the Index: and the Category: and still the category was listed on the Index: page but the page was not listed in the Category:. I guess I'll have to set up some test pages and see if it's reproducible. --Xover (talk) 21:49, 25 February 2021 (UTC)
I did a hard purge (with the purge gadget) yesterday and it seemed to work. I was on my phone and forgot to finish the message to tell you because I got distracted by real life annoyingly happening. I often see categories not filling, generally it resolves in the end, but I've never timed it. Inductiveload—talk/contribs22:33, 25 February 2021 (UTC)
If one set of purges didn't loosen it I kinda doubt the next ones did. But some indirect categories (transclusion, MW set cats, etc.) are updated on a periodic basis, either literally or figuratively by a cron job, so that is certainly one possibility. I'm just not quite buying it since the file was deleted last November and it still hadn't updated yesterday. It could be purge + time, of course, rather than just time, but that's not an effect I've noticed in MW before. Incidentally, the file was restored just over two hours after I posted here so that's a pretty narrow window. --Xover (talk) 23:32, 25 February 2021 (UTC)
charles albert buck is full of mistakes/lubek's third chess castle
Latest comment: 4 years ago4 comments1 person in discussion
The following discussion is closed:
End of discussion. Have an nice life, but do it elsewhere. :-)
dude, you are reverting correct info, year of birth is incorrect!
also he foretold the third chess rotation: correct year of birth 1868; third chess castle:
so, instead of vandalizing, where to post that info, on what chess talk page if not on his? —unsigned comment by124.171.129.129 (talk) .
@124.171.129.129: None of this is evidence of his date of birth. This is also not a place for original research, or promotion of a chess move. If you have an original, published, public domain document related to the "third chess rotation" (whatever that means), feel free to present it. Otherwise, there is no place at Wikisource for this material. If you continue to spam this stuff, you will continue to be blocked. Inductiveload—talk/contribs17:49, 25 February 2021 (UTC)
go on findagrave for proof,i gave it and was deleted SO IM NOT GIVING IT AGAIN, type his name and you will see im correct, give me link where i can present third chess castle! —unsigned comment by 175.34.229.70 (talk) .
No, wikibooks does not care and lubek's third chess castle is played in some countries; also be careful of unsavory characters; they are reported on: wikipediasux /forum/viewtopic.php?f=10&t=1333&p=19413&sid=d276c4732d977b481e173f1a7b4258c6#p19413 (this circus is already live across www and you or anybody else across wmf wont be able to alter it) who will try to remove our conversation, if it does, dont allow others to play with your page, that will show your high level of low self-esteem: i will create a user account here and under my space i will write about the castle. What happened to public domain, it used to be up to january 1 1923, now it's 1926? MY STUFF IS NOT SPAM, BUT HIGHLY EDUCATIONAL MATERIAL AND THEN SOME... —unsigned comment by118.210.49.156 (talk) .
Public domain in the US is set at 95 years ago. Thus it moves forward one year, every year. 2021 - 95 = 1926.
Please do not add your "educational material" here. This is not a chess strategy forum. If you contributions are not good faith efforts at transcribing public domain texts, then, given your history, they will continue to be blocked on sight. You had your chance for the benefit of the doubt, and the fact you have used three IPs for this conversation hardly fills me with confidence. Inductiveload—talk/contribs18:53, 25 February 2021 (UTC)
no, you never gave me nor do you give anybody benefit of the doubt like rest of wikipedoians, my castle statement is here anyway from long ago you wont find it and i dont need to post it again and it is across other wikis and it goes to show how ignorant and stupid you are when it comes to ip: there are dynamic and static IPs, DUH!!! can you post full story to: www.scribd.com/document/337601613/Spinrad-Charles-Buck; also there is plenty of evidence his book was published on january 1: yet your pals erased that date, thus making wikisource articles inacurrate again, again and again: en.wikisource.org/w/index.php?title=Paul_Morphy:_His_Later_Life&diff=10604774&oldid=10414686
If I'm so ignorant and stupid that I think dynamic IPs moving from Sydney to Canberra to Perth randomly is suspicious, then I must be far too stupid help you. Sorry. Good luck bringing your chess move to the world, but it's not going to happen at Wikisource. Inductiveload—talk/contribs19:22, 25 February 2021 (UTC)
and my ips are australian and they change and they could be out of australia as professionaly defined and you just sait it right how stupid you are and all wikipedoians: whatismyipaddress.com/dynamic-static
Source tab missing in document
Latest comment: 3 years ago11 comments2 people in discussion
Hi Inductiveload, I've noticed that the source tab and the side page links that point to the source djvu pages is missing on the chapter pages for translation I've been working with (and which you so graciously helped me to get started with): Translation:Writings of Novalis The problem is with all the chapters, so I figure its in the Index Page Code or TOC. I'm wondering if I may have inadvertently disrupted the code in some way. Could you guide me on how to address this? Thanks you for your help! Wtfiv (talk) 06:22, 10 February 2021 (UTC)
Thanks Inductiveload! The JS sticking plaster works! It gets the reader to the index page, which is invaluable for verifying the source. When researching the problem I saw the 2013 discussion and actually looked at the example it provided, and saw it was broken. I had some memory at some point in the process of having the access to the source tab, and your note agrees, 2013 was a long time ago! So I figured it was fixed. It particularly helps readers with questions get to the original/translated pages. Thanks for taking care of that!
I have one more tentative request, realizing this may be a bigger issue than sticking plaster can handle: Would it be able to get back the page links on the left that point to the individual djvu page? If not, I appreciate what you already have done! Wtfiv (talk) 03:14, 11 February 2021 (UTC)
@Inductiveload: I saw that you have indeed been able to apply more plaster and bring the pages back to translations as well! Thank you...(I wonder how much work that was)? I may be sounding like the Fisherman's Wife here (or maybe you are still thinking it through), but would it be possible to shift the text to the right a couple of em's so that the page numeration doesn't overlay on the text? Regardless, I am still grateful that you have been able to address a problem that has been hanging out there since 2013! And I am also appreciative of how responsive you've been regarding this issue. Also, I suspect I'm not the only one to feel this way, but it's nice to know you are there to support and guide us through the esoteria of this scriptorium. I like its solitude, but it's good to know there is someone out there to make sure the arcane works remain accessible! Wtfiv (talk) 19:45, 13 February 2021 (UTC)
Glad you like it here. I'm trying to kick things into shape a bit with the JS, I'm glad it's working. Getting the indent might or might not be easy, I'll give it a poke, but it might not be an overnight fix. Or maybe it will be!
Esoterica is certainly one way to put it. I'm trying to whip some docs into shape, but there's a long way to go yet!
Remember, I'll always happy to give a hand if I'm around, feel free to ask. And pointing out places documentation doesn't make sense is a great help to streamlining the on-boarding process, so feel free to drop notes if anything doesn't make sense, or it if doe, but it's unclear at first, or can't be found easily. Inductiveload—talk/contribs20:05, 13 February 2021 (UTC)
@Inductiveload: I just wanted to say that your adding the page/djvu links has already been helpful in terms of my work with the translation. More than once, I start seeing a systematic pattern that requires me to go back and change a recurring word's translation though the documentation, and the page number really helps me to get to it quickly rather than having to click through each page in the index. And, in the unlikely chance somebody wants to verify a translation, the page makes that available to them as well. So for me at least, your repair of most of the 2013 damage to the translation pages is really appreciated!
Also, I can definitely see the work in the help pages that have been done. If you've been doing a lot of the contributions to it, I again thank you! Because of its more complex nature, learning to navigate and edit Wikisource is definitely a steeper learning curve than some of the other Wikis. At some point, I may switch gears a bit and reflect on the kind of help that would have made things easier. (Though I think your friendly encouraging response to my initial query, the creation of the initial dvju's and providing some models to guide me by setting the up the TOC, and index pages was probably the greatest help.) Thinking about your offer on the help pages, I think an entire scriptorium discussion on how to decrease the slope of the initial learning curve may be useful though it'd be useful to get members at the relative beginner and intermediate states involved, as their memories of navigating are still recent.
One final observation while I've been working here. As I delve deeper into Wikisource, however, I'm starting to realize the service it can provide to the larger community is not as great as it could be. Its already a great repository for classic and historically relevant proofread texts, and it is one of the best tools on the web for accessing these texts in various formats that actually work properly (e.g., mobi). But on the whole, much of what is seems to offer still seems hidden from view. Outside of the Wikisource universe, the works seem hard to find, and there is not nearly as many links to these works in Wikipedia as there seems there should be. But that too may be something for a scriptorium conversation another day. As you can already tell, I am definitely enjoying the place. Wtfiv (talk) 19:06, 15 February 2021 (UTC)
I am really glad you're enjoying yourself!
One of the major, major issues we have is organisation and discoverability, since "bit pile o' pages" is our default state. Portals are always suggested, but they're rarely set up into a really nice state and they often "peter out" after a couple of levels. If you have a special interest, working on a portal can be a nice thing (any tying it up to authors, works, scouting for works to add, etc). For example, Portal:German literature is a fairly bland and disordered list, and a more "exhibition style" page with a curated section for the Big Authors, a section for "The Classics (TM)", by subject, era, school (Romantic, Reformation,....) etc., would probably be more engaging. And it's allowed to have some description in the Portals, it doesn't have to be just lists. For example, I tried to add some background to Portal:History of China(and then got sidetracked).
Exploring how to work out the portals may be a great way to go! I took a look at yours and it does give me ideas. Thanks! Wtfiv (talk) 04:59, 17 February 2021 (UTC)
@Wtfiv: btw, I think I just fixed the issue with the overlapping page numbers. As far as I know (which is not very far), Translation space and mainspace should work the same with transclusions now. Inductiveload—talk/contribs15:01, 1 March 2021 (UTC)
It looks great! And, it is downright inspiring- time to do some more work! Not only are the pages functional for editing and verification as per your last fix, but now it looks really nice for public viewing too. Thank you so much! Wtfiv (talk) 16:16, 1 March 2021 (UTC)
I also note {{playscript}} which is table based. (which isn't necesarily ideal in a paged environment).
None of these are widely used outside of a few specfic works, and thus I am wondering if the efforts made already should be combined into a single Module, based on what you had already achieved with ppoem, to work around some defects in the POEM extension.
ShakespeareFan00 (talk) 11:02, 28 February 2021 (UTC)
(Aside: Generally I've noted that in print poetry, (and this may be a consideration for export) that printed poetry ( and hymnals) tend to move the end of a stanza that won't fully fit on an output page to the next page. I'm not sure how this could be done in CSS though.)
Alignment of translation with original
Latest comment: 3 years ago23 comments2 people in discussion
I've completed the transclusion from the original Italian text to English by populating the left-hand pages. I've started the proofreading process and want to ensure to tidy up any loose ends, e.g. tweak the translation if needed, correct the English, etc.
One bothersome aspect is to align the English translation as closely as possible with the Italian original. At the end of the page there is often the issue that the sentence structure of English and Italian may not align and so, there needs to be a decision what to include on the current page and what on the next page. Also, as in the Italian original, the last sentence on a page is often not finished and continues on the next page. Question, is it problematic for the completion of the translation product to its final phase if the sentences are split between the bottom of a page and the top of the next page? An example are pages 14 and 15, but the situation occurs on many consecutive pages.MvRwiki1944 (talk) 19:22, 25 January 2021 (UTC)
Your guidance will be appreciated.
@MvRwiki1944: good work! That's OK, it's just how translations are - you can't ensure the sentences split neatly across pages every time. As long as it's vaguely sensible, just choose whatever is easier (probably put the entire sentence/clause/etc on the page where most of it is in the original. Inductiveload—talk/contribs20:05, 25 January 2021 (UTC)
@MvRwiki1944: Wow, wonderful! The next step is to transclude it to mainspace and add it to {{new text}}s to strut its stuff. Validation will eventually happen when a suitably motivated Italian-English bilinguist comes along: it is no impediment to presenting the work in the mainspace. Inductiveload—talk/contribs07:26, 18 February 2021 (UTC)
@Inductiveload: In the transclusion to mainspace do we just manipulate the English translation or do we also include the Italian original text for the benefit of the validators? The latter in the form of the manuscript or a transclusion to modern orthography? And where do I find the workspace to start the transclusion process?MvRwiki1944 (talk) 20:48, 27 February 2021 (UTC)
@MvRwiki1944: We don't have to include the Italian. What we can do is to link both the enWS and itWS mainspace page from the same Wikidata item (d:Q3563423). Then a link to itWS will appear in the mainspace front page for enWS, and for enWS at itWS.
In theory, we can do parallel it/en text using inter-wiki transclusion, but normally you'd only do that after there's a "clean" English translation. In particular, parallel texts do not (currently) export well at all.
@Inductiveload: So, I've started the transclusiom with the Title page and the Contents (not in the original manuscript). How, do I move on to a new section, presumably with a fresh create without losing the first 2 pages? The next section is Imprimatur (pages 3-6).MvRwiki1944 (talk) 23:29, 28 February 2021 (UTC)
Content that does not appear in the original generally does not go in the Page namespace, even if there is a handy blank page to put it in. Rather, we put it in the main (or translation!) space and use something like {{AuxTOC}} to mark it as "added value". Inductiveload—talk/contribs23:40, 28 February 2021 (UTC)
@Inductiveload: Thanks for your assistance. Now, how do I move back and forth between various subsections? I can get to each them based on previous notification, but I don't see how I can move back to the title page and Contens from Imprimatur.MvRwiki1944 (talk) 00:05, 1 March 2021 (UTC)
@MvRwiki1944: generally, there's a link to the next and previous sections in the relevant header fields and a link to the title page in the title field on each page. All three should use relative linking. Inductiveload—talk/contribs00:11, 1 March 2021 (UTC)
@Inductiveload: Thanks for your help. Finished the transclusion of the document to mainspace, moving from one section to the next, starting with your upload of the Imprimatur. To navigate backwards in the document via links doesn't seem to work since the backwards links aren't there. The document in mainspace needs major cleanup to move on to the next stage. In order to transclude by section I moved material from adjacent pages in the proofread source document, so as not to have to deal with splitting pages. Any recommendations for cleanup are appreciated.MvRwiki1944 (talk) 02:06, 1 March 2021 (UTC)
@MvRwiki1944: I've moved them to our standard naming of Rank N, but I left the display names in the header as you had them.
@Inductiveload: I had moved all text to mainspace, but now all text after Prologue is gone in mainspace. Also, I still don't see functional backward links. How do I get back to the full text without having to recreate it?MvRwiki1944 (talk) 12:02, 1 March 2021 (UTC)
@Inductiveload: I can see the backlinks now. Next task is to clean up the text in mainspace, starting with the running-together of letters in the paragraphs, e.g. in Imprimatur the words 'consequence' and 'suffice'. Can this be edited out in mainspace, or do I need to go back to namespace?MvRwiki1944 (talk) 12:29, 1 March 2021 (UTC)
@Inductiveload Sent screenshot of Imprimatur, showing the words 'consequence' and 'suffice'. This running together of letters occurs all over the text. Sent email with PDF attachment to wiki@wikimedia.org, in reply to your most recent email to me.MvRwiki1944 (talk) 12:54, 1 March 2021 (UTC)
@MvRwiki1944: I don't think you can reply to email "pings". Maybe just upload the screenshot to imgur.com (or wherever) and post the link. Just the ID part of the URL will do if it sets off the spam filter. Inductiveload—talk/contribs13:07, 1 March 2021 (UTC)
@Inductiveload: Not familiar with imgur.com (or whatever) and link to what? The PDF is a screenshot of Imprimatur as it appears on my computer.MvRwiki1944 (talk) 13:18, 1 March 2021 (UTC)
@Inductiveload: Looks great now. Problem solved. Is the transclusion document equivalent to mainspace and is it the document from which the validators work by checking the namespace document they can access through clicking on the numbers in the left margin of the page. Has the text been reclassified as to be validated ? nd what should I do next? Also, what is the shortest pathway to the document on mainpage?(talk) 01:06, 2 March 2021 (UTC)MvRwiki1944 (talk) 17:03, 2 March 2021 (UTC)
I was currently using this:- Template:Plainlist/single.css via a template style to suppress the marker on a continued list item..
However, a very helpful user on the Wikimedia discord's Technical channel expressed concerns that this might not be an ideal way of doing this.
Input (very very!) welcome. Feel free to edit directly. Won't be offended if you think any part of it is idiotic (but might of course push back on it since everything I write is obviously the work of genius), nor if you think it's a waste of time (see previous parenthetical).
I'm thinking it can become a guideline-guideline eventually, and something we actively work (long term) on migrating existing stuff to.
And it's meant to be a pretty technical guideline. Target audience is me, you, and anybody else with their fingers in the technical guts. So fair game to talk about p-wrapping, margin collapsing, semantic-ish templates, etc. But sufficiently readable to the community at large that they can look and see that it is good (nodding sagely is adviced). --Xover (talk) 10:46, 2 March 2021 (UTC)
Blanking as 'no text'
Latest comment: 3 years ago2 comments2 people in discussion
@AnotherEditor144: As I mentioned in my edit summary, this is an ex libris sticker, and, since it is not part of the work in question, we do not reproduce it. We also don't reproduce barcode stickers, call number stickers, the card pocket in the back of some library books, library stamps, scribbling by the book owners/borrowers (unless that's a historically useful work in its own right, Fermat's famous marginalia comes to mind), or library or digitization watermarks or registration markers. Basically, if it's not part of the book when published, we don't usually reproduce it. Inductiveload—talk/contribs12:25, 2 March 2021 (UTC)
Thank you. I thought you might be interested in this.
Machinell translated
Latest comment: 3 years ago6 comments2 people in discussion
@Inductiveload: Hallo Inductiveload, we had discussion because of machine translation. You wrote to me that i have before inserted license template. I did that now but still it was not translated. I do not realize why it was not translated while uploading. Thanks from germany
@Riquix: Sorry, I don't follow. What's the problem here? The file at Commons has a license template. The problem before was that it did not have a license template. What translation were you expecting to happen? Inductiveload—talk/contribs09:26, 9 March 2021 (UTC)
Oh I see, this file does not have a text layer. It's also missing several pages. Digital Library of India really do produce some rubbish scans. I'll try to source a better scan from here, but it'll take some time to download and convert. Inductiveload—talk/contribs10:27, 9 March 2021 (UTC)
You're sending end users into Lua code, and completely changing the workflow for new texts. There are technical differences, a completely new and alien syntax, and there's no guarantee those who interact with this workflow are capable or comfortable with the new version. This can't just drop in without warning. I actually also have some technical concerns on my own account, but those are a secondary point. --Xover (talk) 19:33, 10 March 2021 (UTC)
Mainly that as a user-facing interface ("config file"), Lua datastructure syntax is fragile: forget a comma, or any one of a billion other obscure technical details, and you start throwing big red error messages and break the whole system (an error in one leaf breaks the whole tree). Compare the old version: each entry is an individual template, where a mistake is usually immediately obvious, and, crucially, cannot break other entries.Just for contrast, when I've previously toyed with ways to improve that (admittedly rather baroque) process, it's been in the direction of a default-Gadget-provided JS UI to manage the list and possibly JSON storage for the actual data (i.e. explicitly not user-editable). Not as an actual proposed "better" way to do it (I've "toyed with the idea" not actually thought about it), but just to make clear my frame of reference and thinking related to this. --Xover (talk) 19:50, 10 March 2021 (UTC)
@Xover: I thought of that, but MW Lua editor will not let allow you save a syntactically invalid table. The worst you can do is screw up the data itself so badly that it can't even be loaded, which takes some doing (it is possible, e.g. by referencing a global). But nearly all practical ways to do that end up with the same result as now: {{new texts/item}} chokes on it. For example, the most likely syntactically-valid thing a user will do is forget quotes:
{
title = George,
author = Algernon,
year = 1875
},
In which case it breaks "normally". You can't write title = George Chapman, that won't save, as will forgetting a comma. About the only thing I've managed to find that does actually cause a load failure and is allowed to actually be saved is forgetting a comma and quotes and not putting it in {} (and doesn't check it before saving):
foobar
{
....
I've also considered a tool like you say but I would bet half a bitcoin that it would become unmaintained, like PageNumbers.js or Match and Split, depending on whether its a local JS tool or Toolforge tool, the day that the inventor gets hit by a bus. Plus it'll be an even bigger PITA if dodgy data sneaks in if the tool doesn't also make it really obvious how to revert it.
Also, arguably, there's not much practical difference between one failed entry because the user missed { in the current system and they exploded the data table. The main page is still busted, probably with a red error in either case, and will be fixed or reverted ASAP by someone. If we were super-serious about avoiding that, we'd detect the error text (or blanking, or whatever) with a bot watching the page and have it insta-revert (or allow only edits to a canary page, and have it mirrored to the real page if it is suitably not on fire). But that applies right now too. Inductiveload—talk/contribs20:31, 10 March 2021 (UTC)
Oh, interesting. I didn't even consider that the editor might prevent this. Input validation FTW!But I still think the syntax is more complicated and fragile from a user perspective. The template version is far from great, but we do sort of have to presuppose users to be minimally comfortable with templates to contribute. A complex data structure in Lua is a step too far IMO. Even without the breaks everything vs. breaks this entry issue, the myriad ways this can "not work" feels fragile and obtuse to the user, and especially since just hitting the "Preview" button won't actually work.But to be clear: I'm not saying we can't go that way, no way no how not never. I'm saying it's a big enough change that we need to let the community decide if they're comfortable with that before implementing it. For my own part I'd much prefer to edit the Lua config to MW templates (did I mention I hate templates?), so it's no skin off my own back. Looking the same on the main page or ugly error message ditto aren't really main concerns for me: error messages are good so we discover and can fix issues, and the main page is way overdue for some tweaks. It's the user experience that concerns me. But it's entirely possible my concerns are overblown. --Xover (talk) 16:50, 14 March 2021 (UTC)
T277451
Latest comment: 3 years ago6 comments2 people in discussion
But that's just the pagenum span. While you were digging around, you didn't happen to find code to suppress the transclusion of the actual page content. Cf. eg. this sandbox. If the page content is transcluded, it's arguable that the pagenum should be too, for consistency. There may be legitimate uses for pulling such pages in through PRP (vs. just normal MW transclusion like you'd do for the ToC on an Index:), but in my experience these always end up as exclude=pp. in the <pages … /> tag. --Xover (talk) 08:29, 16 March 2021 (UTC)
@Xover: I know what you mean, but I was more about fixing the issue in the code as written, since that was 1) easy, 2) fixes our issues where the W/T number merks the one you want and 3) won't really change on-wiki behaviour (I have no idea if people are using W/T pages for weird and nefarious purposes on other Wikisourcen). To do what you suggest would be to stuff all the transclusion code into the if statement:
if($qualityLevel!==PageLevel::WITHOUT_TEXT){$pagenum=$pageNumber->getRawPageNumber($language);$formattedNum=$pageNumber->getFormattedPageNumber($language);$out.='<span>{{:MediaWiki:Proofreadpage_pagenum_template|page='.$text."|num=$pagenum|formatted=$formattedNum}}</span>";if($from_page!==null&&$page->equals($from_page)&&$fromsection!==null){$ts='';// Check if it is single page transclusionif($to_page!==null&&$page->equals($to_page)&&$tosection!==null){$ts=$tosection;}$out.='{{#lst:'.$text.'|'.$fromsection.'|'.$ts.'}}';}elseif($to_page!==null&&$page->equals($to_page)&&$tosection!==null){$out.='{{#lst:'.$text.'||'.$tosection.'}}';}elseif($onlysection!==null){$out.='{{#lst:'.$text.'|'.$onlysection.'}}';}else{$out.='{{:'.$text.'}}';}$out.=$placeholder;}
Yeah, that's why I wondered if there was existing code for it that was just broken like the pagenum stuff. A bug would presumably be easy to fix, but a change that affects on-wiki behaviour would at the very least require research, and worst case also community consultation (which I don't think any of the usual suspects have the capacity for these days). --Xover (talk) 08:51, 16 March 2021 (UTC)
AFAICT, there isn't broken code for it, it's "supposed" to be like that. But I would say that the logic of suppressing the page number and inter-page separator but not the content is flawed. It's just on wikis (all of them?) where those pages are usually completely empty, there's no visible difference, so no one has cared (or they have assumed it is intended and excluded them manually when needed.
I'll file an issue and patch (after a conflicting patch has gone in) and we'll see if anyone screams. If nothing else, there'll be a task+patch in the system for future reference. Inductiveload—talk/contribs08:56, 16 March 2021 (UTC)
Oh, that reminds me… We should have a cleaned up version of User:Xover/notext.js as a default gadget. I think maybe I saw you had something similar sitting somewhere? In any case, it should empty the text fields when "Without text" is chosen, and restore it when choosing something else (in case the user just misclicked). I'll get around to fixing it up eventually, but feel free to jump ahead if you want. --Xover (talk) 09:36, 16 March 2021 (UTC)
@Xover: I have the Index preview grid, developed for a driven-off contributor, which provides "set as empty" in the alt-click pop-up: User:Inductiveload/index_preview, but nothing like that in edit mode.
Messing with the textbox is frustrating with JS because it kills the undo buffer (at least it does for me in both Firefox and Chrome), so restoring the text after a JS intervention isn't trivial (this is incredibly annoying sometimes). So it'll need special handling other than just Ctrl-Z on the text-box. Inductiveload—talk/contribs10:09, 16 March 2021 (UTC)
Lua table length
Latest comment: 3 years ago4 comments2 people in discussion
Incidentally, since I saw a comment of yours somewhere… Lua table length can be counted with #iff its keys are monotonically increasing integers starting at 1. It's one of the quirks of Lua's design that makes absolutely no sense to anyone. --Xover (talk) 08:33, 18 March 2021 (UTC)
Latest comment: 3 years ago2 comments2 people in discussion
Found this situation when lint error hunting.. Given the large number of templates here I am wondering if what's needed is a module based engine where each pair is placed on it's own line?
ShakespeareFan00 (talk) 10:21, 18 March 2021 (UTC)
So it looks like   is not actually 1em wide, which makes trying to align other bits with :-indented lines inside {{ppoem}} rather a challenge. See example (the drop-initial makes everything complicated, and here it has to be manually shifted due to indented lines).
I would suggest mimicking {{gap}}'s approach to get a predictable width, but possibly using   instead of WORD JOINER inside the span (so the semantics match the context).
PS. I think possibly the drop-initial would have been easier here if I could have disabled the hanging indent on the first line, since both rely on text-indent/margin for their effect. Might be worth considering as magic ppoem-syntax at some point. --Xover (talk) 09:54, 20 March 2021 (UTC)
Re the dropinitials: these have been a "bit" of a pain. I think you actually generally want to keep the hanging indent even after a DI if possible, because you still need it to show the next line is a continuation of the first, not a new line (normally, printed matter still keeps the hanging indent for the first line). However, CSS seems awfully reluctant to allow that without prematurely wrapping the first line. Setting the first line's width to > 100% works but you need to know the extra to add, which you obviously do not know in general. Cogitation continues.
We might also want to consider a syntax to allow a ppoem-line to acquire a class (and, maybe, style), somewhat like tables do for cells. Probably not for common use, but would add flexibility. Inductiveload—talk/contribs11:32, 20 March 2021 (UTC)
Dropcaps are… yeah. Thanks to having to fake them with floated content we're always going to be in a world of pain there. I have wondered though, if we could fix the premature line wrapping by adding Yet Another fake span, z-indexed to the back and turned functionally invisible, but positioned such that it lets the browser calculate the actual width of the box. I think this behaviour might actually be a hole (not a bug, per se, but an unintended corner case) in the CSS box model. Possibly it's that white-space needs, in addition to normal and pre-wrap, a please don't ever wrap this line unless you really physically have to, and calculate its width accordingly. In any case, yeah, preferably preserve hanging indent, but I'd probably trade it for being able to achieve other things if forced to choose.I've also loosely wondered if—since you're already parsing wikicode here—we could let templates like {{di}} to signal their presence to obviate the need for manually adding constructs like {{di|A}} << HOY!. Anything from stripcodes to HTML classes could conceivably work there, and could be emitted either always or by explicit request along the lines of {{di|A|mode=ppoem}}.A class on lines is probably a good idea, but may want to wait for a clear need (per-work CSS may generate that) to avoid having too much syntax with too obscure use cases. Inline style is an emergency solution, so I'd urge a very clear need before implementing that. Could predefined styles cover 80% of the need reasonably? Is there an acceptable manual fallback for the remaining 20%? If so I would argue we could avoid inline styles altogether. --Xover (talk) 18:09, 20 March 2021 (UTC)
since you're already parsing wikicode here I'm not really, though. I could inspect the content for something like {{(di|dropinitial)\| but that's a rather uncomfortable coupling between the templates. I'd almost rather go for a DI syntax within ppoem.
we could avoid inline styles altogether it's fairly likely that per-work CSS will permit this, as long as you can apply classes to lines.
DI syntax: I wasn't really thinking about looking for the template invocation (regex parsers suck), although if anything would merit such a hack it'd be {{di}}. I was thinking more along the lines of making any template that needed it emit a control code (strip markers spring to mind, but I'm sure there are other ways that would work). But I haven't really looked at how you're doing the magic syntax here, so I don't know what'd be a non-yucky way to do it. The amount of magic syntax so far seems to sit pretty solidly in the Goldilocks zone (I'm amazed at the sheer number of pages I've been able to do with essentially only ppoem!), but adding too much more is, IMO, rapidly taking it into the "unwieldy monster" end of the scale. --Xover (talk) 15:02, 21 March 2021 (UTC)
@Xover: glad to hear about the magic syntax. I'm also cautious about taking it too far into "black magic" territory (also I don't want to end up badly re-writing a LALR parser or something as a module to parse a mini-language and/or ending up with {{TOCstyle}}).
Because this is a template, it only sees incoming {{di}}'s as, literal "{{di|...}}" and the module spits the Wikicode out, verbatim, as it comes. They expand later in the parser, not in the module. In theory, you could capture them and expand in the module with frame.expandTemplate but that would be a very last resort, IMO. Inductiveload—talk/contribs16:47, 21 March 2021 (UTC)
Crossing pages is the cross we bear
Latest comment: 3 years ago7 comments2 people in discussion
Any thoughts on handling lines that cross pages inside a ppoem run?
The case is actually a play where everything is spoken in verse, but stage directions show up as a long unwrapped "line" that can span across page boundaries. I can only come up with moving text between pages (which pains my obsessive streak) or splitting it out of the ppoem block (which has the same alignment problems old poem has). I could also abuse hws/hwe I suppose, but for about a short paragraph's worth of text that feels… icky.
Any clever solutions I haven't thought of? I don't think this can be fully solved without a new start/end model and making ppoem an extension (either part of or at least tightly integrated with PRP), can it? --Xover (talk) 11:12, 21 March 2021 (UTC)
@Xover: I think we can get away with just a new start/end "no-line-break" model and just omit the span-close and open on transclude. The pagenum span will occur half-way though the line, but that's fine, I think (or even preferable).
Oh, you're right and I'm just being dumb; I wasn't thinking across namespaces at all. Yeah, that would presumably work (unless the parser steps in and p-wraps it into oblivion of course). And I'm in no hurry for extensionification either; I'm by no means certain the benefits (which may only be the ability to do a /s+/e model) will outweigh the costs. My playing with it so far suggests it works astonishingly well. I'll try to cobble together something resembling structured feedback when I'm done playing. --Xover (talk) 14:54, 21 March 2021 (UTC)
The page joining indeed seems to work. In re the value, the keyword "continue" pops up in my head, but I'm not sure it makes any kind of sense. --Xover (talk) 18:06, 22 March 2021 (UTC)
My immediate guess is that nesting ppoems will generally break in similar ways without special params to handle it. But it's possible this particular formatting would be better done with either an embedded {{center block}}, or possibly explicit magic syntax for the stanza (e.g. "the following stanza is a centered block within the overall width of the poem block").
But on the plus side, this is the first case that seems to break in about a hundred pages of ppoem use (variability low to medium, so not a real torture test; but fairly representative). --Xover (talk) 09:06, 23 March 2021 (UTC)
Yeah. The following poem also has what looks like a stanza block-aligned to the right that I predict similar problems will accrue to. --Xover (talk) 09:17, 23 March 2021 (UTC)
@Xover: sorry, I didn't mean to break it while I was faffing! Anyway, I have something like stanza styles working now (plus a tidier module with...tests!).
Awesome! Page:War, the Liberator (1918).djvu/134 is probably a good in-the-wild example of the margins not quite lining up (well, either that or my eyes are going crosswise). No worries about the breakage: this is experimental use.Oh, hmm. Why is the example above wrapping now? --Xover (talk) 14:24, 23 March 2021 (UTC)
@Xover: where is it wrapping? Bombers and Grenade are both unwrapped on my screen.
In general, DIs may well cause a spurious wrap specifically if the lines next to them are within 4em of the longest in the whole poem, or if the DI itself is bigger than 4em (cf. last example in the docs). Inductiveload—talk/contribs14:50, 23 March 2021 (UTC)
@Xover: Ah, that's because A is not floated left like a DI would be. The lines are display:block (for the moment) to allow them to have a hanging indent each (without being paragraphs) Inductiveload—talk/contribs15:00, 23 March 2021 (UTC)
Oh, excellent! I just had occasion to test with a single em and it worked great. PS. Also ran across some possible pre-defined classes for poem (and possibly stanza) here. --Xover (talk) 19:14, 23 March 2021 (UTC)
@Xover: Hmm, how do you think this should work? Provide a handful of built-in class names via Template:Ppoem/styles.css (say one for text-align:center/right, smaller, and fine)? I think the only one that makes sense to be magic is text-align center/right, but then what's the syntax? Inductiveload—talk/contribs19:22, 23 March 2021 (UTC)
For this case I was thinking pre-defined classes in styles.css. I don't immediately see any need for magic syntax for this. Based on the use case I ran into I was thinking maybe the standard bunch of size classes, and left/center/right/justify, and a couple of line-height values to cover common cases. And I was thinking mostly in terms of stuff that could be cleanly implemented as classes (i.e. nothing like {{di}} here). But maybe I haven't thought it through enough? --Xover (talk) 19:37, 23 March 2021 (UTC)
The other initial has dropped!
Latest comment: 3 years ago4 comments2 people in discussion
Latest comment: 3 years ago7 comments2 people in discussion
Ppoem's docs probably needs a separate caution about block (div/p) based templates inside stanzas. And depending on how many of those crop up regularly in poems we may want to maintain a list of compatible alternatives. The one I ran into was {{
@Xover: Hmm, so I think there are a few options here, in no particular order (and with minimal thought given so far), and can be mixed and matched:
Provide wholesale alternative templates e.g. {{*** inline}} (but we have enough templates)
Give {{***}} and friends an inline parameter (but we don't really want to add more gotchas to ppoem)
Convert {{***}} and friends to <span style="display:block;"> (semantically this might even not be the worst idea ever - are they really divs (division/section) anyway?)
Adding a magic syntax to ppoem to make a line a div isn't an option because stanzas are (rightfully) p's, so you can't have divs or other p's in there. Inductiveload—talk/contribs09:38, 24 March 2021 (UTC)
I'm not sold on the argument that much of anything is actually properly a p, but be that as it may…I think implementation should be a bit mix and match, unless something points itself out as the One True Way. The key point there is to document the common gotchas, and anywhere we can reduce the need for docs (for reading them, not so much the writing of them) by making the relevant templates intuitive or "just work" the better.I don't generally think converting div to span with display:block has much point (it's a distinction without difference in html5; we mostly only notice due to p-wrapping and other parser messes). Better would be a move away from 1= templates and towards /s+/e templates for blocks, to make clear what the model is. For users it would also be more intuitive (less cognitive load) even with more templates, because the inline and block versions would (if done right, and in most-but-not-all cases) would appear to be one template and how to call it given by the context.And I'm generally on the wait and see train on this, until we see how many of these there are. The only reason I'm pointing a gun at {{
}} is that internally it uses {{loop}} to do its thing, and that one seriously needs to be killed with fire. Since it needs to be gutted anyway we might as well find some way to make it work well in ppoem too. --Xover (talk) 10:50, 24 March 2021 (UTC)
@Xover:I'm not sold on the argument that much of anything is actually properly a p How do you mean? Make stanzas a div? That would definitely work and allow block children. But a stanza certainly feels like a p (and as we all know, feels > reals), though since they have a class anyway it makes little practical difference.
I don't generally think converting div to span with display:block has much point indeed, it's almost sophistry, but, at least for {{***}} (and a few allied things), it's not really a div. In fact it's semantically possibly more like an HR element. Which might be possible with per-work CSS (it needs :after selectors and won't copy-paste), but won't be practical with templates.
some way to make it work well in ppoem too - some way to use span instead of div would be easy even as it is, but I haven't thought about it enough to say if that's a good idea.
p is a brainfart from a bunch of old farts back in the early nineties (yes, TimBL and DanC, I'm talking about you ;)), inherited by the work on GML back in… I actually don't remember when IBM started work on that, and can't be arsed to go look it up… but it was a model of what HTML would be used for that was very coloured by the fact that the web did not exist. It reflects the idea that HTML is primarily a way to mark up documents—and I mean double-spaced, typewritten, stacks of dead trees here—and needs to be able to express all the customary parts of such artefacts. It's from a conception of HTML as "a simpler SGML with active hyperlinks", rather than the foundation of the modern web (the majority of which much more resembles an application than a document). When div and span were introduced in HTML 4.01 (well, really in HTML 3.2+ iirc, but that's a different story) it reflected amassed experience that a paragraph tag is way too narrow and limiting for the kinds of things the web needs, and brings along too many assumptions from the stacks of dead trees. In that old conception of the web, p would actually be incorrect here, because a stanza should have its own tag. In the new conception of the web a stanza, like a paragraph (in most contexts), is just another kind of logical grouping of a page (i.e. a div) to which more specific semantics can be added with a microformat. On enWS the only real proper use for p would be marking up the paragraphs in the works we reproduce, but that would only make sense if we could control it directly (the parser's p-wrapping has way too many problems).In any case… I can't think of a single use case here where p would be appropriate and where it wouldn't cause far more problems than it was worth. We inherit some default styling in both UAs and MW for div too, but overall it has far fewer problems. Not that, you know, it's a pet peeve or anything… :) --Xover (talk) 12:02, 24 March 2021 (UTC)
@Xover: Well it sounds like you know this in more detail than I do! Perhaps I've overestimated the usefulness of p based on how pervasive it is. So should we make stanza a div (it's a trivial thing to do)? None of the CSS should be affected, at least. Inductiveload—talk/contribs12:09, 24 March 2021 (UTC)
Bah. I'm mostly just curmudgeoning, and mostly due to T253072. If it's trivial I'd argue lamely in favour of switching, otherwise I'm just venting. The main reason for switching is that div inherits fewer pre-defined behaviours and styles, but on the other hand it works just fine with p so switching might necessitate explicitly specifying some of that stuff. --Xover (talk) 14:13, 24 March 2021 (UTC)
Ppoem, the Liberator
Latest comment: 3 years ago6 comments2 people in discussion
Ok, I've now finished War, the Liberator, and Other Pieces. I did half of it using {{sbs}} and then switched to {{ppoem}}. And finally I went back and converted the {{sbs}} uses to {{ppoem}}, swapped out {{em}} and {{gap}} with : and ::, and other features and fixes you added as I was working.
The experience so far is that, modulo the issues you address along the way, ppoem works great, is rock solid, is intuitive to use (modulo learning curve, caveat custom syntax and model), solved all the needed problems for this work, and despite the oddball and kinda fragile start/stop hinting there were essentially no problems on transclusion (which is more than I can say for any of the other ways we deal with page-crossing!). Converting sbs to ppoem was straightforward (which I think means they actually share far more in philosophy than is immediately obvious), and resulted in immediate improvements: simpler (and less) markup, far fewer visual glitches, and just overall better in all ways. Compared to all other ways to format poems (and similar constructs) ppoem is much better in every way.
The drawbacks and annoyances are the custom syntax that is very different from everything else we use, and can reuse very little existing acquired knowledge; keeping track of what kind of start and end model to use is a bit taxing, and will be moderately difficult for many contributors; and the need to have special syntax (<<) after a dropcap is not at all intuitive and so resists establishing muscle memory (I kept forgetting, even after I figured out why it was needed).
I think the first issue probably argues that we should be completely uncompromising in keeping the custom syntax small and internally consistent (no special-casing, no solving every need that pops up using new magic). The second will need some noodling, but can probably be alleviated somewhat by using good keywords that people intuitively understand, possibly encouraging putting |end= at the actual end of the template call, and maybe even some fancy GUI tool to manage that aspect (or possibly just something EasyLST-ish). The last issue we've discussed elsewhere so I won't go into it here, but it's sufficiently annoying that it's worth a couple more cycles coming up with possible DWIM solutions.
All in all I'd say ppoem as it stands is already a massive success, and but for the need for much more testing on real world data and some caution around the custom syntax / model, I'd say it could with great benefit have been pushed to the community already. I'm going to go back over some of my existing works that use {{sbs}} and convert them to see if there are any more gotchas to be found, and because I'm already convinced that ppoem means sbs can no longer justify its existence (pre-wrap/poem formatting was its raison d'être; ppoem does that far better, and even though sbs has other functions they probably don't justify its existence alone). --Xover (talk) 11:27, 24 March 2021 (UTC)
Thank you for the vote of confidence. Except for the CSS finagling and the DIs, I'm fairly happy with the outcome so far, too. :-)
I'm still thinking about the DI stuff. I think there might be better ways to handle it, but I haven't gotten it just yet. Ideally, the "<<" could just die one day.
Re the start end model, the only thing I can really think of other than your suggestions is "moar magic" with another magic syntax on the first/last lines like "=> same-line" at the start and "stanza-break =>" at the end, but I'm far from convinced that that's actually a reduction in cognitive load. At least parameters are familiar and explicit (and can be picked out by existing tools like mwparserfromhell).
On a completely unrelated note… Does the 4em right margin need to be on the stanza (vs. the lines)? It seems to be the main culprit in making centered stuff fail to align sensibly (by pushing all the contained lines too far left relative to centered stuff outside the ppoem). The left margin for the hanging indent that the right margin is, I think, compensating for is attached to the lines, so my immediate assumption would be that the right margin should be too. --Xover (talk) 18:50, 24 March 2021 (UTC)
@Xover: this is indeed part of what is messing with the alignment. The right margin is actually there to make sure that the lines, which are 100% wide (of the stanza) don't lose their right-hand 4em on a small screen. Although they are 100% wide, they're also 4em to the right, so they "stick out" of small containers.
And why the merry Felicity is there a width:100% there anyway, I hear you ask? That's because if you don't force the line to be as wide as the stanza, a drop initial will definitely wrap its line (whereas with 100%, it will only wrap if the line is within 4em of the longest in the whole poem).
No, because you actually want the 4em to stop the wrapping with (smaller) drop initials. But I am not sure the cure isn't actually worse than the ailment here. Inductiveload—talk/contribs21:16, 24 March 2021 (UTC)
Dropped initial
Latest comment: 3 years ago6 comments2 people in discussion
The bot sets the alt parameter e. g. by alt=G instead of the usual alt=G, see e. g. here. Is it intended? If so, is there any advantage? Imo it makes it less comprehensible. --Jan Kameníček (talk) 18:20, 25 March 2021 (UTC)
Darn, I thought I'd caught all those. That was a trip-up in mwparserfromhell because the alt parameter was being set in the {{di}} template, not in the image call. {{di}} doesn't have an alt parameter at all (it's simply parameter 1), so setting the alt parameter there did nothing at all. Inductiveload—talk/contribs18:35, 25 March 2021 (UTC)
@PeterR2: This should not use a table at all. The (current) easiest way to deal with this is to use {{block center/s}} and <poem>:
Page 1
{{block center/s}}
<poem>
Line 3
Line 4
</poem>
----------------------↑ Body/footer ↓<
{{block center/e}}
Page 2:
{{block center/s}}
----------------------↑ Header/body ↓<
<poem>
Line 3
Line 4
</poem>
{{block center/e}}
Using tables for poems is bad for all sorts of reasons. There is work underway to improve poems, but for now, something like the above works fine. See H:POEM for more information.Inductiveload—talk/contribs10:37, 31 March 2021 (UTC)
Latest comment: 3 years ago10 comments2 people in discussion
Is this going to be a manual process and reliant on you? If so, that doesn't seem sustainable nor reliable. I would have thought that we could be getting user:Wikisource-bot to do that more reliably. I would have also thought that we could be doing something setting something to be a json page via Special:ChangeContentModel somewhere . I could be dreaming though if we could have something that will scrape an archived line in "new texts" poke into a json file and then remove the line. If we set up the schema, I am happy to go back and convert all the previous years into json pages. The one thing that we will need to allow is updated for disambiguation and moves. — billinghurstsDrewth00:17, 18 March 2021 (UTC)
While on that, I am guessing that module: ns is not the long term living habitat for the data. Plus if we are recording json data like this, I would love if we could allow to record WD item data number. I think that there is long term value, and we should be able to run queries and bots, and maybe help twittify stuff more readily. — billinghurstsDrewth00:22, 18 March 2021 (UTC)
@Billinghurst: the idea was to transition entirely to the module, since {{#invoke:New texts|new_texts|limit=7}} provides exactly what you need. Then, there's no archiving needed at all, except for at the end of a year, when you just copy the whole durn thing to the year's archive page. Template:New texts/testcases provides a comparison.
Re The one thing that we will need to allow is updated for disambiguation and moves. this is the same as currently - update the link target (i.e. the title value).
Re Wikidata, it's certainly possible to link to WD. However, due to the number of items, you will not be able to use it to construct pages like Wikisource:Works/2021 if they have more that about 400 entries (and we're on track to break that limit), because that's the limit on how many WD items a single page can load. So we can't fall back entirely onto a list of only Q-numbers, awesome though that would be. With PWB, at least, it's trivial to get the Q-number for a page given the title value, so you can work backwards. Inductiveload—talk/contribs10:26, 18 March 2021 (UTC)
Ignorant question. Does the tabular json have any benefit for us mw:Help:Tabular Data as I see it is an available content type. I don't trust users to edit json files, though I wonder at there ability to edit a table. — billinghurstsDrewth05:01, 13 April 2021 (UTC)
No idea. I was prodding people in irc #mediawiki though no one took the bait. I will prod some other avenues, though might be a "meh". — billinghurstsDrewth23:09, 13 April 2021 (UTC)
In relation to you periodicals issue, there are a number of library cataloguers at Wikidata, it is just a matter of finding one for the help. I saw someone in the past couple of weeks but for the life of me cannot remember where. You may find someone through Wikidata:Status updates or you could try emailing someone like Ruth and asking for help, or being ask for someone as I am sure that they are a tight community. Let me prod someone on twitter to see if they can point someone to answer the question. — billinghurstsDrewth23:31, 13 April 2021 (UTC)
Latest comment: 3 years ago2 comments2 people in discussion
January starts at 2., then February resets and starts at 3., March resets and starts at 4. Novel, though I am not sure that it is the objective. Can we get the count firstly start at 1 ,and then continue in the next month? Thanks. — billinghurstsDrewth13:03, 2 April 2021 (UTC)
Latest comment: 3 years ago6 comments2 people in discussion
Just tried your awesome new Refill Index Page link and I got a bit of an oddity. It says [[Author:H. G. Wells|Author:H. G. Wells]]. Shouldn't it be [[Author:H. G. Wells|H. G. Wells]].
Also the Volume is coming out as [[/14|14]] leading to Index:The Works of H G Wells Volume 14.pdf/Volume 14. Shouldn't it be [[%series title%/14|14]]. This is probably a good way of thinking about title vs series title. %title% should produce [[%title%]], Volume %volume% while %series title% should produce [[%series title%/%volume%|%series title% (Volume %volume%)]]
Finally, the edit text says
"You are editing in the Index namespace, see Editing help and Index pages
This page includes a form for entering details about a work. There is a gadget to auto-populate fields from the File: at Commons:."
Shouldn't it say
"You are editing in the Index namespace, see Editing help and Index pages
This page includes a form for entering details about a work. There is a gadget to auto-populate fields from the File:%link to file at commons% at Commons:." Languageseeker (talk) 13:03, 4 April 2021 (UTC)
@Languageseeker: For me, it fills the author as [[Author:H. G. Wells|]], which comes out correct (it's called the Pipe trick).
As I have said already, you should set the title at the Commons page, not the series. The series does not import to the Index page and you have not set any title at all at Commons. So it imports nothing. We do not always place works in a series as subpages of the series, because "series" can be quite a nebulous concept and works within a "series" may well be full-on works in their own right (as opposed to volumes of a single work). For example, The Garden of Eden is part of the New-Church Popular Series but it is a top-level book in its own right.
Index:The_Works_of_H_G_Wells_Volume_14.pdf Looks better, but I'm still getting the problem with the Author. Also, right now, we have two different links for the transcluded text: ''[[The works of H. G. Wells]]'' and [[The works of H. G. Wells/Volume 14|Volume 14]]. Shouldn't we just keep the second?
Latest comment: 3 years ago11 comments3 people in discussion
I've noticed that there are a lot of pages, such as The Atlantic Monthly where there are {{ext scan link}} suggesting that users might find it troublesome to manually import dozens of volumes even if they are able to find the links. Would it be possible to batch import them and set up the template for the volumes? Languageseeker (talk) 13:50, 4 April 2021 (UTC)
It's possible, but gathering all the requisite metadata and makes it a little more labour-intensive than you might think. At the least, you would need, for TAM, for each volume:
The date range: e.g. November 1857 – May 1858
Publication year (e.g. 1858)
IA or HT ID (ideally IA because then you don't need to mess with reconstructing a Hathi scan, which takes a long time)
None of it is hard, just a bit of a faff. I use a spreadsheet to generate commons files and Index pages. Then it's just a matter of battling the Commons uploader API which is having a bit of a sulk at the moment. Inductiveload—talk/contribs14:20, 4 April 2021 (UTC)
Perhaps, we can do something like {{ext scan link|url|desired name}} and pass that off to the IA upload tool? If the file appears on Common, then change to {{small scan link|desired name}}. Then we wouldn't have to worry about guessing the file name, creating the Index Page, or bad index pages. Then this tool would become a sort of auto-filler for the IA tool. It would also be useful in cases when the IA tool failed so that the uploads could be retried automatically. What do you think? Languageseeker (talk)
That would need support from the IA-Upload tool which is in various degrees of broken-down-itude at any given time. A way to prefill with a URL like "https://ia-upload.toolforge.org?id=XXXX&filename=YYYY&ext=pdf" would be handy, for sure, and I have wished for it before (but not hard enough to figure out how to install and hack on the tool....yet). Also with a little bit of care, you can do better than blindly upload from the IA using their terrible metadata, like, for example, putting the dates covered in the description. Inductiveload—talk/contribs14:41, 4 April 2021 (UTC)
It might be possible to just borrow code from the IA tool because lots of it wouldn't be needed. Basically, get the URL, build the link to the PDF, URl2Commons, import metadata. A new, simplier IA upload tool with no GUI and no user interaction. Languageseeker (talk) 14:53, 4 April 2021 (UTC)
@Languageseeker: It would be a nice trick, and probably not that hard. But it'll still suck up the IA's rubbish metadata and dump the files in a generic category, whereas I think you could ideally do a bit better than that. Not least, with a bit more care, you can create the Index page too, which you cannot do with Url2Commons.
I know that you're swamped and there is way too much to do. Would it be possible to set up a vote on feature requests? Maybe one month for proposals and then a ranked vote?
I know that metadata is not the greatest on IA, but it often requires manually editing. No tool will ever fix incorrect or absent metadata. However, trying to upload 95 volumes manually takes hours of just staring while tasks run in the background. It would take less time to batch upload and then fix metadata manually. Maybe, it would be possible to just borrow code from Fæbot? Languageseeker (talk) 15:06, 4 April 2021 (UTC)
IME, it's much quicker to set all the metadata in a big ugly spreadsheet and then upload it all in one go (staring at the terminal is strictly optional). Opening and editing 95 file pages and 95 index pages (or fiddling with a bot to make the changes retroactively) is not very exciting.
I plan to post my uploader script at some point (a point where it isn't full of API keys and imports of random modules from various places). "One day" it might morph into a full-on toolforge tool, but that needs UI and all sorts of bulletproofing.
Voting on feature requests is all very well, but you still need someone to do them. Comm Tech is already being nice to us with the exporter and OCR projects, and there are not many people "into" the JS/Tools side of things. Most of the tools have been rotting for years: half the existing gadgets don't work as promised. Inductiveload—talk/contribs15:20, 4 April 2021 (UTC)
Uhm. Uploading and adding an index for 95 volumes is one thing. What takes effort is verifying that all pages of all 95 volumes are present and legible. Just mirroring IA is something a bot can do at any time. --Xover (talk) 12:46, 5 April 2021 (UTC)
@Xover: indeed, which I why I'm not on a multi-thousand volume mass upload spree, because locating the good scans and filling in metadata is more work that actually pressing the button on a list of 95 IA IDs and using their awful metadata without even checking.
On the other hand, the IA (or HT) pagelists (e.g. https://pagelister.toolforge.org/, or a local equivalent) help a lot because if you see this:
then you know that volume is very likely complete, because the numbering is continuous right to the end (actually, you normally know that earlier: as you see a non-BW-Google scan). So that helps a bit. Inductiveload—talk/contribs13:30, 5 April 2021 (UTC)
I’m not advocating on going on a uploading spree, but making uploading faster and easier is something that can benefit users. On one hand, IA and Hathi Trust have many duplicates that we don’t need. Also, nobody needs to proofread a reprint that has no illustrations, author input, or scholarly value. We’re a curated collection not donations in a box. Curation takes time. However, every user has a limited amount of time. Do we want to spend their time doing something that a bot can do or on something that a bot cannot? A bot can upload 95 volumes, but it cannot verify the completeness of a work. With such a bot, I can go to IA, find the best scan, set up the links on the Author page, press save, and the then do something else. In a day, the files will be there and I can proceed to set up the index pages. Languageseeker (talk) 14:55, 5 April 2021 (UTC)
BTW, I’ve seen plenty of incomplete non-Google scans especially from the LOC. Hathi Trust has a feedback button that we can use to report missing pages. Languageseeker (talk) 14:57, 5 April 2021 (UTC)
Media matters
Latest comment: 3 years ago2 comments2 people in discussion
cf. this. It occurs to me that we may well still have files referenced using the Media: prefix. I run into them now and again, so the odds someone stuffed one into {{di}} somewhere is at least non-zero. --Xover (talk) 12:41, 5 April 2021 (UTC)
Latest comment: 3 years ago5 comments2 people in discussion
Hi. The {{FIS}} introduced paragraph breaks before and after the image. I used this template hundreds of time to offset images where the text is supposed to flow around without a break.
— Ineuw (talk) 13:58, 5 April 2021 (UTC)
This is a very belated thank you. I didn't forget, it's just that I was embarrassed about the number of stupid problems requesting help for around that time. Thanks again for all your help.— Ineuw (talk) 01:47, 11 April 2021 (UTC)
Script Request... Recent Activity warning...
Latest comment: 3 years ago2 comments2 people in discussion
Hi.
Would it be possible to have a script that adds as bar to the top of pages, that indicates a page has been edited recently, or that there is a 'frequency' of edits to related pages?
The thinking here is that as it can take some time to setup or edit certain pages (like Index pagelists), if you have other 'fast' editors, there is a high potential for edit conflicts, which for long-term contributors can become a frustration. A warning about recent activity can potentially avoid these.
I am not sure how plausible it is to have a mechanism during the loading of the Edit page, which warns about potential edit conflicts, before you even start editing a page though.. Not sure if the edit requests are tracked in an accessible way that would make somthing like a "X is already editing this page..." the way Discord has a "X is typing..." warning in near real-time..
Problem with Small Scan Link and multi volumes that do not start with 1
Latest comment: 3 years ago5 comments2 people in discussion
I'm having a problem with {{small scan link}} where if I don't set the first volume to 1 it throws Lua error: At least an Index page is required. For multi volume series, such as the Complete Works of John Ruskin at Author:John_Ruskin, there are cases where the first volume is not actually 1. Languageseeker (talk) 00:50, 7 April 2021 (UTC)
Latest comment: 3 years ago9 comments4 people in discussion
I noticed that when transcribing footnotes, there is currently no way to preserve the original reference. Instead, Wikisource creates a new numbering scheme for footnotes that has no basis in the original text. See, Page:The_New_Monthly_Magazine_-_Volume_101.djvu/74. Is there anyway to create a template to override these? Perhaps something like this
{{footnote|ref|character}}
For example,
To Godstowe's glade;{{footnote|<ref>See ''Reginald Dalton.'' Book iii. chap, v.</ref> and hallows all the scene<br />|*}}
which would transclude to
To Godstowe's glade* and hallows all the scene
Page 62, *: See Reginald Dalton. Book iii. chap, v.
It seems like a fairly controversial topic that comes up fairly regularly. User:EncycloPetey and User:billinghurst are against it while other keeps on requesting it. Their main concern appears to be that it makes footnotes lose their distinctiveness when transcluded. However, can we not preserve them when transcluding by including the page number? With the community fairly divided on this, would it not be better to put this up for a vote? Languageseeker (talk) 02:06, 7 April 2021 (UTC)
Because we are converting from footnotes to endnotes, and works we do are typically footnotes that restart each page, many do not scale, and have seen endnotes with over a 100 count sources. PLUS we have a house-style and numbers would appear to be the consensus (and it pre-dates me too). Otherwise, I don't see that it is anything of particular concern a citation is a citation, and a house-style is a house-style. — billinghurstsDrewth07:58, 7 April 2021 (UTC)
@Billinghurst: The issue is that it is hugely inaccurate and a fairly large modification of the source text. I can understand converting footnotes to endnotes, but renumbering footnotes makes them impossible to reference. If a book had Page 15, †, then it's incorrect to say that this is footnote 27. It seems that the consensus occurred many eons ago before Proofread Page existed and now we can easily format them correctly. Maybe it's time to revisit this if it's become technically possible and easy to do this. Languageseeker (talk) 12:47, 7 April 2021 (UTC)
If someone truly needs the number from the original source, they have access to the scan page by simply clicking on the page number in the margin. If the original source is using symbols, where the same symbol is used repeatedly for the first footnote on every page, then reproducing that will mean many, many footnotes all marked identically, which is useless in an electronic format. --EncycloPetey (talk) 13:06, 7 April 2021 (UTC)
(ec) The conversation doesn't particularly belong on a user talk page. But do tell me which of the 22 *, 13 †, 8 ‡ or the seven 22 1s, 13 2s, 8 3s you think should stay the same when they become endnotes? Tell me that you would like to see the split references remain split as they come from different pages. There is nothing wikt:inaccurate, let alone hugely inaccurate, so please stop the rhetorical flourishes; the refs are automatically generated and they accurately reflect the text and position of the reference that is in the work. Show me one _inaccuracy_ on a properly formatted and proofread page. — billinghurstsDrewth13:08, 7 April 2021 (UTC)
IA Scan Link Template
Latest comment: 3 years ago4 comments2 people in discussion
It turns out that there's a {{Internet Archive small link}}. Could it be improved so that if the file exists on Common, it acts as a small scan link and if not, it redirects to IA?
For example,
{{Internet Archive small link|newvoyageroundwo01damp}}
would check if {{Internet Archive link|newvoyageroundwo01damp}} exists on Commons, if so and filetype is PDF or DJVU, it would act as {{ssl|A new voyage round the world. - Describing particularly, the isthmus of America, several coasts and islands in the West Indies, the Isles of Cape Verd, the passage by Terra del Fuego (IA newvoyageroundwo01damp).pdf}}
if not, then
it would go to the current code
<span title="Copy of this work at the Internet Archive" style="font-size: 83%; white-space:nowrap;">([https://archive.org/details/{{{1}}} IA])</span>
This should help users avoid having to try to upload the file through the IA tool only to find out that the file already exists because someone already uploaded it which is a colossal waste of time. Languageseeker (talk) 13:03, 7 April 2021 (UTC)
@Languageseeker: AFAIK there is no way to get the filename of a PDF or Djvu with a given IA ID in the info via Lua, and that means you can't do what you want. At best you could write a gadget to flag up "stale" Commons links in JS (e.g. make them red or whatever) and suggest a replacement. There are only 60 transclusions of {{Commons link}} (it's not a very common template), so I think this is not a very exciting prospect. If you're going to go around tagging with {{Commons link}}, you might as well use {{ssl}} and set up the index while you are at it. Lua is limited to mw:Extension:Scribunto/Lua_reference_manual. Inductiveload—talk/contribs13:22, 7 April 2021 (UTC)
Pretty much, yep. You can do it in JS (presumably aping however the IA-Upload tool detects matching IDs), but not, AFAIK, in Lua (remember that gets pre-rendered by the server on save - if it does a search as part of that, how will the server know when to re-render if, say, you upload a matching file?) Inductiveload—talk/contribs14:06, 7 April 2021 (UTC)
"We're all mad here."
Latest comment: 3 years ago2 comments2 people in discussion
Ok, so one issue we will have to solve once and for all with ppoem is separator / elision lines.
Starting somewhere completely different, I've fallen into what seems to be an endless rabbit hole where we're using {{loop}}, {{
}}, {{…}}, {{separator}}, and probably a few more I've not had the misfortune to meet yet, plus manually spacing out asterii, dots, and middots with nonbreaking spaces.
So far I haven't really thought much about the solution; but I've concluded this is definitely a problem, and one that rises to "ppoem must deal well with" through the combination of this being difficult to solve inside ext:Poem and the relative high frequency of appearances of such lines in poems in general.
I'm currently mulling over ways to fix and merge (or replace) various subsets of the above templates for this purpose, but the though has also occurred to me that "it sure would be nice" to support it directly in ppoem. That way probably lies madness due to the number of knobs people can, and do, tweak with the existing templates, but I'm throwing it out there in case you see a potential for a clean solution where I can't. But however we approach it, I want to say the goal should be that once ppoem deploys generally there is One True Way™ to do such lines inside a ppoem, that is well documented, and all other ways should be avoided. --Xover (talk) 13:27, 7 April 2021 (UTC)
@Languageseeker: I don't think this really counts as "fixing" something. The metadata is now substantially worse. For a start, the title is wrong, there's no author and you've trashed the categorisation. That template wasn't used for no reason. The title of the work is The Works of Charles Dickens, the title of the volume is American Notes and Pictures from Italy. I think you need to be a bit more circumspect when storming into "fixing" "problems". In this case, filling in the Index page metadata manually would have likely been the easier way forward. Inductiveload—talk/contribs02:28, 9 April 2021 (UTC)
@Languageseeker: Marginally, but still not as good as it was. Why did this even need changing at all? Just so you could do a one-off import with a helper gadget at Wikisource? Now that's done, what's the point of leaving the page worse off than before you changed it? Inductiveload—talk/contribs06:49, 9 April 2021 (UTC)
I cleaned it up a bit more. My overall goal is to replace the needless non-standard template with a standard book template. It's not like the metadata was that great in the first place. Languageseeker (talk) 14:33, 9 April 2021 (UTC)
@Languageseeker: It was better than it is now, and it's not exactly hard to see where. The editor is still missing, you have mixed up the subtitle, the volume title and description, you have mixed up "location" and "city" and you have not replaced the category that the template added and removed the language tagging. Please either put it back how it was, or if you want to use the book template, make sure there is at least the same metadata there was before and in the right fields. Inductiveload—talk/contribs16:22, 9 April 2021 (UTC)
Um, the incorrect publisher on Vol 28 was not actually the result of my edit. I looked over your template on Commons Template:Works of Charles Dickens volume and I noticed some issues. The major one is that the documentation for the Book template states that Volume should be a number while you have a text field (Volume={{{volume}}}: {{{title}}}). So, I believe that it might require fixing. I’m also not sure why you made City={{{location|}}} instead of City={{{city|}}}. Seems a bit confusing and this is causing an issue when changing it to the book template. In my edit, I put “With Introduction, Notes and General Essay by Andrew Lang” into the Description field so that the subtitle could be the name of the work. I don’t see a volume title in Book. I’m happy to revert if you fix your template to follow Commons guidelines, make sure that it can work on importing, and apply it to all the volumes. Otherwise, it seems to me that reverting the changes would just return us to a broken
"Um", you have done the import already, so that's specious. Also there is zero exception that the Commons info needs to conform itself to whatever rudimentary heuristic the Wikisource helper script uses. If the script doesn't work, just deal with it and enter it manually. It only needs doing once anyway.
I don't know why city → location, but it's not that confusing if you just check the page before you save it. If you want to use Book, whatever, I have no issues with that, but breaking stuff to put pressure on to fix things you don't like is not constructive. Just compare what it was and what it is. There is still missing data. If you care so much about the volume title not having a dedicated field, you should raise the issue at commons:Template talk:Book (and get involved in fixing the issue, not just drive by with "um, this is not how I like it kthxbai") and not just abuse the description field (which is for "out of band" information like physical condition or whatever) and move on, leaving messed up stuff in your wake that someone else has to deal with. Thousands of books have a volume title there. If you want to go on a mission to add a volume title field and tidy them up, into more a more semantic field, fine, I'd even say that's a good idea. But until then, leave things strictly better than you found them, and I think we can all agree that how you intended to leave it is not better than how it was. Inductiveload—talk/contribs12:58, 10 April 2021 (UTC)
Let’s bury the hatchet on this one. I never meant to make things worse. I might have been a bit hasty and didn’t realize that was a custom template and not just a user error. I tried to fix this it, but it didn’t seem to work out. I respect you and what you do for this site too much to want to create bad blood or hurt feelings% Languageseeker (talk) 05:36, 12 April 2021 (UTC)
(or socially-distanced equivalent). I have raised a query on commons:Template talk:Book about adding a volume title, because you are right in that forcing the volume number and title into one field is not ideal.
I sorry if I came over as sharp. I have no intrinsic desire to keep a work-specific template over Book, BTW, I just would like to make sure that there is strictly no less metadata. Ideally, of course, all the bibliographic info (as opposed to file specific) should be pulled from some structured data "somewhere" (Wikidata/Commons SDC/something else?) but I have no idea what that should be, and I don't have enough clones of myself to really dig into it right now, let alone embark on some mission to improve all the literal millions of books at Commons. Inductiveload—talk/contribs07:26, 12 April 2021 (UTC)
Help with Splitting Images and Creating DJVU
Latest comment: 3 years ago5 comments2 people in discussion
I'm not very good with batch jobs. This guy uploaded 32 rare books without splitting the pages and it would be great to do them all [2], but I lack the technical skill. Languageseeker (talk) 05:25, 9 April 2021 (UTC)
@Languageseeker: It's not a batch job as such, ScanTailor is for processing scan images (e.g. splitting, etc). Give it a try before saying you can't do it.
I'll make 32 DjVus from 32 zips of images because I know that's quite hard and until I can get round to publishing the code and/or making a web-based tool others may not be able to do it easily. But doing 32 end-to-end extract/splits and making the DjVus and doing all the metadata for upload "on spec" is more than I have time for, sorry. Inductiveload—talk/contribs07:03, 9 April 2021 (UTC)
Latest comment: 3 years ago3 comments2 people in discussion
I'm trying to create {{IAu}} to simplify uploading to Commons from IA. It basically takes the three parameter that IA upload tool needs and constructs a url from them:
{{IAu|Internet Archive ID|Common File Name|pdf or djvu}}. However, I'm running into an issue where
{{IAu|jesuitrelations169jesugoog|59|pdf}} works, but
{{IAu|cu31924092218191|The Jesuit relations and allied documents (Volume 27)|pdf}} doesn't.
@Languageseeker: The problem is Parameter 2 has a space in it. This means the link is: [https://ia-upload.toolforge.org/commons/fill?iaId=cu31924092218191&commonsName=The Jesuit relations and allied documents (Volume 27)&format=pdf Upload ...], which turns into a link with the URL "https://ia-upload.toolforge.org/commons/fill?iaId=cu31924092218191&commonsName=The" and the text "Jesuit relations and allied documents (Volume 27)&format=pdf Upload ...".
If we can set the last or the right-most column what are your thoughts on a global class of align last column right? It is common enough to make global, and people set all the other formatting as needed around it through a work's css. — billinghurstsDrewth11:11, 14 April 2021 (UTC)
So global table classes is something I have thought about for a very long time. {{table class}} (not my work) attempts to deal with some of the common cases. The issue is a multi-way balancing act between simplicity of expression, simplicity of markup, simplicity of implementation/maintenance and robustness.
I have some concerns with how {{tc}} is implemented, e.g. because classes like _bt are basically just an indirect way to say {{ts|bt}}. CSS shines where we can leverage selectors like descendants and :nth-child. For example __grid is something that cannot be done without very verbose inline styling.
There's also a risk of de-semanticising (ironic use of a non-word intended!) things as well as providing a wide surface for fragility. For example, the following would render the same:
{| class="somework_somedata"
|... index CSS provides the grid and margin:auto rules here, which apply to the "_somedata" type of tables.
|}
However, the latter is a stylistic intent (basically used as a shorthand for hundreds of {{ts}} calls), while the former is a semantic statement of the form "this table is a somedata table" and the styling is a natural consequence of that, with the class identity→styling mapping performed by the index CSS on a work-local basis. Which is perhaps a slightly sophistic argument, and out-of-touch with how we currently do things (mostly out of technical necessity) but I think it's certainly one to bear in mind before we storm ahead and scatter-gun thousands of quasi-global classes into tables all over the place.
To take the Templars index as a more concrete example, the question is do we prefer having a complete semantic→styling mapping in the index CSS (as we do now, more or less) or write something like class="last-col-ar heading-larger-centered margin-auto" and compose the styling out of many pre-made blocks (and still if anything can't be an off-the-shelf class we'll need to use index CSS to fill in the gaps).
Until someone changes the current value to something greater than 1. And before you ask, I don't really think this should be automated, because even if the index is validated, it may not be transcluded properly. With great care you could make an attempt at using the index page info like {{index transcluded}}, but since I'm sure there'll be ways for that to fall over in hilariously unforeseen ways and go unnoticed, it's probably easier to just change current manually. Inductiveload—talk/contribs17:18, 15 April 2021 (UTC)
You know me too well. Seems my mind was confused. Hope good progress will be made on one of the greatest poetic works of the Harlem Renaissance. Languageseeker (talk) 19:28, 15 April 2021 (UTC)
Help Scrapping Books from BL
Latest comment: 3 years ago3 comments2 people in discussion
I was wondering if you knew of any easy way to scrape all 127 quartros off the BL. The link is [3]. Then it goes to https://www.bl.uk/Treasures/SiqDiscovery/UI/PageMax.aspx?strResize=yes&strCopy={{id}}&page=1 Languageseeker (talk) 03:53, 17 April 2021 (UTC)
You can scrape the images via the URL like https://www.bl.uk/TreasuresImages/shakespeare/mid/ham-1603-22275x-bli-c01/ham-1603-22275x-bli-c01-009.jpg. Only the last number needs to be changed. With a bit of experimentation, you may be able to figure out the maximum page count, or just iterate until you hit a 404. For determining the ham-1603-22275x-bli-c01 "slug" for each work, you can load the PageMax.aspx page and pick out out the #uiPageImage element.
Thanks for the advice, but that sounds a bit too technical for me. I don't know what slug is. I tried using scan tailor, but it produced a mess. I tried to automatically select the images and set uniform margins and produced a horribly degraded result. Languageseeker (talk) 20:14, 19 April 2021 (UTC)
Review Activity of Billinghurst
Latest comment: 3 years ago2 comments1 person in discussion
Latest comment: 3 years ago11 comments2 people in discussion
The first line of the first stanza.This line belongs to no stanza.Neither does this.
But this gets a shiny new stanza.So long as we don't tickle the Jabberwookie.It is much worse than the Jabberwock,since it will wrap you up in peas.
I don't think, offhand, that this is fixable. So workarounds that come to mind are either magic syntax or a fake hr specifically for ppoem that uses markup that will not anger the Jabberwookie. Having also looked (very slightly) at {{…}} and {{
}} in ppoem lately, I'm inclined to think we may need a small suite of the most common stuff specifically for use in ppoem, as an alternative to magic syntax.
@Xover: making the stanza a div helped (a bit), but HR inside SPAN is not quite right. Perhaps we should have a magic syntax to drop a line right out of a stanza and start a new stanza after it? Inductiveload—talk/contribs09:43, 19 April 2021 (UTC)
When the rule is being used as a separator, sure; but here it conceptually is a line. I'm thinking what we need is a line span that just happens to display something that is visually indistinguishable from an actual horizontal rule.
The first line of the first stanza.—————This is line three of the stanza.“Hi there, I'm Stanza 1.4.”
Along those lines. All these types of rules might also need a smaller line-height to look balanced with the lines containing full-height characters. --Xover (talk) 10:03, 19 April 2021 (UTC)
Yeah, that's what I'm doing but bar has… other issues.Random thought: a special kind of line, a "separator line", that has a smaller line-height, but is most likely going to be populated by a specific template (like bar) rather than by magic syntax? Not at all sure it's worth the effort and complexity, but… --Xover (talk) 13:17, 19 April 2021 (UTC)
And speaking of, what's your thought on the canonical way to tweak or disable the hanging indent? 4em is a bit aggressive for this use, and some bits should have none. --Xover (talk) 13:52, 19 April 2021 (UTC)
I guess we could do it as a class on the whole ppoem (and/or per stanzas. Since it's an effect on the line, the selector .ws-poem-no_indent .ws-poem-line would work in either case. The DI special-casing would need handling as well.
Oh, I figured you had an idea of how that should be done. I'll try futzing with pre-defined classes and see how that feels. No indent is probably a shoe-in, but differently hanging indent quickly runs into the {{ts}} trap. {{hin}}, if it doesn't interact badly, may be it; otherwise the threat of magic syntax starts looming.Yeah, traditional typesetting and layout have some definite advantages. And it's not helped particularly by us sitting squarely in the "sweet" spot where we get all the indeterminism and quirks without very much of the dynamism and flexibility. But ppoem raises the bar there, so there's certainly hope for the future. --Xover (talk) 15:04, 19 April 2021 (UTC)
@Xover: A built-in for no indent sounds like a good idea, and maybe 2em might make sense. After that, it probably makes sense for users wanting unconventional indents to supply their own classes, or Template:Ppoem/styles.css will look like a kitchen sink warehouse on stock-take day. Inductiveload—talk/contribs15:25, 19 April 2021 (UTC)
broken index page <= config
Latest comment: 3 years ago7 comments4 people in discussion
Latest comment: 3 years ago6 comments3 people in discussion
I'm seeing instances where there is a transclusion template with "yes" and the template is replaced with status marked, but other situations where there is a template that is not removed and the information is not transferred.
In particular, situations where there is a duplicate half-title page (or something similar) that is deliberately tagged and categorized as "not transcluded". Does the bot make a strict check that does not allow for these situations? --EncycloPetey (talk) 03:28, 21 April 2021 (UTC)
Oh, I take it you mean Index:Shakespeare's Sonnets (1923) Yale.djvu? I didn't think there were any using "X" and also the templates, since that was a short lived "temporary" state. I'll run though and check that (small) batch. Duh, sorry, I've thought of that - any "duplicate" templates will be hoovered up as the two old templates get converted. For a short period, a very small number of pages (~10 I estimate, out of ~13k) may have more than one category. Inductiveload—talk/contribs03:50, 21 April 2021 (UTC)
All respective pages have text updated to reflect dropdown field in Index: page. Templates removed. I will let you tell the community of your great progress. — billinghurstsDrewth05:40, 22 April 2021 (UTC)
Could you review a proposal?
Latest comment: 3 years ago4 comments2 people in discussion
@Languageseeker: I understand where you are coming from. I don't really have the bandwidth to deal with this in detail, but your proposal probably needs to focus a bit more on how you plan to implement this rather than just what the problem is. Immediate technical queries I have with it as written:
"The new system would use css popups": What is a CSS popup? CSS has no innate concept of a popup, since that's structural data, not stylistic. Probably a fully-fledged system would use some kind of rich UI element like OOUI when possible (or maybe it can just be done with mw:Reference_Tooltips right now).
E-readers: not many solutions will work on e-readers since the baseline is a very (very) basic environment and you may not even have a touchscreen, let alone a mouse. Probably the best you can realistically hope for is on-the-fly conversion to footnotes by the exporter. Footnotes generally are supported OK (and use the right epub hinting). Inductiveload—talk/contribs20:28, 21 April 2021 (UTC)
I was thinking about using something like these two examples [4][5]. So, it would use CSS to control the display of the popup and how the text is stylized. In this way, these options can be overwritten on a per-text basis. As for ereaders, it would require additional engineering to convert these popups which I don't plan on doing. Thoughts? Languageseeker (talk) 20:51, 21 April 2021 (UTC)
Latest comment: 3 years ago6 comments2 people in discussion
The form for the index page prefills in information from the book template. When I move the information to wikidata, the form shows up empty.
So, a nice upgrade is to have it pull from wikidata if it exists.
Everything I know about pulling information from wikidata, I learned here, at the author template. It is so easy.
If I had a computer, I would make a citation tool that would make the publication listings at the author pages.
On a different matter, I have a question about a template at commons. The template gave me errors, but it seemed to work. I was wondering if you could look at it and tell me if it is rendering nicely for you as well. It is in use for volumes of things there commons:Category:Bentley's Miscellany, Vol. 1 for one.
It can be done, but it'll be a little bit of work to get it done. There aren't many files like that currently, so for now it's not a high priority to me. I'll get to it one day, unless someone else does first.
The navbar template appears functional? I'm not seeing any errors.
I don't know what about that conversion could make you "very unhappy". Regardless, the IA-Upload bot has been modified to only warn if the same identifier has been uploaded before (phab:T269518). I don't like software that does this - I don't know what you mean by that: If you don't like IA-Upload, don't use it? Inductiveload—talk/contribs20:27, 22 April 2021 (UTC)
In the exchange on that talk page, I read your welcome to me when you were "ready to export" texts here and something I said after that during the DP f2 days. These words were delivered back to me and way out of context in an automated sort of way. I don't like that kind of software. Liza is so late 90s....
I like IAUpload, enough to have learned some of its ways and foibles and I am both sad and happy it is being fixed. I kind of want to cuss or append everything that I say on a talk page with something about not liking chat mining now.--RaboKarbakian (talk) 21:40, 22 April 2021 (UTC)
I'm sorry, I genuinely have no idea what you are talking about, as far as I can tell LanguageSeeker was being helpful and I'm fairly sure they're not a chat bot. Inductiveload—talk/contribs22:02, 22 April 2021 (UTC)
Monthly Challenge Work
Latest comment: 3 years ago1 comment1 person in discussion
I would really like to get the Monthly Challenge running in May. Since, I'm not sure that the Bookworm Bot can get started by then, I've created an alternative page for the Project Wikisource:Community collaboration/Monthly Challenge (2021)/May 2021. There's a nomination project on WS:S#Call for Nomination of Texts that has enough texts already. There are two major tasks remaining. The first involves halving the New Text Box and creating a text box for the Monthly Challenge above it similar to the one on French Wikisource. This requires an Administrator. Can you do that? The second is writing the FAQ which I plan to get done over the next few days. I'll also create a discussion on the FAQ at WS:S once they're done. Languageseeker (talk) 13:18, 24 April 2021 (UTC)
Wanna take it for a spin?
Latest comment: 3 years ago7 comments2 people in discussion
Whenever you have a moment, could you take this for a spin to check that I haven't massively borked something before we replace MediaWiki:Gadget-ocr.js? Other thoughts and feedback welcome too, of course, should you feel inclined. Xover (talk) 15:11, 23 April 2021 (UTC)
@Xover: not flushed with time right now, but I'll try. First impression: not sure it's working for hOCR - I get the "hOCR complete" popup but nothing happens in the text box. I think $('#wpTextbox1').value should be $('#wpTextbox1').val() (or cut out jQuery and do document.getElementById("wpTextbox1").value = ...).
Also probably need to steal from be inspired by the Google OCR JS and add $( "a[rel='wsOcr1']" ).css("width", "45px"); around line 95 to fit the wider button icon. Unless the Wikieditor config hook has a CSS field I don't know about. Inductiveload—talk/contribs15:54, 23 April 2021 (UTC)
@Xover: As an aside since this is adding buttons, with the "2017" editor now 4 years overdue, do you have any idea what's going on there? Is there anything we should be doing to get ready for an eventual change-over? It doesn't seem to share the configuration hook with the 2010 editor. As usual, there's naff-all in the way of documentation for the new shiny. I don't personally use it and found it incredibly annoying when I tried it, but there's a few tagged edits around, so some people must like it (or they don't know they can turn it off!). Inductiveload—talk/contribs16:14, 23 April 2021 (UTC)
You mean the wikitext mode of the Visual Editor, I presume? There's actually been some movement on that just recently, triggered by CommTech's use of Parsoid for ws-export, in that the VE team has apparently now given this enough thought that they've concluded they need a Parsoid-native version of ProofreadPage to make this work. I think that opens the possibility that they'll give it some attention eventually. But the bad news is that I don't think our use case is even on their radar, so all the docs and exposed functionality assume you're a core MW dev employed by the WMF. It is definitely possible to hook into VE, both visual mode and wikitext mode, in all sorts of ways, but I've found jack-all that's usable for Gadget or user script developers. Which reminds me… no, on second thought, I'll have to dig up some links for that rant. Later. --Xover (talk) 16:41, 23 April 2021 (UTC)
PRP and Parsoid came up in phab:T274654#6946964 and resulted in phab:T278481.The other links I wanted to dump were mw:ResourceLoader/ES6 and T237688, T178356, and T75714. The long and short of which is… MW and the WMF are now feeling the pain of not using ES6 so much that they're willing to drop Grade A support for IE11 and start requiring it for new core features, but have not as yet given any priority or allocated resources to developing the necessary validator for end-user accessible code (like gadgets) to be able to use ES6. This is kinda concerning since Krinkle, who as they mention in one of the comments is the most likely person to do the work, jokingly estimates it won't get done until 2030. I'm not sure how to approach that because, as they say, it isn't a trivial job, and it's hard to point at anything ES3 makes actually "impossible" or that ES6 suddenly makes possible; it's more the same kind of pain that leads the WMF itself to want to move the bar.And speaking of the utter neglect of Gadget coders… Are you familiar with Vue at all? I've never really looked at it but from a quick peak it looks to exist for / appeal to 1) people who are caught up in the now-fashionable "let's all dump on jQuery" fad, 2) people who are religiously convinced React is the light and the truth, solves all problems, brings world peace, and is therefore the one true way, and 3) people with a genuine need to build full blown applications. What I am not seeing is a library of pre-made UI widgets ala jQuery UI—for which OOUI was already a hideously complex replacement—or anything else that would help us make robust, functional, consistent gadgets with modern and user friendly UI. Sigh. --Xover (talk) 11:42, 24 April 2021 (UTC)
Oh good, we're on the same page. The total neglect of Gadget coders is really quite frustrating. There are AFAIK, a grand total of zero documents explaining best practices for gadgets (specifically: configuration, deployment and code sharing), and you get short shrift in chat channels for asking. The OOUI help pages are deserts for answers (and still have the wrong IRC link on them, where you get an earful for being in the wrong channel). The OOUI "manual" is hilariously short on use cases. and I'm still short of a tab-completing selection box that actually reduces keystrokes for finding something (for User:Inductiveload/quick access.js, I had to roll my own).
I did ask for a way to at least ask ResourceLoader to load specific deps locally (phab:T278304) because otherwise you can't really test "JS libraries" without finding and disabling all other clients of the library. If the core site JS uses such a library, that's not very practical. "Test it on a local wiki" was the answer given. I'm working on a workaround but it's fugly (think web server which regexes JS on the fly ugly).
RE Vue, I have no idea WTH is happening, or if there's anything useable for Gadgeteers, if there ever will be, or even if that's planned. I am vaguely planning to use Vue for a future Toolforge thing just so I'm not totally blindsided when it comes along (but Vue + Bootstrap, since I have no earthly clue where WVUI is at, or even if one can use it for anything right now). I think the idea for Vue is that it's good an "incremental" use, so you can drop in a few Vue widgets without having to drink the whole cup of SPA koolaid. As you say, all one needs is a small library of widgets to play with. And at least it can't be much worse that OOUI in terms of verbose boilerplate needed, eh?
Latest comment: 3 years ago3 comments2 people in discussion
cf. Hans Andersen's Fairy Tales/The Top and Ball and probably a smattering of other pages, and this change. Code like this needs to be not only defensive but downright paranoid: you're dereferencing a complex datastructure that is directly end-user editable with zero input validation. Any and all levels in that datastructure may be missing or contain garbage. As I recall, the last time I ran into this issue I had to walk the tree level by level and do a nil check for every one. Xover (talk) 13:01, 24 April 2021 (UTC)
nil-checking datavalue is probably enough here, yeah (cf. below).You're right that the whole structure isn't user-editable, but you can easily run into logical inconsistencies like a qualifier value for a property with unknown value. My main point is that in dealing with Wikidata we need to armorplate and program defensively as a rule, much more than with any other semi-structured data source. Pulling from MWs DB tables, for example, we can assume a certain level of consistency because it's enforced on input by the software in a high-level UI. Wikidata barely validates syntax, much less any kind of real consistency, and at the same time let end users change data in what is essentially the "database" layer. It's kinda scary. (remind me to rant about semantics and information modelling some day; some day you have lots and lots of free time…)In any case, I dug up the previous instance I was thinking of: Armoring against bad data on Wikidata. It's been a while, but superficially it looks like a very similar type of issue, which probably means nil-checking the datavalue is the pattern to extract from this.Oh, and that doc was useful. I've been looking for that kind of thing and have failed to find it. I think we need to start thinking about what we can do in terms of gadgets to make our WD integration better and more user friendly, without crossloading code from some dude's user space on a different language project… --Xover (talk) 08:58, 27 April 2021 (UTC)
Latest comment: 3 years ago6 comments3 people in discussion
Guessing, not presuming, that you have had a look at what frWS is now doing with their Index: page template. Either way, waving it under your eyes. — billinghurstsDrewth08:30, 26 April 2021 (UTC)
We could probably persuade Tpt to give us a rundown of what the module does, how it fits into their larger architecture, and what the drivers were. I'm thinking the coding part of this stuff isn't so hard, it's more an issue of figuring out how it should work, what are the externally imposed limitations, etc. But then, the way WD is set up is fundamentally incompatible with the way my brain is wired, so maybe that's just me. :) --Xover (talk) 09:02, 27 April 2021 (UTC)
@Xover: I can probably figure most of it out, it's mostly a question of "what we actually want it to do" (other than bling-bling which is of course a noble aspiration). The biggest issue for me is that WD is actually not always as good as you would think/hope at the kind of bibliographic metadata we actually would want on an index page - particularly for things like volume (and that that's before we get to periodicals).
Furthermore, even if I could figure out representing sub-edition data (like volumes), there's another level to it: Index pages exist in a kind of unhappy no-man's-land between WD edition items and Commons SDC data - while the file represents the edition, the instantiation of that representation has its own properties (e.g. pagination, missing pages, scan quality, scan providence, etc etc) that can and do vary even within the same edition. This is, I think, the "Item" level of FRBR (Group 1). This is, I think, one of the primary disconnects between what WD promises it could be and what it actually is to Wikisource comes about. Inductiveload—talk/contribs09:19, 27 April 2021 (UTC)
Now I could be wrong, however, it still relies on data population at WD, and prior to the work being done here. We have struggled to get good WD compliance here at an work/article's creation, and I know that I generally do it just the once when I have transcluded. Prime issue for me is the push from Commons to WD has no easy semi-automated tool to either push or check data. Similarly the push from our Index pages to WD is not connected. — billinghurstsDrewth11:37, 27 April 2021 (UTC)
Well, certainly without the data being present somewhere, it's all a disaster. However, we currently have no one place we can put all the data, or even a solid idea of how to split data between Commons, WD and WS index pages in the general case. Which has lead to "some people" (i.e. me) saying "sod it" and just not bothering too much. I try to get the file metadata for scans to be reasonable, but since I don't even know what best practices are, I leave it at that for files.
For "easy" works like a single volume novel, WD can hold pretty much all of it in the work/edition structure, and only things like the pagelist need to be done at the file (or Index) level. For things in the mainspace, it's also easier, as that that generally can traverse a edition or translation of (P629) statement and suck data out of the work-level item (where things like "topic" likely reside). The "item" level of the FRBR system kind of falls away at that point, because we kind of isolate that behind the Index/Page:Mainspace division.
It's possible there's a good way to do this for a general book (e.g. a volume of a set, or a periodical issue), but I haven't been able to work it out yet. And sadly, the periodical thing especially is probably where WD can most help us deal with the enormous amount of bibliographic data represented by the contents of a periodical.
I am working on a WS -> WD item creator (working title: Wikidata Creata :-D), but it's not done (or half done) yet because writing gadgets is such a huge PITA with the tools we are given. So I'm considering doing it all in a separate web-app on Toolforge. Inductiveload—talk/contribs11:52, 27 April 2021 (UTC)
the setting up of a journal
Latest comment: 3 years ago4 comments3 people in discussion
I checked the tables of contents. Ingoldsby goes to VOLUME 17! Honestly, I was just twiddling thumbs with Rackham. Bentley's is more like interrupting thumb twiddling by playing with a hangnail. I don't like my computer being hacked. It is (a simile is about to happen, different from string literal) like having people jump in your car and go with you, expressing opinions, making rules and interrupting -- just with their presence. When my computer didn't boot, (and when my home directory was being mounted via nfs or ntfs, they mounted it "Non Executable") it was like (simile) they decided to park my car in their garage and I don't even get to know where or who.
I am trying not to spin this one way or another. I am trying this for years! Closer to 5 years since the non-executable fiasco than 1. So, I was going to use my compromised computer and work on something that I like but don't have great cares for. Arthur Rackham. Very wonderful illustrator; his creepy is cute.
Apologies for the rant. My inkscape friend does perl. I do python. WS is lua?
I'm sorry, I do not understand what you are saying. The first half makes no sense to me. The second half seems like you'd like to set up a periodical at WS for Bentley's?
The first step towards that is probably gathering a list of all the volumes and scans into Bentley's Miscellany. If you're going to use the Internet Archive SIM scans (which look like they are split into 3 each, TOC, content and index), that might make life a bit harder for you. I don't really have a good suggestion for a better option other than scraping the Hathi scans or checking over the Google scans at the IA (which look pretty poor). Inductiveload—talk/contribs15:31, 27 April 2021 (UTC)
Jumping into this conversation. I also had plans to create an page for Bentley's miscellany. In addition to the Princeton Scans, Toronto also has full color scans of some of the volumes. For the volumes on Haithi trust, is there anyway that you can batch download the Princeton Set and the Upload them to Commons without compression so that the images can easily be extracted? Languageseeker (talk) 18:31, 27 April 2021 (UTC)
The first rant here is about why I choose different things than what I really want to work on. If my words are being mined, then always including a displeasure about being hacked works for me.
I played with pulling in information from wikidata for about 20 mins, it has been a couple of years since I did this.... You can see what I got at Bentley's Miscellany. The upper portion is paste. The data calls (also edited paste) are under "pulled".
Expanding use of preload to something on a per work basis
Latest comment: 3 years ago4 comments2 people in discussion
For some of our compilation works I would like to better utilise MediaWiki:Gadget-TemplatePreloader.js somehow for people working on these compilations.
A Biographical Dictionary of Modern Rationalists will be using the header adaptation Template:BDMR and it (now) has a /preload. I would like to have scope to put in this template rather than "header" utilising the path of the work. I would also like to see if we can look to do something similar with a range of other compilation works. I would hope that we wouldn't have to do it by editing the preloader.js itself but somehow leverage it by having a "by work" configuration file, json, something!
Nothing urgent urgent as I can do a templatescript that I can set up to do a replacement, but to make all these compilation works easier, better, uniform I see that it is our next evolution. If we can have it configurable outside of the javascript it gives great flexibility and ownership. Thanks for your consideration. — billinghurstsDrewth02:54, 19 April 2021 (UTC)
@Billinghurst: hmm, yeah, so the idea is a (very) good one, but I'll need to think a bit about the implementation. JSON is probably a good call, as long as it actually loads as JSON when AJAX'd. I'll give it a poke at some point. Inductiveload—talk/contribs09:08, 19 April 2021 (UTC)
Latest comment: 3 years ago1 comment1 person in discussion
You may like to be aware of phab:T41510, both for yourself and if anyone else runs into it. There are workarounds through the API, asking a dev to clear it, etc. should that be needed. Xover (talk) 18:47, 29 April 2021 (UTC)
Dezoomify Question
Latest comment: 3 years ago3 comments2 people in discussion
I'm trying to use dezoominfy to get the images off The Jane Austen Manuscript website that are in the PD. However, I get the following error
ERROR: Could not open ImageProperties.xml (<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)>).
Latest comment: 3 years ago4 comments2 people in discussion
One thing that I noticed on frWS that would be helpful here was links to the original source on the Index page. Perhaps, we could import the links from Commons with the other metadata? This makes it much easier to fetch the full resolution images from IA. Languageseeker (talk) 04:39, 2 May 2021 (UTC)
We could do this, but it's one more thing to keep in line. It's possible that some kind of Commons SDC or Wikidata (probably via Wikisource index page URL (P1957)) might be more sensible, and building out an Index page SDC/WD infrastructure might be more sensible. Though I am personally hazy on when to use SDC and when to use WD. There's also User:Inductiveload/jump to file which will link you directly to the source file or hi-res JPG at the IA (or Hathi, or some others), and also give you the option of loading a high-res image into the ProofreadPage pane for when the scan thumbnail is a bit rubbish (PDFs are especially bad at this). Inductiveload—talk/contribs23:08, 2 May 2021 (UTC)
I see. The user script is great. Would there be anyway to get it to upload the high-resolution image to Commons? I think that high-resolution images really come into play when trying to upload better quality images. Languageseeker (talk) 00:33, 3 May 2021 (UTC)
@Languageseeker: A WS-optimised uploader is on my middle-term list, but it won't (directly) be part of this script (though the script might direct you to a prefilled upload form). Uploading raw originals will be part of that.
In any case, the upstream original isn't that useful at WS in the general case, because it is usually needs non-trival processing to remove the paper colour and tidy up defects. It's nice to have the original at Commons, but it's not something we in general will present to readers. Inductiveload—talk/contribs18:59, 3 May 2021 (UTC)
Stats color for 100 to 200 range
Latest comment: 3 years ago2 comments2 people in discussion
My thinking is that highlighting isn't highlighting if everything is highlighted. The idea is only to call out things that have special meaning: falling short of a "minimum" level (bad) or exceeding some "maximum expected" level (good). Obviously, we don't yet have a good handle on our expected levels so 100/200 are made-up thresholds.
On that note, if we start hitting 200 per day on a regular basis, we should bump the limits so that we 1) keep the hounds snapping at heels by raising the "lower" bar and 2) raise the upper bar so we have a meaningful "this day was exceptionally good" signal rather than allow the whole table to turn green. Inductiveload—talk/contribs19:05, 3 May 2021 (UTC)
Monthly Challenge - volume's image
Latest comment: 3 years ago3 comments2 people in discussion
@Ratte: Sure: like this (at Module:Monthly Challenge/data/2021). It's slightly wierd, but it's the only way I can make it so that that MC grid can update automatically without needing a bot to run constantly to keep it up to date. I will be writing some central docs on how to work the MC infrastructure once I'm sure it does actually work (it's looking "OK" so far)! Inductiveload—talk/contribs08:31, 4 May 2021 (UTC)
Latest comment: 3 years ago3 comments2 people in discussion
Hi. As is possible with a.mw-disambig I would like to get some css class(es) on these links so I can visually track the use of these links in works (per User:Billinghurst/common.css). And you know that extended css is not my strength so would you mind doing that? Thanks. — billinghurstsDrewth01:57, 11 May 2021 (UTC)
@Billinghurst: They now have ws-authlink and ws-authlkpl classes respectively. I haven't done per-field classes like {{article link}} because it's quite a bit of faffing in template mode (vs module mode) and doesn't have an immediate use. At least now, you can do
Latest comment: 3 years ago2 comments2 people in discussion
Add a pinch of caution on this one. This may be just NIH, but there are several red flags here for me. For one, we're only just now hearing about this, when the project is essentially finished (cf. the grant proposal: it ends in May). The project was to involve lots of community consultation, with "the Wikisource community", but it looks like they've only talked to the Punjabi Wikisource (the home wiki for at least one of those involved). There are significant differences between the language projects that may make a single-project setup untenable for other language projects. And while better Wikidata integration is awesome, it needs very careful design and management: e.g., what happens when a arWP editor hops over to Wikidata and changes the fields that are used on our Index pages? Or changes it locally on arWP and some set of tools makes the change also happen on Wikidata. Or a bot does a new monster import of Worldcat, with zero verification of the data. Or… All the projects have different policies (as enWP found to their detriment) for things like verifiability, conduct, conflict resolution, etc. And different priorities, which is a perfect recipe for conflict. Technologically crufty as our current indexes are, we own that data and can make policies for them, and patrol changes to them. Once we outsource them we have zero control. And what happens once this grant is exhausted? "Wikidata integration isn't really supported; it's really buggy and nobody is likely to fix it any time soon."
So maybe a more succinct summary is: if you find good stuff there then we should certainly crib what we can, but on our terms and we can't just assume that because someone got a grant to make something it is necessarily good for us. And a lot of the coolest potential stuff for WD<->WS integration needs community buy-in and policy changes, not just technical plumbing. Bah, humbug! :) Xover (talk) 20:32, 11 May 2021 (UTC)
@Xover: I agree 100%. Mostly they were vague on exactly which Wikisourcen they were even talking about. I can turn grumpy and lawn-territorial later on, as needed ^_^. Besides, unless they have a global interface admin and/or GS on side and willing to egegiously abuse those rights, they'll need someone with the bits locally anyway.
As it is, eventually I'm long-term hoping for a frWS-kind of affair for Index page WD, perhaps with more twiddly bits, perhaps not. But until the Datazoids 1) figure out how we're supposed to record, in particular, per-volume data for multi-volume works, and 2) write that down somewhere incredibly clearly in words of one syllable for my dumb ass (or someone figures out what the Commons Structured Data is actually for and tells me that's what I should be using and then tells me how), I find myself suffering from a Vitamin Care deficiency. Inductiveload—talk/contribs20:42, 11 May 2021 (UTC)
autopatroller
Latest comment: 3 years ago10 comments2 people in discussion
and fwiw when using the +er-blocks they do need to be separated by blank lines rather than closer with BR as the text is not suitably separated, it overlaps (in Firefox). — billinghurstsDrewth22:33, 11 May 2021 (UTC)
I don't see anything wrong with the -er-blocks in Firefox. If the line heights are wrong on some platform or something, we need to address that, as line wrapping doesn't only happen with manual BR tags. Does the lorem ipsum in the docs at {{xxxx-larger}} do it as well? Inductiveload—talk/contribs22:38, 11 May 2021 (UTC)
I have no idea whether it is platform specific or not, just saying that I am seeing it that way in my browser. With the (larger)+er block usage (usually above x-) it is typically a display set of text like title pages rather than standard body text as can happen with the (smaller)-er blocks we need that line spacing to contract like it does neatly. So in the display pages we can just throw formatting at it and have it display nicely. So if shoving blank lines or BR doesn't matter as the look is okay, so do it to suit the required display. [Changes now is going to impact a lot of pre-existing pages where it is display for display sake. — billinghurstsDrewth22:49, 11 May 2021 (UTC)
Right, but it's not good if one user thinks something displays properly and another see it broken. For example, with BR, this is what I see: https://i.ibb.co/p2zTM9B/2021-05-11-235139-523x531-screenshot.png. Which is (pedantically) more like the original, not that it really matters. I have used BR-in-largers on title pages before, so it's not good if it turns out they are coming out busticated for some.
@Billinghurst:, OK I have harmonised the larger-block templates to set the line height to the same as the smaller ones (1.4), which isn't needed for me, but apparently is for you, despite us both being Firefoxers. It should now be functional to have a line break in these templates (forced with BR or natural). Thanks for the screenshots, very helpful. Inductiveload—talk/contribs13:54, 12 May 2021 (UTC)
@Languageseeker: ...and now the listing is automagic - all you have to do is keep it topped up with works, and it'll take care of putting them in the right sections (as long as we have under about 500 active—i.e. immune or < 3 months old—works, which I think won't be an issue!). Still no way to have a page counter without a bot or similar, but at least this avoids a daily need to check statuses and risk works stagnating in the wrong sections. Inductiveload—talk/contribs20:32, 27 April 2021 (UTC)
The page looks so beautiful. It’s more than I could have hoped for. Once we reach 500 books a month, we can make it the Weekly Challenge (hee-hee). Is there any way to see texts that have only a few pages to validate or are mostly proofread? I also want to use this as a means of finishing texts. Simply outstanding work. Languageseeker (talk) 00:43, 28 April 2021 (UTC)
Re Is there any way to see texts that have only a few pages to validate or are mostly proofread not really, this is what we need a bot or phab:T281195 for. Once the proofread percentage data is either sitting in a module (or directly available via some magic tag), we can work it into the tiles for a quick overview. Daily stats like frWS will almost certainly need a bot - I don't see that being built into the core any time soon (though it would be pretty nifty if it could).
I think that stats are only part of what drives users to contribute to frWS. If you look at Distributed Proofreaders, users contribute even when they only see the number of books completed every month. In my opinion, most users simply don't know what to do when they arrive on the site. Right now, we basically tell users to pick a project any project. However, most volunteers just wanted to be given a specific task. They also want to contribute to something important. So, I hope that by putting attractive texts on the Monthly Challenge, the site will grow it's user base. That is why I'm selecting texts that have broad recognition or look cool. My other hope is that by creating scan-backed copies of important works, we will attract more users to our site. Our key benefit is that we can combine scan-backed proofreading with the generation of ebooks in a way that allows for future improvement of formatting by relying heavily on templates. That's the key selling point of this site.
I also hope that by creating a Monthly Challenge, it will become easier to reach out to GLAM institutions. Most GLAM institutions simply don't have the money to proofread texts. Scanning is cheaper, but they don't want to contribute scans if they'll just grow moldy in the backpages of WS. With the Monthly Challenge, we can tell GLAM institutions that if they provide us with the scans, then we'll feature them on the Monthly Challenge and get them proofread. Right now is a great time to reach out to GLAM institutions because the demise of flash has had a devastating impact on GLAM websites. Combined with the pandemic, most GLAM institutions are choosing to either take them down or leave them broken. enWS can reach out as a new home for these scans.
On a separate technical note, I've noticed that the box containing the PoTM vanishes if the site window gets to small. Maybe, it would make sense to make sure that the PoTM and Monthly Challenge box does not vanish on small displays? Also, would it be possible to create a way to easily monitor the talk pages of all the Indexes in the Monthly Challenge to see if a user asks a question there? Languageseeker (talk) 02:52, 30 April 2021 (UTC)
Re key selling point that's my view too.
Re monitoring all talk pages, this should probably be a script to add all MC index talk to your watchlist, and probably is not that hard. I'll look into it.
Re small screens, I think the reasoning is that the proofreading UI is so unwieldy on a mobile screen that there's not much point directing mobile users to working spaces anyway. I think we could at least advertise a bit, once we have some progress to advertise, but it'll need some thought to make it useful/interesting to mobile users. Inductiveload—talk/contribs22:52, 30 April 2021 (UTC)
For the small screens, I meant when you resize your desktop internet window, the PoTM box vanishes. On the frWS, it gets moved to the bottom in a long column, while on the enWS it simply vanishes. I can understand why it's ommited on mobile, but, on desktops, you can resize the window. Languageseeker (talk) 02:59, 1 May 2021 (UTC)
Take your time. I was also thinking that it might be good to add some stats to the front page template like the French do. It can go between the line about the number of works and the sprint listings.
Column 1
Mission 2000
Column 2
Results of 2021 :
May 2021
(Total Pages validated or proofread) (percentage of 2000) ((daily change) pages)
This should be possible once we have a few stats to use. Year-to-date stats will obviously only make sense from next month, and total-to-date only from next January. The whole stats system still needs a bit more tweaking before it becomes a hands-off automatic thing (for a start, moving to a Toolforge backend).
I would be interested in stats on click-throughs to the sub-pages of works that are expected to be read-through. Or clicks from wikipedias to their little source sisters. CYGNIS INSIGNIS15:16, 1 May 2021 (UTC)
We can get pageview data for pages via {{annual readership}}, but figuring out how users get to our pages and how they traverse them would possibly start running into issues with data collection fairly quickly (at least, you'd have to be very careful the data is fully anonymous, and even then I'm not sure of the rules on WMF sites). Inductiveload—talk/contribs20:39, 1 May 2021 (UTC)
New WMF toy in testing
Latest comment: 3 years ago7 comments2 people in discussion
Special:Preferences#mw-prefsection-betafeatures, enable "Discussion tools". Beside being really convenient both for replies and new threads, it eliminates a whole lot of "woops, forgot to sign" issues. I mention this for no particular reason. At all. Just completely randomly. :) Xover (talk) 08:35, 5 May 2021 (UTC)
Aurp. You may also want to go into the Appearance section and uncheck "Use Legacy Vector". Not due to its awesomeness, but to be aware of and be prepared for what's (apparently) coming. Lots of good ideas, some less good, and a at time really janky implementation. I'm suspecting a lot of the impedance comes from two main factors: the team's prioritising readers over contributors, and a basic assumption that all content on the wiki is user generated (true for almost all the other projects) and uniform across pages (also true for most other projects).I have good experiences with some of the people involved (receptive to feedback etc.), but it's probably going to take some effort on our part to affect anything here. Xover (talk) 07:57, 14 May 2021 (UTC)
@Xover: hmm, yeah I've tried that before and it doesn't look great. However, "fixing" the centred column formatting, if we wanted to do so, is probably "only" a matter of futzing with CSS here and there, so it's "probably" OK. I guess it's a matter of how much the current state is WIP and how much is actually considered "done".
Then again, some limiting of content width is also probably not a totally insane idea, since a 120em page (1920/16 = 120) is pretty excessive in most cases. But almost certainly the Page: NS editor might need special casing.
Anyway, since from the timeline on the project page it looks like this is some time away for us, I'm not sure there's a whole lot to do here right now. If we are sulkily disinclined to make any real effort, we can also let some other Wiksourcen take the early-adopter hit and only opt in once they've ironed out the wrinkles. I suppose collecting a CSS "fixes" file as a CSS gadget (targeting .skin-vector:not(.skin-vector-legacy) or similar) might be helpful when petitioning the reskin project for mercy?
For example, this removes the most egregious mis-feature, IMO: the 9em margin-left on #content:
A lot of stuff is effectively getting locked in now, so if we want to affect anything we need to start talking to them sooner rather than later. And that will require familiarity with the current state of it and their roadmap ahead, and, perhaps even harder, having some idea of what we want. It is also an opportunity to get things done if we have pain points in this area. Right now there is attention and assigned resources, which, well, you know how things usually go when that's not the case. Xover (talk) 12:44, 14 May 2021 (UTC)
Latest comment: 3 years ago7 comments3 people in discussion
You mentioned these recently and I wanted to blather on about them, I think they are a crucial addition to the site. I try to show by example, but don't remember where they are: this is what I would have shown, The year's at the spring, which I still think works; the other being an original (ie printed, analog) index conveniently linked by pages numbers (not hundreds of anchors). Hope this was of interest. CYGNIS INSIGNIS21:38, 9 May 2021 (UTC)
@Cygnis insignis: thanks for the note. I do wonder about linking only page numbers because it seems counter-intuitive to me - it "feels" like they're going to take me to the Page NS, though I imagine a user unfamiliar with the site wouldn't have that feeling.
More importantly (to me): on export this means the TOC text is not clickable (and the user agent may not make it clear where the link is):
Furthermore, the hint to the export tooling of the name for the entry in the "EPUB TOC" (the one you get when you "view book structure", as opposed to the one in the "content", which replicates the original) is the link text, so using the page number makes it somewhat unclear what is going on:
I hadn't given that much consideration, although I also think that is important I hadn't expected the problems you outline. I suppose there are many potential hazards. I'll go off and learn about what's going on with EPUB, etc, maybe pick this thread up again when I better understand the export side of things. Thanks for the info, and for the couple of times you helped solve a problem recently (kept forgetting to say "that worked a treat, cheers) Have a good one. CYGNIS INSIGNIS07:23, 11 May 2021 (UTC)
@Billinghurst: no, that works fine - the #XXX links work just fine in the EPUB (and they make sense to me as the clickables, since one index entry can have n pages). It's the TOC where the exporter is actually using the link text to construct its own idea of what the entries in the document's structure are called. Inductiveload—talk/contribs12:37, 11 May 2021 (UTC)
I'd forgotten about something, and having reread that just now I recognise it was inappropriate for me to post here. Cheers anyway for the replies. CYGNIS INSIGNIS00:05, 15 May 2021 (UTC)
Latest comment: 3 years ago4 comments2 people in discussion
Okay, where's the Index:page ,the file information and what the actual scans got out of step? Because the Hi-res scans this is looking for bear no resemblance to the ones in the PDF/DJVU for the nominal page numbering? Suggestions are welcome because I don't like playing hunt the glitch. ShakespeareFan00 (talk) 23:49, 14 May 2021 (UTC)
Also, the hi-res versions weren't available when I uploaded the volumes. I left a message for them and they seemed to fix them. If someone wants to replace the low-res version with a higher-res version, feel free. Languageseeker (talk) 00:43, 15 May 2021 (UTC)
Thanks... The script seems to be appending an additional 0 to the page numbering for some reason. Is something doing a string concatenations when it should be doing an addition? ShakespeareFan00 (talk) 07:23, 15 May 2021 (UTC)
Index date validated not working
Latest comment: 3 years ago2 comments2 people in discussion
Latest comment: 3 years ago7 comments2 people in discussion
I dislike having DHR in works that I do, especially when I can just as well space them out with hard returns and not get code bloat and impact readability. So I am not sure why you are replacing them with {{padded page break}}. I especially see DHR used way more than necessary when we can just be adding clean clear space. Am I missing something? — billinghurstsDrewth05:54, 17 May 2021 (UTC)
There are a couple of reasons I would put for why I think {{ppb}} is better (and why I'm not just randomly screwing about):
Multiple hard returns in the code actually result in "stacked" P tags that contain a single BR each in the output. This is 1) semantically wrong as there are no structural paragraphs there at all and 2) this relies on the rather inconsistent way MW throws out P tags, BR tags, and how the inter-paragraph margins stack up. For example:
3 blank lines:
foo
bar
<p>foo</p><p><br></p><p><br>
bar
</p>
4 blank lines:
foo
bar
<p>foo</p><p><br></p><p><br></p><p>bar</p>
This means that the actual gap you get doesn't scale linearly in number of lines, because every odd number of lines collapses one BR into the last P tag (or at least it does right now, but because P-wrapping is a mess, who knows if this behaviour is a safe long-term bet):
1 line:
bar
2 lines:
bar
3 lines:
bar
4 lines:
bar
5 lines:
bar
Secondly, multiple blank lines in Wikitext are fragile not only because MW makes no hard guarantees about how it's going to handle them, but also because the editor intention behind them is not always clear. How important is it that there are 2 blank lines here? Does the editor mean they want exactly one P tag containing exactly one BR tag (for a 0.5em + (1.6 * 1em) total effective gap)? Or did they actually just mean they want "some" gap? Whereas {{ppb}} is explicit: This is a page break with padding around it. Actually, I think all visible page breaks should get padding and then we wouldn't need {{ppb}} at all, because you very rarely want two pages jammed right up against one another, IMO
For that matter, {{dhr}} has some small advantage as well, especially on title pages: there is (hackish) CSS in the epub export which limits the height of a DHR to 100%, because massive stacks of multiple P tags usually result in a random division, with some P's on one page and the rest on the next, depending on how many P's there are and how big the reader screen/font-size is. Furthermore, DHR is an explicit "this div makes blank space" structural element with a class that allows targetting of CSS (which is how the EPUB exporter does that). Bare P tags have no such intention marker.
Because the P tags and the page-break DIV (which is what actually causes a page break on export) are disconnected siblings, you end up with free-floating P tags on each side of the break on export. This is currently true of {{ppb}} too, but...
The implementation of {{ppb}} certainly iswas lacking, an DHR-PB-DHR is the wrong solution for it (even if it is brutally functional). What needs to happen is {{page break}} should some parameter that allows to adjust this. This is on my "make exports moar bettar" list, but I haven't gotten to it yet well, I have now, wasn't as hard as I thought. Inductiveload—talk/contribs08:32, 17 May 2021 (UTC)
Also, similar to the (new) {{ppb}} without DHR escorts, {{section end rule}} provides a similar thing for a rule. By adding a dedicated class, default padding and width (and any other CSS) can be applied though index-level CSS, so you don't need the {{dhr}}{{rule}}{{dhr}} idiom (or anti-pattern, depending on how grumpy one feels), you just need {{ser}}.
The idea is similar: allow the user to express what they mean, as well as avoiding spraying out 3 separate HTML elements that might or might not even end up on the same page. Inductiveload—talk/contribs13:07, 17 May 2021 (UTC)
Hah! I hate terminating section rules too—unneeded book artefact like hyphens, etc. For me they fall between sections and should not be transcluded. :-) — billinghurstsDrewth06:58, 18 May 2021 (UTC)
@Billinghurst: First, not all {{ser}}s have to be at the end of a transcluded page.
Second: remember, a single template which adds a class + per-index CSS can do this: .ns-0.wst-section-end-rule{display:none;}. No dummy sections needed. Inductiveload—talk/contribs08:17, 18 May 2021 (UTC)
Sure, or just not get wedded down in thinking that the 19th book compositing needs to be 21st century computer presentation. The word is king. — billinghurstsDrewth10:52, 18 May 2021 (UTC)
@Billinghurst: I don't follow. The point is this allows it to not be part of the presentation, without having to faff about carefully arranging sections or futzing with no-includes. Slap a {{ser}} down, add suitable CSS (once only) and it's done - no section-end rules in mainspace. Inductiveload—talk/contribs10:56, 18 May 2021 (UTC)
Characters required for Old English keyboard
Latest comment: 3 years ago9 comments2 people in discussion
Hi Inductiveload, thanks for offering to make this; it would be really great to enable quick proofreading of these texts by specialists and non- alike. So this would require the following characters:
1) all 26 characters from the Modern English alphabet
2) the following characters from the O.E. alphabet:
eth: Ð,ð
thorn: Þ,þ
wynn: Ƿ,ƿ
3) A,a E,e I,i O,o U,u and Y,y with macrons, as this is used editorially to distinguish long from short vowels
@Rho9998: It should now be available in your Wikieditor "Special Character" palette: screenshot. I didn't add the 26 normal letters because they're all on a keyboard and it'll just add clutter I think.
@Inductiveload: Nice one. Well done on including the Tironian note (⁊) and the shorthand for þæt (ꝥ) as I forgot to mention them. That reminds me that on the punctuation front there's also the interpunct (·) which was used in the original manuscripts (although most editors modernise the punctuation). And checking Wikibooks I'm told there's also g with macron to abbreviate the prefix ge-. This further reminds me that there's g and c with dots on the top (ċ,ġ) used by some editors to mark when they're soft. Sorry for forgetting these but I hope that should be all of them now.
If you're feeling so inclined, you could also add runes. These are used sometimes in O.E. but I'd imagine they'd be useful for other languages and could merit their own keyboard.
@Rho9998: You mean, in terms of the OCR? It looks like the Google OCR tool recognises this as Old English (even though the IA OCR clear did not). Instructions: Help:Gadget-ocr.
and þæt ic mage geearnian þæt ic sī wurðe þæt dū më
for dĩnre mildheortnesse ālyse and gefrēolsige. Ic clypie to
þë, Drihten, Þū be æall geworhtest, þæt þe æalles ge-
weorðan ne mihte, në æac wunian ne mihte būtan þë.
If all OCR fails, your only options are to proofread by hand from first principles, or find a matching text to copy-paste from.
BTW, it might be clearer to name the file something like "King Alfred's Old English version of St. Augustine's Soliloquies - Hargrove - 1920.djvu"? I can move the file if you like? There's nothing technically wrong with it as it is, it's just a bit of an eyeful. Inductiveload—talk/contribs15:44, 17 May 2021 (UTC)
@Inductiveload: re the file name sorry about that I didn't know how to change. I think the OCR has read it as modern English because the first part is, or because I've put en as the language? I'll familiarise myself with how the OCR works. Thanks for the link.
R.e. the OCR, this is nothing to do with you. The OCR is "baked-in" to the OCR at the source (the Internet Archive). The Google OCR button sends an image to some Google cloud thing and it regenerates it from scratch. Whatever Google does seems to notice when it is fed Old English, whereas whatever the IA did failed to notice (or was set to English only). The "normal" black OCR button just returns whatever OCR is in the file if it can, so that's what you get.
@Inductiveload: Yeah I see what you mean now; I didn't see the big black and then multicoloured OCR buttons: doh! I agree Google OCR comes out much better. Unfortunately it looks like its read a lot of the macrons as either diaereses or circumflexes. Would I be able to do a 'replace all' for the whole OCR-ed text? Thanks for the file name change. Rho9998 (talk) 16:07, 17 May 2021 (UTC)
@Rho9998: You can add the following Javascript (just copy-paste) to Special:MyPage/common.js. Then you should see a button marked "OE OCR fixes" in the sidebar on the left:
$.ajax('//tools-static.wmflabs.org/meta/scripts/pathoschild.templatescript.js',{dataType:'script',cache:true}).then(function(){// page NSpathoschild.TemplateScript.add([// RunningHeader{name:'OE OCR fixes',position:'replace',script:function(editor){// replace most diacritics with macronsvarreplacements=[[/[àáäâã]/g,'ā'],[/[èéëêẽ]/g,'ē'],[/[ìíïîĩ]/g,'ī'],[/[ũúüûũ]/g,'ū'],[/[òóöôõ]/g,'ō'],[/[ỳýÿŷỹ]/g,'ȳ'],];for(vari=0;i<replacements.length;++i){editor.replace(replacements[i][0],replacements[i][1]);}}},],{category:'main',forNamespaces:["Page"]});});
The code like /[ỳýÿŷỹ]/g is a w:regular expression (see also here), the one after that is what is is replaced with. Basically any char found that's inside the square brackets will be replaced with that one. The g means it will do it as many times as it can in the text. Run if after you use the Google OCR button.
You can add more replacements as you need, or come and ask me to make a regex to solve something (i'll need example input and output text, as well as any examples you can think of where the replacement should not be made).
Don't be shy about editing your own JS: you can't damage anything and if it stops working, you can just go back in the edit history to an old version :-). Inductiveload—talk/contribs16:26, 17 May 2021 (UTC)
@Inductiveload: That is amazing, thank you! I've also copied and pasted the code and then edited the duplicate so that there's the option of replacing all wynns (ƿ) with thorn (þ) for texts where wynn is changed to 'w' anyway - you can see why the OCR would get them confused. As this text has the Latin source in the footnotes I won't add one to replace 'p'-s with thorns as that would cause more problems than it solved! I've also added ash and its accented variants by copying and adapting one of your lines of code - and this does make me realise that sometimes there are macron ash-s so if you feel like adding it to the keyboard please do, though note it's not a major hurdle for the instant as most ash-macrons are automatically inserted by the fixes you wrote. Rho9998 (talk) 17:44, 17 May 2021 (UTC)
We don't need the volumes in this situation, they are just physical artefacts and make navigation difficult. Having as WORKS / SUBWORKS is perfectly fine; remainder of the work is done that way, and we should continue that way. Moved the subpages of the second work (after effing it up the first time). It seems that we will need AuxTOC in various places to properly generate the download output. Not certain whether it should be at the root page, or in subpages like Works of Jules Verne/The Mysterious Island. — billinghurstsDrewth08:02, 18 May 2021 (UTC)
My comment was more about the structure of the entire work. Not enough of sampling to make a recommendation, so put out the reminder. — billinghurstsDrewth10:48, 18 May 2021 (UTC)
Latest comment: 3 years ago11 comments2 people in discussion
In olden times we used to direct an author = Unknown ability in header. From my looking at the instructions for the template that is now not said. And it still seems to work with the display logic, however, we are getting the equivalent of links to Author:unknown. Not sure when that occurred, and the when isn't really important. Is it an easy fix in your code? If not, then I will run my bot through to convert these to | override_author = Anonymous and the corresponding categorisation. There are about 600.
We also had a case where someone was using the override and still having author = author so had a magic load of links pointing there, and the categorisation. Those I have fixed and mentioned to the contributor about how to handle. — billinghurstsDrewth15:34, 19 May 2021 (UTC)
Latest comment: 3 years ago2 comments1 person in discussion
10962243 That made me laugh, last February, because "Replay Gain" gives really good info for remastering and when dreaming/thinking sounds like a rant.
I was trying to find a way to make better conversion of color to monochrome before my computer broke, but there was no reliable way yet. Some conversions were great for some but not for others and I could not figure out the why yet. Like it could have been darker needs this not that, or more reds are better with that, not this. It surely was not as consistent as clipping in sound files.
Any rant would have been directed at myself. The decision making that follows the simple monochrome or color toggle was my personal boggle.
Then, the weird fact I learned a while back about the delivery of TV images when we used antenna. That both color and b&w were sent on the same wave. People are so clever! Such a thing is terrible though, for the digital realm, making downloads and files bigger -- but I really thought back then that the conversion was in the device! But, the filming or the processing of it, if you care about quality, is very different for the two.
Latest comment: 3 years ago8 comments3 people in discussion
If I look at that page in Firefox (w10, 88.0.1) it is a fail, though looks okay in Chrome. I will leave to work out the css issues as it is beyond me. — billinghurstsDrewth01:02, 23 May 2021 (UTC)
Yes, resolved. Though that we have an issue in the same browser that simply differs by underlying instrument is an issue, especially where one of those is a reasonable common operating system. — billinghurstsDrewth12:37, 24 May 2021 (UTC)
Seems like Windows + Firefox is sensitive to line-height for larger fonts in a way no other renderer is. I'm not sure if that's a browser bug or undefined or what (@Xover: do you know?). But the "solution" is apparently to set line-height where needed. Annoyingly, there appears to be no way for a non-Windows Firefoxoid to know if this is happening. Even at [6], I can't see this on Windows 10 + FF 88, so it might just be you have something weird on your system? Inductiveload—talk/contribs12:45, 24 May 2021 (UTC)
Billinghurst is using monobook, and there the skin sets line-height as a length value (1.5em). Length values are inherited by the computed value of the parent's line height. So for a block with font-size 127% and line-height 1.5em, with a base font size of 10px, the child will inherit a line-height of . If you then set font-size to 300% or 800% you'll have a glyph of size and inside a 19px line-height. When the line in question is forcibly broken you end up with two overlapping line boxes.This is only an issue in monobook (Vector sets line-height as a unitless number so it inherits properly), and the kind of difference between monobook and Vector that was among the driving factors for developing the new skin (the visual bling-bling too, of course, but monobook is very outdated from a technical perspective too). Xover (talk) 14:40, 24 May 2021 (UTC)
OK, that explains why I couldn't see it on a browser test site - the default for un-logged-in is Vector. At least we don't have a horrible platform thing going on like I thought. Inductiveload—talk/contribs15:08, 24 May 2021 (UTC)
A little birdie…
Latest comment: 3 years ago5 comments2 people in discussion
@Xover: I think it was a mis-aligned text layer (phab:T219376 perhaps?), so this was a rebuild from JP2. I always forget the chunkedUpload thing uploads the moment you choose the file, rather than allowing to add a comment and submit. Sorry! Inductiveload—talk/contribs12:02, 24 May 2021 (UTC)
Was it due to a request, or randomly grabbed from a backlog, or…? No rejigging of pages, deleting extraneous pages, etc. beyond just rebuilding the DjVu? Not really important, but I'm looking at some other maintenance around this work and wanted to check for any surrounding factors or context; and the reupload without finding a WhatLinksHere trigger was a loose straw is all. (also, been there done that, etc.; BCU is very much v. 0.0.1 in that sense) Xover (talk) 12:10, 24 May 2021 (UTC)
I think I patrolled something tangentially related to that page (probably via some discussion with RK) and noticed the "needs fixing" tag and the summary of diff. Which I now notice I didn't change to "to be proofread" -_-. Inductiveload—talk/contribs12:14, 24 May 2021 (UTC)
Completed Texts not Removed from Front Page Template for MC
Latest comment: 3 years ago3 comments2 people in discussion
I just noticed that Sense and Sensibility Volume 1 is completed, but is still on the front page. Is there anyway to remove completed texts from the template for the sprint section of the MC front page template. Languageseeker (talk) 04:11, 22 May 2021 (UTC)
Latest comment: 3 years ago3 comments2 people in discussion
Hi,
just to let you know..... It looks like there has been a small change in the working of the RH-template. For instance on this page. When I get the browser-window below a certain width, the middle section is displayed over two lines, while in the meantime there is still a lot of space left on both sides of the text "THE LIFE OF W.M.". Does this have anything to do with the changes you recently made in the template? --Dick Bos (talk) 16:28, 25 May 2021 (UTC)
Latest comment: 3 years ago2 comments2 people in discussion
I know that it's possible to manually set the cover of a work for export. In the Index ns, the cover is already set. Would it be possible to use this information to automatically set the cover for export? I think the French do it that way. Languageseeker (talk) 20:53, 25 May 2021 (UTC)
@Languageseeker: Only if you use the header=1 parameter of the page tag, which can be a little bit fragile and so is rarely done at enWS. Also often the index "cover" is the title page, even if there's a decent actual cover. And you may wish to set a different cover to the one in the file, for example if there's a sticker or something on the DjVu. Inductiveload—talk/contribs23:08, 31 May 2021 (UTC)
May 2021 Monthly Challenge
Latest comment: 3 years ago7 comments3 people in discussion
@Languageseeker: yep, that's handled now - basically on the first of the month, it "forgot" to update the previous month's data one last time. I'll adjust the script/cronjobs as needed and hopefully it will Just Work (TM) next time round.
@Xover: since I ran the script on the toolforge before seeing the Main Page, I never saw an error. I fixed a couple of div-by-zero bugs for the first of the month though. Do you recall what the error (roughly) was so I can make sure it doesn't happen again if the month data doesn't get updated on time again in future? It's fine if you don't, I'll try to recreate the environment in a sandbox somehow and see if I can make it go bang. Inductiveload—talk/contribs11:09, 1 June 2021 (UTC)
Lua error in Module:Monthly_Challenge_statistics at line 178: attempt to index field '?' (a nil value). Trace:
1. Module:Monthly_Challenge_statistics:178: ?
2. [C]: in function "gsub"
3. Module:Monthly_Challenge_statistics:166: in function "chunk"
4. mw.lua:525: ?
5. [C]: ?
Oh, and main page stats should probably show last month's stats as well as current month, so all the zeroes on the first of the month don't look too pathetic. :) Xover (talk) 11:37, 1 June 2021 (UTC)
Thanks for the info, that's helpful. Not exactly where I though it was going to be, TBH.
Latest comment: 3 years ago3 comments1 person in discussion
Following on from the former post... This is my attempt at a specification of what I was trying to do with CSSline, in writing it I found a flaw in my own logic concerning the current version.)
"The CSSline template generates a CSS attribute and value pair, if and only if an actual user supplied value is present and not the same as default value (which will have been specified in a Templatestyle elsewhere.)
The inputs to the CSSline template are :-
The CSS attribute to use.
The user supplied value for the attribute.
The default value for the attribute concerned.
The output of the CSSline template shall be :-
A correctly separated and terminated, CSS attribute and value pair, if and only if the user supplied value is present, and is neither
the nil or empty string, nor
the value '@std' ,nor
identical to the provided 'default' value
'empty' if no user supplied value is provided.
'empty' if a 'nil' or empty string is provided for the user supplied value.
'empty' if a value of @std is given for the user supplied value ( implying use of the standard or default value).
'empty' if the user supplied value, is the same (or expands to the same as) the default value. (NB It seems I had not yet fully coded this..)
I then reimplemented the sandboxed sidenoted templates to use @std, (making sure I was calling the updated ones. (very big note to self here) and made a test page:- Page:Sandbox.djvu/257
As far as I can tell in Page: namespace the sandbox it is functional, with the same functionaliy as the equivalent live versions, but the sandboxed version doesn't generate inline styles unless needed, with the defaults in a suitable CSS stylesheet. I also reimplemented the "special/_special" handling. It's how I generate the color formatting, by setting up a cotnrived Indexstyle approach!
Not that I am in any position to ask, but a review of the sandboxed versions appreciated. (Comments at Scriptorum noted. What's being done here doesn't affect the outer wrapper currently, so the updated version should be compatible with Dynamic layouts in Mainspace.
I've not re-sandboxed the Outside L, Outside R etc family because those only work in Layout 2, and a fuller spec would be needed to get them working in all layouts.
ShakespeareFan00 (talk) 17:04, 1 June 2021 (UTC)
@ShakespeareFan00: Well, I'm still not entirely sure what your exact goal is here (not necessarily a description of what the sandboxes currently do, but what the outcome you are after is: beware the w:XY problem). Sidenotes have lots of issues (and lots of implementations, each with their own quirks) and as was mentioned to you before, it may (or may not) be impossible to hit every possible outcome 100%. Progress towards improving them is slow, but it's happening slowly (it's bound up tightly with the pagenumbering stuff as you well know, so general improvements there are helping too).
As usual, you haven't documented any expectation about what your templates are supposed to do, so I can't really tell what that template does. Moreover, I don't even know that you know that it's doing what you think it is. E.g. parameter 3 isn't even used.
On the technical level, I am unsure what {{Right sidenote/sandbox/CSSline|text-align|{{{align|@std}}}|left}} achieves that something "conventional" like {{#if:{{{align|}}}|text-align:{{{align}}};}} does not.
Noting also that it's nearly always better to put the default (i.e. left) in a TemplateStyles sheet if possible, because if any style ends up inline on an element directly, it'll always win a specificity battle with all other CSS (unless that CSS used !important, which is nearly always a code smell). Sometimes this is what you want, but usually not.
It might be that if you genuinely are running into Mediawiki template limitations (and I am not really sure you are) that a module will allow you to do what you need to. Inductiveload—talk/contribs17:38, 1 June 2021 (UTC)
(Sigh) It seems pointless to attempt further explanations, or waste my time until someone has actually written a specification.. Good luck with that.ShakespeareFan00 (talk) 19:19, 1 June 2021 (UTC)
Latest comment: 3 years ago1 comment1 person in discussion
I'd like to apologise.
My rantings over what inevitably turns out to be typing errors or logic errors on my part is not the sort of calm professional attitude expected here.
Thank you having the patience to respond in an entirely calm and helpful manner despite of this, and I hope that you will feel able to responded to more inteligently posed tehchnical queries in the future.
ShakespeareFan00 (talk) 18:53, 6 June 2021 (UTC)
Chapter headings
Latest comment: 3 years ago9 comments3 people in discussion
Prompted by a text I was doing minor maintenance on that got away from me, I've started thinking a little about our previous discussions about per-work CSS and chapter headings as a good starting point. Braindump:
We currently have {{h}}/{{heading}} (html hn tags) and {{ch}}/{{chapter heading}} (div with inline styles), neither of which seem easily adaptable to a modern implementation in place and will effectively have to be replaced.
So maybe we create {{styleable chapter heading}} ({{sch}}) to start. It'll wrap its argument in a span with a standard ".ws-chapterheading" class, and be explicitly scoped down to single-line simple headings (hence the span) to avoid amassing of cruft over time. Chapter headings get basic default styling in global CSS, equivalent to display:block, centered, and xl. Per-work CSS and TemplateStyles should have greater specificity and so should be able to override this with no special magic. To provide some flexibility the template accepts a class argument that lets you specify an extra CSS class to add to the span, but enforces that the class name must start with two underscores. We then explicitly reserve such class names for use in per-work CSS.
I suspect that on examination of existing uses of {{h}} and {{ch}} we will find a large proportion of the uses are actually "used on principle" or "used in ignorance" where they're used for the default styling of the respective template. In my experience with other such templates I also hold it likely that a large proportion of the rest will fall into a relatively small number of categories. I'm thinking these may be worthwhile to have predefined global classes for, but that needs careful consideration. If global preset classes are provided, these should be specified in the class parameter in place of local per-work classes.
I'm thinking we keep the template lean and mean, and do all we can to discourage scope creep. And be clear that the template is intended to be semantic first and foremost, and must be used with either a global preset or a per-work CSS.
In any case, if we're happy about {{sch}} we then actively migrate existing uses of the old headers to the new one (which global presets will make much simpler compared to creating bespoke per-work CSS for existing works), and then redirect them. At least as a first approximation we shouldn't have more than one template for this particular purpose. More complex headers can exist, of course, but they should either wrap {{sch}} or roll their own; and all our docs and guidance to new users should nudge them toward this template.
Experience from this process should be a good basis for starting to think about what else we can do in terms of more semantic templates and leveraging per-work CSS.
I'm already thinking that for big brutes like EB1911, creating as-fancy-as-you-please per-work CSS is a reasonable and commensurate burden. But for most works here, and for most contributors, the complexity, skillset required, effort, etc.… do not match up. We need to find ways to provide stuff "for free" to be able to leverage it routinely. Either predefined stylesheets with very common varieties, or some kind of friendly visual editor to add snippets of CSS under the hood but presenting a GUI-ified interface to overriding certain properties ("All headings should be green instead of black"). At a minimum we need to provide boilerplate CSS and guidance. And even then I am presuming that using per-work CSS will be the exception rather than the rule.
In any case, I may throw together {{sch}} to try it out on the poor text that's become my lab rat for several such experiments. Thoughts apt to adjust or confirm course would be very welcome before I squat on another nice short and mnemonic template name. :) Xover (talk) 08:01, 3 June 2021 (UTC)
@Beeswaxcandle: Yeah, I just flubbed the template link above ({{header}} has a, uhm, slightly different function :)). Part of the goal here is to get away from things like {{heading}} and {{ts}} that put all the formatting inline in the template invocation, and move to a template that just expresses the semantics ("This is a chapter heading") and leaves the formatting to the newly supported per-work style sheets (you may have noticed a new "Style" tab on index pages). Provided we can make it work well and be user friendly it's a much cleaner solution from a technical perspective. It'll also, potentially, let us have much more simple and consistent templates for many features that are common across works but formatted slightly differently. Xover (talk) 08:41, 3 June 2021 (UTC)
@Xover: check out {{plain heading}} (yes, it's a rubbish name, I need to thing of a better way to describe these "classy" templates in a quasi-standardised way).
However, I am pretty sure the semantics are "suboptimal" in that it's currently
Using hn is always going to be a fight with core, skins, and any number of Gadgets etc. (for example, {{plain heading}} gets random appearances of "[ link | permalink ]" from w:User:Xover/EasyLinks.js). And the semantics aren't noticeably a good match for chapter titles in a book either (it'd work fine for our more web page-like content, such as docs and policies). That's why I specified span for this above.And also for the second problem with {{plain heading}}: it's trying to deal with multi-level headings and all the attendant complexity. We don't need to put everything in one template: if we need multiple heading lines, use multiple templates. There's a similar case with the common heading followed by a decorative {{custom rule}}. Or what about that pithy quote following the chapter heading? Or the list-of-subjects-covered-in-this-chapter? Orley Farm is an outlier in that the second line can actually be argued to be a part of one full chapter heading, but most such constructs aren't or are only dubiously so. UNIX philosophy applies: do one thing, do it well, and make sure you play well with all the other doing-one-thing-well templates. PS. Did I mention… I can guarantee the annoying(ly) part, but whether also right is a completely orthogonal issue. :) Xover (talk) 18:48, 3 June 2021 (UTC)
{{Plain heading}} isn't seeking to be the way to do everything and anything with headings, but the subtitle line seems a common enough pattern to merit a second parameter to me? If a work doesn't suit it, then don't use it and use something more suitable. That's pretty much why I specifically didn't add style, to stop people piling crap into it. As before, I can be convinced to drop the use of hN tags.
Options I can see (for the Orley Farm type, not for a generically complex case) for the calling API (i.e. what an editor writes in Wikitext: the generated tags are roughly the same in each case, mostly modulo where the CSS lives)
Status quo: direct formatting: {{center}} etc. - CSS inline/maybe TemplateStyles
Something like {{plain heading}} (takes heading and maybe subtitle iff suitable) - CSS in index styles
A heading template and a subtitle template - ditto
A heading template and a generic classed template like {{classed div|subtitle|The Content}} - ditto
Two generic classed templates (i.e. {{classed div|chapter_heading|Chapter 42}} and {{classed div|subtitle|The Content}} - ditto
Work-specific templates (overkill for most works) - CSS inline/probably TemplateStyles, maybe index styles
The point I was failing to make is that the world (our works) is arbitrarily complex, so I don't think Orley Farm as a type specimen tells us very much. The pattern of a heading followed by some other text preceding the start of the body text of a chapter is common, yes, but what the contents of those two "lines" is varies remarkably. Compare Repulsing the Eater of the Ass, Propertius Confesses, and Page:Life of Edmond Malone.djvu/57 (all of which I need to finish up, I am reminded). If the template does two of the chapter heading lines, why shouldn't it do three?My thinking here is that we should go to the smallest atomic unit that we can reasonably do, and leave the combination of them to users for each use case. A chapter heading is then a completely generic construct common to every published book that has chapters, or as near as makes no statistical difference. The same goes for a "chapter subheading". Both can be display:block by default, but an index style can choose to make them inline-block instead if that makes sense.That block of stuff beneath the chapter heading and possibly a subheading can then be a div based template with a suitable name and aliases (chapter toc? chapter quotation? chapter summary?), and can fit all sorts of stuff that you don't want to try to handle in the heading qua heading.That gives us three templates (possibly all backed by the same Lua module, of course), all of which are optional, and that can be combined in infinite ways (including non-css-based stuff interspersed between the heading and subheading at need), but will fit almost all works and is easy to teach (as well as can be expected) and thus can get ingrained in muscle memory.I don't like "classed div", for the same reasons you've previously expressed misgivings about similar, and ditto why I abandoned {{sbs}}: it goes one step too far towards just writing raw HTML. I think we'll probably need both div-, span-, and p-with-a-class templates to deal with special cases, but I don't think these should be recommended and certainly not a model to follow for other templates. Xover (talk) 13:14, 5 June 2021 (UTC)
I understand the sentiment, but I'm not quite clear what your ideal wikitext for, say, Page:Life of Edmond Malone.djvu/57 would be? Could you just brain dump what you think the wikitext should look like and we can fight about something concrete? ^_^
For example the examples above:
Status quo:
{{center|{{x-larger|CHAPTER III.}}}}
{{center|{{smaller|1769–1777.}}}}
{{hi|{{smaller|Law Studies—Irish Duels...}}
His return...
Plain heading (with Index CSS) (BTW, {{plain heading}} as it stands can do this)
{{plain heading|l=2
|CHAPTER III.
|1769–1777.
|Law Studies—Irish Duels...
}}
His return...
"Semantic" templates (with Index CSS)
{{??? heading|CHAPTER III.}}
{{subtitle|1769–1777.}}
{{section content|Law Studies—Irish Duels...}}
His return...
Out of these (or something in-between, or something else entirely) how do you envision this in an ideal case? I am cognisant of more complex cases existing, but on the assumption that there always a more complex case, there will always a point at which it'll be easier to just go back to direct formatting, so my thought is to make the 95% cases easy and worry about the 5% cases later. If the 5% can be coerced into a wider framework, then great, if they can't, then don't make the 95% case impractical to accommodate. Inductiveload—talk/contribs10:00, 7 June 2021 (UTC)
Re We need to find ways to provide stuff "for free" to be able to leverage it routinely. Either predefined stylesheets with very common varieties, or some kind of friendly visual editor to add snippets of CSS under the hood but presenting a GUI-ified interface to overriding certain properties ("All headings should be green instead of black"). At a minimum we need to provide boilerplate CSS and guidance. And even then I am presuming that using per-work CSS will be the exception rather than the rule. This is roughly my feeling. I'm not going to hard or fast on the CSS in general because we need to see how it shakes out when there are better template support. A "GUI-ified" interface would be pretty sweet, but technically rather tricky to keep in line if someone edits the CSS as code. A "wizard" to create snippets would be easy enough, and there is already some CSS help in the WikiEditor which can be built on (wouldn't it be nice if Vue.js could be a thing in time for that...).
I also need to figure out if I can get the "preview page with this CSS function" working on the PHP backend, because editing the CSS blind is really a bummer. Inductiveload—talk/contribs09:31, 3 June 2021 (UTC)
Per Index styles to be available per Index: page
Latest comment: 3 years ago6 comments2 people in discussion
Hi. If you look at the inserted TOC on Index:The case for women's suffrage.djvu you will see that the the right hand column has not right-aligned (per index style). Seems to not be picking up the style. While is less consequential on this page, some ToCs will have more complicated and reliant formatting, so I was thinking that we need to apply the styles to the mediawiki: index page template. Am I missing a potential downside? — billinghurstsDrewth04:46, 6 June 2021 (UTC)
Due to implementation details (see my comment at 18 March 10:47pm and Tpt's reply here), you only get auto-styles with the <pagelist> tag (or in page NS). But this tag doesn't work in the Index NS.
Duh. Do we want to have it set to only display if Remarks is present too? Hmm, don't suppose it matters as if it fks up we need to know wherever it is. — billinghurstsDrewth13:20, 7 June 2021 (UTC)
I think that's actually more confusing, as the CSS will apply to the whole page (TS isn't scoped to the HTML element it is declared in, which is actually what allows it work for us at all), so gating it on remarks could be slightly surprising if you add a TOC and suddenly some CSS comes along and makes your pagelist Comic Sans. Also phab:T284449. Inductiveload—talk/contribs13:28, 7 June 2021 (UTC)
Re: Main namespace files for download or export
Latest comment: 3 years ago7 comments2 people in discussion
Hi. Is this the root page layout needed for download? Please check the lower part of the page. The History of Slavery and the Slave Trade. The TOC links in the header point to the pages in the context of the book layout. Does this has to be done to every main namespace book with a TOC?— Ineuw (talk) 14:05, 26 May 2021 (UTC)
@Ineuw: pretty much, yes. There are a handful ways to get around it in a pinch, but the simplest, most maintainable and most consistent way to achieve this is to put it on the root page. Also, IMO, this is far more user friendly for reading, since you can see the TOC from the landing page, without having to notice the contents link and follow it.
No link in the header is used for export generation, because headers are explicitly excluded from the content for export.
Also, you should never use pixel widths for things with text content. See H:PXWIDTH, but tl;dr you should probably use a text relative size like em. In this case ~30em will look roughly the same in the usual case. Inductiveload—talk/contribs14:15, 26 May 2021 (UTC)
I understood the importance using "em" instead of pixels. Are there exceptions? My future specifications will be in "em", but what about the thousands of past uses? Should templates be modified? I am also aware that my personal settings for the browser, and a web page, have no bearing on what others see. My interest is on how to get a casual reader's attention to notice the 4 possible layout options of a main namespace page.
As for placing the table of contents on a "root" page was not clear because it could have meant a Page: ns I needed an example of a work which had a Page: ns Table of Contents that was transcluded to the Main ns. Not one that was added to the Main ns by an editor. — Ineuw (talk) 20:03, 8 June 2021 (UTC)
@Ineuw: The main exception is if the container contains something in pixels, for example an image that you want something to be the same width as. Generally you don't really want to do this because that can be surprisingly large or small depending on the DPI of the device and the text might not be "suitable". For example, say you want a caption to be the same width as an image at 400px. Might make sense at 1em = 16px, but if the font is 48px on a device, but the image is displayed 1:1 on a 1000px screen, the text will be compressed to the middle 40% of the screen, but the font is 3x the relative size compared to the image that you expect. So use with care.
Using {{FI}}, my image widths never exceed 500px. This width is set in my vector.css for the Page: namespace so I can have margins surrounding the text, similar to the original. Since, the images are my uploads as well, I see them from birth (.jp2) and aware of the limitations of display. Besides, I found the commons' image viewing tools to be much improved (impressively).
On re-read, you missed the gist of my Main namespace related TOC question. I always transclude the book's TOC to the Main namespace as it appears in the book's original layout. My question was, do I leave the transclusion of my original layout as well? And duplicate the transclusions on the root page? OR, should my original TOC page be deleted when its contents transferred to the main page? — Ineuw (talk) 20:35, 12 June 2021 (UTC)
Images from When We Very Young
Latest comment: 3 years ago18 comments3 people in discussion
So, the images from this books are getting copyvio on Commons. I opened an undelete request there. Since PawełMM has done a great job on many of them. Is it possible to batch localize the image files here? Languageseeker (talk) 01:05, 2 June 2021 (UTC)
A Commons admin restored the files so that they could be transferred. However, Xover said that pybot script is broken so the transfer has to be done manually. There are around 121 files to transfer. Any suggestions or is it just get lots of coffee? Ideally, I would like to preserve the entire file history as well. Languageseeker (talk) 20:12, 2 June 2021 (UTC)
@Languageseeker: The batch import is running right now. The tools I know of do not allow for uploading the full file history, it only grabs the most recent revision. If you really care, you could manually upload the original pages over the top and then undo the change. Don't worry about the messy metadata and missing templates, I'll rip through it all with the bot once it's all imported. Inductiveload—talk/contribs22:20, 2 June 2021 (UTC)
Thanks. You're amazing as always. I'll let you know if PawełMM decides to finish processing the rest. Really glad that these files are not lost. Languageseeker (talk) 23:13, 2 June 2021 (UTC)
You're welcome. Note that the files wouldn't ever be "lost". Commons might be strict with the copyright hammer, but the admins will lend a hand when we need (and we have a few Commons admins here too). Technically, files are never actually removed from the Commons DB, so we can always get them back, even if PawełMM had deleted them already from his computer!
Importing file history as actual revisions isn't technically possible (you could do the file description page, but not the file itself). That's the main reason why FileImporter/FileExporter exists: you need to be operating inside MW and talking to the revision table at a pretty low level to do it.PS. Languageseeker, the metadata on these uploads is really quite shockingly bad. If you can't do better than this you should not be doing batch uploading at all, neither here nor at Commons. Please reign in your ambitions to be commensurate with your actual abilities, or ask for assistance. Xover (talk) 05:25, 3 June 2021 (UTC)
@Xover: I appreciate the explanation. Sorry about the poor metadata. I see that I mixed up the Author and the Title field in Patytpan. Besides that, how you suggest that I improve the metadata? Languageseeker (talk) 13:51, 3 June 2021 (UTC)
@Languageseeker: You can see an example of a basic file description page for a plate I extracted recently at File:Birdcraft (1903), plate 37.jpg.The absolute minimum to get right are author, date, source, and licensing. But almost all files should also have a description field filled out with something sensible, and appropriate categories added.For the source field, for images extracted from a book, you'll usually want to use c:Template:extracted from (or at least a link to the book's file description page). For licensing, on Commons, you need to make sure you are documenting the correct copyright status for both the work's source country and the US (unless the work was first published in the US). Their (and our too, for that matter) licensing templates leave a bit to be desired on the user-friendliness side, but it's still important to get it right. And as you've just learned the hard way, copyright for a book can get really complicated with various people contributing and having separate copyrights.By the way, it looks like PawełMM is still uploading processed versions of these images at Commons after Inductiveload copied them here. See eg. c:File:Whenwewereveryyo0000unse i2b7 orig 0065.png vs. File:Whenwewereveryyo0000unse i2b7 orig 0065.png. Those are going to get deleted at Commons soon so you may want to coordinate that effort.Oh, and the book scan from which these images were taken will also need to be either copied locally or redacted at Commons (the images are no less copyvio there just because they're part of the scan rather than in separate files). Xover (talk) 16:41, 3 June 2021 (UTC)
PawełMM is absolutely amazing and finished processing all the images. Would you mind running the bot job again? I made a list of all the files to be localized here. There is also new metadata for the images. Languageseeker (talk) 14:09, 4 June 2021 (UTC)
@Inductiveload: Sorry to be a bit rude, but I think the clock is ticking on these.Languageseeker (talk) 20:24, 4 June 2021 (UTC)
@Languageseeker: The existing PWB script doesn't actually work for overwriting when the file already exists on Wikisource, I don't have time to write a new script for this tonight, so I suggest just uploading the newer images locally at enWS manually if it has to be done very soon. Inductiveload—talk/contribs22:22, 4 June 2021 (UTC)
File list updated over the weekend. All files added into the book. So this should be the final list. BTW, this is the first appearance of a certain bear who is always looking for honey. Languageseeker (talk) 12:33, 7 June 2021 (UTC)
@Ineuw: You used a width attribute, not a width style. AFAIK, attributes only do px. But you should not use them anyway because they're obsolete and replaced with CSS. Inductiveload—talk/contribs04:17, 18 June 2021 (UTC)
Latest comment: 3 years ago5 comments2 people in discussion
Not sure whether this is related to our implementation, sanitised-CSS or something with the css editor. Both of these syntax entries fail
@importurl('https://en.wikisource.org/w/index.php?title=Index:My_Life_in_Two_Hemispheres,_volume_2.djvu/styles.css&action=raw');@import'Index:My Life in Two Hemispheres, volume 1.djvu/styles.css';
Unrecognized or unsupported rule at line 2 character 1.
I tried to change the page type to CSS, and that just threw up a VIEW page, so running into a permissions thing, so that is not the solution.
Found that it is not allowed through sanitised-CSS. For certain works where we have volumes I would think that is going to be horribly burdensome, and lead to errors. Now you may be able to set some code to make this happen that an Index uses another file however, it seems that local @import of another css-sanitised files in the same ns. should not be a significant risk, so I have created the tracked task. — billinghurstsDrewth03:48, 20 June 2021 (UTC)
Yep, that will be super handy. The index CSS should be able to be a redirect, but since changing content model is an admin-only thing, it's very clunky and unfriendly. Inductiveload—talk/contribs12:42, 21 June 2021 (UTC)
Subsidiary question. I have not set the work to max-width: 38em; as that would make for a very long page on screens for no reason. It will still self-size to screen width, so will it be problematic? Feel free to set it in the css if you think that it is generally more beneficial to do so. — billinghurstsDrewth04:04, 20 June 2021 (UTC)
Addendum: How in or out is <blockquote> per My Life in Two Hemispheres/Chapter 31. If it is problematic in its natural form, what code have you been applying for export, etc.? I am still dwelling whether to flick code at it anyway now that I am progressing towards CSS 102. Even 2em: indent is irksome when I do an add-on check of the mobile form. — billinghurstsDrewth04:40, 20 June 2021 (UTC)
@Billinghurst: There's no need to restrict width where it impacts readability. It's more of an issue when you have a very short chapter name and a number, and 70em between the two. For prose (including your TOC's summary blocks), widths should be strictly left up to the layout. It will "compress" horizontally as needed on export: phab:F34518109. The concern with widths is twofold:
On very wide layouts (Layout 1 + fullscreen monitor + a skin like Monobook): is there a massive amount of whitespace that ruins readability? (Many TOCs fall into this)
On a very narrow layout (Mobile @ ~360px generally, Layout 2/4 @ 36em, some e-readers, visually-impaired users with large fonts): does the content spill off the right margin or generally fall apart? Mostly images are monkey-patched in CSS to avoid this in Mobile and export at least. Generally the places this falls apart are:
Tables: often there's naff all you can do: they're just wide things that made more sense on a printed page in 1873 than they do on a portrait screen in 2021.
Multiple columns with something like {{multicol}}, which abuses a table for the purpose. Often this can be done better with {{div col}} or {{flex wrap centre}}, which will degrade to a single-column layout. Especially for side-by-side images. Side-by-side text like parallel texts such as treaties and Loebs and such I do not yet have a good solution that will export in any sane way.
Blockquote is exported just fine as HTML tags. The default MW skins add those annoying grey bars (they don't go to export). The {{quote}} template uses blockquotes but provides a saner default (2em on left and right), and can be overridden with index CSS via the class wst-quote and also custom classes if you need multiple variants within a single Index:. Inductiveload—talk/contribs13:11, 21 June 2021 (UTC)
Latest comment: 3 years ago3 comments2 people in discussion
This is now (mostly) validated and transcluded. Could you create the other two images, please? Then the work can be fully proofread, and the other work can be removed to this one. TE(æ)A,ea. (talk) 16:25, 27 June 2021 (UTC)
@TE(æ)A,ea.: I'll try to get it done soon. It does take a little time to clean the images up since they're quite faint and coloured even more faintly, so you can't just smash them with background erase. Inductiveload—talk/contribs06:17, 28 June 2021 (UTC)
@Languageseeker: me too, but I'll sort something out. Poke me if I have forgot something and tweak the source list as you want when it exists. The month change-over point is still manual so expect a small bump on the 1st of the month again. Inductiveload—talk/contribs06:19, 28 June 2021 (UTC)
OK, the basics are set up (main page, cat and data table). I haven't added much, feel free to chuck in a couple more (but I think not too many, since a couple of series will complete volumes soon) Inductiveload—talk/contribs08:33, 28 June 2021 (UTC)
Latest comment: 3 years ago2 comments2 people in discussion
This was flagged as misnested.
I've marked it as 'problematic' (and others in the same Index also flagged) because I can't see a clean solution to getting even an approximation of the layout without some considerable hassle.
@ShakespeareFan00: I don't really, no. This formatting is extremely hard to replicate in HTML/CSS and I have never thought of a good way to robustly implement it. All our templates like {{overfloat image}} and so on a thoroughly broken when it comes to mobile and exporting. Inductiveload—talk/contribs15:58, 4 July 2021 (UTC)
OCR visibility on changes feedback
Latest comment: 3 years ago3 comments2 people in discussion
Hello, would you be open to doing a user session to give feedback on OCR changes? It could entail a phone or video call with some questions on my end. Thanks for your proactive communications. Take care! NRodriguez (WMF) (talk) 21:51, 12 July 2021 (UTC)
Let me know if afternoon Thursday or Friday would work, or if Monday may be an option. Thank you! Feel free to write me to nrodriguez@wikimedia dot org NRodriguez (WMF) (talk) 18:19, 14 July 2021 (UTC)
Web development kinda sucks…
Latest comment: 3 years ago8 comments2 people in discussion
And a little bonus weirdness: c:Special:Diff/576257724. Before the change MediaWiki:Gadget-Fill Index.js failed silently; afterwards it fills as normal. I didn't trace it to the root cause, but I'm guessing the regex ends up grabbing "Creator:Kate Douglas Wiggin}}{{Creator:Nora Archibald Smith" and then later bails thinking its gotten garbage data. A non-greedy match might plaster over it short term. --Xover (talk) 22:25, 21 July 2021 (UTC)
Oh, I see. Or rather, I don't see, which amounts to the same thing. Yeah, that's pretty yucky; but unavoidable so long as we don't have a proper structured data store and a smart GUI for entering bibliographic metadata. Sigh, one day… I don't suppose you've run across any decent JS lib to access Wikidata yet?Incidentally, throwing something (anything) into the console whenever you have code that bails out otherwise silently makes these issues much easier to track down. With all the minification and loaders and stuff (every script shows up as "load.php" in the debugger) I'm also getting inclined towards religiously including the script name and other identifying stuff in a comment at the top. Maybe even a convention to start dumping debug logs if there's "debug" in the URL? Because this particular issue aside (custom parsers are always going to be dense), most issues run into in the wild are pretty easily traced in the script logic itself, so most of the effort actually tends to be in peeling away the MW-specific stuff. Hmm. In fact, wouldn't it be neat if the URL param made MW set a wgDebug variable or something… Xover (talk) 08:10, 22 July 2021 (UTC)
Oh, and another thing you might want to do the next time fiddle with Fill index.js: Commons uses c:Template:City to wrap locations for localization purposes. Not a lot of people use it so it doesn't show up much in the wild, but in theory it ought to be used. And when it is, Fill index ends up putting "{{City|New York City}}" in the Index. I'm the only one I've run across that uses it, so definitely not a high priority issue. Xover (talk) 08:16, 22 July 2021 (UTC)
It actually does spew if it chokes on not finding a Book template. The issue is that the borked template parser (it only happened if you had }}{{ in the parameter) reported that the template ended after the Translator parameter. So it looked valid, just really empty.
A better suite of "gadget utils" code would be handy (e.g. modular, levelled debug and consistent "registration"). One for the Infinite List of Infinite Infinities.
I've cut out {{city}}. That said most people don't even move the city to the city field, since the IA dumps it in publisher. Cough.
Re WD, I can't even get them to agree on the schema for their bibliographic data! Best advice so far: do it without asking and if you get far enough it becomes the schema.
A BIG thanks for your help with the Bodleian Library scans!
Latest comment: 3 years ago2 comments2 people in discussion
A really BIG thanks for your help with the scans. The project is developing quickly, thanks to your quick, positive input! Llywelyn2000 (talk) 09:47, 23 July 2021 (UTC)
Latest comment: 3 years ago3 comments2 people in discussion
I've been approached by an editor who would like to work on old Welsh ballads (pre 1900). There are c. 1,700 "Welsh+ballads"&page=2 here on IA. You mentioned recently the tool, but I think that it's for single djvu file, rather than batch transfer? If there is a batch transfer tool, please let me know, and if it automatically does OCR, then so much the better! Or maybe I wish too much! Thanks again! Llywelyn2000 (talk) 09:54, 23 July 2021 (UTC)
@Llywelyn2000: Indeed, IA-Upload is really for single files on a manual basis.
Latest comment: 3 years ago3 comments2 people in discussion
Hi Inductiveload. You may remember chatting to me about Old English on Wikisource c. 2 months ago. I've been proofreading this text, Index:King Alfred's Old English version of St. Augustine's Soliloquies - Hargrove - 1902.djvu, and have done the whole main text. I might need some help to continue - I certainly will for validation. Do you know any active users with an interest in Old and Middle English? You mentioned on the Scriptorium that I might be able to suggest this for Monthly Challenge. What do you think? Regards Rho9998 (talk) 22:03, 17 July 2021 (UTC)
@Rho9998: I'm afraid I don't know of anyone off the top of my head.
This work is particularly tricky due to the parallel Latin translation on each page, so good work getting it to the current state. It should be possible to transclude at least the Old English now. You can nominate at Wikisource:Community collaboration/Monthly Challenge/Nominations and see what others think. It's borderline for me due to the difficulty of Old English for most people, but since it's already mostly proofread, you may find interest. Inductiveload—talk/contribs21:02, 21 July 2021 (UTC)
@Inductiveload: I'm not sure that it would be appropriate for the biography month? Would it include autobiography? Augustine's Soliloquies is sometimes considered the predecessor of the Confessions, which is often considered the first autobiography (in Western tradition anyway). I might argue my case. In the meantime, by the way, I'm proofreading an OE text which uses acute accents on the vowels a and o. Would you be able to add these to the OME special character board SVP? Rho9998 (talk) 11:41, 26 July 2021 (UTC)
Hmm.
Latest comment: 3 years ago6 comments2 people in discussion
Also… Much of the actual flakiness of pagenumbers comes from those inserted elements, and hoisting entire blocks around (usually after .ready has already fired). If you come up with any brilliant ideas for how we could get the proper HTML structure in place before pagenumbers.js goes to work I'm all ears. I'm even seriously mulling over whether the community would go for needing to put {{foo start}} and {{foo end}} on every single page (well, or the end one; the start we could jam into {{header}}); or, equally iffy but tempting, getting MW to output what we need directly (can PRP manipulate the whole page when it's being invoked?). After all, we just need a couple of empty containers inside #mw-content-text at parse/render time, and then pagenumbers.js could mostly be reduced to a simple stylesheet-switcher. And if we could rely on the containers being there, I think we could even split the page numbers stuff from the dynamic layouts stuff. Xover (talk) 16:50, 30 July 2021 (UTC)
@Xover: it’s possible (possible) that this could be done by the Wikisource extension (not PRP, as we really want control of all pages even if they don't use <pages/>) which could construct all that crap server-side. It is indeed a longer term goal of mine to move the layout stuff into that extension for wider reuse and better "performance" (in particular not having the flicker as the JS comes online). But first, I’m trying to generally sort all the junk out so that I can even visualise what is needed for that to happen. Inductiveload—talk/contribs17:01, 30 July 2021 (UTC)
We'll get some flicker no matter what for users that have something other than the default layout set. But if we just apply a different stylesheet and don't force the browser to modify the DOM we'll get the benefit of all the browser's built-in optimizations for this kind of thing. Hmm. Actually… this gives me an idea. Maybe we don't actually need all those #fooContainers? The page numbers are absolutely positioned anyway so maybe we could just stuff them over in the gutters. That'd (and moving the CSS out of JS) make this a whole lot cleaner. Maybe. Xover (talk) 18:25, 30 July 2021 (UTC)
@Xover:CSS out of JS: FYI as of this morning, the CSS has indeed been moved out of the JS for exactly that reason. Now, the gadget just sets a class dynlayout-{id} on a top element and leaves the rest to the browser.
And, in theory, if the Wikisource extension is administering the layouts rather than gadgets, it's possible for layouts (and the CSS) to be served in-place according to a user option. Not sure it's worth the effort, but it's not unthinkable. Inductiveload—talk/contribs18:32, 30 July 2021 (UTC)
Porcine lipstick
Latest comment: 3 years ago4 comments2 people in discussion
That's a clear example of where TemplateStyles provides a major win.
If I had to do that table, I’d use {{TOC begin}} and co, since they delegate nearly all of their formatting to TemplateStyles via classes on each row of the table. Alternatively, an index- or page-local CSS can be set up and that "raw" table targeted with a class like ._mayan_toc.
Repeated use of either {{ts}} or style= is a code smell now we have TemplateStyles. {{ts}} in particular has gotten way out of hand since it was written way back when. Inductiveload—talk/contribs17:07, 30 July 2021 (UTC)
Oh, yes, it’s definitely lipstick on the pig. This is just a stop-gap until we come up with a plausible alternative and to make existing pathological uses not blow up. Xover (talk) 18:05, 30 July 2021 (UTC)
Oh, and here are some numbers illustrating the point:
Variable
Template
Lua
Limit
CPU time usage
7.862 seconds
3.486 seconds
N/A
Real time usage
7.898 seconds
3.549 seconds
N/A
Preprocessor visited node count
1,001,301
32,601
1,000,000
Post-expand include size
684,391
384,556
2,097,152 bytes
Template argument size
103,184
24,298
2,097,152 bytes
That's on one of the pathological pages that currently blow up (due to exceeding the node count limit), just by calling the sandbox version instead. --Xover (talk) 21:23, 30 July 2021 (UTC)
MC stats borked again
Latest comment: 3 years ago5 comments2 people in discussion
@Xover: Darn, seems it always finds a way to fall over on me, and it's always at one of the weekends when I'm not allowed to be up at 1am to watch it! The stats seem to have been updated at 00:00 by the cron job, not sure why it didn't propagate to the main page - perhaps a purge would have been sufficient? Maybe I'll add a pre-emptive purge to the update script.
All that said, the future PRP Lua stuff may (may!) make some of this tedious bottery obsolete. I hope!
The midnight page creation created an empty data structure (the actual data was added this afternoon), and the code consuming it assumed it'd contain at least one entry (it tried to dereference it). The check I put in is a brute force bail early, so it's possible you can get more nuanced results by moving the check later in the flow. e.g. I'm not sure what the "Current sprint" is supposed to contain, but there might be a more graceful failure than "No sprint found". Xover (talk) 11:54, 1 August 2021 (UTC)
The sprint is supposed to be a "sub-month" focus, but I'm pretty sure no one cares about it, we don't really have the traffic to be able to direct energy like that. Inductiveload—talk/contribs21:30, 1 August 2021 (UTC)
Gilding the lily
Latest comment: 3 years ago6 comments2 people in discussion
Not that it's really needed, but since I happened to be poking about in the MC stuff today…
@Xover: there's (kind of) a reason for that. The images are actually served at a fixed height (x300px), so they don't always have consistent widths. This is done for more consistency with neighbours in a row. It's not perfect, but I think it looks "OK". Adding a bit of padding hides the visual effect of some images hitting the edge and some not.
Uhm. They may be requested with fixed height, but they're rendered with fixed width and variable height. But, in any case, it was just a quick "it'd look prettier that way" that struck me when I was in really looking at the MC for the first time. The MC stuff is really rather uncommonly slick for enWS to begin with, so further tweaks aren't exactly pressing. But if you ever go back tweaking it you might want to keep the option in mind. Xover (talk) 08:14, 2 August 2021 (UTC)
Latest comment: 3 years ago6 comments2 people in discussion
The progress bars for the 3-month works don’t show up, because the system doesn’t recognize works that old (I think). Anyway, they should be removed; could you comment them out from the module? I don’t want to comment out anything important on accident. TE(æ)A,ea. (talk) 19:33, 1 August 2021 (UTC)
@TE(æ)A,ea.: it's actually because they're not categorised as Category:Monthly Challenge (August 2021). I had to rush it off last night and this morning (I had thought it was the 30th!), so I didn't want to trash it without looking. I'll take a look at it now.
We're also a bit short on new works this month - feel free to chuck some more in quickly. 21:18, 1 August 2021 (UTC)
That makes sense, although it is a strange limitation. Perhaps some short works from here? Some of the listings up now look good, but it’s better to work on items with more general support. (Only the three listed with “(transcription project)” instead of “(external scan)” were chosen as works for that project.) TE(æ)A,ea. (talk) 21:23, 1 August 2021 (UTC)
It's just how the bot works: it uses the category members as a data source. Progress is being made towards being able to render progress bars without a bot, but for now, it is what it is.
I didn't have time to do a decent selection. I'll go back though the noms pages and your list, but many of them don't have indexes and I didn't have time to set them up. I kind of dropped the ball this month on being ready for change-over. Sorry!
Thanks for creating the Letters. However, the text layer is offset. Could you fix that, please? (They are set one behind: the text for the title page is found on the page facing the title page, and so on.) TE(æ)A,ea. (talk) 01:56, 2 August 2021 (UTC)
Latest comment: 3 years ago4 comments2 people in discussion
Ambox has WP-specific stuff along for the ride, so I’ve been deliberately migrating things to ombox. Xover (talk) 22:13, 3 August 2021 (UTC)
@Xover: Oh right. I just noticed that ambox gets excluded from the dynamic layouts and ombox does not (makes sense since {{missing image}} is an ombox. Probably the docs for ombox need an update since they currently have a prohibition on use in article space. And the notice templates either need an interposer template or the various classes applied (noexport, layout exempt and probably noprint). Or maybe just make ambox that interposer? Inductiveload—talk/contribs22:22, 3 August 2021 (UTC)
It's been a while and I'm insufficiently caffeinated so caveat brainfog… As I recall it's the same module but ambox is very specifically for "Wikipedia Article-space", and not for "ns:0". Mainly, IIRC, it's got some WP-specific maint. cats. There's some changes that I was hoping to get done upstream, but it ran aground on complete lack of interest from the enWP folks. And since it required grokking metatables and custom methods, sandboxing a patch for it ended up getting dumped on my todo list. In any case, all of the box types are basically interchangeable, except that ambox has enwp baggage and the multibox (whose name I can't recall ottomh) has a bit too much logic that we probably don't want most of the time (I can't recall if the logic was enWP-specific or not). Xover (talk) 06:11, 4 August 2021 (UTC)
Overeager layouts
Latest comment: 3 years ago3 comments2 people in discussion
I think we need to back out this, because now dynamic layouts are active on redirects, versions and translations, and disambiguation pages. I don't really see any workable alternative to having it trigger off the presence of PRP: we could have a suppress flag emitted by {{versions}} and friends, but that would require a whole infrastructure around tagging redirects and otherwise invite playing whack-a-mole with edge cases and exceptions. Xover (talk) 06:44, 6 August 2021 (UTC)
I actually had most of that ready locally in my experimental Vue-based layouts code, so I ported it back. Redirects are easy (they have a flag in mw.config). And the main "special non-content headers" can have classes to disable layouts. I was debating using .subNote, but that's a pretty non-obvious heuristic so I went with an explicit class on header after all.
There might be a few edge cases in content where the layouts look bad but:
That actually is a red flag that the page will be a hot mess on mobile (exhibit A) — if it looks bad in Layout 2 at 36em, it'll look even worse in a 320px mobile screen.
I think it's more important to allow layouts that enable more comfortable reading for the millions of words of non scan-backed texts that to artificially limit it based on presence of scans (which is almost entirely orthogonal to layouts).
It's pretty easy to either set a default layout or just let the use choose) in the rare edge case. I can't actually think of a valid example that's not actually a symptom of non-responsive layout, though.
Also note that the layouts are now smart enough to not reserve space for page numbers where there aren't any (previously they would reserve 3em on each margin). Inductiveload—talk/contribs07:19, 6 August 2021 (UTC)
I am convinced. At least until non-hypothetical moles start sticking their head up on a regular basis. :) Xover (talk) 08:10, 6 August 2021 (UTC)
Float left....
Latest comment: 3 years ago2 comments2 people in discussion
1. for “prefix_l the section prefix for the right side” do you mean prefix_r? 2. Unfortunately, ASC has six side-by-side sections, spread out over two pages. TE(æ)A,ea. (talk) 19:42, 12 August 2021 (UTC)
Thanks for the compliment. I should probably create a one-work template to help transclude the sections, so it doesn’t take up so much space. (Something like <pages index="The Anglo-Saxon Chronicle according to the Several Original Authorities Vol 1 (Original Texts).djvu" include={{{1}}} onlysection={{{2}}} /> for the individual parts.) TE(æ)A,ea. (talk) 20:10, 12 August 2021 (UTC)
Well, it was more "got carried away" fixing: what actually broke that is now fixed was just an extra span in the markup that necessitated a selector tweak. But the waters are muddied by the backend on Toolforge sometimes returning 500 errors (which is not fixed) so by the time I figured out what was going on I'd already rewritten the lot. The backend proxies the requests to Phabricator's API because Phab doesn't support JSONP/CORS and requires a manually assigned per-user API token (so we can't talk to Phab directly from web browser JS), and the maintainer of the tool hasn't edited since 2019 (ex-WMF employee that stopped editing when they switched jobs or something). I may try to set up a replacement eventually, if the bitrot gets sufficiently bad, but not right now. Xover (talk) 08:54, 13 August 2021 (UTC)
officious links
Latest comment: 3 years ago2 comments2 people in discussion
I made a request, you pasted a bunch of officious links but there is no person involved between here and there.
You pointed me to a deadend wall.
How was it that they started to do structured data on all the images and didn't think of what the purpose was or asking the wikis that use them? Oh, probably they made a bunch of officious crap links that they could point real people at.
Indexed images do not compare with rgb. You either knew that and wanted to use jargon to "win" some war you are having or you didn't know that and should not be arguing for this pixel image or another. I am not at war with you; we work together.
Now, this is a serious question: Are you the same person who uploaded those beautiful SVG long ago? If you are not, sure, that is okay. If you are pretending to be that person, probably that is "okay" in what is allowed and not allowed, but not "okay" in the moral sense of right and wrong.
But I am done here. If you can know of a person who can author an uploader for the commons, do point that person in my direction. Better to say NO! Then aim a person to a wall of officious crap.
Wiki is an FOD. If in all of these years, you do not know that, I am sorry for that.
@RaboKarbakian: You didn't actually make any request, you just talking obliquely about generalities. And I didn't paste any links to anything resembling guidelines or rules, so I literally do not know what you are talking about with "officious links".
I am the same person I was 10 years ago. I am not impersonating myself, if you suspect me of usurping someone else's account, please always feel free to consult a Checkuser to allay any concerns.
I think there is not much I can do to explain more about images since you clearly have some deeply-held misconceptions about image data and how the common formats work. Inductiveload—talk/contribs12:33, 19 August 2021 (UTC)
histogram request
Latest comment: 3 years ago4 comments2 people in discussion
Let's look at the histogram for this image: You want to treat illustrations as photographs, then really do that.
Good example. As you see, there is quite a lot of information content in this image that is not black or white: phab:F34605970. Obviously, the histogram is bimodal due to the large about of pure white and black, but there are actually 285 unique colours in this image (or at least the 320px thumbnail of it), so it cannot be losslessly encoded as a GIF. The "non-black, non-white" content (i.e. the histogram from values 1 to 254) is either the red digit, or the antialiasing around the digits (which is why they look nice and smooth). Inductiveload—talk/contribs13:39, 19 August 2021 (UTC)
NOTHING can be losslessly encoded as gif! That is just jargony crap! The choice of format should be based on the purpose of the image. What job is the image going to perform. Where does it display and why. And having an ebook maker that turns hogs into light weight niceties for small devices is really a good thing because image format is not a politic, even though, apparently it is being used that way. PNG, SVG, JPG are not team sports. They are image formats. FGS!--RaboKarbakian (talk) 13:49, 19 August 2021 (UTC)
A bitonal image certainly can be losslessly encoded as a GIF (inefficiently: CCITT in a TIFF would be an even better choice depending on purpose), as well as any image that uses fewer than 256 colors, plus, optionally 1 bit transparency.
In this case, PNG is the format of choice for rasterising an SVG, since the ability to have 8-bit transparency makes it look much better. This is one reason why all SVGs rasterised by MediaWiki are PNGs, not GIFs (the other reason is that a 256-slot index is far to0 small to look good for most images that aren't greyscale).
Latest comment: 3 years ago7 comments3 people in discussion
⇈ The request ⇈
Background, or what caused the request/need
I had a bot tagging my uploads with structured data. "Inception date" which is so meaningless for a publication, but I was annoyed and mostly just justifying my annoyance until I uploaded a photograph that was reprinted in a book. The photo from the late 1800s, the book early 1900s and I thought "THERE! Publication date and Inception date!" An example that made it justified annoyance.
And really, if the structured data on the images is ever going to be used, publication date really does have meaning for the images whose display destination is here.
The outcome of the negotiations regarding the date in the structured data was that I use the book template. The bot would not put an inception date on the images that were using the book template. At commons, I then installed the AC/DC gadget which will add structured data to all of the images in a category. And Ouila! beautiful categories, informative structured data, etc.
Benefits
The book template has two nice things for me. First, I put the scan on wikidata, so simply by putting the Wikidata Qnumber into Wikidata = I get all of the other fields filled in with the exception of description, permission and image page. It provides an image of the scan and a link to the Index, etc. The second thing is the Image page, which will open the scan to the page that the image was found on. See File:Complete Course in Dressmaking-008-a.gif
Of course, I still get screwed if I forget and put the wikidata number on the book template before getting the Index page filled out here....
Other things the uploader could do
If the uploader could put the images into a sensible category also. I think that people are too shy to make sensible categories, or too busy with their book, or too unknowing of the ways of commons, or too filled with recent concensus, but a category structure that matches the main space structure here is simple and sensible with some additional (crap) identification in the upper cat due to the fact that commons might have much more than en.wikisource does.
So, summary (leaving out "Why I don't want to communicate with you, which was also left out of the communique):
first summary of this request
An uploader for sourcerers that uses the {{Book and not {{Information at commons
Structured data befitting source images
Sensible categories
more words summary of this request
Can you write an uploader for sourcerers that uses the {{Book and not {{Information at commons
Can it add Structured data befitting source images
Can it add Sensible categories
The old admin there, by old I mean active there for years, seem to like my cats and subcats (useful for books containing subject matter that could go elsewhere like the fairy tales and the short story books and the mags). So, I recommend that, because I haven't had problems there with it and it does have some sense.
<rant>Also, I read your tutorial for images here. Do you hate wikisource? Do you hate wikisourcers? Have you ever tried the Decompose plug-in? Do you think that they used grey ink when they published. And, sorry, usually I am nicer, but right now, these are my real thoughts, so really sorry. Also, where is the SVG tutorial which you should probably be really good for writing? </rant>--RaboKarbakian (talk) 14:42, 18 August 2021 (UTC)
@RaboKarbakian: Well there's a bit to unpack here. Let me start by saying that I really am not sure what you are trying to ask here, or even if this is a question at all, or just another rant about...something. But Why I don't want to communicate with you and Do you hate wikisource? Do you hate wikisourcers? leads me to wonder if a constructive dialogue would emerge even if I could understand the above, but let's see.
I have made a tool for "semi-batch" uploading of images for Wikisource: https://ws-image-uploader.toolforge.org/ It currently uses {{information}} rather than book. It could really do with more WD integration, which will eventually come along.
If you are talking about SDC, then I am not the person to ask as have no idea what is going on there and no one cared at VPT and the SDC Modelling talk page when I asked about what WSIU should set. When they decide what SDC is for and how to use it then maybe I'll bother. WSIU has a category field, and one day may be able to grab it from the WD "commons category" field for an edition.
If you have a better way to do the images, then feel free to write it up yourself, but I maintain crashing the black point is not very respectful of the images, even if they do then perceptually "pop" more. The books were of course printed in "black" (obviously not perfectly black itself) but there are still variations in the printing - even black and white printing has shades of grey due to the ink, paper and plate texture. Compare the left and right: you have deleted all the small variations in the body and head of the woman:
"Crashed" black point
Gif: palette 256-slot colormap + transparent (max 9 bit/pixel). Colormap slots 216-255 empty, which reduces it further.
As you can see from the histogram, in the right image there is "information" throughout the spectrum from white to black. Now, you can argue that a lot of that variation is actually JPG/JP2 noise in image, and you'd be right. However, I think there's still some amount of original information there and I have tried to preserve it within reason, instead of going for a quasi-bitonal output like yours. There's certainly a spectrum of choices to make here, progressing from "minimum adjustment", which preserves most of the information at the expense of contrast and inclusion of compression noise, right up to past your version where there is only black and white (and pixels are reduced from their original 8 bits to 1 bit of information content). There is no right answer, but my personal preference is for the least destructive method, at least when I do not bother to make a "master" and "display-optimised" variant.
BTW, I do not think the animated version belongs at Wikisource, unless (maybe) as an annotated version.
Also, where is the SVG tutorial which you should probably be really good for writing? Probably somewhere at Commons until WS starts having any works that have SVG images in them.Inductiveload—talk/contribs16:14, 18 August 2021 (UTC)
The animation is fun. I was careful to avoid rude, as fun has more staying power to it. Some of the images I have worked on make great coloring book images also. And I looked at wikibooks, so, please don't go there with your suggestion. A software solution would be nice, to allow a choice of fun vs. actual (I did not type curmudgeonly). I saw a gif contest once, and someone had made a gif of a line-drawing of a crab walking out of the page it was on, some scientific book. It was so beautiful and cool and it stuck in my mind. <divs can be used for image changes? I can style, but in current company, I am only 'good' at it, and not masterful of it.
There is good reason to maintain "dusty and musty" works as "dusty and musty", but what you are calling clipping I am thinking it is more like ink-bleeding. Where in the printing process, sharp lines become smudges. So, that is our difference there.
SVG: A recent image that really should be SVG If this publication had had more images, I might have taken it to the SVGers at commons. As it is, I don't have root access to this computer, which is fine, it is clearly not my os and a warning for me not to share things with just anyone, and Inkscape is not installed here as the intended user just needed GIMP for a facebook portrait. If on my own computer, a tutorial very well might have been used for this image which would very clearly be nicer in SVG.
Who is the person who is representing en.wikisource at the commons then? The inception date meeting was in 2019. The bot owner told me that pasting inception date on all of the images brought out a lot of different dates that would be more useful, publication date being one of them. So, who is the person here, in all officiousness possible, who is representing the needs of en.wikisource?
@Inductiveload:If my earlier request was unclear, please review this ennumerated list. What I am typing now is actually the mishmash you claimed the above to be.
Also, if you think that I am ever using my tools or skills to in anyway humiliate you, or compromise you or in any way offend you, do let me know. I will do anything within my small realm to fix it or make it so it is somehow less offensive to you.
Banter between artists is okay by me. The idea of having a "layout" based solution for fun vs. curmudgeonly is something that is not in my skillset, but wow, wouldn't that be great....
@RaboKarbakian: Nothing in that list is a question, so please be explicit with what you are asking. If you would like me to invent such a tool or system, for the first I will point you to https://ws-image-uploader.toolforge.org/, and for the rest, I'll just defer to Commons' guidelines, if and when they figure them out.
No "one" represents enWS at Commons. The "community" represents the needs of Wikisource, usually via the Scriptorium. If you are after a Wikisource "data union" for collective bargaining for something to be implemented at Commons or Wikidata, you should ask there. Re inception date meeting was in 2019 and the bot owner, and old admin there obviously I am out of a loop or three, because I have no idea who or what you refer to.
You have misunderstood what I mean by "clipping" clipping is where you adjust the levels so aggressively that pixels that used to have values between 1-254 are instead "clipped" to be either 0 (black) or 1 (white). Generally speaking, this represents loss of information and is often undesirable. Some clipping is intentional (for example, I will often "clip" out the brightest 5% or so of the histogram because that's usually JPG noise around edges on a white background and it's a quick and easy way to remove it). But I do try to avoid clipping that substantially alters the image content.
For such a simple image, if you wanted to vectorise it, there are many, many tutorials out there, e.g. https://inkscape.org/doc/tutorials/tracing/tutorial-tracing.html and probably hundreds of YouTube videos. tl;dr trace the bitmap then tidy up with the node editor tool.
The animation might be "fun", but IMO it's not a faithful copy of the original (or even the original intent), it's entirely synthetic.
It's very fair to compare the histograms because it shows you a good representation of the information loss you have paid for you smaller file size (via quantisation into an indexed GIF) with. Yes, the GIF will look pretty rough in the histogram, and there's a good reason for that - the format has introduced compression artifacts.
I have undone your incorrect edits to my comment above. The fact that the GIF format is incapable of representing more than 256 colors + transparent is not a defense of it. In fact, it is the reason the format is usually not suitable because it can represent at most 257 values: the 256 colormap entries and "transparent". Compared to an 8-bit GREYA PNG, it throws away 65279 values out of a gamut of 8 bit color + 8 bits transparency = 65536 values. It's even less suitable for an 8-bit RGBA PNG, where the PNG has up to 24 bits per pixel and therefore can store roughly 436 million unique values (at a cost of a filesize roughly 4 times larger, ignoring difference in compression algorithms: 24 bits/pixel vs 9 bits/pixel). This is why you rarely see GIFs these days: the image quality is so bad they are only used when the animation capability is more important. Inductiveload—talk/contribs13:23, 19 August 2021 (UTC)
Reply to updated request
Can you write an uploader for sourcerers that uses the {{Book and not {{Information at commons
It uses {{information}} which is annoyingly general as a template, but it seems to me that {{book}} is also not necessarily the "correct" template since it still doesn't accurately capture the concept of "image from page X of book", and many of the metadata items copied from the book are not necessarily correct. For example, the illustrator is the author of many images, some anonymous artist and/or engraver did the title page logo and fleurons and so they should be the "author" of the image, even if they are not the "author" of the book.
"Ideally" such a template might be able to refer to the "containing" work via Wikidata, then add concepts such as "page number", illustrator of this image, "what this image is" (e.g. a drop cap) as well as things like captions.
Can it add Structured data befitting source images
Sure, if I can ever figure out what SDC is actually correct. The ball's firmly in Commons' court on that one. I've asked, twice, and received no reply. As you say, it does rather appear that SDC is a solution that no one has actually tried to apply to a problem yet.
I am also unclear if SDC can or should replace the information template. For example, if the caption was in SDC, does it need to be in the info template too? And the page number?
Can it add Sensible categories
Depending on what you mean by sensible, it already attempts to add sane categories based on what the image is (e.g. it will categorise a drop cap according to the letter).
I smoothed some of it. It had a major formatting problem but reading the whole thing was not painful but more of a clash of my other experiences. I had to quit, so my SR was not very much.
It belongs here. One of the reasons I did not want to dig it out is because it has a bazillion images. Line drawings mostly, if not entirely, of leafs and stems. Very likely, better as SVG than re-doing from the gutenberg project. The gutenberg images could be downloaded but they are going to be small and disappointing, but as place holders for an SVG project with many contributors. That is so much easier to think about.
I would like to hand it to you and then contribute. I don't know how to get it here though and would like to know before I go through the effort to dig it out.
One day I will sort out a streaming HTML parser to deal with the final HTML results from PG which should be even better than the PGDP text. But I'm not there yet.
For the images, your best bet is probably the IA JPGs (or JP2s, even better), because the PG images are small and all have a light grey background for some reason. The images at the IA are in fairly good condition with a fairly even and light paper colour, though there is some bleed-through from the other side of the page.
I have the F2. I mentioned this before; months ago. You were uploading F2 here. If you issed them off, and don't want to do this anymore, then okay. But to repeat, I have the F2. so the only real answer (no advice, no suggestions) I need is either that you will put it here or that you wont put it here. Simple enough for even a bot to answer.--RaboKarbakian (talk) 17:08, 20 August 2021 (UTC)
@TE(æ)A,ea.: Well, they do say there are only two hard problems in computing: naming, cache invalidation, and off-by-one-errors. There is definitely a tendency for something in Lua to be sensitive to stale caches. It's not something I've been able to definitively nail down, but certainly updating a module can necessitate a purge to get template docs refreshed sometimes. If I ever get as far as a reproducible example, I'll file a bug on Scribunto. Inductiveload—talk/contribs18:53, 17 August 2021 (UTC)
With caches, it’s nearly impossible to have consistent examples. I realized that this also happened when they reintroduced the Score extension: some files didn’t work, or caused problems (especially on transclusion), until the cache was refreshed. (Although, on that topic, there are still some old scores that don’t work with the new extension, like this one; but Score has always hated \header.) TE(æ)A,ea. (talk) 19:05, 17 August 2021 (UTC)
@TE(æ)A,ea.: Thanks, confirmed and fixed. It's a funny one because editing the page and saving will fill in the Source field, but somehow at least some of the NIOSH pages are missing it. I have added a default. I think the other fields will be less finicky about being passed nothing.
Please revert the change you made to Polytonic template
Latest comment: 3 years ago2 comments2 people in discussion
Can you please revert the change you made to Polytonic template. The whole point of that template is to render accented Ancient Greek text properly. The change you have made has stopped Ancient Greek text appearing as it should. Was there any consensus about this change??? There was a whole discussion in the Talk page of Polytonic ensuring that the right fonts were included (is this now lost?)
Latest comment: 3 years ago3 comments2 people in discussion
For a11y reasons, if not for my blood pressure. It's only active in read mode (vs. edit mode) so having to hit the back button is no big deal even in browsers that do not restore form controls. Xover (talk) 14:02, 27 August 2021 (UTC)
@Xover:Done, sure, if you like. FYI, layouts are available from "submit" mode (i.e. when previewing). Hopefully your browser will save your work in that case (unless you close the help page rather than going back, then it's goneski, at least in Firefox).
Yeah, I've noticed the changes flowing by on my watchlist, but haven't had time to look closely yet. I'll give a holler if I can provide any useful input, but meanwhile I'm just ecstatic at any progress there (my own ambitions in that regard have been thwarted by a combination of IRL and an inability to come up with a more perfect way to handle the three-into-one step and other stuff that causes a rerendering). The _blank thing is just a personal hobby-horse.
I was particularly happy to see the end of the ws_msg hack, which IMO should be killed with fire anywhere it still exists on the site. Xover (talk) 14:37, 27 August 2021 (UTC)
Greek vs Polytonic
Latest comment: 3 years ago3 comments2 people in discussion
The Greek template was set up for Modern Greek (el) and Polytonic for Ancient Greek (grc). Has this distinction been removed? --EncycloPetey (talk) 17:02, 28 August 2021 (UTC)
That is odd, as I always understood the distinction had been made for the two. But I can see looking at the early history that it was not the original intent. The history on Wiktionary is so convoluted with moves and changes, I can't tell whether that distinct existed there. There may be some pages here where incorrect advice is given. --EncycloPetey (talk) 22:55, 28 August 2021 (UTC)
Common Sense (Monthly Challenge version) proofread!
Latest comment: 3 years ago4 comments2 people in discussion
The proofreading of this just finished. I noticed you had section markers, so I would like you to transclude along those lines. (I’m also a little busy now, so I can’t work too much on heavier-work stuff like transclusions.) TE(æ)A,ea. (talk) 02:03, 20 August 2021 (UTC)
Latest comment: 3 years ago2 comments2 people in discussion
Could you add kos = "Kosraean", nb = "Norwegian Bokmål", sa = "Sanskrit", please? These are the major two missing (plus a new one for an index I am creating now). TE(æ)A,ea. (talk) 01:35, 30 August 2021 (UTC)
For some reason a (block)right-aligned stanza is getting a computed margin-left:0 and is showing up centered on this page. I'm not sure I can recall this particular thing ever working, so it may have always been broken. Possibly the outer poem block is rendering the inner stanza block alignment moot? Maybe this is a case where the technical guts should rather be exposed to the user in /doc as "Don't do that, align the whole poem instead"?
I'll dig further at some point (head's not in the right space for it just now), and it's a problem that'll keep just fine (no hurry), but figured I'd drop a note for tracking / aid to recall. Xover (talk) 10:08, 29 August 2021 (UTC)
Incidentally, having finally made the effort to put {{sbs}} and friends out of use, I was curious so I checked the transclusions counts for {{ppoem}}. It's now sitting at just over 1k transclusions in Page: and 400 in mainspace. All the ones I've done have been intuitive, done what I expected, and with very few weird interactions with other necessary templates. I've dropped notes here for almost all issues encountered, and I think most of those are either unfixable or should not be fixed. In other words, I think for the stage it's in it is essentially stable. It probably needs more baking looking for weird edge cases before going mainstream without caveats, but we can probably tone down the big dire warnings in the docs.Biggest outstanding hurdle before full production, as I see it, is making a call on whether going full Extension is worth it at some point. If that's a possibility it should probably happen before flogging it to the variety of contributors (conversion may be slightly painful, and unlearning the habits will be tough for some users); but contrariwise, if we're sure we won't be going that route I think the current version is in excellent shape.And my experience using it so far suggests it'll be a great boon both to users and technically. For actual poetry, both new uses and replacements for {{sbs}} has been smooth sailing. Xover (talk) 07:30, 2 September 2021 (UTC)
@Xover:Well, I'm glad you're enjoying it. I think pretty much all the worst pinch-points are handled now.
In terms of making it into an extension, I really don't know. The deathly slowness of submitting code to Gerrit in general makes me really not want to get involved if we can avoid it, plus it would take weeks to deploy fixes. The thing about the template as it is is that it is naturally machine-readable (or the module couldn't parse it). This means that there must always be a direct mapping to any equivalent implementation later. So if we do move to an extension one day, it "should be" a straight script to move things over.
The last thing I think we have an issue with is that the drop initials don't export well to Koreader when there's hanging indents (which means they probably will go sideways on other e-readers). Inductiveload—talk/contribs16:48, 2 September 2021 (UTC)
@Xover: You could try something along these lines:
Who seeks for heaven alone to save his soul,
May keep the path, but will not reach the goal;
While he who walks in love may wander far,
Yet God will bring him where the blessed are.
@Londonjackbooks: Thanks for the tip! However, in this particular instance I was just noting an issue with an experimental template that Inductiveload has been working on. It's very good at poem formatting, and I'm hoping it'll end up making life much easier for everyone once it's done, so while I'm trying it out I'm giving them feedback on issues encountered (even things that maybe shouldn't work, but that my head thought made sense at the time). Xover (talk) 14:34, 29 August 2021 (UTC)
Latest comment: 3 years ago4 comments2 people in discussion
Hello. I have noticed that you added information about a "cover" parameter to the documentation page {{Header/doc}} about 5 months ago. However, I failed to find that parameter in the {{Header}} itself. May I ask about your intentions with this? Just curious. Thanks! --Jan Kameníček (talk) 10:19, 5 September 2021 (UTC)
I am asking because I was experimenting with the new parameter to see what it does, and did not see anything. Following the advice in the documentation page I added cover=Czechoslovak stories.pdf/7 to the header template of Czechoslovak Stories just to see what happens, saved the page, but did not see any change. --Jan Kameníček (talk) 19:08, 5 September 2021 (UTC)
@Jan.Kamenicek: There is no visible output on a normal page, the image is only used for setting the cover of exports like EPUB (and probably MOBI), so when you view it in a "shelf" view on an e-reader it shows the cover rather than just a textual one.
If you look at the HTML source, you'll see something like
Latest comment: 3 years ago5 comments2 people in discussion
I was going to try going off and figure this out myself, but I think I've convinced myself there's no hope. Please read and see if you agree...
?)
I had noted your template/module {{betacode}} and got around to trying it out, e.g. {{betacode|ai/nigma}}. But since I'm chiefly interested in permanent insertion, I thought I'd try using {{subst: ... }} to generate and insert the converted Greek chars one time. And... boom!
{{betacode|ai/nigma}}
αίνιγμα
{{subst:betacode|ai/nigma}}
If I do save the page anyway, I see "{{#invoke:betacode|decode}}" has been generated and inserted into the page. But then displaying _that_ page will blow up.
If I then change that to "{{#invoke:betacode|decode|ai/nigma}}" I get good text displayed: αίνιγμα But there are still problems.
The first 4 variations all produce good display output, and the fifth blows up.
The third variation does substitute generated output into a saved page. However, the output is the source contents of Template:Greek with parameters expanded inline, 1=αίνιγμα and no param 2. (View this same talk page and find "wst-lang")
So if there's a template->module->template
we can't use subst:template at all, and
subst:#invoke:module substitutes the expanded template source and not just the desired output for display?
(minor: hmm, if there's a redirect from Template:Polytonic to Template:Greek, perhaps 'Greek' should be used in the module's .expandTemplate{title = 'polytonic',} call?)
@Shenme: honestly, I'm not 100% sure how or if subst can work with a module in the mix. I also do not really understand how it works.
However, my conclusion from {{betacode}} was that a template and/or module in this case is a pretty rubbish implementation, and what I am now actually planning (and have started poking half-heartedly at) a "real" IME using the ULS input framework. So then you could access a Betacode Greek IME from the keyboard symbol next to the editor field (F34633878)
Ah, um, that's my interest in Betacode, as IME, figuring subst: was taking advantage of your work until I got around to finishing my attempt.
I was going to simply (presumably) do it as a JS widget in user space. I've got the conversion working standalone with no UI. I wanted to output the multi-accented chars as either composed or non-composed, according to inline configuration. A couple other usability variations (strict Betacode vs. loose vs. looser)
Your mentioning ULS is scary sounding. Mebbe we try advancing in parallel, my lash-up homebrew vs. your internals oriented? ;-) Where is the ULS IME hook documentation? Shenme (talk) 07:06, 6 September 2021 (UTC)
(BTW: why no edit links on section titles past a certain leg-wagging section?)
@Shenme: sure, you're welcome to give it a go in wiki-side JS. I just would hate for you to feel put out if a built-in IME came along later.
Latest comment: 3 years ago7 comments3 people in discussion
I ran across a loose article from The Texas Medical Journal that needed scan-backing and situating in context, and found that the journal was blessedly brief and published over a reasonable timespan, for which there were actually good scans available (i.e. it is actually possible to get it completely proofread if it ends up like an example to point folks at). So I got inspired to set up Portal:The Texas Medical Journal as a sort of way of thinking out loud and experiment about how to deal with periodicals and what our guidance should really be in the area.
Feel free to opine or tweak. My own thoughts are pretty unformed and hand-wavy.
Got any thoughts on how we set up the (sub)page structure, and content of each level, for this specific case? Is the portal pointless, and should be made into a WikiProject-type page off in the workbench namespaces somewhere?
And, considering we already have one lonely article from this periodical, how would we deal with it now while almost no other content is extant? I proofread one entire (chapter-sized) issue for demonstration purposes, so we have that much that could be transcluded. But it's not much more than that one article we already had (Ligation of the Dorsal Vein of the Penis as a Cure for Atonic Impotence). I would not have been necessarily opposed to a proposal to migrate the extant text to the Index: and delete the mainspace page had one been made. But let's say it was a really seminal article by a famous author, widely cited and widely linked, that we really wanted to keep. How much of the structure do we think this particular case would need to justify having it in mainspace?
I'm currently thinking I'll create the page for the journal with the content from the portal minus the scan links, and a per-volume page for Vol. 18 that transcludes its volume index and has a AuxTOC listing the issues, and a page where I transclude the contents of Issue 8. But I'm ambivalent enough I may change my mind several times before pulling the trigger.
Oh, also, no hurry. I was fixing up that lone article and saw it as a good case to explore the issue, but I haven't been giving it a lot of thought since last it came up and my head is really elsewhere currently so this'll be a "keeping it warm" type thing. Xover (talk) 16:00, 6 September 2021 (UTC)
@Xover: Ha, well, this all sounds very, very familiar! This is essentially the same question that is still open at Wikisource:Scriptorium#Policy_on_substantially_empty_works, to which there was a lot of words and very little in the way of actionable consensus.
Personally, I disagree with shunting away into Portal: or WikiProject space, because that's a major barrier to entry for setting up any article. We should make it possible for people to "slot in" articles as they please and see them presented and findable in mainspace immediately. Or we might as well make periodicals functionally out of scope, because I do not think there is a practical hope of any given periodical ever being complete other than freak occurrences like the PSM). Living the in the real world as we do, it's substantially more likely that someone will create an article or two than will plough though an entire volume or even issue. Especially as we get more periodicals that aren't just text and images in simple layouts. If you told me as a new user I had to proofread an entire issue to get one single article transcluded, I'd get 2 pages into the rest of the volume, lose heart and leave. And I also do not think a "Cantor's dust" structure of an arbitrary number of articles floating around in mainspace untethered to parent pages is a good idea.
I think we should just put them into mainspace right from the off, even if it's literally just a list of volumes and IA/Hathi links (which in itself represents a substantial amount of editor effort, since locating and collating series is quite involved) plus a Wikidata item to catch the authority control. Any other method will immediately result in duplicated content once the list is copied to mainspace (the point this is supposed to happen is unclear to me, somewhere between "one article" and "every article ever"), and the end result will be that one of the pages, likely the Portal, rots. We do not have the editor interest to maintain 2 separate lists of volumes per periodical and keep them in sync. A Portal is a supplement to a work, in my opinion, not a substitute (i.e. it should come after).
The Portal can be very valuable as a thematic index, list of contributors, referencing works, historical contextual works, etc, and, in fact, that's somewhere WS can add mind-blowing amounts of value-add (especially with decent WD support). But it should not, IMO, be the only, or even main, entry to the work. And there should certainly not be a redlink to the work in mainspace, as the number of people who know they should also check the Portal namespace upon seeing a redlink is probably only in the thousands in the world. Furthermore, since WikiProjects are not even in the default search set, I'd say at if you were proposing to set up periodicals as Wikiprojects until "some" future completion threshold, you might as well just leave the periodical at the Internet Archive for all the good the effort, typing and time would do you.
Populate it with as many volumes + scan/source links as you know of (ideally all, but could even just be the one)
Ideally, set up all the volumes as a batch import. This would be wildly improved by WD support, but WD do not appear to actually care about using any of their data, so it's on us to find, implement and maintain a suitable schema. So I am not in the mood to do do the work and perform batch imports myself at this time, knowing that I'll have to come back and re-do half of it at some point. One day. Maybe. Plus I CBA to do the pagelists.
Create The Texas Medical Journal/Volume 18 right now and place an AuxTOC, or the actual TOC if you can be bothered to proofread it (which you have done).
Depending on the work structure, create The Texas Medical Journal/Volume 18/Number 1 if you want to split by number (which I think is good, since it's "as published" and also sidesteps having to deal with "Publisher’s Notes" in every issue by some kind of suffix), but I see that that is not a universal opinion and I just don't care enough to die on that hill.
Create the article itself and link from the TOC, make a WD item, etc
I also did what you have done and set up a "minimal" example periodical as show-and-tell for this discussion: Journal of Classical and Sacred Philology (I didn't create the page, merely co-opted an empty periodical's page). No-one has ever responded to my question on the Scriptorium about what exactly should happen to that work.
Interestingly, I'm right in the middle of setting up Transactions and Proceedings of the New Zealand Institute. I've created the base page and am (slowly) bringing in the Indexes and doing their pagelists. I'll also do the TOC for each issue—so that they are consistent. At the same time I've created Portal:Royal Society of New Zealand to be the holder for the thematic lists. The reason for using the publisher as the Portal rather than the work is that the Transactions is not their only journal. Depending on the size of some themes and what it ends up looking like, may well need sub-portals. I'm seeing the Portal as an effort co-ordination point, but only as links from articles we already host. So, I'll be doing a smattering of articles from the five domains (Zoology, Botany, Geology, Chemistry, Miscellaneous) and some of the Proceedings. The idea being, "I wonder what else is here on that topic; let me click this link; Oh my, they need some help in my area of interest." I'll also populate the author pages with the redlinks out of doing the TOCs.
In terms of structure, I'm fortunate in that the Articles in each volume are numbered. So, Transactions and Proceedings of the New Zealand Institute/Volume 6/Article 2 will lead to the article On Observed Irregularities in the Action of the Compass in Iron Steam Vessels (to pick one at random). However, the Proceedings are not numbered, so various gymnastics are being followed to deal with repeats of names between the sections.
wrt the concept of splitting by number, I agree with doing this. In general, an Index should have a single mainspace place it goes to (the exception being collections of unrelated works).
Yeah, I think one of the essential tensions here is between enabling people to dip in and do a single article versus keeping mainspace a pure presentation namespace with none of the dust and mess (scan/index links) of the back rooms (WS:, Index:, Page:). The other being between the desire to have only fully finished works in mainspace versus the reality that most periodicals are far too massive and complicated to make such a requirement viable.I waffle back and forth on these constantly, so this Texas Med. J. stuff is an attempt to fumble my way to some kind of enlightenment, if only in baby steps. Xover (talk) 18:39, 6 September 2021 (UTC)
Wall of text? Dude, you know who you're talking to here! :)But, yeah, the Scriptorium discussion that rather predictably (sadly) didn't go anywhere is what I am trying to keep warm. And because my brain hurts when I try to think about the big ball of spaghetti I'm approaching it one little bit at a time. Periodicals. Of a finite and manageable size.Since we already had an article that I was trying to clean up, I have few compunctions about slapping up otherwise-empty structure around it. It's an improvement however one looks at it. But at the back of my mind is the voice howling that I wouldn't want to keep a single small entry from a dictionary or something like that, and I'd be very annoyed if, say, a large number of such popped up over a relatively short time span (complete with bot-created Not proofread raw OCR pages). There has to be some way to square this circle.Maybe a separate kind of mainspace page, with separate, visually distinct, {{header}} template, separate policy and style guides? If we carve out periodicals from the dictionaries and encyclopaedias, maybe the problem becomes more tractable? Software support for periodicals, with good Wikidata (or structured data) integration, that makes otherwise-empty structure less offensive to those that swing that way? Xover (talk) 19:09, 6 September 2021 (UTC)
Is there something that coud be leveraged off the Type field on the Index page? Most just default to Book, but these should all be on as Journal (if we want to change the name to Periodical, that's fine). So, if the Type is Journal, could there be some automagic that does {{Periodical header}} etc.? Beeswaxcandle (talk) 19:41, 6 September 2021 (UTC)
Is there something that could be leveraged off the Type field on the Index page Technically, yes, perhaps (10× easier with the PRP Lua patch at Gerrit). However, IMO, it makes more sense to drive this kind of thing via Wikidata. Then automagic is more than possible (and, in fact, I would say the only scalable solution). Modelling such things at Wikidata is pretty much up to us:
So we probably should (as done here by all three of us) start small and totally hammer out a very small handful of exemplar periodicals. Then document the absolute hell out of it and start rolling out to other periodicals.
none of the dust and mess (scan/index links) of the back rooms (WS:, Index:, Page:). I have to say that in find a discrete {{small scan link}} after entries in a volume list to be singularly inoffensive, and if I had to choose between that and maintaining two completely separate venues for the same list, separately by a Portal link that a casual reader will not know about, and differing only in the presence of that little link, I'd choose a single list. We will likely never (barring a general AI with an interest in proofreading) have complete coverage, and at least providing some link to scans is a very useful service since we are the only library who provides that list and allows to expand and correct it. Over time we should work on importing scans so at least it goes {{ext scan link}} → {{Commons link}}, but that will be "easy" if we can hash out WD support.
I'd be very annoyed if, say, a large number of such popped up over a relatively short time span (complete with bot-created Not proofread raw OCR pages) I wouldn't mind if proofread articles popped up all over the place, and OCR dumping is pointless and slightly annoying, but ultimately is contained to the page NS in most cases (though the lack of an actual rule that says you can't transclude red page causes bad feeling when it inevitably leads to a WS:PD showdown. However, if we're going to allow a standalone article in mainspace (and I absolutely think we should, because articles are independent units of work and valuable (or not) in their own rights, rather than as part of the whole) we should also allow the parent pages to join it all up and provide a central anchoring point in mainspace.
visually distinct, {{header}} template, separate policy and style guides: yes * 2.5 (not sure the header needs to be distinct as such: maybe just a banner on the top level saying "periodical incomplete")
If we carve out periodicals from the dictionaries and encyclopaedias, maybe the problem becomes more tractable I think this is sensible. Encyclopedias and dictionaries are their own things and mostly seem happy as they are. Binding them up together will just cause a logjam.
Latest comment: 3 years ago1 comment1 person in discussion
I just couldn't resolve the image well enough so I couldn't be sure, and your fix was appreciated. Unfortunately, it looks like it's headed for the tip. I hate copyright.
Interestingly, while one can play with the scan image at 1024px as seen during proofreading, you *can't* get commons to display the really large 7462px image from the page "Image" link:
"Error: 500, Internal Server Error at Tue, 07 Sep 2021 10:02:45 GMT"
I noted your use of {{wsp}} as I had just come across it in searching templates. (Because I was looking for {{word spacing}}) There is *so* *much* out there. Found {{Rbstagedir}} and don't even remember where that'd been useful. Templates aren't really categorized well, are they? Shenme (talk) 10:14, 7 September 2021 (UTC)
Merging Two Indexes for Sherlock Holmes
Latest comment: 3 years ago1 comment1 person in discussion
Latest comment: 3 years ago2 comments2 people in discussion
Hi. If it's still possible, please change "top_caption" to "top-caption". The other options are using the hyphen, not the underscore. — Ineuw (talk) 09:39, 9 September 2021 (UTC)
Latest comment: 3 years ago2 comments2 people in discussion
Notice this page. Works with non-numeric year (e.g. “1220s”) claim to use a template, but I can’t make heads nor tails of Lua. Could you fix this? TE(æ)A,ea. (talk) 22:42, 12 September 2021 (UTC)
@TE(æ)A,ea.: this is not related to the module. It was a 6-year-old typo in {{header/year}} (introduced here). This is a rather complex template that IMO is a perfect example of when a module would be clearer, since the huge nested if statements are a readability disaster area. Inductiveload—talk/contribs07:12, 13 September 2021 (UTC)
Gonna claim "Great minds…" on this one
Latest comment: 3 years ago3 comments2 people in discussion
And while I am making unconnected comments on discussions flying by my watchlist: Module:Header really should be consolidated into doing everything in-Module instead of leaving the main loop in template code. For example, because this is currently non-functional and (needlessly) hard to implement right when the template controls the entry points.
It is a shortcoming of MW that Scribunto doesn't, ironically enough, have a (html) template facility, but the resulting tradeoff is pretty much a no-brainer as far as I'm concerned. Xover (talk) 11:39, 13 September 2021 (UTC)
Certainly there is more work to do with the header template/modules, but now the bulk logic is moved out, I'm hoping to gently iterate towards a modular Lego-kit of utility templates and modules (and template-modules where you can invoke/transclude/require as appropriate with the same APIs), each with simpler APIs and expectations.
RE: in-module templating: you can farm out to sub-templates with expandTemplate, which I think is a pretty handy pattern sometimes, e.g. {{header/main block}} seems more readable that way than as pure Lua-driven mw.html nodes. The problem (not really a problem, just boilerplate) there is that you have to pass the frame around internally or use mw.getCurrentFrame(). Inductiveload—talk/contribs12:03, 13 September 2021 (UTC)
I disagree on mw.html vs. {{header/main block}}. You certainly can find cases where a expandTemplateed Template: is cleaner than mw.html, but in most cases where you need a html-template type of template a Template: is usually going to be messy (it makes all the wrong tradeoffs). Deeply nested, complex, markup structures are certainly going to look relatively unreadable in mw.html, but I'd say the alternative is a hypothetical future real (html-)template solution for/in Scribunto and not a Template:-based pseudo-html-template (which they aren't really designed for; they're more akin to primitive macros than templates, though having a bit of both in them). Xover (talk) 17:21, 13 September 2021 (UTC)
Batch Upload for The Strand
Latest comment: 3 years ago5 comments2 people in discussion
Thank you for running this. You're probably right. It's best to make all the post-1900 local files. I'm not sure what the license field should be like. If it's not the template, then is it the code of the template?
My basic idea is to create a batch file for the complete run of a periodical once a week in the hope that this will encourage users to use them to create scan-backed texts. Otherwise, it's a lot of work to find and import a 600+ page file for only a few pages. I'm also going to try to merge any periodical fragments that I find into full volumes. Would that be too much for you? I'm not sure how much work it is on your end and I don't want to overwhelm you. Once again, thanks for doing this and sorry about the confusion on Phab. Languageseeker (talk) 23:43, 12 September 2021 (UTC)
@Languageseeker: I don't mind you stacking up tasks, but I probably won't be able to sustain one a week of this size (specifically, with this many Hathi volumes) as they are incredibly slow to download - since I started yesterday, I have 8 volumes downloaded. IA is probably easier, especially if the DJVU already exists.
Don't worry about the Phab confusion, I'm not sure if one is "allowed" to use Phab for on-wiki tasks like this, but its fine by me (and even preferable, since, as well as making it easy to host the data files, a task tracker is, y'know, good at tracking tasks). If it's not allowed, I guess someone will tell me at some point.
If there is going to be a backlog, Phab is certainly easier for me, since otherwise it'll just get lost on a talk page or something.
BTW, if you are using spreadsheet formulae, I'd rather have the XLSX file if possible, since then I can adjust the formulae if I need. ODS is OK, but XLSX is better as the script has an XLSX ingest function, so I'd have to convert an ODS anyway.
For the license field, it should be what goes insideCommons:Template:PD-scan. So, for example, PD-old-assumed.
Wow, I did not realize that this would be this complicated. I'm definitely going to slow down and wait until one periodical finishes before I request another. Ironically, this makes me even more convinced that this needs to be done preemptively. After all, if it takes a user, an interface administrator, and a WMF teammember several days to upload one volume, how is an ordinary user supposed to do it? Hopefully, seeing the number of volumes that need to be imported server side will convince somebody at the WMF to actually fix uploading. But, who knows? As always, a huge thank you. Languageseeker (talk) 01:34, 14 September 2021 (UTC)
mark-proofread deps fix
Latest comment: 3 years ago2 comments2 people in discussion
Some weird timing thing or upstream change makes MediaWiki:Gadget-mark-proofread.js bomb on mw.api being undefined. Since it doesn't internally armour-plate this, could you set its deps to mediawiki.util,mediawiki.api in MediaWiki:Gadgets-definition?
It's been working fine without it for yonks years so there's an external trigger for why it's started dumping, but I don't have the cycles to actually debug what's happened there just now. Xover (talk) 07:24, 14 September 2021 (UTC)
Latest comment: 3 years ago7 comments2 people in discussion
I’m planning to run “The Smart Set” next, but I’m running into a bit of a conundrum. Some of the volumes are available on HT and HT2 scanned by Google , but all the issues are available on IA from microfilm. I see two possibilities. One, we can upload the complete volumes from HT and then combine the individual issues from IA into volumes as well. This will be slower, probably run into the cache bug, but have better image quality. Or, we can just upload all 354 issues. This will be faster, but have worser image quality, and have a gigantic volume listing. The second option would also probably require writing a script that would replicate the way in which the IA create identifiers. Thoughts? Languageseeker (talk) 20:02, 15 September 2021 (UTC)
Alright, that makes sense. Do you think it is possible to modify the batch upload script to support combining multiple issues into one volume? Something like sim_smart-set_1930-03_86_1;sim_smart-set_1930-03_86_2;..;sim_smart-set_1930-03_86_6 would download the six volumes and combine them into one volume? Languageseeker (talk) 00:57, 16 September 2021 (UTC)
There are 28 volumes missing from the HT. I feel like this won't be the only case because IA is digitizing the microfilms of entire print runs which can fill in gaps from other volume sets. Languageseeker (talk) 12:46, 16 September 2021 (UTC)
Darn, that's quite a few. I'll see if I can make the changes, but it might take a while, so I guess don't count on those volumes being ready imminently! It's obviously going to be a better in general to avoid the IA SIM collection just because the quality is pretty bad (not the IA's fault, just a fact of the medium). That said, the project is pretty cool. Inductiveload—talk/contribs12:55, 16 September 2021 (UTC)
No worries at all. I know that you have a lot on your plate already. I'll just wait until you get a chance and then I'll create the batch upload request. I agree about the IA SIM collection, but it's a good last resort. Languageseeker (talk) 22:30, 16 September 2021 (UTC)
New Maintenance Category
Latest comment: 3 years ago3 comments2 people in discussion
With the advent of the ProofreadPage Lua library, do you think it's possible to create a page that has the top 10 indexes that have the least amount of unproofread pages remaining (but greater than 0) and which have last been worked on more than one month ago? This could be a good maintenance category for the almost completed works that got abandoned. Languageseeker (talk) 22:30, 16 September 2021 (UTC)
@Languageseeker: This isn't something the Lua library can do for us, really. Right now, it's probably something that would need to be done as a bot, and actually might even need backend API support, since AFAIK the proofreading stats of an index is not presented on the API (yet), Lua is actually ahead of the curve here. It's possible the best way would be to somehow adjust Special:IndexPages to be more useful. For example, Special:IndexPages could learn an excludeZero=1 parameter or similar. Inductiveload—talk/contribs07:25, 17 September 2021 (UTC)
Yes, I think it would be good to exclude completed indexes from the To Be Proofread and To Be Validated sections. Would it be possible to rank them according to % of pages remaining? Languageseeker (talk) 16:46, 17 September 2021 (UTC)
Latin not italic? :-)
Latest comment: 3 years ago3 comments2 people in discussion
No shame, just second thoughts, and more than likely correct, as I'm working on a text with "quem Deus vult perdere prius dementat" _not_ italicized. Better not to mix formatting and separate function of classification. 's cool. Shenme (talk) 18:23, 17 September 2021 (UTC)
Periodical Merge
Latest comment: 3 years ago3 comments2 people in discussion
Latest comment: 3 years ago4 comments2 people in discussion
At one time, awhile ago, when the engineers needed to retrieve a mainframe cable they knew was unused under the computer center's raised floors, they'd point to a floor square and say "we think one loose end is there" and I'd disappear under the floor and untangle and haul out the very expensive, very heavy cable - 100s of wire pairs - 50/100 or more feet. Dirty and ripped clothes - good reason to stop dressing up for work, yes?
So I'm likening that to templates. ;-) Tell me when you'd like to re-re-re-peek at the {{TOC begin}} family. While {{TOC row 1-1-1}} has vertical-align:bottom on the last "page number" cell, {{TOC row 1-c-1}} doesn't. If it is assumed that 'all' last cells are page number cells, then 'all'...?
@Shenme: quite right. I have fixed the template, but you will have to modify your page so all that white-space isn't "sucked into" the last cell of each row. See phab:T232477 and the linked conversation for gory details of why it works this way.
Thanks. I'm intrigued by {{optional style}}. I've been thinking that many templates ought to allow a 'style' parameter escape mechanism for unthought-of usages. Mebbe {{optional style}} could be used for that. I'll look at the phab ticket another day, when stomach stronger. Shenme (talk) 04:27, 21 September 2021 (UTC)
Lots of templates do provide a style override ({{optional style}} is just syntactic sugar for a complicated {{#if:}} construction. However, often class is better. In this case, a style parameter won't help much, because a padding has to go on the <TD>, not the <TR>. So you can use Page styles like this too: Special:Diff/11703295 and Special:Diff/11703293. In general, if you find yourself piling custom CSS into a style parameter more than a handful of times, you should be considering if a template or a class would suit better. Inductiveload—talk/contribs09:12, 21 September 2021 (UTC)
Page Range for MC
Latest comment: 3 years ago6 comments3 people in discussion
It occurs to me that if we begin to run sections of works in the MC, it would make sense to have an option to set the page range. For example, "The Red-Headed League" is part of Index:The Strand Magazine (Volume 2).djvu but only pages 190:203. So, the size of the work is only 14 pages not 666. This way, we won't have erroneous number of pages when running parts of periodicals or multiple excerpts from a work. Do you think this is doable? Languageseeker (talk) 06:02, 13 September 2021 (UTC)
OK, having reflected on this a bit more, it's actually quite hard, because the statistics currently work on a per-index, not per-page level. So it'll need quite a bit of back-end faffery to only record page status changes for the page range of interest. Otherwise if you just limit the page count of the index, you end up with the pages outside the range of interest contributing to the month's stats. So this will probably need to work for the DB-driven change querying that's slowly chugging along (see phab:T172408 and the little constellation of issues around that). Inductiveload—talk/contribs12:39, 13 September 2021 (UTC)
Aww, ok. That makes sense. I think there are two things that still would be nice and might be possible. 1) To have some way for these selections to appear in the "Under 50" section. Maybe, we can have boolean for Under_50? This way a user will be able to see which texts are the short ones. 2) It's no longer possible to assume that an Index name will be unique, so it might make sense to have a function that would check and remove any duplicates. This should prevent errors in when calculating the total number of pages each month and the total number of pages proofread. I know that there will not be a way to distinguish if a user proofread P220 or P440, but it would probably be best not to count P440 more than once. Languageseeker (talk) 01:30, 14 September 2021 (UTC)
There is such a thing: set short to true to force it to appear as a short work.
@TE(æ)A,ea.: Thank you for creating this. Ultimately, I hope to reduce the number of split indexes in favor of entire volumes so I'm not in favor of this approach. My goal right now is to try and figure out how to begin proofreading periodicals. Based on past experience, users are not attracted to Periodical Volume X because nobody knows what's inside and there is quite bit of less interesting material. Right now, I'm testing out featuring individual articles/stories. So after Sherwood Anderson, The Man's Story, I plan to change the title and cover to another article from that issue of The Dial which has contributions from authors such as Thomas Mann, T.S. Eliot, W.B. Yeats, etc. This will also help to create scan-backed copies of short stories and serialized novels.
This month, the focus has been mainly been on just how to get the scans on WS. It took almost the entire month to get the The Strand uploaded due to various cache bugs.
Once we figure out the logistics, I plan to create a suggestion section for periodicals. There will probably be an open-ended section for anything from any periodical and a more restricted one for what do you want from this volume of periodical X. Any thoughts or ideas would be more than welcomed. Languageseeker (talk) 02:42, 1 October 2021 (UTC)
Save load actions
Latest comment: 3 years ago2 comments2 people in discussion
Note that you should check each edit carefully, be prepared to fix mistakes and also respect the recent changed feed by not slamming it with rapid-fire changes without a bot flag. Inductiveload—talk/contribs18:23, 30 September 2021 (UTC)
Would you mind doing this with a bot flag? The printer did not insert spaces after em-dashes and it seems to be a by-product of some computer doing it automatically. I'm happy to check over the work, but I really don't think that there will be any actual em-dashes with spaces in the original text. Mary Shelley used hundreds of em-dashes, so it's quite a tedious task to do manually. Also, I doubt that she had a modem in her voiturier. Languageseeker (talk) 18:32, 30 September 2021 (UTC)
Checked over all of them and there was not a single instance of when there was an actual space before or after an em-dash. Interestingly enough, there was one page where there were spaces before and after a bar|2. This was a nice way to wipe out several hundred mistakes in one go. Some of the pages were even marked as validated. This might be a good scan to run on Index pages that are proofread/validated because I suspect that this will not be the only case. It might make a good clean-up project. What do you think? Easy to run (hopefully), easy to check, and has a high impact. Languageseeker (talk) 20:28, 30 September 2021 (UTC)
@Languageseeker: Well, the script to run is right there ↑.
For less script-y people who also don't want to use AWB, a web front end to PWB that shows you replacements is possible, but it's basically just w:User:Joeytje50/JWB:
AFAIK, that tool works on Page pages now (it didn't used to, but I moaned at the maintainer). I don't really use it because I use User:Inductiveload/quick_pwb.
Setting up a WikiProject Typos (or whatever it's called) is possible, but you'll have to find someone else to run it, as I do not have bandwidth.
Latest comment: 3 years ago4 comments3 people in discussion
I was wondering if it would be possible to have a bot that would reOCR texts that will be featured in the MC. Most of them have OCR that is more than 10 years old. It would make it much easier to proofread with better OCR. Or would it be possible to do it offline and reupload them? Languageseeker (talk) 22:19, 6 October 2021 (UTC)
Either is possible. Probably updating the file is slightly easier, but it also depends where the OCR came from and if there's red pages in the way. Also the new OCR tool will use a new Tesseract, same as if someone did it offline. It might be more scalable to have a gadget to load the OCR tool output on page create. Inductiveload—talk/contribs22:38, 6 October 2021 (UTC)
Is writing a gadget feasible? I feel that a lot of users are doing this already and it might save people time to automate this. It would also ease the task of proofreading. Languageseeker (talk) 22:44, 6 October 2021 (UTC)
Latest comment: 3 years ago6 comments2 people in discussion
Open Page:Treasure Island (1909).djvu/35; enter edit mode; hit "Show changes" without touching the edit field. Do you get a clean (empty) diff, or do the header/footer and pagequality stuff show up in the diff? What happens when you repeat while logged out? Xover (talk) 14:03, 11 October 2021 (UTC)
Latest comment: 3 years ago1 comment1 person in discussion
When looking through the Recent Changes, I see that you're doing some work on Copyright renewal. You probably already know this, but the NYPL is working on a database for copyright renewals with initial release here. Languageseeker (talk) 22:11, 14 October 2021 (UTC)
toc to toc conversion
Latest comment: 3 years ago4 comments2 people in discussion
Did you do this manually? I ask (mostly) because I was going to suggest that mpaabot maybe learn how to do this, but I don't want to be rude if you have plans for your bot to do this.
I have a TemplateScript script which mostly uses regexes to do it. It needs the {{TOC begin}} and {{TOC end}} adding manually and otherwise just makes a "best-effort" that needs manual tidying. The primary help is that {{dtpl}} and the {{TOC row ...}} templates have basically the same argument orders.
{name:'Dtpl',position:'replace',script:function(editor){lettext=editor.get();text=text.replace(/\{\{(dtpl|dotted TOC (page )?(line|listing))\|\s*\|\s*\{\{gap\}\}/gi,'{{TOC row 1-dot-1||').replace(/\{\{(dtpl|dotted TOC (page )?(line|listing))\|\s*\|/gi,'{{TOC row 2dot-1|').replace(/\{\{(dtpl|dotted TOC (page )?(line|listing))\|/gi,'{{TOC row 1-dot-1|').replace(/\{\{(TOC page line)\|/i,'{{TOC row 2-1|');editor.set(text);},editSummary:'Convert to {{TOC begin}}: as a single table, it\'s more likely to export cleanly'}
Maybe one day I'll work out a way to do it more automagically, but today is not that day.
So let's just settle on ^2? But yeah, in general, {{dtpl}} is a siren that lures you onto the rocks of broken exports. {{TOC begin}} and friends are not universally loved by all, but they do map unambiguously onto tables so if we can think of something better, they're easy to swap out later. Inductiveload—talk/contribs20:52, 16 October 2021 (UTC)
You got it ^2. And about the siren (Circe was a good friend to ole Rabo), dtpl is very person friendly, so, to be able to layout with that and have it converted to something more technically sensible has a very real appeal. You have that script in your RegExp editor gadget thing? Every time I look at that thing, the examples are (or were) "vote for me" templates and I cannot disable it quickly enough. I saw that between two of the toc pages you edited was only about 4 minutes. But the first was, like, an hour and 20 mins. Was that first writing the regexp instructions? If so, that didn't take long and I am impressed with that. That 4 minutes to make the changes work is its own kind of sirening....
My next script was going to benchmark png vs tiff saving. I thought it was GIMP but after using the scanner, I am pretty sure the problem is libpng, as even the scanner was confused with how long it was taking png to save and started its "scanning" progress bar again. Not that I am any great lover of libtiff.... --RaboKarbakian (talk) 02:26, 17 October 2021 (UTC)
The 1hr 20 was realising we needed {{TOC row 2dot-1-1}}, then having dinner then coming back and creating it. I have had the dtpl-killing script for a while.
I do recommend just never using {{dtpl}}, because it's just pretty awful for various reasons. {{TOC row 2dot-1}} and {{TOC row 1-dot-1}} are no more complex to use (except adding a {{TOC begin}}). The argument are basically the same: 1, 2 and maybe 3.
WRT TIFF, it will strongly depend on the compression, if any, you are using in the TIFF (TIFF is a wrapper format, not an image format per se, rather like PDF and DJVU), as well as the compression level used in the PNG. It also depends on whether you care more about speed or filesize. Both LZW and ZIP TIFF modes are pretty speedy, especially as LZW is explicitly designed for high speeds. But since they're not image-specific compressors, you will probably pay for that to some extent in filesize.
For example, File:Goblin_Market_029.tif compresses from TIF=23MB to PNG=16MB with a simple convert in.tif out.png command, but it takes quite a bit of time to do so (~11 seconds for me). pngcrush -brute on that PNG would save some more bytes, but it has not even finished yet. Of course, all options (TIFF/ZIP, TIFF/LZW and all PNG compression levels) are lossless, so there's no difference in the image: it's just a matter of how much you value your CPU time, the Wikimedia's disk space (don't worry, they have a 9-figure budget, they'll be OK) and your own Internet bandwidth. Inductiveload—talk/contribs20:09, 17 October 2021 (UTC)
Does pdf export work?
Latest comment: 3 years ago2 comments2 people in discussion
Hello. I would like to ask you as a person who knows a lot about exporting: I have just tried to export The Shoemaker's Apron into .pdf using the Download button in the top right corner, but I received only the main page with the title, note and contents. None of the subpages was exported, although I used the TOC begin and TOC row templates which are recommended for exporting books. Is there anything else wrong? --Jan Kameníček (talk) 18:28, 18 October 2021 (UTC)
Weird, I can't see anything clearly wrong, though I seem to recall an issue with a page with a quote in the title before but can't remember if that was the cause in the end. Punted to phab as phab:T293708. :-s Inductiveload—talk/contribs21:56, 18 October 2021 (UTC)
E_TOO_MUCH_LATIN
Latest comment: 3 years ago3 comments2 people in discussion
If you've any idea what's going on here I'm all ears. Shirley it's not the impressive page count it's choking on. I note Commons is currently doing it's usual stellar job choking on anything bigger than a 2MP JPEG, barely cranking out the file description page thumbnails, so I guess it may be giving PRP some crap data. But that seems like a weird way for PRP to fall down if so. Xover (talk) 18:42, 20 October 2021 (UTC)
@Xover hmm, the WS file page says 0x0px, but it's right at Commons. Smells like some kind of file storage/cache shenanigans rather than ProofreadPage. PRP just pulls the page count from file metadata, looks like a garbage in garbage out situation. I think punting to phab is needed here unless there's a control surface we can wiggle to unstick something. There was a stuck image cache for a DJVU that was updated weeks ago, since magically self-resolved, so something wierd is going on with files, could be related, it could be just more mystery. Inductiveload—talk/contribs21:23, 20 October 2021 (UTC)
@Shenme a better error message would make sense here (and I'll do that), but the gist is that author should have been just Horace James. For multiple authors, you can use override_author (and then you do you [[Author:xxxxx]]). Inductiveload—talk/contribs06:42, 21 October 2021 (UTC)
Latest comment: 3 years ago5 comments2 people in discussion
The Sprint idea for the MC has largely died because it always felt a bit artificial. I was wondering if it would be possible to reuse the code to add a label to a work instead; for example, Easy; Old English; Formatting, etc.. In the database for the MC, there would be a parameter "label" which would control the text shown on the book cover. This label would be persistent. Adding a label can make it easier for users to know what's they're getting into. Languageseeker (talk) 15:52, 23 October 2021 (UTC)
I think it would make the most sense to have a separate data field called "label" that would control what is displayed in the ribbon. My intent is to use the ribbon to give users the sense of the difficulty or what needs to be done. For example, Easy, Long-S; Transclusion; Formatting; Images; Challenge. This is a request from a number of users who come from PGDP who say that this helps users on PGDP to select texts. Languageseeker (talk) 19:59, 25 October 2021 (UTC)
Latest comment: 3 years ago17 comments2 people in discussion
Hello. May I ask for help with some upload? I have downloaded two volumes of the journal The New Europe from HathiTrust and wanted to upload them to WS as they are not elligible for Commons. However, it seems that the Wikisource uploader does not support chunked uploads and so only files under 100MB can be uploaded. I have stored the two volumes at https://drive.google.com/drive/folders/1z-kkizbun9ItpgAw7-rhxdFLqXRdVBW1?usp=sharing . If you want, you can also convert them to djvu, but that is not necessary. It would really help! --Jan Kameníček (talk) 11:11, 25 October 2021 (UTC)
@Jan.Kamenicek in progress. The PDFs apparently will just not upload to Wikisource, though I mistakenly somehow got one up to Commons >_< So I have gone for the DJVUs which come out much smaller anyway due to being bitonal. Inductiveload—talk/contribs13:41, 25 October 2021 (UTC)
I do apologize, but only now I have noticed that volume 3 has a missing map there. I did not notice it before because the pages of the map are not numbered. I have extracted the map from another file of the same volume (which has some other pages missing and so I did not choose it for upload) and it is available at https://drive.google.com/file/d/1Ig26VPcY34E5zCJBhrwEvHo4S4yRT25A/view?usp=sharing . After page no. 256 there are two empty pages and the map should go either between them or instead of them. Do you think you could add it there? --Jan Kameníček (talk) 13:54, 25 October 2021 (UTC)
This is for v3. I will correct the pagelist then. I have not noticed anything wrong in volume 4, I hope I did not overlook anything. --Jan Kameníček (talk) 13:59, 25 October 2021 (UTC)
I will certainly use this offer. I only have to find some time to choose the best copies. It is quite difficult as they are not directly accessible outside the US, so I have to download all of them using a Hathi download helper, which is veeeery sloooow, and only then I can go through them and choose. --Jan Kameníček (talk) 15:59, 25 October 2021 (UTC)
Now I can see that you have uploaded a different copy of volume 3. Unfortunately, this copy is missing a lot of various pages, including title pages of individual issues and the appendix at the end of the volume. The map is missing there too. May I ask to upload the copy from my GDrive, only adding there the map after page 256, please? I am sorry for bothering you again. --Jan Kameníček (talk) 17:53, 25 October 2021 (UTC)
@Jan.Kamenicek oh right, sorry. The PDF can't be uploaded due to the server upload issue, and I can't really convert it from PDF t I'll DjVu easily. Can you just let me know which Hathi ID it is? It's much easier to just start from scratch in that case. Inductiveload—talk/contribs18:57, 25 October 2021 (UTC)
I am really afraid to write another problem here… Unfortunately, there are 2 extra pages in the copy. I have prepared the .pdf copy in my computer very long time ago and did not remember that I removed the pages, but as you downloaded the copy anew, they are there again. After page 225 there is an extra page with some blue piece of paper an then page 225 again. Cay you remove them, please? I do apologize for this neverending story, but it should really be the last thing… I am very sorry. --Jan Kameníček (talk) 06:49, 26 October 2021 (UTC)
Latest comment: 3 years ago5 comments2 people in discussion
I'm working on importing The Elizabethan stage (Volume 4).pdf from PGDP. The Index is split and it runs for 116 pages. Do you know if there's some easy way to combine the pages. So 1, 2 - Page 1; 3,4 - Page 2; etc. Maybe a checkbox to just ignore odd or even pages? Languageseeker (talk) 20:29, 25 October 2021 (UTC)
I've been testing it and it works wonderfully. Thank you. I've been using dp_reformat to import some of the more challenging/non-novel texts so that they can be run through the MC to teach users about formatting and to save lots of time proofreading these challenging/long texts. Languageseeker (talk) 16:17, 26 October 2021 (UTC)
Question about correcting errata and printer errors
Latest comment: 3 years ago6 comments2 people in discussion
It seems that the French are doing a better job at handling printer errors and errata than we are by striking a balance between silently correcting as PG and not correcting at all as enWS does. They have templates set up that allow users to enter corrections in PP which will not be transcluded, see [7]. Do you think it might make sense to create a discussion about importing this to enWS?
The implementation appears to be functionally exactly the same as {{SIC}}: <span class="coquille" title="{{{1}}}">{{{2}}}</span>. They just use some CSS to make it green in page-space: body.ns-104 .coquille { .... }Inductiveload—talk/contribs15:34, 27 October 2021 (UTC)
I think there are differences in that the Page ns will indicate that a printer error has been corrected and that in the tranclusion, there is an option under “Options d’affichage” to show the printer errors/errata corrected. Instead of displaying the correction with a tooltip, they directly tranclude the corrected text. [8]. :: Also the French seem to have a sic template and an errata template to distinguish corrections made by the printer and those made by Wikisource. It seems like it might make sense to distinguish the two. Languageseeker (talk) 15:44, 27 October 2021 (UTC)
Looks like that's JS, something like Mediawiki:Gadget-Visibility. We can put it on the "would like to get working one day" list along with improving the whole visibility system. As usual, I kind of feel that this should be aiming to upstream into the Wikisource extension so all WSes can benefit.
In terms of the template here, {{SIC}} already does what it needs to to enable this (and more, actually). An {{erratum}} template to allow the proofreader to inline the erratum into the relevant location would be a good idea. All it has to do is set the class. Inductiveload—talk/contribs15:50, 27 October 2021 (UTC)
Turns out there already is a {{errata}}. I agree that upstreaming it eventually would be a good idea. However, the current tooltip system is completely broken on mobile. Do you think it might be a good idea just to import the code for now and then worry about creating a more elegant/universal solution later? Also, I feel that {{sic}} and {{errata}} should be used for displaying and exporting like the French do because it creates a far better experience for users. Languageseeker (talk) 21:06, 27 October 2021 (UTC)
I do agree with {{sic}}, but it's been like that for ages and I don't have the energy for it. I would say fix SIC and then transition {{sic}} once it can be made toggle-able so people can choose.
{{erratum}} seems a good implementation using footnotes already, short of a full-on JS solution. Tooltips are broken indeed. A better thing would be some kind of popup. But....time and effort. I do not have bandwidth for that at the moment, but you can try to play with it as a user script. Inductiveload—talk/contribs22:12, 27 October 2021 (UTC)
I completely understand. There is always so much to do and only so much that one user can do. I dare say that you do more than your fair share on this site. The good thing about templates is that they don't have to be perfect, only functional. Some day, someone may come along and write some brilliant code that will do everything perfectly. Until then, I'll have users mark printer errors with SIC and errata with erratum. Hopefully, more users will join soon and lighten everyone's burden. Languageseeker (talk) 21:57, 28 October 2021 (UTC)
Problem with MC
Latest comment: 3 years ago5 comments2 people in discussion
Sorry to be spamming you. Feel bad, but ... I'm trying to add two texts for the November MC that only require transclusion (for new users that want to practice transclusion and to help clear the backlog). However, despite being marked as "Not Proofread," they are still being sent into the Completed Texts section. Is there anyway to fix this? Languageseeker (talk) 21:38, 27 October 2021 (UTC)
That seems to have worked: partially. The not-transcluded text are indeed in the right spot, but texts that have initial = proofread now show up in the To Proofread section instead of the To Validate section. Probably a case of the programmer's conundrum: squash a bug, make a bug? Languageseeker (talk) 23:53, 27 October 2021 (UTC)
The Dotted cell template enables to add more spaces between dots and also to replace the dots by a different symbol. I have been experimenting with this for the TOC row dotragged too, but have not succeeded. What do you think, would it be possible to add such a feature too? --Jan Kameníček (talk) 17:17, 28 October 2021 (UTC)
@Languageseeker ha, I was just coming to say I tried the DjVu from the IA but the OCR is still not ideal, so I'm going to regenerate from JP2 and see how that goes. I do not know why the OCR is like that but I think it's probably a historical issue with the IA when that file was generated. Inductiveload—talk/contribs21:08, 29 October 2021 (UTC)
Latest comment: 3 years ago3 comments2 people in discussion
It seems that users are spending quite a bit of time removing “” ‘’ from this work. Is there a way to use a bot to change “” to " and ‘’ to '?Languageseeker (talk) 12:03, 30 October 2021 (UTC)
-prefixindex:Tarzan and the Golden Lion - McClurg1923.pdf
-namespace:Page
-summary:Convert curly-quotes to straight quotes for consistency in this work
-regex
[“”]
"
[‘’]
'
Thank you! One of these days I'll get into PWB and then I'll probably regret not getting into it early, but, for now, I have a bit too much on my plate already. Much appreciated as always. Languageseeker (talk) 13:58, 30 October 2021 (UTC)
Latest comment: 3 years ago10 comments4 people in discussion
@Xover: I updated the File on Commons because of several heavily damaged pages in the original version. However, now the proofread pages need to be shifted by +1 starting from Page 2. I'm pinging Xover in case you're not available. Languageseeker (talk) 14:16, 30 October 2021 (UTC)
move text from Elizabeth Fry (Pitman 1884).djvu/2 -> Elizabeth Fry (Pitman 1884).djvu/3, etc. Replace Elizabeth Fry (Pitman 1884).djvu/8 and Elizabeth Fry (Pitman 1884).djvu/9 with the images from [10]
Righto: Done.
FYI, in future if you could also say exactly which pages to move in terms of a page range (or several page ranges) that helps be sure what I am about to do aligns with what you had in mind. For example: you could say "pages 2-216, offset +1". It's OK if the range contains pages that don't exist. Otherwise, I have to figure out for myself that, yes, indeed the page range goes all the way to 216 and the whole range is an offset of +1. Inductiveload—talk/contribs18:41, 30 October 2021 (UTC)
@Kathleen.wright5 I'm also trying to include some maintenance projects into the MC. Right now, there are over 700 Indexes that have either been fully proofread or validated, but not transcluded. Even if enWS would transclude one-a-day, it would take over two years to clear the backlog. I'm hoping that by featuring some of these in the MC, it will help to clear this backlog. Languageseeker (talk) 15:18, 1 November 2021 (UTC)
@Languageseeker I can do it, but it would have been ~900 times easier if done that before splitting. As usual, my recommendation is to slooooow dowwwwwn and think things over before rushing into the first action you think of. Inductiveload—talk/contribs07:29, 4 November 2021 (UTC)
Actually I do not think this is correct. There are lots of multiple-spaces and not all of them are new lines:
Abandonnemént. ''at randome, dissolutely, licenciously, profusely,with libertie.''
Abandonner: ''to abandon, quit, forsake, forgoe, waiue or give ouer, shake or cast off, lay open, leaue at randome, prostitute vnto, make common for, others; also, to outlaw.'' Abadonner la vie de tel au premier qui le pourra tuer. ''to proscribe a man; (is ever to be vnderstood of a Soveraigne, or such a one as, next vnder God, hath absolute and vncontrowlable power ouer his life.'' s'Abandonner à plaisirs. ''sensually to yeeld, or become a slave, vnto pleasure; wholy to captiuat, or deuote, his thoughts to delights.'' Fille qui donne s'abandonne: Pro. ''A maid that giveth yeeldeth.'' Il commence bien à mourir qui abandonne son desir; Pro. ''he truly begins to die that quits his chiefe desires.''
Ideally we want to find a transform that will allow us to leverage the Mediawiki definition list markup like this:
; Abandonnemént.
: ''at randome, dissolutely, licenciously, profusely,with libertie.''
; Abandonner:
: ''to abandon, quit, forsake, forgoe, waiue or give ouer, shake or cast off, lay open, leaue at randome, prostitute vnto, make common for, others; also, to outlaw.''
:; Abadonner la vie de tel au premier qui le pourra tuer.
:: ''to proscribe a man; (is ever to be vnderstood of a Soveraigne, or such a one as, next vnder God, hath absolute and vncontrowlable power ouer his life.''
:; s'Abandonner à plaisirs.
::''sensually to yeeld, or become a slave, vnto pleasure; wholy to captiuat, or deuote, his thoughts to delights.''
:; Fille qui donne s'abandonne: Pro.
:: ''A maid that giveth yeeldeth.''
:; Il commence bien à mourir qui abandonne son desir; Pro.
:: ''he truly begins to die that quits his chiefe desires.''
Latest comment: 3 years ago2 comments2 people in discussion
The one that says, maybe I don't want to open the lid on that mystery container at the back of the fridge because who the heck knows what will come crawling out. I've been having that feeling for a while regarding the magical mystery black box that is phetools. But since the PWB thing forced my hand I've had to start opening lids. Let me illustrate by the pseudocode version of the algorithm that makes the Phe OCR gadget so fast:
titles = SELECT page_title FROM <Index: namespace on enws>;
for title in titles
if not exists ocr_cache[title] then
generate_ocr(title)
Because the flip side of the fridge horror above is the feeling you get after fixing a bot that's been dead for a while and discover it's decided to download every single PDF and DjVu file on commons to warm its OCR cache. Having to do emergency database surgery to excise the ~70k jobs already queued up in its internal grid engine manager before the Toolforge admins come `round to have a wee bit of a chat is… Well, I don't recommend it as a habit.
This thing is so clearly an eldritch horror poking its icy cold tentacles through a weak spot in the skein between dimensions. Maybe not Cthulhu itself, but surely Th'rygh, The God-Beast or Sho-Gath, The God in the Box. Xover (talk) 18:32, 8 November 2021 (UTC)
Latest comment: 3 years ago3 comments3 people in discussion
I just saw a note you left with another user. Just what? How are we to know when changes like this happen? I used to get changes on the Scriptorium page on my Watchlist which I check but it doesn’t seem to be showing up anymore. It seems a major change to me, as a proofreader and quite distressing to be oblivious of it happening. I am working on a Beginners’ proofreading guide. Can you tell me of any other changes that I may be unaware of? I’ve noticed you seem to have your finger on the pulse. I’d appreciate the support. Cheers, Zoeannl (talk) 23:30, 8 November 2021 (UTC)
Zoeannl, it doesn't mean that you need to stop using HWE and the community has not deprecated its use and it is still supported, just there is now an alternative. There are still situations where HWE has to be used. To note that we did have a conversation more recently that we do need to get better with our announcements with regard to changes taking place. — billinghurstsDrewth23:00, 9 November 2021 (UTC)
Pop goes the… Extension?
Latest comment: 3 years ago4 comments2 people in discussion
Yeah, I haven't looked at it; I just ran across the link and figured it might be relevant due to Reloaded (which I haven't looked at either). Incidentally, I'm cross-loading the enwp upstream of Popups instead of our locally-ported copy, and it is much nicer. "Good enough" rather than "Great", but everything is relative. Xover (talk) 09:58, 10 November 2021 (UTC)
@Xover reloaded is far from done, but even now 1) it's got lots of fun WS'y features (page image on hover anyone?) and 2) it's designed to allow pluggable extra modules (though the API for that isn't baked yet, so caveat implementor). Inductiveload—talk/contribs10:03, 10 November 2021 (UTC)
@Languageseeker Hmm, that would need an extra <span>, since the [[File:...]] markup doesn't accept a style parameter. Looks like the imgstyle parameter was an attempt to do that and I failed at it.
@Languageseeker: Unless there was a change in templates, we are talking about two templates. {{FI}} is a <div></div> based template and {{FIS}} is <span></span> template. The difference was to allow text to flow around the frame unbroken. Which one are you referring to? — Ineuw (talk) 07:45, 13 November 2021 (UTC)
The templates surround both the image and the caption with a border. The desire is to have an option that would just surround the image. Languageseeker (talk) 03:32, 15 November 2021 (UTC)
Option to Export Text Layer of an Index
Latest comment: 3 years ago23 comments3 people in discussion
I was wondering if there's an option to export the entire text layer of a Index similar to how PGDP can export the concatenated text file. The output would be something like this
====Page:1====
Status: N (Not Proofread) B (No Text) P (Proofread) V (Validated)
<header>
header text
<body text>
body text
<footer>
footer text
====Page:2====
This would be a great way to be able to search for common problems in the Index and also to be able to have a copy of the raw wikicode for an Index. Languageseeker (talk) 17:33, 8 November 2021 (UTC)
@Languageseeker hmm, interesting. It's probably pretty easy to do with Python.
Well, once the Python exists, it can be deployed on Toolforge. 100x easier than getting it into the extension (for one, it would need a formal format to be defined). Inductiveload—talk/contribs17:58, 8 November 2021 (UTC)
I was thinking that maybe Proofreader Page might actually be the best place to add this code. Ideally, there should be an option to export/import a project. Right now, there's no easy way to export the data from a project or recreate it in another space. Ideally, this would generate a zip file on zip file that would contain a text that starts with information about the file and metadata followed by the text from all the Pages as well, the original scan, and any media files. Languageseeker (talk) 21:50, 8 November 2021 (UTC)
That could still be done on Toolforge. Building into the extension is a good idea, but 1) you need a very well-defined format to fix and 2) it takes an enormous amount of effort and and even larger amount of time to get non-trivial patches accepted in the extensions. Easily an order of magnitude lower velocity than a Toolforge tool. Inductiveload—talk/contribs22:34, 8 November 2021 (UTC)
I see. It's a practicality issue. Could you add it to your already too long list of things to do? This will be important to users who wish to proofread and also those who wish to have a complete archive of an Index to either use on a different instance of Wiki or in whichever this capacity they wish. It's also an key to fulfilling the open-access philosophy/promise of Wikisource. Languageseeker (talk) 00:42, 9 November 2021 (UTC)
Ideally, there would be two level of backup
A pure textual one consisting of a concatenated text file.
A full backup that could be imported into a clean wiki-setup. This would include
Forgot to ask the all important question. Do you think that this is something that your can do or do you not have the bandwith/time for it? Languageseeker (talk) 15:24, 9 November 2021 (UTC)
@Languageseeker Honestly, I do not think there's a lot of benefit to building this into the PHP. Any format would be extremely specific and not generally useful. All the information you need is explicitly available on the API. I think you should really be thinking about what you want to achieve here. You use the word backup, so this makes me think that you're thinking of some kind of archival purpose rather than any proofreading-related purpose. Database dumps of the whole of Wikisource are made every month or so, so if you're after an archive, maybe you can check them out.
In short, I do not really have time or inclination for any involved tool without a pretty solid "business case". On the other hand, a quick hack-up of "get the wikitext of every page in an index" is not very hard. Inductiveload—talk/contribs15:32, 9 November 2021 (UTC)
For me, there is both a use for proofreading and for backup.
I think the business case for backup would be make it possible for users/institutions to get their work out of this project. I can imagine many cases where institutions or users may want/be willing to use the Wikisource platform to proofread as long as their are able to get the work out in an easy way. This weekend, on LinguaLibre, there was a similar case where a user was willing to contribute because they were expecting that they could download their pronunciations easily. As it turned out, there is no such way which caused some embarrassment and lead to the team downloading every pronunciation manually to avoid losing the user. I think that WS faces the same. Say the NLS would like to download their chapbooks. How would this be possible? Think about how many Indexs on enWS have images or repaired files. I've imported quite a few works from PGDP and one of the constant challenges that I face is that the text file does not correspond to the actual file on IA or HT. Keeping the text file with the image files/scan will make it possible to actual backup the work.
For proofreading, a system to import/export an Index will have several benefits. First, it will enable users with slower internet connection to contribute without having to worry about long load times or losing data. Second, it will also enable users to proofread an entire text or search for common errors. Finally, it will also make it easier to locate a specific error.
I think that "a quick hack-up of "get the wikitext of every page in an index"" is a great start and would be a wonderful thing to have. Would at least that be possible? Languageseeker (talk) 15:48, 9 November 2021 (UTC)
@Languageseeker Right, but what's a pile of wikitext, Lua and images going to achieve? You'd have to import it into a near-perfect clone of Wikisource as it was at the time of export. For what purpose? In case Wikisource gets nuked? WS Export already provides HTML export, as well as PDF, ePub, MOBI, text and RTF. Wikitext export is already completely possible via the API or DB dumps and can easily be done, but the format it ends up in will be wikitext and essentially completely useless except for feeding back into a wiki and more suitable for some kind of offline match-and-split-like workflow that feeds back to Wikisource itself.
Anything more than a straight wikitext dump of the pages in an index is weeks of work and an ongoing maintenance burden, so you really need to explain what it's for, other that "man, wouldn't it be fun if".
And if you do want to feed back into a wiki locally, then we already have Special:Export (probably with Special:PrefixIndex assuming people have done the Right Thing and used subpages properly), as well as the aforementioned DB dumps, API access and the Wikisource-dedicated export tooling. Inductiveload—talk/contribs16:03, 9 November 2021 (UTC)
I don't think that it's going to be anywhere near a perfect or easy process to import the files into another system. However, it would be possible. Creating an export for an index will enable users to do what they wish with the data in an easy in convenient manner. That is why the three most important aspects to export are the text layer, scan, and images. The other features are nice to have (especially transclusion ranges), but are not strictly necessary. For me, this is a central pillar of a commitment to maintaining open-access to the information produced. Anyone should be free to take the raw data produced on enWS and do with it as they please. Languageseeker (talk) 16:29, 9 November 2021 (UTC)
But it's possible now. Getting the relevant data from the API and/or a DB dump is no harder than getting the data out of some special-sauce WS-specific package format. In fact, it's probably easier, because there's probably already tooling for handling DB dumps in whatever language the user wants (certainly Python and PHP).
There's still no concrete use case beyond "sounds fun". You need to find a client for this feature and make sure what you're proposing actually works for them. Bulk archiving is already provided by the software. Yes, Special:Export is missing the images, but that's a defect in the core (phab:T15827, 13 years old), and should "just" be fixed (ahahahaha, I crack myself up) there rather than getting me to do more of the WMF's homework and piling on more external tools to paper over lack of upstream interest. Tl;dr go and complain at them.
A way to generate a Special:Export package for all the pages in an index without having to use Special:PrefixIndex may also make sense (i.e. leverage the tools we already have)
A dump of wikitext in one big file I can understand, because then you can use a text editor to do various fixes without needing to bot them in "live" (though you'll still need a bot to upload at the end). Inductiveload—talk/contribs16:46, 9 November 2021 (UTC)
For such an edge case, what is wrong with Special:Export and let the users work out how they manage it. I sometimes wonder why we are trying to replicate rarely utilised functionality when we have needed improvements. If it is something important, stick it into phabricator: with all the other TO DOs. — billinghurstsDrewth22:34, 9 November 2021 (UTC)
I don't think it's edge cases. I've been thinking about this more and here are what I think are some real scenarios in which this can help.
Checking the formatting of an entire Index. For example, say that an Index has plates that should all be 500px. Right now, if you want to verify that all the images are in fact 500px, you would need to open every page, then click on edit, and then check. (Also, hope that the pages with images are actually marked.) This can be quite time consuming. If you had all the pages as a single text file, then you can just use find to check them all. This case can be generalized out.
Finding an error in a book. When I read a WS text on my Kobo, sometimes I notice obvious scannos like "1t." On the Kobo, I can highlight the text which saves it to an annotation file. However, Kobo will save this as "Chapter 10: LETTER VII." So, if I want to find this error, I need to go to the transcluded work, find the right chapter, search in the chapter for the text, and then click on the page. It's a huge time waste. It gets worse when there are multiple Letter VII.
The ability to import the text would also make it easier to correct common errors such as "— " or curly quotes.
The ability to export/import images would make it much easier to replace poor quality images. Recently, I worked on replacing all 174 images in Index:The Adventures of Huckleberry Finn (1884).pdf because the existing images were cropped from the DJVU. The ability to export them with an accompanying XLS file would save a ton of time when it would come to reuploading them. As long as there is a reason column, there should be no technical barrier to using a script to override all 174. That is far faster than manually reuploading 174 files.
It could also become possible to generate metadata for missing images. It would generate the metadata for all the missing images in an XLS file. Once the images are added to the folder, the script could upload them without the user having to manually create the metadata. This would greatly speed up the adding of images.
In the long run, a proper system for importing/exporting text would enable the creation of an offline proofreading interface similar to AWB. There are many cases in which users might have a slow connection or just loading images from PDF/DJVU is simply a slow process. Creating a way to download/upload Indexes and individual pages would greatly speed up the work. Languageseeker (talk) 01:54, 10 November 2021 (UTC)
@Languageseeker: Re I need to go to the transcluded work, find the right chapter, search in the chapter for the text, and then click on the page. It's a huge time waste. It gets worse when there are multiple Letter VII. As a semi-tangent, you may appreciate the replace tool in User:Inductiveload/maintain, which allows you to highlight the text 1t and replace it directly in the Page namespace, if possible (usually it is).
correct common errors such as "— " or curly quotes. functionally, AWB, JWB or PWB (perhaps with User:Inductiveload/quick pwb) are existing tools that can do this already.
Most of what you're asking is already possible, and if you're already using a custom script, the normal API is much more reliable, available, tested and stable than any Toolforge tool would ever be. It still sounds to me like you are coming up with solutions to problems before you've actually worked out a workflow that has the problems. Inductiveload—talk/contribs21:58, 10 November 2021 (UTC)
Wow, I did not realize how amazing the API was. However, when I try to get the raw content for Mansfield Park or frWS, it seems that it does not show the content for all the pages and the pages are out of order. Is there any way to show the content for all the pages in order? Languageseeker (talk) 01:48, 11 November 2021 (UTC)
At some point you're going to need to process the data anyway. Sorting that array is a one-liner in Python: pages.sort(key=lambdapage:int(page['title'].split('/')[-1])).
@Languageseeker the generator is the one that will be implemented in phab:T291490. I'm halfway though doing it. Deployment will be when it will be. I have to finish it, and then shepherd it through code review.
For the data there, that's because you are not logged in, so you have lower API limits (50 vs 500). You will either need to make the API query from some logged-in session, or handle the continue.rvcontinue field correctly. Note, that since some books are over 500 pages long, you need to handle the continue data anyway. 13:06, 11 November 2021 (UTC) Inductiveload—talk/contribs13:06, 11 November 2021 (UTC)
Thank you for your wonderful and detailed explanation. I'm looking forwards to seeing the generator when it is done. It sounds very cool. Languageseeker (talk) 01:04, 12 November 2021 (UTC)
Transcluding
Latest comment: 3 years ago5 comments2 people in discussion
I am trying to transition from using {{page}} to using <pages/>. With your help, this has gone well, but I face a new problem where the book I am working on is missing two pages (discovered late in the game unfortunately). I handled this in my usual fashion which is to import (via JPEGs) the two pages needed from another copy of the book. The pages in question are pp. 412-413 (see Index:The Reminiscences of Carl Schurz (Volume Two).djvu). My problem is I don't know how to transclude the patch pages except by using {{page}}. I have done this for The Reminiscences of Carl Schurz (book)/Volume Two/Chapter 8, but the transition between pp. 411 and 412 is poor (the one between pp. 413 and 414 worked fine since p. 413 ends with a complete paragraph). How do I make this work smoothly without using {{page}}? Thanks for any suggestions. Bob Burkhardt (talk) 17:08, 15 November 2021 (UTC)
@Bob Burkhardt the best thing to do here is to repair the scan by inserting the missing pages, then you can keep it normal. I did this using those two files and then moved the pages into their new homes and adjusted the transclusion (obviously this is easier when it's the last chapter of a book!). Wikisource:Scan Lab exists for this kind of repair - if you notice that a book has a defect, you can get it fixed there and hopefully it'll be done before you get to the pages in question. Inductiveload—talk/contribs17:44, 15 November 2021 (UTC)
Latest comment: 3 years ago1 comment1 person in discussion
With the stash bug fixed, is it possible to do batch upload of periodical again? I think that The Dial, Volume 75 is a good example of how having scans enables users to scan-back works published in periodicals. Languageseeker (talk) 14:54, 16 November 2021 (UTC)
Unpurgeable stale thumbnails
Latest comment: 3 years ago2 comments2 people in discussion
Latest comment: 3 years ago3 comments2 people in discussion
Hi. Do you have any suggestion on how to manage the indentation e.g. in the first 2 lines of Page:Dictionary_of_National_Biography._Errata_(1904).djvu/296? The first line could be managed with {{hi}} but what about the second? If there are no existing templates that can be simply combined without being too hacky, do you have any suggestions for a custom template? Also considering that it would be used all over the place. Thanks Mpaa (talk) 22:11, 19 November 2021 (UTC)
@Mpaa hnaging indents are fundamentally a hack of a negative text-indent and a padding or margin to give the first line a space to "hang" into. Usually the two are the same (but one is negative). So what it looks like you need there is a padding greater in magnitude than the negative text-indent:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Thanks. A change in {{tl|hi}} to set "text-indent" independently would do but I can't see a way of keeping a nice interface and compatibility. Maybe a dedicated template would be better then. Mpaa (talk) 20:32, 21 November 2021 (UTC)
Switching from px to em
Latest comment: 3 years ago2 comments2 people in discussion
I didn't dismiss your advice about using "em" instead of "px". At first when I implemented it, the images were smaller than what I was used to. Then you reminded me about adding "!important" which made a difference, but still the image is 8 pixels smaller when converting pixels to em and measured with a pixel ruler. Could you please look at this page with the two images and tell me what I am doing wrong? Same images two sizes.— Ineuw (talk) 04:30, 21 November 2021 (UTC)
It looks like the font size is actually set to 14px because there's a global CSS rule for . .vector-body { font-size: calc(1em * 0.875); }. 14px * 32 = 448px, which is indeed the width of the box.
Exact sizing on the order of 10% is not incredibly important anyway, because it strongly depends on the user-agent. You might see it as 14px = 1em in this exact case, but that presupposes a "base" ratio of 1rem = 16px, which is only a common default on desktop browsers and may be wildly different elsewhere (even on desktop, if someone has changed their font size). Inductiveload—talk/contribs18:36, 21 November 2021 (UTC)
ppoem and left overfloat
Latest comment: 3 years ago4 comments2 people in discussion
Looks like you quickly run into trouble with the 2em left/right gutters when you stuff arbitrary strings (i.e. wider than 2em) into these. Not sure whether that's "Don't do that then.", adding template params to control the gutter widths, or pointing the issue to index CSS. Possibly we could guesstimate the needed width based on the string length and handle it automagically, but that sounds… hacky. The one I ran into is a once-off that can squeak by as is, and I haven't seen that in any of the other works I tested ppoem with, so I'm going to leave it to simmer for a bit. Xover (talk) 20:36, 21 November 2021 (UTC)
Note before adding template params: you can also control gutters with CSS using ws-poem-left-gutter and ws-poem-right-gutter. Maybe params are better, but like you say, lets see what boils over! Inductiveload—talk/contribs22:24, 21 November 2021 (UTC)
IME and ULS and Beta Code (and interested in testing?)
Latest comment: 3 years ago3 comments2 people in discussion
Awhile ago (5 Sept) I mentioned I was writing a user script to do easy keyboard entry of Ancient Greek polytonic script. Using w:Beta Code as was the basis for your template experiment {{betacode}}.
And it was <!*wonderful*!> with multiple options for visual feedback and all the easiness of Beta Code. I used it a lot at EL on a bit of ambition.
But it felt a bit hacky and was a lot of work, working around Wikimedia and doing all my own UI visual displays. And I remembered your comments about ULS and jquery.ime.
After several days of discovery and coding I've convinced jquery.ime to do Beta Code, with both strict rules and deliciously loose rules. It isn't as beautiful, but still wonderful to use, and the Wikimedia people will not actually hate it.
But it *is* rather hard to demonstrate 'live'. I've worked out a way to bootstrap the development copy of the new rules into online wikisource, but it requires a localhost HTTP server and many (>10) steps to force it into the live wikisource IME for use and testing.
Thing is, a Beta Code implementation using jquery.ime's very basic tools was complicated. I had to write programs to generate all the rules. Even the strict rules set is 200 rules of magic (~275 total). The loose rules - very kind to users - is 1200 rules of magic (~1275 total). The previous largest jquery.ime rules set was 179 rules. The wikimedia people might have heart attacks?
Soon I'll be ready to submit a pull request at Github, but I understand they are kind of slow(?) merging pulls into that project, and then merging jquery.ime updates into Wikimedia. So figuring out how to excite people is a goal?
@Shenme this looks amazing. I don't have much advice for getting things into the code base other than "be very, very patient" and then "be more patient" and then "don't kick up a fuss and just take the beating when you have followed existing code and documentation but still get told it's wrong and go round the the review process dozens of times" because the process can take months and months and can also be incredibly frustrating to the point that I sometimes seriously consider kicking a dustbin across the room and giving up on anything that needs a merge. "Fortunately" the deployment pattern for userscripts and gadgets is so broken that I keep going back to merges as a way to get any code deployed anywhere near sanely. You have probably also noticed this by trying to deploy something locally.
I will try dig into it, but my initial feeling is that using combining diacritics will allow a substantial saving in rule lists.
Even if we cannot get it into the upstream, we can first deploy here as a gadget, and if the ULS people still can't be persuaded of its utility, we might also be able to squeeze it into the Wikisource extension. Inductiveload—talk/contribs07:50, 15 November 2021 (UTC)
Latest comment: 3 years ago7 comments2 people in discussion
I've frequently been getting this message "The time allocated for running scripts has expired." on the main MC challenge page and none of the books are showing. Could you please take a look? Languageseeker (talk) 23:39, 18 November 2021 (UTC)
Latest comment: 3 years ago14 comments2 people in discussion
I found a book that did not have a printed copyright, that is probably from 1931 but maybe from 1915 and I scanned it. I scanned it to jp2 and duped that into png and uploaded them here.
If necessary, I will enter it into a process to get approval or no, although, that didn't go so well here, so maybe at the Commons but some steering me into the right direction would be appreciated.
As I see it, worse case, the scans are not approved and go with the files to be released in 2029. That is not so bad (its not great though). So, the files are at the commons. Included with them is an advertisement for this book from another book which is here. There is an ad for the other book also in that cat, I really thought I saw the image that is on that ad in this book but have failed to find it.
A couple of other things. If you prefer jp2, I can manage that (I found a place to upload). Also, If you would like a really good for tesseract set of dups, I can make those (I just need to dust my script off) -- I don't have tess on this computer, so I cannot provide the text files.
Also, thanks for cleaning up that toc! I should have looked at that also because all of the other multi-page things needed tweaking also.--RaboKarbakian (talk) 03:44, 18 November 2021 (UTC)
((also, I scanned the blank pages because I scanned all of the odd pages first and the even pages second and I can juggle, but maybe not so well. The next book is a 2 volume set from 1896!! They are some of the most beautiful books I have ever had my hands on!!))
OK, well I can make a DjVu out of that easily enough (is that the question?) Some hints for the next scan:
For the images, do try to get into the spine a bit more, because the scanner has a very low depth-of-field and loses focus in the gutter: phab:F34753564. This is hard on a flatbed, but if you're hoping to scan a lot, you may consider a book scanner with an "edge bed" like an OpticBook (I don't know if that's actually any good in terms of image, I just know the 3600 model is cheap on eBay, I don't have one myself). Alternatively, the time-honoured DIY method is a (good) camera on a tripod and a sheet of glass to flatten the page. This is slow and fiddly, but gets excellent results, and if your camera, lens and lighting is good it is probably better in terms of colour reproduction than whatever manky electronics they shove into a consumer-grade scanner. I wouldn't want to scan a whole book like that, but maybe it's practical for only the images. The next step up is a v-cradle scanner like the IA themselves use, but that's some serious DIY unless you're really scanning a lot of books. The actual optical setup is still a real camera + glass sheet, it's just a question of throughput at that stage.
Do try to rotate the files before uploading. If you're already batch-converting JP2→PNG with ImageMagick, it's in the same command: mogrify -format png -rotate 90 *.jp2
You do not need special versions for Tesseract - it has a built-in binarisation step that will handle these images perfectly well. It's when the image has poor contrast between text and background (e.g. dark paper, light print, bleed-though, bad scanning or something like that) that you might consider a pre-OCR processing step.
I have a great camera, (I worry sometimes that it is worth more than me!) and I saw the howto at IA but maybe I should get my brother onto the hardware. My inter-library loans will end soon though.
Just for your growing ability to look at images and figure out what happened between inception and delivery: I blocked off an area of the scanner bed (using the dividers from a box of tea bags, actually) because the edges of the glass don't scan, even if their rules make it look like they do. I had the scanner software rotate the even numbered scans as I used the same area for both sides of the open book. My conversion script is just a format conversion, nothing more. So, half of them were scanner rotated only. The covers were apparently on the scanner bed in the right direction. To clarify: the only rotation done by me was via the scanner, rotating the even numbered scans.
My script for preparing the scans for tess was the gutenberg recipe. I was reminded of it by the scans at Hathi, which have been very clearly posterized in a truly harmful way to what had been beautiful little line drawings (some sniffles, a couple of tears shed, the supression of a hatred for how this world works, etc.)
There is an interesting thing about this book. I have two copies, one is a nineth edition the other, less old and less with blown reds...., from Lippencott. Both from libraries. The unnumbered pages -- the pages in each are in a different order. All the worse because it is a poem I kind of know. So, </whine> and thanks for all the information and analysis!--RaboKarbakian (talk) 14:25, 18 November 2021 (UTC)
Yes, I can see you haven't rotated them, because they're (nearly) all sideways. My point is you can fix that in a few seconds with ImageMagick: mogrify -rotate 90 *.png, or even do it in the same command as the format-shift to PNG.
Generally, I'd say don't use any processing software that comes with a scanner, because it's pretty universally junk. Scanners are for getting the images onto a computer where you can handle them with real tools.
You do not need to process these particular images at all to feed them into Tesseract - that's only needed if the default binarisation fails. tesseract image.png - -l eng works just fine. I'd say you could try it for yourself with the OCR tool like this, but since the image is sideways, it won't work. Inductiveload—talk/contribs14:40, 18 November 2021 (UTC)
I mentioned that the two books did not match and that there are no page numbers. I compared the books and the book I scanned was clearly out of order. So, I reuploaded into the namespace and fixed it -- however, for whatever reason, the correctly ordered book was two pages less than the book I scanned. The whole process was very disturbing. The files can be renumbered if necessary (I started again from the end when I got a little mess up....) I think that the one black and white plate is actually an end page, and if I were to organize them, I would have put the last color image before the last text page, but this is the one book matching the other.--RaboKarbakian (talk) 20:35, 18 November 2021 (UTC)
To the best of my knowledge, Yes. Also, thank you so much for your time and patience and your sharing of knowledge thoughout this endeavor of mine.--RaboKarbakian (talk) 00:33, 19 November 2021 (UTC)
I completely missed this!! I had to get slapped around at the commons before I saw it, even. I hate it when I am the one who sucks. So, thank you so much! I got uploading to do, if they would stop slapping me around at commons.--RaboKarbakian (talk) 17:03, 29 November 2021 (UTC)
out of order books
I don't know where to take this so, please allow me to just spew here. I had those two Night Before Christmas books, and the one I scanned was out of order by the words. I think it was in order, however, for the pictures. It is early in my day and I am not together enough to look at it.
The pictures in The Night Before Christmas (Rackham) don't match the words. I am thinking about redoing it, in my User space, not scan backed, so the pictures can go with the right words. It is a little jarring to my sensibility as it is.
Also, post transcribing, I think the 1915 version must have been beautiful, where this old (probably 1931) scan I used looks like a reassembled thing. This scan I made, is just another fuzzy proof of the earlier edition.</spew>--RaboKarbakian (talk) 13:46, 30 November 2021 (UTC)
I don't know the reason I needed to type words about this, but I did. This was the best place to type them. I should have typed "thanks for the djvu", etc. The good djvu got me thinking. No action from you required. Thanks for the consideration.--RaboKarbakian (talk) 13:53, 30 November 2021 (UTC)
@Languageseeker I can do it, but could you please do the page list first? DLI books are often pretty poor scans, so I'd rather not convert it and only then find that there are pages missing or something. Also the easiest thing to do for these is to re-import as a DJVU with https://ia-upload.wmcloud.org, which will also do Tesseract OCR on the way through, so the result will be the same as if I did it. Inductiveload—talk/contribs17:06, 29 November 2021 (UTC)
Hmm, OK so actually it does not look like that import is going well at all! The DLI books are such a mess: they import the PDF and then the IA extracts to JP2, but because they're going "backwards" from PDF to JP2, the files end up encoding a ton of compression noise resulting in a >2GB tarball. I did think the IA import would work though, if it doesn't, that should be fixed. I'm downloading the PDF now (taking a while at 125kB/s): I'll convert/OCR it and upload if the IA-Upload does indeed fall over. But I'd still like a pagelist if you could :-) Inductiveload—talk/contribs17:56, 29 November 2021 (UTC)
I decided to scrap the idea. It seems like to much work for a poor quality scan. HT also has scans of the same edition, but they are behind a protection wall. I've requested them to remove the protection. Let's see how that goes. Sorry for the bother. Languageseeker (talk) 21:40, 29 November 2021 (UTC)
@User:Languageseeker it's not especially hard to convert, just takes time to download. I don't think HT usually respond to such requests (oddly enough, Google are good about that) but do let me know if they do. In the meantime, I'll happily upload a DjVu from one of the DLI scans if you can do the pagelist and let me know if it's complete. Inductiveload—talk/contribs21:48, 29 November 2021 (UTC)
It turns out it was the opposite problem: four duplicate pages. I corrected the page list. I haven't flipped all 1,500+ pages, but a sampling indicates that the scan is complete. Languageseeker (talk) 05:30, 30 November 2021 (UTC)
You do not usually need to flip every page - you can generally tell with good confidence that the scan is complete if the page numbering is correct at the end of the file. If pages are missing or duplicated, the page numbers will be out of step. In this case, you can tell the pages are probably correct because Page:An American dilemma the Negro problem and modern democracy (First Edition).djvu/1541 is correctly numbered as 1483. Of course, pages could still be jumbled or duplicates balance out missing pages, but it's very very rare for that to happen "perfectly" so that the pages still line up after the defects. Inductiveload—talk/contribs10:14, 30 November 2021 (UTC)
Thank you! I might run it for Jan 2022 because December is quite crowded already. Thank you for the information about the pages. I'll keep it in mind for the future. Languageseeker (talk) 22:20, 30 November 2021 (UTC)
We need some stats, stat!
Latest comment: 3 years ago7 comments3 people in discussion
Well, or not so "stat". But somewhere (MC summary? Some diff I saw somewhere in any case) you referenced the phetools page stats as a point of reference for the MC stats. So before the whole matter drops from my frazzled mind, I thought I'd mention that I have on my plan doing some work on the stats code in phetools at some point in the not too distant future (maybe). Prime mover is improving the graphs in various ways, and secondary is cleaning up the way the stats are persisted (it's currently dumping a stringified Python datastructure to a text file, and reading and exec()ing it on next run). But once I go digging there may be opportunities for other improvements, such as anything the MC might need. If you give me a wishlist I can try to keep it in mind whenever I get around to that project (over the Christmas hols at the earliest, and absolutely no promises on anything). Doing all the MC stats in phetools would probably require more "on-wiki knowledge" than is sane to implement there, but anything generic / cross-project-applicable that would help or remove friction is fair game. Xover (talk) 06:41, 2 December 2021 (UTC)
@User:Xover that's a kind offer. I don't think the MC actually needs much in the way of stats support from Phetools, the bot is chugging along happily enough.*
What I do actively miss is the ability to get the "uplifts" for the whole wiki on a year-by-month and month-by-day basis. For example, if I want to check the figures for November, I have to check on 1st Dec (and even then, that's only actually accurate for 30-day months).
* The change-tag-based progress history API will make it much easier in future to get change histories for sets of pages, but that's stuck in review/deployment hell, so who knows.
As for getting the sets of pages, the API for getting all pages in an index and using them as a generator exists now (docs here), but hasn't been deployed this week as expected because the RelEng folks are "distracted" and nothing is being deployed. Inductiveload—talk/contribs08:59, 2 December 2021 (UTC)
Ok, more work then! Backward compatibility, especially for the first one, will be needed to support old wikis. Mpaa (talk) 23:07, 2 December 2021 (UTC)
@Mpaa Out of interest, who is using ProofreadPage outside of the WMF deployment zone (and therefore are on older versions)? Also, if you have smart ideas about useful API stuff, there's a whole column on Phab for it, and I'll be happy to try to make a dream come true if I can. Inductiveload—talk/contribs23:12, 2 December 2021 (UTC)
In practice I guess no one, but in my experience I always got comment about compatibility when adding stuff to PWB, as PWB supports a certain range of wmf-versions. Sure, I will keep in mind the API stuff. Happy to see the Extension hassome more people to help Tpt. Mpaa (talk) 23:25, 2 December 2021 (UTC)
Latest comment: 3 years ago2 comments2 people in discussion
Hello,
Thank you for the tip, it is not always easy to know the best way to make the text look good with all those models lying around. I am actually pretty proud of myself for remembering the existence of {{fraktur}}! ^_^ Ælfgar (talk) 21:11, 2 December 2021 (UTC)
@Ælfgar the learning curve is pretty vertical, isn't it? Just thought I'd let you you know sooner rather than later.
BTW, normally you should reply to messages where they are left, otherwise it's just confusing. In this case, just reply on your talk page and I'll see it in my watchlist. Or you can ping me with @[[User:Inductiveload]] and I'll get a notification (just like you will get when I save this, because I pinged you at the start of the reply. Inductiveload—talk/contribs21:17, 2 December 2021 (UTC)
Indexes as atomic units
Latest comment: 3 years ago11 comments2 people in discussion
Another random drive-by drop of a thought unchewed…
That MC display of works with the progress bar under a pretty cover image and some metadata would be useful in several contexts. Think Featured Texts type things, or users' personal bragging rights on their user page. But since live querying for that is kinda gnarly… And conceptually in the same vein as the pagination API… If we view the Index more as an atomic unit, of a "collection" type, of which the Page: pages are members (rather than loosely-coupled references to external resources)… Perhaps it would make sense to track aggregate status in the Index, analogously to how a File page will tell you (and expose through the API) how many pages it has?
It wouldn't really be worth the effort for just a pretty display, but this "treating the index as a unit" thing has been nagging at me for a long time now and shapes other stuff. Like the Pagination API and what it enables.
I'm not sure it's a good idea, what it would actually be, its consequences, or its feasibility of implementing. But since you have your fingers deep down in the guts right now I figured I'd sorta seed the idea that might at some point sprout into… something. Xover (talk) 12:12, 4 December 2021 (UTC)
@Xover:, if I understand correctly, we do already have Lua access to "aggregate status" (i.e. the counts of pages of each status) via the Index object (this drives {{progress bar}}. We also have access to the index fields via Lua, which can be used for site-local stuff like transclusion status.
As for "real" API, we don't yet have index-level stats but it's on the list (and not too hard since the internals exist already). We do have index field access (in JSON, even) via Index data API.
We will also need more formal access for both Lua and API access to the Index data JSON, since that controls what index fields mean what. This can already be done with both Lua and API, but not in a dedicated implementation-independent way.
[ itym {{index progress bar}} :) ]Modulo caching, mw.ext.proofreadPage triggers a DB query on the associated Page: pages every time :pagesWithLevel() is called, doesn't it? I was thinking more along the lines of a page property on the Index: page, that generally only triggers work when the Index itself is rerendered (edited/purged). Imagine the performance if, say, Billinghurst decided they wanted an MC-like display of the bragging list on their user page (have you seen that monster?!?), or any other context where it could legitimately get fed a "MC times ten"-sized number of indexes on a single wikipage.The idea was that if the Index is conceptually an atomic unit that "owns" the Page: pages, it would make sense for it to actually track those counts (conceivably by edits to the Page: namespace updating the count when the level changes). That way it's a straight fetch of ~4 props from the Index, or a batch fetch of same for several Indexes. It may be waaay over-engineered for its utility, and the performance characteristics and scalability of the status quo may be far better than I imagine it to be, but, well, I rarely let insignificant little things like reality get in the way of a good technological philosophising. Xover (talk) 21:05, 4 December 2021 (UTC)
@Xover (phone posting so it'll be brief) actually it's only a single DB lookup for the stats of an index, since the pr_index table already maintains a running total of its pages. Also it's cached by the Lua interface too, so once you ask for a stat in a single page render, the others come for free.
The perf hot-spot is actually handling the tens of thousands of "dependency" pages (preview an MC page to see this in action), but since I fixed it last week we should be able to at least handle a hundred or so indexes on a single page render. In theory, we could add an "approximate" mode where the dependencies are skipped, at the cost of needing manual purging, since the page will no longer be sensitive to individual page status changes. Inductiveload—talk/contribs21:29, 4 December 2021 (UTC)
Ah. So you're way ahead of me, as per usual, in that the index already tracks the status in the pr_index table. But creating a dep on, currently, 19592 pages just to display a pretty gallery of 52 works is pretty insane. Trying to edit that page is downright painful!A "lazy" mode that doesn't update automatically seems pretty necessary, yes, and should probably be the default / strongly encouraged in the short term. If every Wikisource set up a MC patterned on ours we'd be talking a performance hit that would show up on the infrastructure graphs.But, without any familiarity with the code, it seems to me that this is a problem that screams for a "push" style solution. Is it feasible to update the pr_index counts when a Page: page is saved? A if($pageisnew){update_pr_count();}if($status_changed){$old_level_total--&&$new_level_total++;} kind of thing, in the vicinity of the code that handles the page quality tags etc.? It'd create contention on pr_index, but only per index and that'd be, what, at most 10 people editing concurrently even for 1000+ page works (EB1911 and similar), so even a direct lock ought to be reasonable, and there has to be some kind of "one at a time" async queue mechanism in MW somewhere that could be used for this in a pinch. If async, the risk would be multiple status-changing edits to the same Page: page that get dispatched out of order, so there'd have to be some purge-time magic to update all in the brute force way, but I have trouble imagining that'd happen very often.Hmm. In fact… this problem seems really similar to updating large categories, this case just has a much steeper curve (categories grow linearly, while we multiply by number of pages). Could we reuse some of the machinery, or just approaches, from the category code? Maybe not. Based on what little I know about how those are handled it seems plausible there are literal cron jobs running out-of-band maintenance scripts to update those. But I haven't seen the good old well-known update problems with cats in half a decade or so so they may have actually fixed that in a way that could be copied or co-opted for this purpose.There are edge cases galore with such an approach, but nothing a brute-force-on-purge couldn't fix, I don't think.Anyways… Enough whiteboard quarterbacking from me. I have some possible use cases for this, but no pressing need (and no time even had there been a need), so I'm just bouncing stuff off you to see if there's anything at all sticky down in this spaghetti bowl. I'm not expecting you to go rearchitect the world based on my half-arsed musings, is what I'm saying. Xover (talk) 08:22, 5 December 2021 (UTC)
@XoverBut creating a dep on, currently, 19592 pages just to display a pretty gallery of 52 works is pretty insane. You're not wrong, but those deps are actually required if you want the progress bars to be "live". Otherwise a change to one of the 19592 (!) pages will not propagate to the progress bars. But for "finished" works, it's barely an issue if the progress bar is live. Actually, for a finished work, the progress bar is pretty pointless anyway: looking at the index work status field for Proofread/Validated status is more useful.
The pr_index counts are indeed updated when a Page: page is saved: that is why it is a single DB lookup to get the counts - the many updates are amortised across the many saves. However, there's no (current) way to "sensitise" a page like an MC gallery to changes on each of the possible pages in the indexes, existing or not, without adding a template dependency on the page. It would be better if there was a way, as you describe, to allow the index page to "push" to interested pages that the counts have changed rather than having to have the individual pages do so. Something is updating pr_index on save, however, I do not know if there's a mechanism for marking an update like that in Mediawiki, to invalidate renders, or if it's something an extension can do, or what. Inductiveload—talk/contribs14:16, 5 December 2021 (UTC)
Oh, hmm, I see.This is the inverse of the "widely used template" problem: instead of 100k pages depending on a single template, it's a single page depending on 100k "templates" (which the parser would bail on because it can't do it performantly and hence has hard limits for). There must be some facility for it because those 100k pages using the single template do get updated eventually. But it is not always reliable so null edits are sometimes needed (or was at some point needed, anyway).So long as it's just the MC, and it doesn't grow significantly, you can maybe just about squeak by with this approach. But then it's a pretty specialised solution, and in any case the limitation ("do not use for more than ~50 works comprising more than ~10k pages") ought be glaringly documented. But that just makes me more convinced that your proposed "lazy" mode ought be the default, and if the "live" mode is even available it needs explicit parser limits built in. Bot-purging the MC pages a couple of times a day would seem to be a reasonable tradeoff (how fast do you really need those to update?), and the interactive performance on those pages is …. Well, the Performance Team would quite literally send the ninja squad in black helicopters to give you a talking to if it ever hit their radar.But this brings to mind the task flying around Phab somewhere (Tech Decision Forum maybe?), triggered by the latest insanity from the WMF: Wikipedia of Functions. The grand idea seems to be outright algorithms stored in a dedicated wiki that are reuseable on other projects, so that, say, Wikipedia can refer to "that Wikidata query for demographics", "fed into this Wikipedia-of-Functions function to calculate a ten-year average", and spit out the result (but global/cross-wiki Modules/Templates are apparently an impossibility). Due to the obvious scalability issues they're discussing implementing this asynchronously and having the actual MW parser just spit out a placeholder that gets filled in by client-side JS (how they expect to handle the new obvious performance issues with that approach is beyond me).That's years away from deployment as yet, but it might be one approach to explore. For a lot of the use cases for index status progress bars you don't actually need the pre-rendered cached page to have up-to-the-minute data; so maybe fire JS onLoad that asynchronously, but serially, updates the counts. You could even build in TTLs and caching to speed it up and avoid stampedes. Not as elegant as having it all handled by the parser, but right now it turns out the parser isn't exactly doing a graceful ballet with this either (the coughing and asthmatic wheezing is kinda spoiling the performance). Xover (talk) 15:01, 5 December 2021 (UTC)
Certainly it's a topic that needs more work, but the initial implementation seems "somewhat functional". Anyway, if they send the black helicopters for me, they can first consider that the only reason a complete nub who learned PHP a couple of months ago is even writing this shi...uh...stuff on their own and probably getting it hilariously and egregiously wrong is because "The" WMF is not interested in doing its own homework. So, yah boo sucks.
I have considered client-side loading for the MC stats in particular, but I'd really like to avoid that for various reasons including moving part count, bus factors, laziness, the dread prospect of getting that reviewed if we wanted all WSes to have it, etc. The current "bot updates a Lua table, which triggers a re-render" appears to be working tolerably well. And that, indeed is an "async" update: the stats can be a bit stale, currently by up to 2 hours, since that's the bot run frequency. We do not have any direct API to generate the data for that, which is why it needs a bot anyway.
So, if we had a "lazy" mode, it might still make sense to keep the current MC in "keen" mode, and then drop to lazy mode once it becomes archival (or even just drop the progress bars entirely). Inductiveload—talk/contribs08:27, 6 December 2021 (UTC)
That would require me to construct a coherent thought and do some actual research. Which sounds… ambitious. :) Xover (talk) 08:10, 7 December 2021 (UTC)
Of Periodicals and Pagelist
Latest comment: 3 years ago1 comment1 person in discussion
I saw your comment on Phab, but I thought that this might be a better place for a discussion. I understand your concerns for the gnomes. They work tirelessly to make keep this place running. However, when it comes to pagelists, I think that the benefits of having the periodical scans on WS outways the downsides of a backlog. Realistically, if we are to upload all the volumes of the general interest periodicals that are in the PD, they would amount to several thousand volumes. No one user can or should have to create the pagelists for all of them. Indeed, some of them will also require the finding and insertion of missing pages. With that being said, having these scans will enable users to proofread articles from them. The sheer difficulty of uploading periodicals causes users to either skip them or proofread them outside of WS and upload them as non-scan-backed versions. I also don't think that the WS model is that a user has to do all the work by themselves. It takes time to find the best scans for a work, more time to create the metadata, even more time to upload them, even more time to create a pagelist, and a huge amount of time to proofread the volume. Each one of these steps can be done by a separate user. Batch uploading the volumes without placing all the burden of creating the pagelists on either yourself or myself will enable more users to help out. Together, it will be done faster than either of us can do it alone. Languageseeker (talk) 02:10, 7 December 2021 (UTC)
John Curtis
Latest comment: 3 years ago3 comments2 people in discussion
Latest comment: 3 years ago14 comments3 people in discussion
Hi! Whenever I open any page of Index:The Origin of the Bengali Script.djvu, four transclusion tabs are displayed at the top (I have this feature installed). One tab is for the actual transclusion page, and the other three are Main Page, Main Page/sandbox and Main Page/sandbox2. And in those three pages, a Source tab is displayed at the top, pointing to the POTM index. It seems to me that Module:PotM/data behaves like a transclusion and as it is invoked on the Main Page and its sandboxes, our software treats the matter as transclusion. A user recently raised the issue in the global wikisource Telegram forum. Can you do something about this? Hrishikes (talk) 04:00, 19 December 2021 (UTC)
This is actually caused by the progress meter (I think), since that has a dependency on the pages, which counts as a "transclusion" even if there is no actual content presented. AFAIK this is a limitation of the MW core handling of parser dependencies: there is no way to declare a non-transcluded dependency. Or maybe there is, but I don't know of it.
I can work around it with a heuristic like "no sandboxes or Main page", which should cover 99% of cases (since there's nowhere else I can think of in mainspace a page could be "para-transcluded" to).
No, I got it by MediaWiki:TranscludedIn.js. And the Telegram group for Wikisource Global Community is quite old; Wikidata, Commons etc. also have their own global groups in Telegram, and all these are very lively groups. You can answer the matter in Telegram, by going to 1. Hrishikes (talk) 13:33, 19 December 2021 (UTC)
This particular matter has been solved in Bengali Wikisource, by keeping the Main Page in Wikisource namespace, instead of NS0. That's why bnWS is 100% scan-backed (see at 2), but enWS cannot be, even theoretically, because of the presence of the Main Page in NS0. Hrishikes (talk) 13:45, 19 December 2021 (UTC)
@Hrishikes: Not that it has any particular relevance to the issue at hand, but enWS currently has 206 966 mainspace pages that are not scan-backed. Since 2019 we also increased our number of Page-namespace pages that are "Not Proofread" from about 500k to over one million (for comparison we have 1 434 979 that are "Proofread" and 524 976 that are "Validated"). In fact, over a week or two in March or April this year we increased this backlog by something like 150k pages (meanwhile, the Monthly Challenge is averaging something in the range of 2000–5000 pages per month processed; not only will it take more than a decade to scan-back the current backlog, but the backlog is growing way faster then we can reduce it). Just sayin´… Xover (talk) 14:32, 19 December 2021 (UTC)
@Xover: -- I don't know whether you are aware, but I don't work only in bnWS, I also work here, at least from time to time. So I am aware of the general picture, if not the exact statistics, of the situation described by you. But my point was theoretical, as clearly mentioned, that it is not theoretically possible to make this site 100% scan-backed as long as the Main Page is in the transclusion zone. Anyway, that was a secondary point only. The primary point was the "Source" tab displayed at the top of the Main Page, which came up for discussion in a global Wikisource forum. I am not techno-savvy, so I could not respond there. That's why I asked IL here. Regards. Hrishikes (talk) 15:52, 19 December 2021 (UTC)
@Hrishikes: I should apologise for butting in: it was an unrelated rant, you just happened to mention it just when I was looking at the latest depressing numbers. :)I am indeed aware of your work here, and it is very much appreciated! Xover (talk) 17:02, 19 December 2021 (UTC)
@Hrishikes: You're cross-loading that script from mul:MediaWiki:TranscludedIn.js, so nobody here on enWS has edit rights to the script. But perhaps Candalua can help? It was written in 2012 and last updated in 2015 so it's probably ripe for some modernisation in any case (most scripts do, as a result of technological changes in the surrounding environment). Xover (talk) 07:53, 20 December 2021 (UTC)
@Xover: -- No need to edit that script. If you go to the Main Page, you will see the "Source" tab at the top, which is not dependent on user-script. That was the item that came up for discussion; and that source tab can be seen by anyone, without any script. The script cited by me just gives the reverse scenario from the Page: namespace, no need to do anything about that script. But can anything be done about the Main Page issue? Hrishikes (talk) 08:03, 20 December 2021 (UTC)
@Hrishikes: Ah. Yes, that is indeed a different issue; and that one would need to be fixed in the Proofread Page extension itself as that's where the script that adds the "Source" tab comes from. I'm not sure, though, whether that kind of special-casing would make sense; especially since the problem is currently enWS-specific (so far as I know).@Inductiveload: Trying to generalise… Would it make sense to say that the Main Page (and its subpages) should never get a "Source" link since it is by definition not a PRP-managed page (cf. the arguments usually advanced for moving it to projectspace)? There are various other special-case rules for the Main Page littered around already, so it certainly wouldn't be unique to PRP. I'm not aware of any Wikisourcen that use PRP for their Main Page, and would hence by design want the "Source" tab to show up there, but I guess one can never know. Or is this another camel-tongue on the side of soft-deps/lazy mode? That probably wouldn't work very well for this use case, I don't think, since for that particular progress bar you'd presumably want relatively "live" updates. Xover (talk) 09:16, 20 December 2021 (UTC)
Indeed, the main page is special at quite a deep level, as there is Title::isMainPage() provided in the core PHP. So it should be a simple fix to check that when doing the source page on the server side in TransclusionPagesModifier.php.
As for the script, I'm not sure what the completely cross-subdomain way to detect a link to the Main Page is from the client side only. From the Main Page itself, you can check mw.config.get( 'wgIsMainPage' ), but if you're on some random page and merely holding a string that says "Hauptseite" on deWS, I'm not 100% what you do then. You can query MediaWiki:Mainpage for the title, but that's a bit of a hack (though trivially cached). I wonder if the best option might be as drastic as adding eiexclude=mainpage to API:Embeddedin to filter it out on the server side and provide the functionality for all API users with a similar corner case. Inductiveload—talk/contribs09:35, 20 December 2021 (UTC)
Hmm. So an extra condition for the test in ProofreadPage::onOutputPageParserOutput(), along the lines of if($outputPage->getTitle()->inNamespace(NS_MAIN)&&!$outputPage->getTitle()->isMainPage()){…}? I thought that happened in ext.proofreadpage.article.js, but I see now it just unconditionally adds a "Source" tab whenever it is loaded and there is a #ca-nstab-main present.For JS, if one was to start futzing with the API code, wouldn't the more consistent approach be to expose Title::isMainPage() in mw.Title? Not that lists/generators couldn't use richer filters to push more of the stuff server-side, but… Xover (talk) 11:08, 20 December 2021 (UTC)
I think it's one of those things where the actual solution is to do both (actually: all three, there's the Lua mw.title API too). There are times when you want to be able to do it all client-side (e.g. when slinging strings about which don't come from an API query), there are times when you just want the server to hand you the Right Thing (TM) in the first place and there are times when you're in a template or module and you need it to be done at render time. Inductiveload—talk/contribs11:38, 20 December 2021 (UTC)
I have absolutely no interest in joining another proprietary chat platform, especially one which requires me to register with my phone number and doesn't allow a secondary account like Telegram. As far as I am concerned, if people have anything to say publicly they can do it in such a channel: on wiki or in IRC (or in some channel bridged to one of those).
Latest comment: 3 years ago3 comments2 people in discussion
HI, i have problems with Google OCR. Several times, not work (In my bot several pages have more 200 attemps to save a page with Google OCR). Now, in Spanish Wikisource and here, not work. I need to make several attempts (very much) to OCR works. I don't know if the problem is mine, or it is from the mediawiki system, you know? Shooke (talk) 15:26, 18 December 2021 (UTC)
I don't know, it's something between the mediawiki thumbnail service and the Google OCR service. The bug is tracked at phab:T296912, but I don't have any further clues to what's going on beyond what I wrote there (and I don't have access to the logs to figure out it out for sure). In the meantime, you could try the Tesseract OCR instead? Inductiveload—talk/contribs15:53, 18 December 2021 (UTC)
thanks for answering. Regarding Google's OCR, it is very good with Spanish, tesseract has quite a few shortcomings for this language. December 19 was perfect, not errors. But today not. It seems that the error is intermittent. Shooke (talk) 02:38, 21 December 2021 (UTC)
Index:The Poems of John DonneVolume 2 - 1906.djvu
Latest comment: 3 years ago2 comments2 people in discussion
@Jan.Kamenicek the issue is not related to that. The problem is that if you un-named parameters, whitespace is not removed. Lines that begin with whitespace are made into "pre" code-like blocks. You can fix it by removing the space before (shudder) the {{Dotted TOC page listing}}, or using the parameter name (1)
However, I personally think that an even better solution would be to use the wst-toc-aux class like this, since then you don't have to break the table up or use templates that don't export properly:
{{TOC begin|width=100%}}
|+ {{larger|CONTENTS}}
{{TOC row 2dot-1|class=wst-toc-aux|Note (not in original TOC)|vii}}
{{TOC row 2dot-1|Foobar|18}}
{{TOC end}}
That solution looks good. Thanks, I think I will use it. However, the extra space did not cause any problems until recently, so hopefully there are no more pages like this where this problem has suddenly arisen. --Jan Kameníček (talk) 12:54, 22 December 2021 (UTC)
I don't know. It as on my watchlist, it was about nuthatches (a lovely little bird with a terrible nasal song). It was deleted: so I don't know if I created it or added to it or moved it, and I don't know what was there even. So, I eliminated everything I don't care about and arrived at the one thing I do care about which is a wikidata link.
Reasons to make a page for a sitelink include: biological descriptions of species, genus, or family which, even if they are not the type species still get cited often. Mathematical formualas, especially in original proofs or as manipulated from the original for use. Recipes. I am sure there are more but this is off the top of my head right now. But anything that could be isolated from the text and stand alone as a "something". Type species, type genus and type families are first and foremost on that list. Biologists have been referencing type species since the 1600s and now that technology can really make the reference (not just an L1753) rules made by "English literature majors" with deletion tools (and other implements of distruction) prevent it from happening.--RaboKarbakian (talk) 18:08, 22 December 2021 (UTC)
@RaboKarbakian Most of those things are things that you make a reference for, not an item. A paragraph of an edition is not a standalone concept that would be modelled by a Wikidata item. It it able to be represented in a triple store, of course, just like anything can be, but, for the same reason that paragraph 67 of Spot the Dog Goes to the Supermarket, 2nd ed. does not have its own item, it is not worthy of a Wikidata Q-id.
Even if you did make a Wikidata item for your snippet (perhaps it's very famous like a verse of the Bible*), you should not make an orphaned, duplicated Wikisource page simply so you can have a sitelink for that Wikidata item. Wikisource's layout is not driven by any perceived need to fill in sitelink boxes at Wikidata. If you need to reference a specific text location at Wikisource for a WD reference, then you can use the reference URL (P854) and an anchor.
See commons:Category:Aporocactus flagelliformis at the top, it's TOL. The last item is the species name and behind that is a little L. That L. links to commons:Carl von Linné. They have been making that "link" since the 1600s, they being the biologists. It is there so the original can be found. There is another author behind that, as that species has had an updated and improved description. It could just as easily link to the text of the first (and updated) description of that if it were here. A text I worked on here that had a bunch of those "firsts" had them so embedded in the text that it was just easier to make a stand alone page for the species. And also, the first genus which needs the species, in that same book. It was an important book, a not so interesting portion of a series of actually interesting books, and the book as a whole is good to have.
So, I recommend that you spend some years with tree of life information and how it is set up, the different trees and the authorities and how the information fits into the several trees and branches out and then express your opinions about what qualifies as a good stand-alone or not. Or, skip that couple of years and trust someone who has done that. TOL is kind of interesting in the database management sense. Also, how much even science is politicized. I often had to decide to go with the name being used by the closest group to where the species was found. Like, New Zealand for Antarctica, etc. It is difficult for me to call anything I need to make a decision about a "science", but there you have it. Things need to have a name if they are to be discussed. But I am getting away from the point.... It would not be orphaned in the greater sense of the word. It is more likely that the complete book be orphaned due to not being able to link the important/specific parts at wikidata. If wikidata would accept anchored web links, this discussion would not be happening.--RaboKarbakian (talk) 19:06, 22 December 2021 (UTC)
@RaboKarbakian I don't understand your point. I understand Linnaean classification and biological authorities perfectly well, thank you. Commons having a category for Disocactus flagelliformis is not a surprise. The appropriate Wikisource sitelink for Disocactus flagelliformis (Q310976), if any, would be Portal:Disocactus flagelliformis, if it existed, and not some random paragraph in an 1881 book where it was mentioned. What you might do is link or reference some claim (maybe a reference on taxon name (P225) or described by source (P1343)) to the Wikidata item for a WS edition and qualify that with a page number, and a direct URL, perhaps with an anchor.
Your assertion about not accepting "anchored links" is incorrect: Wikidata can and does store full URLs:
Both of these will accept fragments (i.e. #...) as well as query parameters.
If you mean that sitelinks cannot have fragments, then you are correct, but, despite however many years of whatever it is, you may have misapprehended what a sitelink represents: it is not a shorthand URL, it is a statement that the sitelinked pages all relate to the same concept. The only item that could reasonably have a sitelink to a page Early Spring in Massachusetts (1881)/Nuthatches-1854-02-24 would have been an item about that exact paragraph. That hypothetical item would probably have, amongst others, main subject (P921) → Sitta (Q858577) (or maybe a specific nuthatch species if you knew which).
If you did actually have such a narrowly-defined item to fill a structural need (and you probably do not have such a need, but let's say you did), it still wouldn't justify either artificially splitting up WS works or creating a "shadow realm" of redundant transclusions at WS just because that would allow a sitelink at Wikidata. Inductiveload—talk/contribs19:50, 22 December 2021 (UTC)
Batch Downloading from Modern Journal Project
Latest comment: 3 years ago3 comments2 people in discussion
That said, I can (and will) add a "mjp" source to shortcut it. But it'll probably still just be the PDFs, or it's an order of magnitude slower and the MJP PDFs seem OK for quality and OCR.
Latest comment: 3 years ago3 comments2 people in discussion
I've had my eye on All The Year Round for a while. From the experience with previous magazines, I'm not entirely sure if all the faffing about that results from trying to combine HT with IA. The IA has a complete Sims set of All the Year Round. However, it consists of 2,048 files. Major ouch. I was wondering if it would be possible to batch scrape the links. For every id, there is also entry_meta.xml where <volume></volume> maps to volume; <issue></issue> maps to issue or can be "CONTENTS" for Table of Contents; <date></<date> is a bit tricker because it can either be year-month-day or Text year (e.g. Christmas 1859). Do you think this is possible? Languageseeker (talk) 16:32, 26 December 2021 (UTC)
@Languageseeker you mean can you scrape the contents of that IA collection and construct the information for the upload (ie the same data as presented in the spreadsheets) from that? If so, it should be possible. The real question is are the SIM sets going to give us what we want, or would we rather prefer other sources and backfill from the SIM data when needed? Inductiveload—talk/contribs23:04, 26 December 2021 (UTC)
Yes, in essence, it would take the transform the 2,048 into one excel file that can then manually be verified and fill out. I've thought quite a bit and we're basically dealing with SIMS vs Google scans in many cases. The Smart Set sims scan show that OCR produces excellent results which is all that matters. The SIMS have more background noise, but the Google have more aggressive reduction that can obsucure or remove parts of the images. The sims scans are also more likely to be complete. I think that in almost all the case, the SIMS set might actually produce better results with less scan repairs needed. Honestly, SIMS will work and save an enormous amount of labor from having to track down 2,048 issues.
Latest comment: 3 years ago8 comments2 people in discussion
Sorry about the mix up with the volumes. Must of been a product of tiredness. What should we do about the volumes that only exist as sim sets? Should I make a batch file to upload them individually or is there a way to combined the issues into volumes? Languageseeker (talk) 11:54, 23 December 2021 (UTC)
I've looked and can't find them anywhere. I'll make a batch file to upload the individual issues in the next few days. As always, thanks for your help. Languageseeker (talk) 12:58, 23 December 2021 (UTC)
Okay, sounds good. You're welcome :-) Lippincott's is coming next. I'm getting my money's worth from the ISP this month!
In general, we can upload single issues perfectly well. Just set an issue heading and I'll make it work somehow. The bulk of the grind is downloading and converting the images in the first place, and uploading the file with some kind of sane-enough metadata. Combining multiple indexes can happen "later", if and when complete volumes deign to appear.
It's not really a huge issue if we have a bit of a mishmash of volumes and issues - it's just easier not to if vaguely possible. Manually recombining issues and faking volumes up is more work than just living with the mishmash. After all, after transclusion, it doesn't even show at all! Inductiveload—talk/contribs13:08, 23 December 2021 (UTC)
So, it seems that the MJP pdfs are b&w, while the images are full-color. Do you think it's possible to download the images, make them into DJVU, and upload them? Languageseeker (talk) 23:35, 30 December 2021 (UTC)
I technically could, but it'll be a pretty huge amount of downloading and processing, plus I'll need to write a backend for the download script to scrape the IIIF manifest. The high-res loader works and as far as I can tell the magazine was printed in black and white anyway, so it's not really losing any detail. Images should never be cropped from the PDF (or DJVU) anyway. What's the use case here? Inductiveload—talk/contribs00:17, 31 December 2021 (UTC)
Latest comment: 3 years ago7 comments2 people in discussion
Sigh. P-wrapping is the bane of every attempt at sanity. All-one-stanza (p)poem spanning three pages, which needs LST due to intermixed textual notes, where the middle page ended up getting wrapped in a p tag (with attendant vertical spacing) unless I manually fudged the LST end tag onto the same line as the end of the ppoem. Might be worthwhile keeping in mind if mysterious "stanza breaks" start popping up. Xover (talk) 09:27, 25 December 2021 (UTC)
So the first day of Christmas is a paragraph in a parse tree is it? I'll keep my eyes half open for bad interactions but at some point I'm just going to grumpily shift the blame onto "Mediawiki" in general and gesture vaguely at "the parser". Merry everything! Inductiveload—talk/contribs23:09, 26 December 2021 (UTC)
Heh heh. Merry merry to you too. And in this case it is most definitely MW's fault: there's nothing we can do on the content side to affect this, except pray someone will tackle T253072 (cf. T134469) eventually.PS. If you have any suggestions on how to handle /164 and /165 I'm all ears. I really don't want to do them as dumb images-of-text, but I can't think of any way for us to even approximate that layout without full-on arbitrary webfont support. Xover (talk) 07:17, 28 December 2021 (UTC)
@Xover whoops sorry, I forgot to reply here. I really don't think there's much more we can do here. Even if we were to ship the modern equivalent fonts, it's still not quite the same as the original, and the exact form of the font is the content in this case. Even if you could channel the spirit of William Caslon though FontForge and generate a perfect reproduction font, we can't actually ship it. I'd say a nice clear image is as good as you can reasonably get.
Yeah, no, I wasn't concerned with perfect fidelity: I'm not that geeky about fonts. But all those do have rough equivalents in modern computer fonts (most if not all available in free beer-ish variants) so in a perfect world… *sigh*But, in any case, I meant the technical approach with the overlain transparent text to give cut&paste'ers and TTS systems something sensible to work with. Do $(".wst-iot-text").css("color","red"); in your console to see what's going on. I may move this to {{iot}} ("Image of Text") if I decide it has sufficiently general applicability to these cases. Xover (talk) 12:34, 30 December 2021 (UTC)