User talk:Inductiveload/Archives/2020
Add topicPlease do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date. See current discussion or the archives index. |
Img float
Hi. Do you think it would be possible to fix {{Img float}} to avoid the div-span-flip lint errors? There are quite many related to this template, occurring when divs are passed as parameters, e.g. see Page:Geological_Evidences_of_the_Antiquity_of_Man.djvu/151. Thanks.Mpaa (talk) 21:09, 4 January 2020 (UTC)
- It would be possible, but it would break all the cases where the image floats within a paragraph (e.g. Page:Coin's Financial School.djvu/93), which is what it was designed for. It originally didn't do centre-aligned imaged, specifically because they have to be div-based in order to centre the image in the container. That was added later, and appears not to work very well. There is no need for a template for most centre-aligned paragraph-breaking images, you can just use [[File:foo.png|centre]] or there is also the much more complex {{FreedImg}} (though I'm not 100% sure what that template really is for in the common case, it certainly is popular).
- I'd say perhaps we should consider stripping all uses of {{Img float}} that centre the image, converting them to FreedImg or something and then removing the option. Inductiveload—talk/contribs 14:53, 5 January 2020 (UTC)
large image not centering?
Hi, I’m using {{large image}} on Page:ONCE A WEEK JUL TO DEC 1860.pdf/527 at less than full width and for some reason the image isn’t centered; can you find what’s wrong? Thanks … Levana Taylor (talk) 21:50, 24 January 2020 (UTC)
- @Levana Taylor: (I am not Inductiveload) Please consider removing
|style=margin: 20px 0;
. You will find it is overriding the centring. 114.78.171.144 22:50, 24 January 2020 (UTC)- Indeed. The image container has a margin "0 auto" set on it by its own CSS, which is what produces the centring. Setting "20px 0" overrides this. You have two choices: use "20px auto" to override, but retain the left/right auto, or "margin-top: 20px; margin-bottom:20px" to override only the top/bottom margins. They have identical effects. Inductiveload—talk/contribs 23:04, 24 January 2020 (UTC)
- Agreed. I missed the subtlety of the top/bottom margins. Listen to Inductiveload! 114.78.171.144 23:31, 24 January 2020 (UTC)
- Thanks! Levana Taylor (talk) 23:47, 24 January 2020 (UTC)
- Agreed. I missed the subtlety of the top/bottom margins. Listen to Inductiveload! 114.78.171.144 23:31, 24 January 2020 (UTC)
- Indeed. The image container has a margin "0 auto" set on it by its own CSS, which is what produces the centring. Setting "20px 0" overrides this. You have two choices: use "20px auto" to override, but retain the left/right auto, or "margin-top: 20px; margin-bottom:20px" to override only the top/bottom margins. They have identical effects. Inductiveload—talk/contribs 23:04, 24 January 2020 (UTC)
css help needed
On Page:ONCE A WEEK JUL TO DEC 1860.pdf/636 I have got the images looking the way they should, on my desktop browser at different window widths, including the flex box apparently working. But I think the coding is too complicated and may have unwanted side effects. Can you untangle it? Sorry to bother you Levana Taylor (talk) 02:15, 30 January 2020 (UTC)
- Sorry for the delay, I could have sworn I had replied to this. The only real question I have is why not put all three into a single flexbox, rather than 1 then 2. in the original, they all seem to have equal "weight". I also made a little tweak to avoid needing a block center when what you really needed was just alignment. Other than that, it looks about as simple as it's likely to get to me. Inductiveload—talk/contribs 08:53, 5 February 2020 (UTC)
A second opinion : Page:Chronological Table and Index of the Statutes.djvu/363
Taking a hint from elsewhere I reformatted this into the format currently given.. I'd like a second view, mostly on whether I should further split up the See and See Also type entries. Thanks ShakespeareFan00 (talk) 15:48, 3 February 2020 (UTC)
- I am not quite sure what you mean by "split up" or "elsewhere". If you mean should you use CSS classes for styling rather than inline Wikitext, an HTML/CSS purist would say "yes, never mix semantics and styling", the pragmatist might say "no, that's a lot of effort and the wikitext will be a mess". I would tend to fall on the latter side of the fence. Using
;
for it's intended purpose of the<dt>
is good semantics, but the dl/dt/dd system in HTML is limited, and you have to inject further meaning though your own classes, which quickly gets unwieldy when it's entered manually and not, say, generated from a database. E.g.:
; Heading : Statute title :: <span class="_stat_ref">Reference</span> ::: <span class="_stat_seealso">See also: <span class="_stat_page">2</span> <span class="_stat_para">(b)</span> <span class="_stat_index_link">Charities</span></span>
- Possibly the "wiki way" is to use a template for the sea also lines, and if calling formatting templates from that template is too many transclusions, then use CSS instead within that template. But you might still surpass the transclusion limit, in which case, splitting the index might be the next remedy:
; Heading : Statute title :: {{statute_table/ref|Reference}} ::: {{statute_table/see also|2|b|Charities|....}}
- Hope that helps, but I realise that might not have been your question. Inductiveload—talk/contribs 09:08, 5 February 2020 (UTC)
- That definitely answers my question. I was also trying to figure out how to reformat things so I don't need to use the colon syntax and a {{left}} to get the various indented entries. Elsewhere someone said that wasn't a good idea. ShakespeareFan00 (talk) 12:42, 7 February 2020 (UTC)
Some entries are like this
;Heading (e.g Borough Muncipal.) :Subheading (e.g Officers.) ::Sub heading (e.g. Mayor.} ::: Topic (e.g Term of office not to extend more than 5 years.) :::: Statute citation :::: Statute citation ::: Topic (e.g Disqualifcation of, grounds.) ::::Statute citation ::::Statute citation
Which my current nesting (and styles) doesn't take into account. As I said elsewhere, I might need to rethink the nesting a bit. ShakespeareFan00 (talk) 12:42, 7 February 2020 (UTC)
- Nesting of dl's is possible to handle in CSS:
/* top-level headings */ ._stat_list > dl > dt { color: blue; font-variant: small-caps; } /* first level defs */ ._stat_list > dl > dd { color: red; } /* second-level defs */ ._stat_list > dl > dd > dl > dd { color: orange; }
- Using lists is basically the same, just change
dl
toul
anddd
toli
. You'll probably still need a template for intra-line formatting (eg right-floating of refs if you do that, linking of refs, plus the italics, SCs and bolding in the see-also lines) so consider per line templates too. Inductiveload—talk/contribs 13:07, 7 February 2020 (UTC)
- And I am backing to writing a series of article specfic templates, albiet with a common stylsheet (sigh). I was told quite passionately previously that approach (using templates like {{st-ie}} and {{st-it}} was the WRONG way to do this). I was also passionately told that the {{TOCstyle}} approach was too convoluted. I am now finding that the definition list method is also unable to reliably cope with the use-case. I'm sorely tempted to revert my existing efforts back to RAW OCR, and format the entire section as PRE text manually. I appreciate Mediawiki isn't perfect, but if it's not up to the task, then I am regretfully having to say it's time to do something else. ShakespeareFan00 (talk) 00:08, 9 February 2020 (UTC)
A possible implementation
https://jsfiddle.net/q8kh9rwm/ - How that could be implemented in TemplateStyles and per-line templated code I'll suggest someone else figures out, because it's certainly NOT straightforward with all the nesting. ShakespeareFan00 (talk) 19:50, 9 February 2020 (UTC)
- This is over-complex if you're going to put a class on the leaf element:
dl > dd.__topic > dl > dd.__statute
. Just use, in this example,.__statute
and don't use that class for anything else. You can have multiple classes per element too. Inductiveload—talk/contribs 19:32, 10 February 2020 (UTC)
- Hmmm.. I did it that explicitly because of the nesting needing margin adjustments. If it can be simplified feel free, if you have the time. ShakespeareFan00 (talk) 20:43, 10 February 2020 (UTC)
"Flat" content - (followup to Page:Hill's manual of social and business forms.djvu/108 discussion.
In the trail of the Scriptorium discussion you mentioned that Mediwaiki converts line-feeds (in what it sees as mid paragraph) into whitespace ( the "space" after an em dash in my attempted style using a definition list). It apparently even if all the content either side is inside tags.
I was considering in respect of this use case (and others) if it was worth asking for a <flat> tag to tell the parser to interpret the line-feeds in a different way (once it had converted the wikitext alias for certain HTML tags. and to remove the spaces between adjacent tags ( such as in the use case given between consecutive DD and DT tags.
Different interpretation of line-feeds is already implemented for the POEM extension.
Example syntax:
<FLAT> ;Term 1 :Definition A :Definition B </FLAT>
Would produce
<DL><DT>Term 1</DT><DD>Definition A</DD><DD>Definition B</DD><DL>
... Leaving the eventual user browser client to apply it own line-feed rules determined by it's internal styles , as well as any CSS supplied from the Mediawiki side (like the TemplateStyle I was trying to apply.) Having this flattened model available might prove useful in other situation as well, (provided that it was Proofread page aware of course.)
Can you think of other use-cases, or situations where FLAT might be useful or make it easier to do things? (generating certain types of other list or table came to my mind.) ShakespeareFan00 (talk) 09:02, 7 February 2020 (UTC)
- I can't immediately think of a situation where this would really help, other than this exact case. And even then, you're mostly just almost-lucky that the dl/dt/dd is semantically useful and has special shortcut Wikitext markup, but fell short of providing a magic solution for this particular page. I'd say it's just too much of a corner case to merit a whole extension just because you want to use
;/:
for a handful of glossaries formatted on one line per entry. Feel free to ask at Phabricator, but I don't think the maintainers will feel it's worth the upfront cost and ongoing overhead to design, build, audit, deploy, document and maintain such an extension. Also, it would just become confusing, I think. - As I already demonstrated, this can already be done easily enough in your use-case with a pretty minimal template, without bolting on domain-specific extensions. Perhaps you could write a TemplateScript tool to assist you, or use AutoHotKey or a text editor with macros, or whatever. Once it's done, it's done, and it doesn't need ongoing back-end software support.
- The "problem" is that Mediawiki's special dt/dd markup doesn't suit this exact page. It wasn't really designed for this, so it's kind of fair enough. In the same way, say,
----
markup doesn't really suit WS works. I'd stop hanging up so much about this kind of thing, find a workable solution (e.g. a simple work-specific template) and just move on. A template would be more flexible than definition lists too, as you can adjust in a single place. Inductiveload—talk/contribs 09:44, 7 February 2020 (UTC)- See {{synonym list/s}}{{synonym list/e}}{{synonym}} and the associated stylesheets. (At some point I wrote {{p-collapse/s}} to handle this situation for other types of non-list list. If you wanted you could perhaps generalise p-collapse to also be viable for other block type tags? ShakespeareFan00 (talk) 09:49, 7 February 2020 (UTC)
- Synonym and friends are totally un-documented and their scope and API is a mystery to me. They also don't look like generic templates, they look like they apply specifically to this work, so they're named overly broadly, IMO. I'm not sure what you're really asking for here. What "other block type tags"? In what context? What are you trying to achieve? What's wrong with a hypothetical {{Hill forms def|Word|definition}} template and a single CSS file for that work? Why do you need so many templates and stylesheets? Why make it so hard for yourself? Inductiveload—talk/contribs 09:58, 7 February 2020 (UTC)
As I am apparently incapable of communicating in a way that contributors here can clearly understand I'm walking away.ShakespeareFan00 (talk) 10:13, 7 February 2020 (UTC)
- See {{synonym list/s}}{{synonym list/e}}{{synonym}} and the associated stylesheets. (At some point I wrote {{p-collapse/s}} to handle this situation for other types of non-list list. If you wanted you could perhaps generalise p-collapse to also be viable for other block type tags? ShakespeareFan00 (talk) 09:49, 7 February 2020 (UTC)
- Well, my thought is to rename the above templates (which I've done)
- If CSS had a ::first-word rule for P elements... then this would be about two lines of code to implement, (a pusedo CSS selector for ::first-word can be implemented with some additonal code - https://github.com/FWeinb/nthEverything does it, but again this would definitely need a phabricator ticket for that kind of support to be added.ShakespeareFan00 (talk) 11:47, 7 February 2020 (UTC)
- I continue to stand by my position, stated at the Scriptorium and above, that a simple work-specific template would do all you want, be flexible, be maintainable, will not exceed the template limits, and will not result in some over-generic template that is just a work-specific template in a wig crufting up Category:Formatting templates. If in the end it turns out it can be subsumed by a generic template, it's a simple subst/replace operation.
- Re first-word, that uses Javascript: implementing a JS gadget and loading it site-wide to save a few one-off keystrokes on one specific work is just not a practical solution. Plus it'll only work for this exact situation (where there is no space in the headword), you've introduced an interdependency between content and style (which is antithetical to the philosophy of CSS in the first place), it won't work if a user's JS is off, someone has to maintain it, and it will not work on exported files. Again, this is gravitating towards overly complex remedies for very specific issues. You do not need to bolt on random greebling to the core WS infrastructure for every work-specific formatting. Inductiveload—talk/contribs 11:56, 7 February 2020 (UTC)
- My thoughts exactly.. I've put the first word in my templates as a Span anyway, so it's identified if the CSS spec itself changes to support a first-word selector in the future (I added a comment to that effect) - The reason for importing the styles in the wrapper template rather than the entries is so there is LESS effort for the de-duplicator, but I don't think based on what you've said previously that it would be a big issue.
P-collapse (more generic disscussion
Currently {{P-collapse/s}} works by collapsing the top and bottom margins for a P, but instead applies a 1em margin for the wrapper DIV. However there is not an inconsiderable amount of formatting on English-Wikisource that uses DIV based blocks to wrap content for formatting. Currently this will not be collapsed (and the template concerned wasn't designed for that). By "genericising" I meant extending the stylsheet, so that the margin collapse between content blocks also applied to those "formatting" blocks as well (assuming they didn't set their own margin arrangements). ShakespeareFan00 (talk) 12:30, 7 February 2020 (UTC)
- (Aside:You could reasonably query why I'm not using a POEM tag. This is because it's modification is applied on a per line basis and not on a per paragraph basis, which is less desirable in certain use cases (like index entries with mutiple lines and indents which should reasonably re-flow when the page width changes on different devices or page widths.)ShakespeareFan00 (talk) 12:30, 7 February 2020 (UTC)
- And I am aware, documentation of my own template creations is also lacking in some respects. ShakespeareFan00 (talk) 12:30, 7 February 2020 (UTC)
- Can you present an example of what you are trying to achieve here? Inductiveload—talk/contribs 13:09, 7 February 2020 (UTC)
- Examples of P-collapse in use.
- Page:The_Pilgrim's_Progress,_the_Holy_War,_Grace_Abounding_Chunk1.djvu/325
- Page:The_Prince_(translated_by_William_K._Marriott).djvu/320
- Page:Treasure Island (1909).djvu/81
- Page:A Dissertation on Reading the Classics and Forming a Just Style.djvu/272
- Page:Concepts for detection of extraterrestrial life.djvu/6
- As I've not encountered a good use-case for block collapse's yet (possibly {{center}}ed heading within pages that contain index entries for a given work}} (as in the example Pages I linked) I think this idea is over-thinking again, and we can end the discussion.
ShakespeareFan00 (talk) 13:29, 7 February 2020 (UTC)
TOCstyle - and the "leader" code.
In response to an issue with the D.P model in TOCstyle, I'd forked the code into an /experimental version Module:TOCstyle/experimental which I'd then used to implement a sandbox version of the main template Template:TOCstyle/sandbox, which attempted to use a templatestyle stylesheet Template:TOCstyle/TOCstyle.css to cut down on the sheer amount of inline code generated.
It's limited to a . leader, and fails currently to allow for the other types of leader that might be encountered, or indeed leaders which would be more than a single character (a limitation of the original template.). The commented out rule in the stylesheet, was based on a proposed specification for adding leader support in HTML/CSS which was still draft and thus not implemented in major browsers yet.
I don't feel confident in working with this further (given certain recent events), but before deleting it figured I would ask someone technically competent for a second view.
ShakespeareFan00 (talk) 18:54, 7 February 2020 (UTC)
- Sorry, I don't think I have the know-how needed to deal with this thing, it's too mysterious to me. Generally, I find dot leaders to just be too much hassle to be worth the effort. Perhaps if CSS leader properties become standardised it would be nice, but all the workarounds I know of just don't feel productive to me.
- Note, you will not be able to have parameterised leader characters with TemplateStyle CSS, so that would still need to be inline (how D?P is right now). I would not hold your breath for parameterisable TS CSS, it sounds like a technical minefield to me. Inductiveload—talk/contribs 19:29, 10 February 2020 (UTC)
- I have my doubts about D?P as well, given the 'bloat' that occurs. ShakespeareFan00 (talk) 20:32, 10 February 2020 (UTC)
United States Statutes at Large/Volume 33/Fifty-Eighth Congress/Treaties and Conventions
Hi Inductiveload. I've just noticed your edits on the Treaties & Conventions page of US Statutes at Large Vol. 33. I may be missing something, but the List of Treaties & Conventions is here in Part 1 of Volume 33, or the full index of both parts of Vol. 33 is here at the end of Part 2. No list of treaties as per your edit exists within the body of the text where you have placed it, so my inclination would be to delete it. Please let me have your thoughts. Thanks. CharlesSpencer (talk) 12:39, 3 March 2020 (UTC)
- So clearly I missed there was an "original" list of treaties was in a separate volume! D'oh!
- However, it's quite tricky to lay that out hierarchically without either the "fake" aux TOC, or a repeated transclusion of the Volume 1 TOC at United States Statutes at Large/Volume 33/Fifty-Eighth Congress/Treaties and Conventions. Looks like the proclamations are all on one page, but I'm not sure piling over 200 pages comprising nearly 30 separate sections onto a single page is going to work (and if you did do that, it would still need the TOC to be accessible, as well as anchors and redirects for incoming links to individual treaties). Inductiveload—talk/contribs 13:01, 3 March 2020 (UTC)
Hey,
You may (or may not) be interested that I found a much better scan of File:Transactions of the Royal Asiatic Society - Volume 1.djvu (and 2 and 3), so I built new DjVus (with OCR) and set up basic indexes for them.
- Index:Transactions of the Royal Asiatic Society - Volume 1.djvu
- Index:Transactions of the Royal Asiatic Society - Volume 2.djvu
- Index:Transactions of the Royal Asiatic Society - Volume 3.djvu
Since you set up the first volume I thought you might want to know (and if not, I apologise for the noise). --Xover (talk) 19:40, 9 May 2020 (UTC)
- Nice! That is much better! Cheers, Inductiveload—talk/contribs 12:24, 23 May 2020 (UTC)
PyWikiBot
Original message revision for continuity of discussion: https://en.wikisource.org/w/index.php?title=User_talk:James500&oldid=10202776#Botting_page_creation
- Thank you for your message. I looked at the installation page and what you wrote. It is so complicated that I do not think there is any chance I would be able to use it. I would be more likely to break my device. I am so uninformed that I was under the impression that bots ran from the WMF's toolserver computers. I apologise for wasting your time. James500 (talk) 19:17, 27 May 2020 (UTC)
- What you're comfortable with is of course up to you. You are right in that many bots do indeed run from the Toolserver as then they can run continuously and/or provide Web frontends like ia-upload. But many "ad hoc" bots can just run on your own computer.
- Another thought occurs: you could ask at bot requests if a bot operator is willing to help. But it probably would need approval too, since adding the text layers that already exist in the DjVu is not, as far as I know, a previously approved bit activity. Inductiveload—talk/contribs 20:31, 27 May 2020 (UTC)
Bits and such…
Any particular reason you haven't asked for your old mop back? It's been gathering dust in a dark corner of the broom closet, but the tag with your name on it is still attached. --Xover (talk) 14:01, 3 June 2020 (UTC)
- Thank you for the thought! I have only recently been editing here as a deflection from things I should really be doing "in real life" (and before that due to health issues), though I have to admit, I have recently got a bit more into it that I had planned, because I keep finding interesting tangents to go off on. I'm not really expecting to be able to stick around on a continual basis, but I'm certainly not against picking up the mop, but I would say I'm likely to spend considerable periods not using it. I know this doesn't preclude, I just hadn't gotten to the stage where I think I could substantially benefit from it (although I did find myself needing a page-shift yesterday...)! Inductiveload—talk/contribs 09:32, 4 June 2020 (UTC)
- I think the community could substantially benefit from it, even if participation is intermittent. We are all volunteers and nobody can be expected to firmly commit to participation at any given level over arbitrary spans of time. As you note, we have several admins that only use the tools for the occasional page move on their own projects. I think that's perfectly fine: if someone can generally be trusted with the tools and having them can make them that tiny little bit more efficient, that's sufficient benefit in itself. I reserve the right to periodically dole out gentle pokes to help with backlogs and such, of course, but I do that with or without the bit. :)But, in other words, I can put you down for
two Lemon-Ups and four Thin-Mintsnomination at AN? --Xover (talk) 12:33, 4 June 2020 (UTC)- Sure thing, I hope I can make some useful contributions if it goes ahead :-) Inductiveload—talk/contribs 14:51, 4 June 2020 (UTC)
- I think the community could substantially benefit from it, even if participation is intermittent. We are all volunteers and nobody can be expected to firmly commit to participation at any given level over arbitrary spans of time. As you note, we have several admins that only use the tools for the occasional page move on their own projects. I think that's perfectly fine: if someone can generally be trusted with the tools and having them can make them that tiny little bit more efficient, that's sufficient benefit in itself. I reserve the right to periodically dole out gentle pokes to help with backlogs and such, of course, but I do that with or without the bit. :)But, in other words, I can put you down for
Pronouns?
What pronouns should I use in referring to you in a discussion? Skiasaurus (skē’ ə sôr’ əs) 22:23, 18 June 2020 (UTC)
- I really have no preference. Continue using "they", since what gender I say I am is (hopefully) irrelevant to anything that goes on here? :-) Inductiveload—talk/contribs 10:46, 19 June 2020 (UTC)
John Wesleys
I truly hadn’t noticed—I found the two from separate searches through maintenance categories (Abba, Father, Hear Thy Child in Category:Works with non-numeric dates and Targum Onkelos in Category:Incomplete texts without a source). I hadn’t even noticed the translator’s name. TE(æ)A,ea. (talk) 21:04, 24 June 2020 (UTC).
Admin bit
Howdy,
Welcome back to adminning. I've granted you the bit, and restored your old entry in Wikisource:Administrators#Current_administrators, which lists your languages as "French (basic), German (intermediate)".
Cheers, Hesperian 00:34, 2 July 2020 (UTC)
- @Hesperian: thank you very much! Inductiveload—talk/contribs 10:56, 2 July 2020 (UTC)
- Hi can you massage this a bit to get a consistent styling with css?
- Would you be willing to discuss some kind of convention for how to name per work and per page specific CSS?
I am also considering a 'root' template so that I can store indvidual work styles using the pageid of the Index page or Page:.( I had previously been doing this with "Table class" styles (example: Template:Table class/3184136.css" to avoid having to use either {{TOC}} or repeated {{ts}} calls. )
ShakespeareFan00 (talk) 16:32, 9 July 2020 (UTC)
- @ShakespeareFan00: I can have a look.
- I don't have any strong feeling about naming other than collisions with existing classes would be rare but annoying, tricky to diagnose and could happen silently at any time if the MediaWiki software introduced a class name we were already using locally. Since all MediaWiki classes appear to start with a letter, my thought has been to use an underscore to denote "this class is not part of the MediaWiki software, it's part of the proofread work". Furthermore, a double underscore could denote "this is a shared class that applies to multiple works", whereas a single underscore implies a work-local class (ideally with some kind of quasi-unique prefix). This way, collisions are less likely. I don't think any more stringent requirements are needed, as long as the class names are unique on a given page and indicate fairly clearly what they are trying to achieve. If it's too much to fit in the class name, at least you have
/* CSS comments */
. Ideally minimise reuse of class names in different CSS, if nothing else, so it's easier to search for. - In terms of creating CSS in Index space, I don't think you can, for now, as we talked about here. We need to disable the ProofreadPage edit form and add Index to
$wgTemplateStylesNamespaces
(or allow users to change the content model). I have defaulted to using Template namespace. This isn't so terrible, as any CSS that works on multiple scans (say CSS that applies to all volumes of a work) belongs in Template anyway. And one day, we can easily move the work-specific CSS into Index space without huge drama. - I think using page IDs is a rather opaque way of doing it and ties the CSS to a single page, when it's not uncommon for it to apply on multiple pages (as in your example). I just use the templatestyles tag.
- Also, Template:Table class/3184136.css is a very verbose way to say it. You could simply define a class:
_cce_1950_sc_row td:nth-of-type(2) {
font-variant:small-caps;
}
_cce_1950_sc_row td:nth-of-type(3) {
text-align:right;
vertical-align:bottom;
}
- Then add it to the table row:
|- class="_ccs_1950_sc_row" <!-- or even just _sc_row --> | Your || row || content
- Now you don't need to customise loads of CSS for each page (which is against how CSS work anyway - the CSS shouldn't have such a dependency on the row contents, the link between a row and a certain style is done with a class, not hardcoding row numbers in the CSS). Inductiveload—talk/contribs 19:56, 9 July 2020 (UTC)
- WRT to Blue Beard, I added a CSS rule to the cols which does most of it. CSS doesn't help a huge deal with setting starting values, so you might need to do this, if you want both to be ordered lists (this is how Mediawiki recommends to do it).
<ol start=81> <li>The Juvenile Prayer Boole.</li> <li>The Private Prayer Book.</li> </ol>
- There are such things as CSS counters, but I'm pretty sure it would be more fiddly and fragile. Using raw OL elements is a bit ugly, but it's simple and well-defined.
- Also I suggest to add a width to the div-cols so they can collapse neatly on small screens. Inductiveload—talk/contribs 20:33, 9 July 2020 (UTC)
- @ShakespeareFan00: regarding your edit summary "well it would be nice if the rule you set up in CSS actually worked...", what is it that doesn't work? I see a black rule between the columns on desktop and mobile.
counter-increment
isn't as simple as just using what you have used. You also probably need a:before
pseudo-element selector and acontent
property. There's a reason I didn't use it in the end. In my opinion, it's just easier to use straight OL tags with the start attribute. Shoehorning everything into templatestyles CSS is not always the answer. Inductiveload—talk/contribs 22:28, 9 July 2020 (UTC)
- It wasn't a comment on your code, it was a comment on mine whilst debugging..ShakespeareFan00 (talk) 22:31, 9 July 2020 (UTC)
- O, I see, I thought maybe I had done something that wasn't cross-browser compatible and the column rule wasn't working for you. Inductiveload—talk/contribs 22:42, 9 July 2020 (UTC)
- Well it should technically be a ::marker rule ( which isn't supported on older browsers anyway.). The closest I got based on the rule was
- O, I see, I thought maybe I had done something that wasn't cross-browser compatible and the column rule wasn't working for you. Inductiveload—talk/contribs 22:42, 9 July 2020 (UTC)
https://jsfiddle.net/bjtmy2kL/1/ but it wasn't clear how to get the padding, as setting padding-right on the ::marker element had no effect.ShakespeareFan00 (talk) 08:29, 10 July 2020 (UTC)
- @ShakespeareFan00:
::marker
is not a published CSS standard, only a Working Draft, and it's got under 20% browser support. So we can't really use it with any confidence that it'll work for readers, even if it would be better when it has full support. You can still achieve what you want with CSS, it's just quite fiddly: https://jsfiddle.net/bo02mg83/. Note, you need to set the:before
element to beinline-block
so you can set awidth
on it, and also a negativemargin-left
. The exact values for these depend on the content, so a fully generic solution is tricky. - Note, none of this is MediaWiki's problem - the CSS is still just CSS and MW doesn't interfere with it. The only way it could be MW's problem is if the TemplateStyles extension's sanitiser doesn't handle a CSS property (c.f. phab:T162379). The thing you can "blame" MW for is the inability to add styles or classes directly to an OL, but still use the #-syntax without it generating a nested OL:
- @ShakespeareFan00:
<ol style="counter-increment: start 80;"> # Foo # Bar </ol>
- This is what means you have to go to TemplateStyles in the first place, so you can write a CSS rule with a descendant selector. But, that's not a bug as such, rather it's an enhancement request for the parser for a very specific use case. You could open an issue if it really matters to you and see if it's possible to change without causing issues for existing pages.
- I see you found a shorthand for setting the list start value in the Wikicode, which I suggest is the better way forward for this page than this fairly esoteric CSS. It's certainly less typing, doesn't need a separate CSS page and is more obvious to other editors. Inductiveload—talk/contribs 09:24, 10 July 2020 (UTC)
Congrats on picking up rights again. Your coding is way better than my scrambling hacks. I am looking to progress this automated linking based on leveraging wikilinks at WD. Could I entice you to look at that and see whether it is possible to move. — billinghurst sDrewth 08:54, 11 July 2020 (UTC)
- Thanks ^_^, though be warned my hacks are also scrambling! I will take a look, but I am slightly unclear, reading that thread, what the current status and/or blocking issue are. I can see the doubled category, but are there other blockers? Inductiveload—talk/contribs 14:04, 13 July 2020 (UTC)
MediaWiki:Common.css cleanup
This rule…
.mw-abusefilter-editor { width: 600px } /* widens abuse filter editor */
…is being overridden by a higher-specificity rule in Vector (sets it to 65%), so it can just be removed outright. If for some reason anybody wants to tweak that we can try to find something suitable for user CSS or make a gadget for it. Only admins can use that editor anyway, so even the number of theoretically affected users is pretty low.
And MediaWiki:Dynimg.css should be safe to just copy to Template:FreedImg/styles.css and then be deleted. I don't see any rule here that should cause trouble by being duplicated while the switchover happens.
Now if you'll excuse me I need to go procure a bottle of expensive Champagne so I'm ready for when the then-empty MediaWiki:Common.css is finally deleted! :) --Xover (talk) 13:21, 13 July 2020 (UTC)
- @Xover: MediaWiki:Common.css is now empty and nothing appears to be on fire right now. I think I'll wait a bit until totally removing it from the MediaWiki NS (I'm not going to delete it right now, I'm collecting "deleted" CSS under User:Inductiveload/site-css-js for reference. The good thing about the change-over is that the TS CSS gets a
.mw-parser-output
added to it, which increases specificity, so identical TS and global CSS will always result in the TS CSS winning. Which is what we want. Inductiveload—talk/contribs 15:39, 13 July 2020 (UTC)- https://www.instagram.com/p/BnRI09-Fbr9/ 🍾 --Xover (talk) 15:59, 13 July 2020 (UTC)
- @Xover: MediaWiki:Common.css is now empty and nothing appears to be on fire right now. I think I'll wait a bit until totally removing it from the MediaWiki NS (I'm not going to delete it right now, I'm collecting "deleted" CSS under User:Inductiveload/site-css-js for reference. The good thing about the change-over is that the TS CSS gets a
CSS Cleanup (plainlist)..
I decided to check what was calling the plainlist class directly...
Answer: Not much according to a search:-
If the pages of the work concerned were given their own 'templatestyles' class, then the plainlist class itself could be moved to a Templatestyle for the relevant template, and the style removed from MediaWiki:Gadget-enwp-lists.css.
I wasn't sure how far you wanted to take your removal of CSS styles from the core. ShakespeareFan00 (talk) 09:21, 14 July 2020 (UTC)
- This is part of the work to be done. Those handful of pages need addressing, the CSS goes to Template:Plainlist/styles.css and then removed from global CSS (they can co-exist temporarily - the TemplateStyles CSS would "win"). Ideally, the only global CSS eventually will be what cannot be applied by templates. Inductiveload—talk/contribs 10:10, 14 July 2020 (UTC)
- Do we have a root, {{CSS}} template under which Index and page specfic CSS classing can be placed, so it can be imported using templatestyles, irrespective of having to create a specific new template for each work/page? Also other than typing multiple Templatestyles lines is there a quick way of importing multiple stylesheets? (For the {{table class/import}} I set it upto allow import at least 8, but I personally think it might be better if Template styles allowed for having more than one src import. ) ShakespeareFan00 (talk) 08:06, 15 July 2020 (UTC)
- @ShakespeareFan00: I am not aware of such a template, but I made {{page styles}}. Let's see if it works. I gave it 4 parameters, I doubt you'd see more than two in most practical case. Generic CSS is probably better applied though dedicated templates. There is no way to have multiple same attributes in valid XML, so TemplateStyles won't get multiple
src
attributes. - One day in the dim and distant future, we might see phab:T226275 done and we can find all the CSS so used and specify any CSS that applies to a whole work through the Index page form, rather than adding to the header of each page/the body of the first transcluded page.
- Separately, we might also wish to make it possible to put CSS in the Index namespace as a subpage of the main Index, but we need to enable CSS in Index NS and also disable the Index namespace form (I seem to remember there is a Phabricator issue for that).
- I am undecided on the naming of CSS pages - I have previously placed it per-work under like Template:Os Lusiadas (Burton, 1880)/errata.css. Some CSS will probably always stay in Template space because it applies to multiple Indexes (e.g. a multi-volume work with repeated formatting). I don't think it matters too much, moves are easy enough. Notably, you can categorise CSS too. Inductiveload—talk/contribs 12:07, 15 July 2020 (UTC)
- @ShakespeareFan00: I am not aware of such a template, but I made {{page styles}}. Let's see if it works. I gave it 4 parameters, I doubt you'd see more than two in most practical case. Generic CSS is probably better applied though dedicated templates. There is no way to have multiple same attributes in valid XML, so TemplateStyles won't get multiple
Failed Commons upload
Hello. I have been unable to put this scan of volume 1 of the Maryland Law Reporter on the commons for reasons that I do not understand. The Upload Wizard keeps saying that something has gone wrong. I do not know what to do about this. I would be grateful if someone could either explain what to do or upload the scan. James500 (talk) 07:53, 23 July 2020 (UTC)
- @James500: I also failed to upload the scan. Since it is over 300MB, it has to uploaded in chunks and re-assembled. I don't know why this would fail, but something is not happy. I tried to work around it by transferring the file to the Internet Archive with the BUB2 tool (resulting in IA item bub_gb_2m5CAQAAMAAJ, and then doing a "direct" upload from the PDF URL there at commons:Special:Upload, which also failed.
- It's possible it's a transient Commons server issue, and will be resolved eventually. And if the IA "derive" process fails, which it probably will, due to the size, the IA upload tool will also fail. Other than that, I have no really bright ideas. Inductiveload—talk/contribs 11:21, 23 July 2020 (UTC)
- I see your upload at File:Maryland Law Reporter - Volume 1.pdf; and my own test through a different route also succeeded at File:Maryland Law Reporter, Volume 1 (1872).pdf. For James's purposes, cleaning up one of these (and deleting the other) should be sufficient.But I'm also seeing reports of FileImporter failing from multiple wikis, so there may be some broader problem going on. You may want to subscribe to phab:T255981 that, somewhat randomly, is where it ended up. --Xover (talk) 14:26, 23 July 2020 (UTC)
- Huh, weird, so it uploaded but threw an error. I fixed up some of the metadata at File:Maryland Law Reporter - Volume 1.pdf (and fixed the title). I see you used bigChunkedUpload - did it just work without any errors? Inductiveload—talk/contribs 15:03, 23 July 2020 (UTC)
- Yeah. Was a little slow (a bit over 10 minutes total), though not exceptionally so. I've generally had a feeling the API is a bit slow lately, but nothing that can really be pinned down, and not enough to obviously explain upload failures. --Xover (talk) 15:17, 23 July 2020 (UTC)
- Is either of these files better than the other? Does it matter which one I use? James500 (talk) 15:50, 24 July 2020 (UTC)
- @James500: the files are identical. Use whichever one you prefer. Inductiveload—talk/contribs 16:09, 24 July 2020 (UTC)
- Is either of these files better than the other? Does it matter which one I use? James500 (talk) 15:50, 24 July 2020 (UTC)
- Yeah. Was a little slow (a bit over 10 minutes total), though not exceptionally so. I've generally had a feeling the API is a bit slow lately, but nothing that can really be pinned down, and not enough to obviously explain upload failures. --Xover (talk) 15:17, 23 July 2020 (UTC)
- Huh, weird, so it uploaded but threw an error. I fixed up some of the metadata at File:Maryland Law Reporter - Volume 1.pdf (and fixed the title). I see you used bigChunkedUpload - did it just work without any errors? Inductiveload—talk/contribs 15:03, 23 July 2020 (UTC)
- I see your upload at File:Maryland Law Reporter - Volume 1.pdf; and my own test through a different route also succeeded at File:Maryland Law Reporter, Volume 1 (1872).pdf. For James's purposes, cleaning up one of these (and deleting the other) should be sufficient.But I'm also seeing reports of FileImporter failing from multiple wikis, so there may be some broader problem going on. You may want to subscribe to phab:T255981 that, somewhat randomly, is where it ended up. --Xover (talk) 14:26, 23 July 2020 (UTC)
Segmentation fault, core dumped…
Thoughts? --Xover (talk) 11:27, 26 July 2020 (UTC)
- @Xover: Pretty cool! Maybe have an IGD subpage for "tasks" and "potential tasks/thoughts" (those that need more thought before open season). For example, global JS/CSS is probably a task, but nuking dotted TOC probably not (sadly). Then after potential tasks get a bit of discussion, they can migrate to being tasks.
- Re pagelist widget: 1) that's awesome. 2) meta:Wikisource Pagelist Widget and phab:T172953. Inductiveload—talk/contribs 12:53, 3 August 2020 (UTC)
- Yeah, that's roughly in line with what (vague, hazy) thoughts I had for it. PS. mw:Extension:Echo triggers off 1) a link to the target user page, and 2) something recognised as a user signature, and both have to be present in the same edit. IOW, if you need to fix a ping you also have to re-sign the post or it won't work. (took me ages to figure out, and even advanced users have trouble with it). --Xover (talk) 14:51, 3 August 2020 (UTC)
/* Problematic */ Overlapping sidenotes
What do you suggest to avoid the overlap? Page:The_Laws_of_the_Stannaries_of_Cornwall.djvu/122 ShakespeareFan00 (talk) 20:44, 15 August 2020 (UTC)
Scan
Hello. If commons:File:The Building News and Engineering Journal, Volume 22, 1872.djvu has a text layer (and do not know whether it does), it does not load in the editing window in the page namespace (which I accessed by previewing the index page). I also get a message saying "ws_ocr_daemon robot is not running" when I try to use the OCR button. I do not know what to do about this. I would be grateful if someone could assist. James500 (talk) 06:09, 6 September 2020 (UTC)
- @James500: It is likely that what you observe is due to a bug in MediaWiki's DjVu handling that is triggered by certain kinds of invalid or pathological text data in a DjVu file. I have regenerated the file in question from the source scans and uploaded the new file over the old one. Try again and see if you get the OCR text loaded now.Regarding the OCR button in the wikitext editor, this is a known issue and is unlikely to be resolved soon. If you often use that function, I recommend turning on the Google OCR gadget in your preferences: it works the same, only using Google's internal OCR software, but is generally robust and available. --Xover (talk) 15:20, 6 September 2020 (UTC)
- Thank you. The OCR is loading now. James500 (talk) 21:09, 6 September 2020 (UTC)
- Thank you @Xover: for the help!. Do you have a script to repair borked DjVu files, perhaps by nuking the text layer on invalid pages? It's something I have never done myself. For reference, the DjVu text layer bug is (I think) phab:T219376 (reported by Xover).
- Even when the OCR tool is working, it can still be useful to have the Google OCR on hand, as sometime one of them works better than the other, especially for text in columns. Inductiveload—talk/contribs 11:09, 7 September 2020 (UTC)
- I grab the original scan .jp2 files and manually convert them to jpeg with GraphicsMagic, and then have a custom script that puts them in the right order, runs tesseract on each page to generate hOCR structured text, convert the page jpeg to DjVu, convert the hOCR to sexpr, add the sexpr to the page DjVu, and then compiles the page DjVus to a new DjVu book. I also have some related utilities to redact individual whole pages of an existing .djvu file, and some premade images and .djvu components to aid in manually insert placeholder pages or redact parts of pages.None of this is very user friendly or documented (you basically need to be me to easily use it), but I'm happy to share the code on request. I have a long-term todo to set up an interactive web frontend for manipulating DjVu files, but my todo is already waaaay too long. (I also want to figure out a way to take full advantage of the DjVu format's features to optimize file size as well, but…)PS. Just for reference, I didn't go for nuking the existing text layer for a couple of reasons. One is that some manipulations would then work on multiply-encoded image data, and would compound the problem when reencoding it afterwards (and most DjVus from IA etc. are already very aggressively compressed). By starting from "pristine" sources the end result, both image and OCR, will be better. The other is that the text layer is fragile and MediaWiki's extraction even more so, so it's safer to generate it from scratch. My script that converts from hOCR to sexpr is reinforced against certain classes of bug in tesseract and tries to guarantee the resulting sexpr data is valid. phab:T219376 is just one bug, that manifests as OCR being offset relative to the scan images; the main culprit here is phab:T240562 where MW fails to extract the text layer at all. --Xover (talk) 13:05, 7 September 2020 (UTC)
- @Xover: ah right, I though maybe you were just hot-fixing the file rather than regenerating from scratch. In that case I understand that the script would be pretty complex. Thanks for reference to the correct issue. Inductiveload—talk/contribs 07:54, 8 September 2020 (UTC)
- It's not really that complex; it's just not very polished and lacks documentation. It's not user friendly, even for technical people, is what I'm saying. But it's not rocket science by any means. --Xover (talk) 08:08, 8 September 2020 (UTC)
- @Xover: I'd be grateful if I could take a look, as I have a set of JPGs I could do with turning into a DjVu and I don't currently have a handy script to do that (plus I haven't got a battle-hardened OCR layer mechanism). Inductiveload—talk/contribs 14:57, 30 September 2020 (UTC)
- I'll find somewhere to dump it and write up some kind of guidance. Might not be until this weekend though. --Xover (talk) 16:33, 30 September 2020 (UTC)
- Thanks! It doesn't have to work or even be very well documented, so don't feel it needs to be too detailed, I have half a system I just need to figure out the hocr->DjVu OCR and I don't know the gotchas like you do. Inductiveload—talk/contribs 17:37, 30 September 2020 (UTC)
- Also, do you do the fancy IA text-background separation (I think that's how they get their massive compression - a bitonal test layer over a lower-res page background). Just a straight C44 isn't particularly impressive in terms of text quality vs. file size. Inductiveload—talk/contribs 10:33, 1 October 2020 (UTC)
- I'll find somewhere to dump it and write up some kind of guidance. Might not be until this weekend though. --Xover (talk) 16:33, 30 September 2020 (UTC)
- @Xover: I'd be grateful if I could take a look, as I have a set of JPGs I could do with turning into a DjVu and I don't currently have a handy script to do that (plus I haven't got a battle-hardened OCR layer mechanism). Inductiveload—talk/contribs 14:57, 30 September 2020 (UTC)
- It's not really that complex; it's just not very polished and lacks documentation. It's not user friendly, even for technical people, is what I'm saying. But it's not rocket science by any means. --Xover (talk) 08:08, 8 September 2020 (UTC)
- @Xover: ah right, I though maybe you were just hot-fixing the file rather than regenerating from scratch. In that case I understand that the script would be pretty complex. Thanks for reference to the correct issue. Inductiveload—talk/contribs 07:54, 8 September 2020 (UTC)
- I grab the original scan .jp2 files and manually convert them to jpeg with GraphicsMagic, and then have a custom script that puts them in the right order, runs tesseract on each page to generate hOCR structured text, convert the page jpeg to DjVu, convert the hOCR to sexpr, add the sexpr to the page DjVu, and then compiles the page DjVus to a new DjVu book. I also have some related utilities to redact individual whole pages of an existing .djvu file, and some premade images and .djvu components to aid in manually insert placeholder pages or redact parts of pages.None of this is very user friendly or documented (you basically need to be me to easily use it), but I'm happy to share the code on request. I have a long-term todo to set up an interactive web frontend for manipulating DjVu files, but my todo is already waaaay too long. (I also want to figure out a way to take full advantage of the DjVu format's features to optimize file size as well, but…)PS. Just for reference, I didn't go for nuking the existing text layer for a couple of reasons. One is that some manipulations would then work on multiply-encoded image data, and would compound the problem when reencoding it afterwards (and most DjVus from IA etc. are already very aggressively compressed). By starting from "pristine" sources the end result, both image and OCR, will be better. The other is that the text layer is fragile and MediaWiki's extraction even more so, so it's safer to generate it from scratch. My script that converts from hOCR to sexpr is reinforced against certain classes of bug in tesseract and tries to guarantee the resulting sexpr data is valid. phab:T219376 is just one bug, that manifests as OCR being offset relative to the scan images; the main culprit here is phab:T240562 where MW fails to extract the text layer at all. --Xover (talk) 13:05, 7 September 2020 (UTC)
┌──────────────────────────┘
Code is now in my sandbox.
I convert .jp2 to either JPEG or PBM manually using GraphicsMagic (gm mogrify -format jpeg '*.jp2'
) (PBM for scans without images that Google has already crushed because the loss of fidelity doesn't matter and the size savings are worth it; but not all inputs will produce usable PBM outputs, so always check these). Then run this script as "hocr2sexpr *.jpeg
", which spits out a file called "output.djvu". I haven't bothered with any real command line argument handling so I hardcode the output filename and unconditionally leave behind all the temporary files for easier debugging. Changing the text language also requires modifying the code just now. All this will get command line switches to control once I get around to it.
If you want to run this directly you'll probably have to install some dependencies (all four of the modules up top are non-core I think, and they have additional deps in turn). On macOS all of them are available through Homebrew, and they should be available in most package managers on Linux. I have no idea of the state of things on Windows.
I've commented the code so you should be able to navigate it reasonably well if you just want to grab the core hOCR/sexpr logic. It presupposes that you're using a push-parser for the HTML, and reusability will be much lower if you're using a pull or pseudo-DOM parser. Feel free to ping me if I can help with anything.
Regarding the separated DjVuDocument files, that is indeed what IA does (well, did). I've looked a bit at it but there are no finished command line tools for working with these, so you'd need to partially implement support for the component file formats. Given the relatively low resolution of current scans it is also hard to automatically extract the text image without making it unreadable for manual analysis (several cases of completely unreadable text here seem to be the result of a non-optimal DjVu compression from IA). However, with all those caveats in mind, supporting this would do wonders for our file size and interactive performance, and I see no inherent reason we shouldn't be able to come up with settings that address both concerns (possibly by manual per-scan hinting; there's usually high degree of conformancy between pages within a single scan).
Other fancy stuff I want to look at "some day" is a tool to automatically straighten crooked pages, and intelligently crop them. Possibly even to split double-page spreads automatically. The algorithms for these should be pretty straightforward (if we ignore pathological cases), except that it'll need to have a lot of knobs to adjust due to differences between scans.
Oh, and PS., I have a toy webservice set up (on WMCS) to interactively run Tesseract on a page here like the Phe and Google OCR gadgets. It's pretty hacky just now, and it'll break regularly as I mess with it, but if you want to play with that just let me know (or poke through my user .js etc. on noWS where I've been testing it for multi-lingual support). I'm hoping to get it good enough to replace Phe's OCR gadget, and add a few nice-to-have features like automatically preserving paragraphs and unwrapping hard-wrapped text. It can also conceivably be a good vehicle for attaching various OCR fixup scripts, but I haven't gotten to the point of looking at those yet. Mainly I'm stalled on integrating the fancy OOUI-based user interface stuff with the purely OCR-related code, which will balloon the number of lines of code disgustingly but probably won't really be difficult so much as fiddly.
Mentioning it in case anything catches your interest. Most of my capacity for attention is already oversubscribed IRL so I probably won't be able to give any of this any sustained attention any time soon. Happy to share any ideas and code though. --Xover (talk) 12:21, 3 October 2020 (UTC)
- @Xover: Thank you very much!. I will have a look and see what I can learn. My attempt so far is User:Inductiveload/make_djvu.py, which seems to have worked "OK" in regenerating the scans below and a few other British Library/Hathi works for which I only have image scans - the main issue is it tends to produce quite large files, but it has a command line parameter to set a max file size, which is effective enough, but produces mediocre image quality due to no text-layer separation. The hocr->sexp step is a bit of a hack but appears to work, believe it or not. The biggest problem I encountered was detecting empty cols/para/lines which djvused rejects.
- One more thing I think one could be able to detect and mitigate is the "bad pages" mentioned below, which should stand out like a sore thumb to any kind of OpenCV kind of algorithm due to a heavy dark border around the page on 3 sides. Inductiveload—talk/contribs 17:35, 3 October 2020 (UTC)
- Apart from being written in Python (*hack*, *spit*), it looks good to me. Only thing is that using XPath queries to pull out what you want is fragile in the face of changing input (hOCR is a "living standard" in constant evolution, and Tesseract's implementation has changed several times since 4.0 was released). With a push parser you'll get fed everything (including classes and elements we haven't seen before) and can device a sensible strategy for dealing with it (debug logging unexpected input, say). --Xover (talk) 19:56, 3 October 2020 (UTC)
- @Xover: heretic :-p. It's definitely the weakest part of the chain, I might consider a more robust version if it were on a webserver as opposed to on demand with a
-v
flag spewing debug. Looks like I'm also short ofocr_textfloat
andocr_caption
too. But it does use every thread on hand, so it'll keep the room warm in winter. - It's a shame the IA's derivation script isn't available, I'd like to see it (or maybe it is and I haven't found it). Inductiveload—talk/contribs 20:13, 3 October 2020 (UTC)
- @Xover: heretic :-p. It's definitely the weakest part of the chain, I might consider a more robust version if it were on a webserver as opposed to on demand with a
- Apart from being written in Python (*hack*, *spit*), it looks good to me. Only thing is that using XPath queries to pull out what you want is fragile in the face of changing input (hOCR is a "living standard" in constant evolution, and Tesseract's implementation has changed several times since 4.0 was released). With a push parser you'll get fed everything (including classes and elements we haven't seen before) and can device a sensible strategy for dealing with it (debug logging unexpected input, say). --Xover (talk) 19:56, 3 October 2020 (UTC)
Other scans
I have had similar problems with commons:File:The Register of Pennsylvania, Volume 1.djvu, commons:File:Hazard, The Register of Pennsylvania, Volume 3.djvu, commons:File:Hazard, The Register of Pennsylvania, Volume 1.djvu, and commons:File:Hazard's United States Commercial and Statistical Register, Volume 5, 1841.djvu. James500 (talk) 08:24, 24 September 2020 (UTC)
- @Xover: hmm, do you think this could be related to the IA item having junk pages like this one in the "processed" JP2 archive? Not sure there's much one can do about it, other than regenerate the file offline. Inductiveload—talk/contribs 10:19, 1 October 2020 (UTC)
- The junk scan images are one major triggering factor for this, yes. IA seems to be maintaining per-image info somewhere else that lets it ignore these images (possibly that JSON file you found), but not in the XML file that ia-upload uses (iirc). --Xover (talk) 12:23, 3 October 2020 (UTC)
- @James500: files regenerated from JP2s. Drop any more you need here. Inductiveload—talk/contribs 15:11, 1 October 2020 (UTC)
Fancified borders...
I did up Page:Little Ellie and Other Tales (1850).djvu/168 mostly as an experiment to figure out the techy bits for myself. When you have a moment I'd appreciate it if you could take a quick look and comment on what you think in terms of the technical approaches, suitability for e-readers, and so forth. I mainly just cribbed your code for the fancy border so there shouldn't be anything particularly new or innovative lurking in there. --Xover (talk) 14:07, 11 September 2020 (UTC)
- It looks perfectly serviceable to me. Probably the only nitpicky graphical "defect" I can see is the edge width is a few pixels too narrow so it clips the inner leaves slightly. Where it repeats (halfway down the edge) is a small alignment defect, but it's a tricky one to get exactly right without very careful manipulation, and you wouldn't notice it unless you were looking for it. Newes of the Dead was easier as it was a repeating element to start with.
- WRT e-readers, it will not work, because the CSS refers to an online resource by URL and e-readers generally do not fetch them. The solutions I can think of are:
- Use a "normal" border as a simple fallback and live with that (Newes does this)
- Modify ws-export to rewrite TemplateStyles URLs, either as base64 data or by bundling the resource like any other image and changing the URL to a local file: phab:T256780
- The other issue I can see with this technique is that the CSS usage of the files doesn't show up at Commons, so its a little vulnerable to silent breakage if the files are changed or deleted, so at least I think making a note on the file's description page is a good idea. Inductiveload—talk/contribs 17:04, 11 September 2020 (UTC)
- Thanks! Border width fixed, file tagged, and Phab subscribed to. Regarding the fallback: I'm using effectively the same TS stylesheet as Newes, including the non-fancified border there. Did I miss something? --Xover (talk) 17:44, 11 September 2020 (UTC)
- The fallback is the
border
colour: a thick grey border in Newes:
- The fallback is the
- Thanks! Border width fixed, file tagged, and Phab subscribed to. Regarding the fallback: I'm using effectively the same TS stylesheet as Newes, including the non-fancified border there. Did I miss something? --Xover (talk) 17:44, 11 September 2020 (UTC)
/* need to set this width first to provide space for image */ /* ereaders (and other offline devices) will see only this, as they don't have access to the CSS url source */ border: 50px solid LightGrey;
- If you'd like a thin border like
1px solid black
, you probably need to add apadding
to make up the shortfall between "1px" and the border image size. The "nominal border" runs through the centre of the image slices, so if the padding + half the border width is less than the actual image border width, the content can overlap the image. Inductiveload—talk/contribs 18:33, 11 September 2020 (UTC)
- If you'd like a thin border like
Amplify
Why wouldn't it be ok to make a soft redirect on Amplify to w:Samplify? It makes sense, see Q: Are We Not Men? A: We Are Devo! Some users could type in the correct title (s:amplify) and end up here. The redirect would help them find their way back. Gioguch (talk) 21:37, 27 September 2020 (UTC)
- @Gioguch: because it is not the purpose of Wikisource to provide soft redirects to Wikipedia in its main namespace. There are numerous cases of those sorts of issues, and these are the quirks that we face. What you should be doing is lodging a phabricator: ticket to fix the issue that if a page exists that it does not follow an interwiki map link. — billinghurst sDrewth 22:22, 27 September 2020 (UTC)
Match & Split
…now that the Match & Split bot is down (hopefully just until someone gives it a kick) it reminds me that we need to start thinking about systematically replacing all of Phe's tools. I've made a start at the OCR gadget, but that's frankly just because I had some code sitting around and it was easy; for most folks the Google OCR is plenty good enough here (and Community Tech may replace both with something better anyway).
But Match & Split is critical for ever making any appreciable dent in our ever-growing non-scan-backed backlog, and it desperately needs a few quality of life and user-friendlyness improvements. I've cast only fleeting glances at the code and not really understood it (not just because it's in Python: Phe has structured the code around some kind of internal pseudo-SOA architecture that I've not cracked yet), but maybe you will have better luck there.
Incidentally—and I don't think you have the free time or necessarily inclination for it—but Phe's code is public and can be forked, and the existing phetools.toolforge.org can be usurped since Phe is non-responsive. Iff you should be so inclined the option is there. I'd be happy to help there, but my Python-fu being what it is I can't in good conscience take the lead on that.
In any case… I think we need to work systematically towards making sure enWS (and, ideally, all the Wikisourcen) have access to the set of functionality Phe's tools provide today (when they're not broken). My first iteration on such a list would be:
- Interactive per-page OCR
- Match & Split
- Per-project statistics (so we can see what we're doing)
- Cross-project / comparable statistics (so we can se how we're doing relative to others)
And of these Match & Split is the short/medium-term highest priority in my mind. --Xover (talk) 08:02, 18 October 2020 (UTC)
- @Xover: Agreed that we should try to get some of these maintained/able, and M&S is maybe a priority.
- With the new shiny Toolforge/Kubernetes stuff, is there any advantage to all this being once huge
phetools
tool, or would it be better to have much more granular tools, e.g.match_and_split
andocr
? Inductiveload—talk/contribs 19:57, 24 October 2020 (UTC)- I think the current phetools is one tool because it shares a lot of the plumbing between functionalities, but that may of course have been a design shaped by the infrastructure available at the time. Going forward I would default to having separate tools (on Toolforge) or even separate services (on WMCS) for each distinct functional bit. There are better ways to share code and "separation of concerns" is IMO a strong principle. If we go the WMCS route (may be overkill / have drawbacks) it will also permit separate resource allocation for each tool. --Xover (talk) 20:07, 24 October 2020 (UTC)
First steps
Hi Inductiveload, thanks for the welcome. If you have a minute, could you have a look and see if I'm doing things correctly before I continue? Thanks! --Andreas (talk) 13:13, 9 November 2020 (UTC)
- @Andreas: great to hear from you! Your work on Lowland Scots looks good. Some notes:
- You can remove the line breaks between lines in the same paragraph
- This is not correct:
{{uc|ii.—{{sp|in decadence}}}}
as that will copy-paste as all lowercase because the text is actually lowercase, though it is styled as uppercase. It should either be all uppercase (II.—{{sp|IN DECADENCE}}}}
) or in title case: (II.—{{uc|{{sp|In Decadence}}}}}}
). Then, when copy-pasted it will be correct. - The poem is not actually "centred", it is "block-centred" (so it is left-aligned within a centred block). It just happens the the line lengths are such that centring the whole thing is a very similar outcome in this case. {{fqm}} can help with the hanging quote in this case.
- Other than that I think it's reasonable to mark those pages "Proofread" (yellow). Then the next person can check them and mark them "Validated" (green). Otherwise it takes three readings as one person cannot directly mark as "Validated" from "Not proofread".
- Inductiveload—talk/contribs 13:32, 9 November 2020 (UTC)
- Thanks so much for taking the time to look at it and give me some useful hints.
- I left the line breaks in because I thought they'd help me find things more easily. Although sometimes the last line behaves as if there were two line breaks (i.e. a new paragrah) but it only happens either on the Page page or in the transcluded text. I guess I'll remove them.
- I just copied these headings from the person who started the book. I'll fix them.
- {{block-center}} and {{fqm}} are exactly what I was looking for!
- And yep, I'll give it another read when I'm done with the chapter and mark the pages as "Proofread". --Andreas (talk) 14:00, 9 November 2020 (UTC)
- By the way, do I have to recreate the identation of the first line of a paragraph? --Andreas (talk) 14:02, 9 November 2020 (UTC)
- Thanks so much for taking the time to look at it and give me some useful hints.
- @Andreas: Yep, that's why removing the line-breaks in the end is recommended, things sometimes break for some reason to do with the Mediwiki parser injecting a paragraph break somewhere. It's OK to remove them last, after proofreading.
- Re. indentation: no, we don't try to recreate that as it can be done more effectively with CSS if desired. Hard-coding it removes the flexibility to do that.
- Poems with indented lines are different - you can use {{gap}} and
<br/>
, or you can use the<poem></poem>
tag and then use a ":" to indent by 1em per colon. See H:POEM for a bit more on that. Inductiveload—talk/contribs 14:17, 9 November 2020 (UTC)
Hi, me again :) So I've just "finished" chapter 2 of Lowland Scots—only need to give it one last read, remove the line breaks and mark the pages proofread. Just one thing I noticed in the pages you fixed for me is you added {{fine block}} to the quotes (e.g. here). If I understand the template correctly, it reduces font-size
to 92%. The font of the quotes in the scan doesn't look smaller than the prose (to me), so I just wanted to double check if I should add {{fine block}} to them. --Andreas (talk) 03:21, 12 November 2020 (UTC)
- Well done! The poetry is actually slightly smaller than the body text, see this comparison (main text on left, poetry on right). There's not a lot in it, certainly. Inductiveload—talk/contribs 10:47, 12 November 2020 (UTC)
I have proofread and transcluded this work, excepting the image on p. 3. If you could, can you please strip the last two pages of the document? They are not part of the work. TE(æ)A,ea. (talk) 01:53, 12 November 2020 (UTC).
- Great, thanks! Trimmed the DjVu, and did the headpiece. In theory the fancy title page border can be done too, but I'm not feeling it today! Inductiveload—talk/contribs 11:49, 12 November 2020 (UTC)
You recently wrote or adapted this.
It's approach looks remarkably like the one I was considering for the {{uksi}} family.
Can you take a look at and simplify down the related {{uksi}} family, which is a mess?
If you also wanted to take on the currently unused {{cl-act-p}} and related as well, much appreciated, even though it's potentially obseleted by the changes you made to the {{left sidenote}},{{right sidenote}} etc..
Thanks :) ShakespeareFan00 (talk) 10:52, 16 November 2020 (UTC)
- @ShakespeareFan00: The first thing you need to decide here (for each of them) is the specification for the templates. It's hard to tell if your approach is what you actually want or just what you have so far.
- What is the "input" (i.e. wikicode API that the editor inputs) you want. E.g.:
- Do you want a single "god template" or a suite?
- How you want the end templates to be used - parameters and so on. Split /s/e templates or wrappers?
- What are the outcomes that you want? This includes:
- Examples of each level's formatting
- Anchors (ties to the API above - it appears you wish the anchors to be automatically generated)
- Anything else you want to see
- What is the "input" (i.e. wikicode API that the editor inputs) you want. E.g.:
- Once you have the endpoints tied down, we can try to produce something that connects them.
- It's possible that we can work it down to a single, global, core "legislation" template ± module ± CSS that everything (UKSI, CL-act, CoR65, etc, etc) can call in the background, but it really depends if that's more complexity than it's worth. Inductiveload—talk/contribs 12:27, 16 November 2020 (UTC)
- The current approach is a template suite with each "level" being a sub-template of the /1 /2 /3 design IIRC (with some variants)
- The current approach is for {{uksi/paragraph}} is not /s /e based IIRC.
- The outcome is formating like that used on legislation.go.uk for Statutory instruments within the limitations of Mediawiki. (N.B I am not worried about replicating sidenotes exactly, provided there is a credible alterantive, I think the legislation.gov.uk format is to put them in smaller bold above the relevant section.)
- Generating the sectional numbering automatically, both anchors and in text is what was desired.
- {{cl-act-p}} and related only existed because of trying to avoid overlapping sidenotes. If that's now soluble by tweaking existing templates (clear:left; float:left; display:block for example than the whole cl-act module is obsolete.
- {{uksi/paragraph}} is in use. The format desired was (subject) to the limiations of Mediawiki was that in the source documents as far as possible.
ShakespeareFan00 (talk) 13:33, 16 November 2020 (UTC)
- @ShakespeareFan00:
The current approach is a template suite
andThe current approach is for {{uksi}} is not /s /e based
: just checking: is that how you actually want it to be? Inductiveload—talk/contribs 13:40, 16 November 2020 (UTC) - Well it depends on what the 'standard' way of doing things here on Wikisource now is.. The template documentations notes a split param, previously I've been told that doing 'splits' that way is non standard, and thus there should be /s /e variants.
- I'd started to try and implement some CSS Template:Uksi/styles.css so that what was generated from the higher templates was essentially a wrapper around a core template, and style calls. Template syntax is clunky when it comes to varags style invocations, hence the /1 /2 /3 variants. ( Aside: The higher level /1 /2 /3 sub templates are essentially wrappers or paramater sets to the core template. In places I think I've called the core template directly, but told it to invoke a specfic CSS class to get specfic niche behaviour (such as for definitions that aren't numbered in the same way) ShakespeareFan00 (talk) 13:49, 16 November 2020 (UTC)
- I've also put some additional documentation in a comment at the start of Template:Uksi/paragraph and in the relevant CSS templatestyles.ShakespeareFan00 (talk) 13:50, 16 November 2020 (UTC)
- @ShakespeareFan00:
Sidenotes...
Page:A Collection of Charters and Statutes relating to the East India Company.pdf/141
Not quite an overlap, but here the formatting is such that 2 sidenotes run very close together.
Is there a sandbox for checking how something will look with dynamic layouts in mainspace? ShakespeareFan00 (talk) 16:28, 16 November 2020 (UTC)
- @ShakespeareFan00: annoyingly not (yet), because Sandbox redirects to Wikisource:Sandbox, and the layouts gubbins doesn't work in Wikisource: namespace. Maybe when it's gadgetised it can be turned on for Sandbox pages. For now, I'd just pick a quiet corner of mainspace and fiddle there and it can be speedied it when it's no longer needed. Inductiveload—talk/contribs 17:18, 16 November 2020 (UTC)
- Was there a localised version of the Community Wishlist, because Gadget isation of the layouts code would be something I would support :) ShakespeareFan00 (talk) 17:45, 16 November 2020 (UTC)
- A Collection of Charters and Statutes relating to the East India Company/53Geo3_c155, Layout 2 works but only because there's sufficent vertical space in the layout.
Layout 3 breaks, because it puts one reference over the top of another because they are to close. (33 G. 3. c. 47.) in the paragrpah marked (III.) is placed over the side-title for that section.
Layout 4 breaks, because the marginals aren't seemingly far enough over.
The title of the linked page is where in the structure that transclusion would be anyway.
The thinking is that there might need to be a specific "layout" for legislation?
I note you did attempt to move the sidenotes styling into CSS? (at least for Page Namespace) ShakespeareFan00 (talk) 18:02, 16 November 2020 (UTC)
- Gadgetisation of Dynamic Layouts will be a local effort. In the meantime you can experiment with new layouts yourself by adding to your personal JS: Help:Layout#How_to_write_dynamic_layouts.
- Remember that dynamic layouts are aesthetic suggestions: users with no JS, most e-readers and users who opt out of the dynamic layouts or just the default layouts may not see the layout you expect. Thus, if the formatting doesn't work in any layout except a specific one, it's a defect. It's the responsibility of Wikisource to ensure the content renders acceptably regardless of layout (and also when there is no layout) This can be very tricky.
- Sorting sidenotes in the page namespace is distantly related, but more of an effort to also start unpicking the disaster area of the sidenote templates in general. Inductiveload—talk/contribs 08:52, 17 November 2020 (UTC)
- I've moved the experimental code as you suggested.. What I was attempting to do was collapse the sidenotes on a narrow screen/viewport in CSS using a media query.
- The addition of classp was so that a user could roll their own "class" if needed. (The intention was to eventually ask for implementation of classm to setup a custom main page class as well if needed, which could be defined as part of Dyunamic layouts or as a templatestyle!) In Page:The Laws of the Stannaries of Cornwall.djvu/122 I'm using a changed color and 'floated' behaviour.
was used to ensure the sidenotes for Item 19. weren't pushed down to far, however it's an ugly ugly kludge, because of the amount of additional whitespace it generates, (and of course it only works in Page namespace currently. If you know of a way of tweaking it so the same effect would happen in main-space without breaking anything.
- If the approaches I am taking here pan out, then as I said earlier cl-act might be replaceable...
ShakespeareFan00 (talk) 10:18, 17 November 2020 (UTC)
- @Inductiveload: BTW When did Dynamic Layouts get enabled in Page: namespace? because things look odd for some templates currently compared to previously.ShakespeareFan00 (talk) 13:28, 17 November 2020 (UTC)
- Well after a LOT of examining what had gone wrong, I've got a sandboxed version of Left and Right sidenote that should be more amenable to being tweaked. ShakespeareFan00 (talk) 14:51, 17 November 2020 (UTC)
Curious - Page:The_Laws_of_the_Stannaries_of_Cornwall.djvu/122 This give options for dynamic layouts and then fails to apply them, or a User supplied one. Back to 'forcing' a defined layout (which is bad design) because Mediwaiki is seemingly being pedantic about how to do things again :( ShakespeareFan00 (talk) 15:44, 17 November 2020 (UTC)- @Inductiveload:
- Applied the load-fix you suggested, and things worked. (The Stannaries page I linked earlier looks very nice in the layout I worked out.) Looking at the approach I used in {{UKSI}}, cl-act and the dynamic layouts stuff the cl-act module overly complex. Much of it's function can be replaced with a 'custom' layout and some TemplateStyles (like the approach I had attempted on uksi), this should make it easier to maintain.
- {{Outside}} and {{Outside2}} I seem to have implemented a clearfix option, but I don't think I propagated this to the templates actually called because of concerns about the approach used, You are welcome to review this, but I don't think the relevant parameter is in use currently.
- Currently {{Outside L}} and {{Outside R}} use the Page namespace specfic class created for {{left sidenote}} and for left and right sidenotes. Now that Dynamic layout seems to be active for Page: namespace. Perhaps the handling for all these should be adjusted so that what's currently a Page-name specfic Templatestyle, can be moved into the parameter set for the relevant Dynamic Layouts? (Aside1: Strictly speaking Page namespace has 2 different presentation layouts, One for the Page: a Narrow display (with the Page scan), and the full width used for the Preview when editing. I'm unsure if the dynamic layouts code is able to accept a full CSS selector as the first parameter of the paired items used to define a layout, or if there is way to check for narrow vs wide display.)(Aside 2: When (and if) the Dynamic Layouts get turned into a Gadget, I would very strongly suggest that the code which applies them was modified, so that the Layouts can be supplied as a Templatestyles like Stylesheet, which is separate from the core code... this would make defining 'work-specfic' layouts easier (currently this is sometimes done with {{page layout}} and {{margin note}}/{{margin block}} which aren't compatible with the dynamic layouts approach as they force things.) , and potentially allow for the use of CSS media selectors to accommodate even more differences of display device/screen wdith.
- @Inductiveload:
ShakespeareFan00 (talk) 10:33, 18 November 2020 (UTC)
Dynamic layouts (1)
User:ShakespeareFan00/Sandbox/Layouts
These have a defined class on them, that should ONLY affect the styles in the given test layout. Currently in User space, when I change the layout, the styling does not appear to change, as might be expected. Have I overlooked something (load order, cache lags etc.)? ShakespeareFan00 (talk) 11:30, 19 November 2020 (UTC)
- This is more curious, if I load the page manually (with a different layout from my test one) it loads and displays as expected, but when I move to my custom layout, it changes to my custom layout, but then doesn't change back when I move to a different layout. ShakespeareFan00 (talk) 11:34, 19 November 2020 (UTC)
- I am not using the Gadget-ised version yet, as that's still limited to Mainspace... I'm using the current 'live' version of Dynamic Layouts because those are active in User:space (albiet currently with what seems to be a glitch.)
- I also find I am having to refresh the page sometimes to get Dynamic Layouts at all, outside of Main namespace.ShakespeareFan00 (talk) 11:35, 19 November 2020 (UTC)
@Inductiveload: : I think what's happening is that the code is dutifully adding the new style class (and for some reason a full CSS selector appears to work for setting up a style) per the current code, but crucially when I had my custom classes, being "_sidenote_container" and "_smash_left" the code for these is set on entering my custom layout, but it is NOT being removed when the layout changes to an existing one, because the current code just runs through the pairs in the data set for the new layout. It doesn't currently have a means to look for styles it needs to remove 'styles' that are only defined for a specfic layout. This would not have been noticed previously because all the existing layouts that have been defined have the same number/layout of parameters defined. ShakespeareFan00 (talk) 13:34, 19 November 2020 (UTC)
- Looking at the Gadget code, it's using an Attr to set these? Should it also be doing some kind of remove class before it leaves one layout for the next one in the toggle? ShakespeareFan00 (talk) 20:36, 19 November 2020 (UTC)
An apology..
I'm sorry, It seems that earlier this week, I was more than a little frustrated as a result of my own inability to debug code, and this may have come out on your talk page amongst other places.
This is not the sort of thing that's reasonable on a project like this.
If you are still willing to respond to queries let me know, but I will understand if you don't help someone (me) that's gets stroppy and grumpy, when they encounter mostly through the (non) limits of their own technical ineptitude, things that fail to work as they would have done if properly implemented with a calmer attitude. ShakespeareFan00 (talk) 21:50, 19 November 2020 (UTC)
- It's OK, but I think you should probably lay off the layouts for a little bit. enWS will get there, but it's a slow process. There's always plenty to do that doesn't need digging though crufty templates and JS.
- If you really want to do "UK legislation-y stuff" but, quite reasonably, find the sidenotes frustrating, there's always House of Commons Parliamentary Papers or the United Nations Treaty Series. Maybe avoid the tables in the HCPP if you value your sanity! Inductiveload—talk/contribs 00:57, 20 November 2020 (UTC)
- (Aside: Not that we have Visual Editor on Wikisource, but wasn't there a proposal to have a visual editor for tables, that gave a spreadsheet view, for data entry? Given that on Wikisource we also have {{ts}} Providing what's effectively a mini-spreadsheet entry interface wouldn't be infeasible.
The second aside would be a Visual Editor for Wikisource, given that the template set here is very different from the English Wikipedia one.
ShakespeareFan00 (talk) 08:34, 20 November 2020 (UTC)
Interesting and thank you
Interesting and thank you, I am trying to do some proofreading but suspect I will tire of it soon after having exerted all my possible energy into the FIRST creation. Also is there a way to put the "Modern English" as I have in my article into that new article? Seems pointless to spend my time publishing something that 50% of people who open will immediately close as unreadable. Peace.salam.shalom (talk) 05:24, 20 November 2020 (UTC)
- @Peace.salam.shalom: What you are talking about is what we call "annotation". It is explicitly allowed (encouraged, even) to make such annotated versions (e.g. a re-spelled version). However, there should first be a "clean" version of the original text. Because of this, we rarely have any decent and complete annotated versions, because people usually run out of energy after the first chapter. There's more about expectations at WS:ANN. I personally feel that annotated versions are a sadly under-used resource at Wikisource, so I hope your project works out!
- I'll be happy to help with any technical aspects if you like. Fair warning: support for such annotations is not very extensive. We do have, for example, the {{Side by side}} template which you may find useful. Inductiveload—talk/contribs 05:34, 20 November 2020 (UTC)
- Well if you know a way to put the chapter headings on the index page so I can see whether I should be clicking 171, 145 or 199 to find "The Wys Virgyns" would be great - don't know if possible. Oh and how to mark pages with images. Peace.salam.shalom (talk) 05:36, 20 November 2020 (UTC)
- Figuring out the right page for a chapter is tricky without a TOC. Perhaps the easiest way is to download the PDF and scrub through it on your computer.
- Pages with images can be marked "Problematic" (and place {{missing image}} on the page) if you don't want to extract the image yourself. Inductiveload—talk/contribs 05:41, 20 November 2020 (UTC)
- Well if you know a way to put the chapter headings on the index page so I can see whether I should be clicking 171, 145 or 199 to find "The Wys Virgyns" would be great - don't know if possible. Oh and how to mark pages with images. Peace.salam.shalom (talk) 05:36, 20 November 2020 (UTC)
- Pages 8,9,10 I think are the TOC
- Oh, I think there are 14,000 pages at Category:Pages with missing images so it's not likely they'll ever be added by some bot that extracts, uploads, inserts, etc...seems like it's really just investing my time to such little purpose. Very frustrating, appreciate your help but I dunno. Peace.salam.shalom (talk) 05:51, 20 November 2020 (UTC)
- The TOC doesn't have page numbers, so it's not very helpful in directly taking you to the page. I'm happy to do the image extraction and upload tomorrow if you like. Inductiveload—talk/contribs 05:54, 20 November 2020 (UTC)
Interesting works
(Moved from other page just so we're not off-topic)Probably 14th-17th century "Advice to Women/Daughters/Wives" works, whether feminist like Author:Christine de Pizan's "The Book of the City of Ladies" or the ones that are...not, as in the The Book of the Knight of the Tower. But the 18th/19th/20th century are "too modern" for my tastes, Mayflower Pilgrims/Puritans would probably be the outside limit of 'acceptable'. Maybe interested in non-contemporary accounts of women in ancient cultures lesser-known than Greek/Roman/Egyptian as well. I'll browse through the periodicals and see if anything pops up, or if you wouldn't mind adding the City of Ladies, or anything else matching the general "How a woman should live"/Etiquette/Advice funny stuff...I'll promise to spend at least as much time proofreading/tagging/categorizing it as you must've spent finding/uploading/prepping it :) So at the very least, I'm matching your donations! Peace.salam.shalom (talk) 01:55, 21 November 2020 (UTC)
- Edit: Lol, apparently they are called w:Conduct books or w:Courtesy books and "the Spanish Castigas y Doctrinas que in Savior Dava a Sus Hijas (Admonitions and Doctrines from a Wise Parent to His Daughters, 1406" might be fitting since I can translate it (if it's short) from Spanish if necessary :) My real bane is just the large books, because my attention span is not long enough to spend a month on a single book. Peace.salam.shalom (talk) 02:01, 21 November 2020 (UTC)
- https://archive.org/details/youngladysparent00gregiala is a little past my usual interest, but it has three books in one - all on the same theory, so might be worth an upload...not sure I'll ever get through 170 pages myself though...but I might get the shortest of the three done myself hopefully https://archive.org/details/instructionslady00unknuoft/page/2/mode/2up is the same three-book series in a different binding if one of them is clearer for OCR ease :)
https://archive.org/details/poorrobinstruech00lond/page/4/mode/2up is only 9 pages and 17th century, I could definitely commit to doing all of it - much though I'd love to promise to finish https://archive.org/details/ldpd_14974920_003 it seems less likely :)
- @Peace.salam.shalom: Sorry for the delay! I can't see a PD source for City of Ladies - most translations seem to be since 1999, so they'll be probably copyright for the rest of our lives. If you know a good one, let me know. Likewise, I don't see Castigos y Doctrinas obviously available.
- Indexes done so far:
- A Way to get Wealth is 6 volumes, so I'll do that separately in order to do a decent job, but I can't do it right now.
- If you run into a periodical that you'd like but the issue isn't there, let me know and I'll see if I can sort it out. Most periodicals are "sparse", but often the scans exist somewhere online. Inductiveload—talk/contribs 00:31, 22 November 2020 (UTC)
- @Peace.salam.shalom: Some more single indexes:
- Index:A father's legacy to his daughters - Gregory - 1808.djvu
- Index:A practical directory for young Christian females - Newcomb - 1833.pdf
- Index:The young lady's guide to the harmonious development of Christian character - Newcomb - 1841.djvu
- Index:Advice to young ladies on their duties and conduct in life - Arthur - 1849.djvu
- Index:The young woman's guide to excellence - Alcott - 1840.pdf
- Index:Letters to young ladies - Sigourney - 1837.djvu
- A few others are still processing from Hathi, which has better scans. Inductiveload—talk/contribs 22:35, 22 November 2020 (UTC)
- Wow, thank you...I was so annoyed at having somebody lecture me about something or other on my talkpage I was ready to quit the whole project...but I did commmit to doing at least that one work if you uploaded it and I see you did - so that draws me back. Am I correct that [[1]] should be marked as "blank" since it is not the work itself? I saw you did similar with a digitizing watermark, right? Hit a couple small snags, for example Page:Poor Robin's True Character of a Schold - 1678.djvu/10 - is there some template to use when I just have no idea wtf the author is talking about? I feel like the cutting of a vein beneath the dogstar should be wikilinked to wiktionary or wikipedia except I'm confused whether she's using astronomical analogy for anatomy or anatomical analogy for astronomy and the whole thing makes no sense to me. Some form of {{reconstruct}} or {{sic}} or similar? Peace.salam.shalom (talk) 06:52, 23 November 2020 (UTC)
- @Peace.salam.shalom: Correct, we don't reproduce library "extras", ex libris stickers, digitization watermarks, pencil notes that someone has made in their library book and so on. Just mark the page "without text" and leave the text boxes blank.
- I'm glad I've suckered you back in. Wikisource has a rather steep learning curve, and sometimes it feels a bit daunting, but I promise it gets easier! The first thing that happened to me was I got my attempt at a book punted to German Wikisource, where it was summarily deleted because it didn't have complete scans! Let me know if you have any queries. If you're not sure, you can just do what feels best, and ask someone to take a look (or someone might look in and make friendly adjustments).
- Generally we don't clarify such things, but the very occasional link is OK under the annotation guidelines. I also have no idea what that means. You can always just leave it. Inductiveload—talk/contribs 07:29, 23 November 2020 (UTC)
- Wow, thank you...I was so annoyed at having somebody lecture me about something or other on my talkpage I was ready to quit the whole project...but I did commmit to doing at least that one work if you uploaded it and I see you did - so that draws me back. Am I correct that [[1]] should be marked as "blank" since it is not the work itself? I saw you did similar with a digitizing watermark, right? Hit a couple small snags, for example Page:Poor Robin's True Character of a Schold - 1678.djvu/10 - is there some template to use when I just have no idea wtf the author is talking about? I feel like the cutting of a vein beneath the dogstar should be wikilinked to wiktionary or wikipedia except I'm confused whether she's using astronomical analogy for anatomy or anatomical analogy for astronomy and the whole thing makes no sense to me. Some form of {{reconstruct}} or {{sic}} or similar? Peace.salam.shalom (talk) 06:52, 23 November 2020 (UTC)
I've been inspired by Queen Victoria; if you upload https://archive.org/details/rukaatialamgirio00aurarich as an Index I can proofread individual letters written between royalty, etc - instead of one long work that feels like I've accomplished nothing until the whole thing is complete, I can have a sense of accomplishment after each "chapter" since they're standalone works :) Does that make sense, or am I misunderstanding something? Also, is there a way to quickly see a "thumbnail gallery" of PAGE pages in an Index? It would make it much faster for me to quickly highlight problematic/image/blank pages, etc. Peace.salam.shalom (talk) 03:48, 24 November 2020 (UTC)
- @Peace.salam.shalom: Here you go: Index:Letters of Aurungzebe - tr. Bilimoriya - 1908.djvu. I did a couple of pages as a kind of guide. It makes perfect sense to me - I also kind of prefer "bitty" works where you can get little chunks done. I've been messing with something similar in the background. At the end of the day, you're here by choice, so do something that interests you!
- About the image grid, not as far as I know, but I do plan to make such a tool at some point because it will be useful and also it's annoying to have to wait for the server thumbnail cache to warm up every time. Inductiveload—talk/contribs 05:50, 24 November 2020 (UTC)
- Yeah, image grid even if you cannot edit inside it, just to give a quick overview of pages, would be nice. Btw, Page:Letters of Aurungzebe - tr. Bilimoriya - 1908.djvu/100 introduces two new problems with footnotes - on the page they are labelled as 4,5,6 but they automatically appear on Wiki as 1,2,3 - issue? (Although Letter to Shaista Khan (1659) shows them correctly, but other uses will not?) Also how do I handle a footnote that is spread between two pages?
- @Peace.salam.shalom: it's not a problem - we don't expect the footnotes to be numbered exactly the same.
- For a split footnote, the first page should look like
<ref name="some_name">Text</ref>
and the next page should be<ref follow="some_name">More text</ref>
wheresome_name
is the same. Pages 3 and 4 do this too. Inductiveload—talk/contribs 16:10, 24 November 2020 (UTC)- And just putting the "follow" ref up at the top of the page since there's no actual text to which to link it? As I've done Page:Letters of Aurungzebe - tr. Bilimoriya - 1908.djvu/101 it looks right on the page, is it correct if you view the "Edit page" code? Sorry to keep bothering - though the good news is I think after this, Letter to Shaista Khan (1659) is complete and the other letters should be easier Peace.salam.shalom (talk) 16:24, 24 November 2020 (UTC)
- @Peace.salam.shalom: correct. Actually, I don't think it matters where it goes, but top-of-page is conventional.
- Good luck with the others! Looking good so far: don't worry about bothering, I'm happy to help and you're doing really well! Inductiveload—talk/contribs 16:41, 24 November 2020 (UTC)
- Actually there is one more thing here: when works are scan-backed like this, we transclude them into a single mainspace work. In this case, something like:
- Normally, it goes "/Chapter 1", "/Chapter 2", but this work doesn't have numbered chapters. Inductiveload—talk/contribs 16:53, 24 November 2020 (UTC)
- And just putting the "follow" ref up at the top of the page since there's no actual text to which to link it? As I've done Page:Letters of Aurungzebe - tr. Bilimoriya - 1908.djvu/101 it looks right on the page, is it correct if you view the "Edit page" code? Sorry to keep bothering - though the good news is I think after this, Letter to Shaista Khan (1659) is complete and the other letters should be easier Peace.salam.shalom (talk) 16:24, 24 November 2020 (UTC)
Something like Letters of Aurungzebe/Letter to A'akel Khán, Fort-keeper and Governor of the Capital of Sháh Jehán Abád does not get automatically added to Category:1662 works though seems to be a problem, yes? Also it seems odd that the main "Letters" page says "Letters of Aurangzeb (1908)" without any explanation of their original date two centuries earlier. Even the translation says on the title page it was copyrighted 1867, not 1908. It would be a bit like saying "Romeo and Juliet written in 1926" just because that was the publication year. Can we list all three, or JUST the original timeframe and the copyright date nixing the particular edition date? And if you get REALLY bored - I was trying to look at Page:The Oriental Biographical Dictionary.djvu/45 which looks like nobody has done any work to transclude it unless I am missing something about the dictionary, but I assume it would one day be at Dictionary/Alamgir I (Alamgir I is the other name of Aurangzebe, same emperor)...and I don't know how to proofread and put on a page just that one PORTION of a PAGE page. :) Think I got two short letters done today though. Peace.salam.shalom (talk) 22:44, 24 November 2020 (UTC)
- To be quite honest, I'm not 100% on the best way to show the dates here. You could maybe add an override_year and make it clearer and categorise manually? Or maybe punt to the Scriptorium for more input?
- Re the OBD, that's one of several works that had the text later created for some reason. The way to deal with this is called "Labelled Section Transclusion", which sounds complicated but really you just write
## alamgir_i ##
in the page text and then in the main space, use<page index="Your index.djvu" page=45 onlysection="alamgir_i" />
. There are more details at H:TRANS. - Some very minor things I've noticed, by the way:
- We do replicate áccénts. There's a toolbox called "special characters" in the edit toolbar to help if you can't enter it using some keyboard shortcut (depends on your OS). Letters seems mostly to use áū, OBD seems much more accent-happy!
- No spaces before ;:!? and inside quotes or around —. The OCR process very often leaves these in.
- Don't use
{{c/s}}
, use {{center}}.
- Yes, there are a lot of "rules" and "tricks", but you're almost a master! Inductiveload—talk/contribs 00:34, 25 November 2020 (UTC)
- Accents...I'll be honest, not sure I can figure them out.
- Lack of spaces, aha, I was wondering whether or not to include them - because the image of the PAGE page often shows them with a blankspace beforehand which I assumed was just old typography. I'll try to remember to get rid of them in the future...can our OCR or a bot not automatically replace " ; " or " : " with "; " or ": " ? It would seem to never go the other way.
- When I used the {s for center I thought it screwed up my efforts to put things on separate lines, so I've tried to use it for single-line centering and <s for multi-line centering, no?
- Oh, and Page:Travels in the Mogul Empire, A.D. 1656-1668.djvu/535 is there something other than <poem> to use to just easily keep the list of characters in list-form without needing to manually add a dozen hard-paragraph-breaks? Peace.salam.shalom (talk) 00:45, 25 November 2020 (UTC)
- There are cleanup scripts you can use. I'm working on one personally (slowly) but it's a bit temperamental at the moment and needs a bit of coaxing to work. I think Wikisource:Tools_and_scripts#PageCleanUp is often used.
- {{center}} should always work even for multi-lines. E.g. the title page uses it. But if the content contains "=", you need to write it
{{center|1=Thing to center with an = in it}}
otherwise the template goes wrong (this is a universal wiki thing). Using the {{c/s}} tag is technically a deprecated HTML element. - For that Travels page, a table will probably be best, due to that braced section. Inductiveload—talk/contribs 00:58, 25 November 2020 (UTC)
- Hm, okay - will try that. Also should Index:Babur-nama Vol 1.djvu and Index:The Memoirs of Babur.djvu be merged so that people don't waste time proofreading both where they seem to be the same edition, etc?
- Merged and the latter index deleted as empty. I have made the table for you but it still needs proofreading. Inductiveload—talk/contribs 01:17, 25 November 2020 (UTC)
- @Peace.salam.shalom: hint: {{sc}} is for Small Caps Like This Inductiveload—talk/contribs 02:00, 25 November 2020 (UTC)
- Page:Travels in the Mogul Empire, A.D. 1656-1668.djvu/535, I just changed it to SC and it's still not working for his name at the top (obnoxiously only for the non-capital letters, lol...this is a good example of "good enough" in my mind) Peace.salam.shalom (talk) 02:06, 25 November 2020 (UTC)
- @Peace.salam.shalom: you need
{{sc|Aureng-Zebe}}
→ Aureng-Zebe. Only the lower case letters will be "small". Upper case is the same. Inductiveload—talk/contribs 02:08, 25 November 2020 (UTC)
- @Peace.salam.shalom: you need
- Page:Travels in the Mogul Empire, A.D. 1656-1668.djvu/535, I just changed it to SC and it's still not working for his name at the top (obnoxiously only for the non-capital letters, lol...this is a good example of "good enough" in my mind) Peace.salam.shalom (talk) 02:06, 25 November 2020 (UTC)
Weird thought, but looking at Index:A father's legacy to his daughters - Gregory - 1808.djvu - couldn't there be a button/bot that automatically reads each page to red/unproofed status, so that I could then transclude it into A Father's Legacy to his Daughters...and then see all 100 pages at once, clicking the little links to the left for individual pages to fix/markup/etc? It would be much more efficient than randomly clicking through a work looking for the TOC, or the chapter of most interest, etc. Peace.salam.shalom (talk) 07:05, 25 November 2020 (UTC)
- @Peace.salam.shalom: perhaps, but the practical upshot of that would be that for many works, we'd have dozens of "red" pages presented in the main namespace, and we don't really want that - only proofread text belongs in the main namespace.
- Your image grid idea seems much better to me. Inductiveload—talk/contribs 07:17, 25 November 2020 (UTC)
Novalis Project
@Inductiveload: I want to thank you again for helping me to get started on the Novalis translation project for Faith and Love and the King or the Queen. Your uploading the books and pointing me toward how to get started has made that side of the project simple. Though I'm not totally finished with the translation, I think I'm far enough along to ask about the next step.
The project has been much more difficult than I anticipated (What's new?) In part, because Novalis intentionally writes in a prose that is written a kind of symbolic code (as you may remember reading when you translated the first "fragment"). I have completed a rough draft first translation for the entire work. Pages that have been translated are the pink ones in Index:Novalis_Schriften_-_Volume_2.djvu. I am now working my way through the pages again, cleaning up the translations. As I work through them in this second round, I have been marking them as proofread. This is still time consuming, but much faster than laying down the rough draft translation! Depending on how much time I have available, it could be "proofread" by me in a couple of days. I'm feeling pretty good about the translation because my final step before pushing the button is to check the translation to the one's out there and though my language and structure are significantly different they've agreed with me on the gist. (There are some major places where we disagree. I sometimes wonder if the author wasn't using a different version of the work. However, some of it could certainly be on my side)
So, here's my question. What do you suggest I do next? I need to keep "proofreading", I've only "proofread" eight of the eighteen translated pages. Should I wait until I'm done all of them? My thought- hence why I'm emailing now- is to reach out to people who would be willing to validate the pages. If they can start now on the pages I've done, that would probably make the completion quicker. If so, is there someone you can recommend who could look at the text, validate its integrity and fix errors? This validation may be difficult in its own right as the work is very poetic, and thus more open to subjective interpretation than most works. Is there anybody you could recommend or a page for me to go to put out a request?
Again, I want to say that I so much appreciate the great work you did to step in when I put out my first help request. When I'm sitting here working on the project, I just think of what a rough ride this would've been without your help. I'm not even sure it would've have gotten started as I was having difficulty just trying to wend my way through the help pages to figure how to do a translation in proper Wikisource style. Thank you again!Wtfiv (talk) 08:01, 20 November 2020 (UTC)
- @Wtfiv: for this work, because it's fundamentally separate works in a collection, I think you'd be OK to transclude it as it stands to Translation:Writings of Novalis and Translations:Writings of Novalis/Faith and Love and the King or the Queen now.
- I don't have anyone I can really think of to help of the top of my head, but you might be able to find someone interested at s:de:Wikisource:Skriptorium or even a German enWP or deWP editor who hangs around w:Novalis or w:de:Novalis? Inductiveload—talk/contribs 00:33, 21 November 2020 (UTC)
@Inductiveload: Thank you so much for your help! I think I can work it out. I just need to keep slogging at the translation for now. Once more, you last note provided valuable guidance! AppreciativelyWtfiv (talk) 00:39, 21 November 2020 (UTC)
- @Wtfiv: The transclusion looks good to me. Good work! Inductiveload—talk/contribs 14:24, 21 November 2020 (UTC)
- @Inductiveload: Thank you! I'll look into transcluding the book too to finish up the great work you've done. And who knows, maybe this translation project will continue... Thanks again!Wtfiv (talk) 22:00, 21 November 2020 (UTC)
@Inductiveload: I'm currently figuring out the secrets of transclusion and book creation. It looks to me like Wikisource policy is to keep the formatting and order as close to the original as possible, which I've been trying to do with the translation. However, if an index is at the back of the book would it be okay to move it to front, that is to the book page itself ,so it can serve as links to the various works within?
- @Wtfiv: I think putting that TOC up front is more than sensible (and, in fact, is required to make ebook exporting work). Inductiveload—talk/contribs 01:50, 25 November 2020 (UTC)
Another scan
Hello. Regarding the discussion archived at User talk:Inductiveload/Archives/2020#Scan: I have had similar problems with commons:File:The Typographical Journal, Volume 6, 1 August 1894 to 1 August 1895.djvu. James500 (talk) 00:21, 21 November 2020 (UTC)
- @James500: I will regenerate the scan, but I can;t do it right now. Again, there are "dodgy" images in the IA image sequence (first and last, at least). Looks like an interesting "booky" periodical! Inductiveload—talk/contribs 00:56, 21 November 2020 (UTC)
- @James500: Scan regenerated and index created. The OCR looks "OK", considering such light scanning (I had to use a threshold of 80%!). What do you think? Inductiveload—talk/contribs 14:22, 21 November 2020 (UTC)
- It seems okay to me. Thanks. James500 (talk) 01:11, 22 November 2020 (UTC)
Beale
Dictionary of Indian Biography/Beale, Thomas William — billinghurst sDrewth 04:24, 25 November 2020 (UTC)
Index preview script
Please check your script.
I said this was Raw img: Page:The Traffic Signs Regulations and General Directions 2002 (UKSI 2002-3113 qp).pdf/47 and it marked it as missing table.
Thanks. ShakespeareFan00 (talk) 16:21, 27 November 2020 (UTC)
- Specfically, check around line 348 in the relevant script, it's warning about a duplicate function name? ShakespeareFan00 (talk) 16:28, 27 November 2020 (UTC)
a) Oh dear, Page:Advice to young ladies on their duties and conduct in life - Arthur - 1849.djvu/8 and elsewhere...I had the same problem and didn't notice; I'll pause using it right now so I don't create a backlog of miscats. b) I don't suppose it's possible to allow the tool to reclassify red pages as RawImg/Table? There are a lot of works where all 300 pages are red...but it's difficult to then go sort through them :) Peace.salam.shalom (talk) 18:23, 27 November 2020 (UTC)
- Sorry about that. Should be fixed now! I must have messed up my deployment (aka, copy-paste).
- I'll think about allowing overwriting pages, but it'll need a little bit more thought (the API is different, and probably needs a prompt). This is why blind "dumping" of the OCR is not always helpful. Inductiveload—talk/contribs 19:49, 27 November 2020 (UTC)
- Should I be using "Table" when it's only a half-page table, or like images should it only be for full-page ones? Or should it add {{table missing}} but still leave the other OCR text efforts? Peace.salam.shalom (talk) 20:35, 27 November 2020 (UTC)
- Possible for the rewrite tool to distinguish between Red pages that can be overwritten (ideally just to ADD the template, without deleting the OCR-d text already non-proofread), and Green/Yellow pages that cannot be overwritten? Or not how the system works? I'll leave it with ya. Peace.salam.shalom (talk) 22:44, 3 December 2020 (UTC)
- IMO, don't use it for half-page tables, because it prevents the OCR from the text layer in the file from loading for the next editor. There's not really a lot of benefit in pre-emptively doing table templates for an other-wise empty work as no-one is going around doing only isolated tables. {{missing table}} is more for when an editor isn't able to format the table in an otherwise complete work and needs to mark it as such, at which point someone will (hopefully) step up and do it for them. Inductiveload—talk/contribs 09:05, 3 December 2020 (UTC)
- Should I be using "Table" when it's only a half-page table, or like images should it only be for full-page ones? Or should it add {{table missing}} but still leave the other OCR text efforts? Peace.salam.shalom (talk) 20:35, 27 November 2020 (UTC)
- Okay, noted - although I guess I was of the opinion noting 500 missing/unformatted tables now might prove useful if at some point somebody develops a tool to magically transpose them...better that we already have a backlog list created from the previous years that the tool can run against. Save a couple years of newly finding/tagging them. On related note, would still love if the Show Page Grid allowed simply Missing Image for when it's not a full-page but it's still making it easier for interested editors who resolve to upload all images of the Mughal Empire (or old coins, or whatever their interest is) to quickly skim only the pages with images that still need uploading, etc. And different issue, also related, would love if I could do the same one-click edit that didn't mess up the OCR but marked specific pages as "Index" for example (because often between 2-50 pages at the end of the work are just an index which is useful to see at a glance how much actual work there is to transcribe just the book wihtout the index) Peace.salam.shalom (talk) 16:18, 3 December 2020 (UTC)
- @Peace.salam.shalom: I just wrote the tool, I don't make any rules, it's just my opinion. Again, if the tables are in an otherwise-proofread work, they should already be marked properly. Even if we ad a magical table-OCR-and-format-tool, filling in isolated tables shotgunned throughout otherwise-blank works is of questionable value to me.
- I'll work on allowing a page overwrite option, but it might not be for a little while since it needs an "are you sure, you're about to blow away work someone pressed save on" dialog.
- Re marking pages as "index", that's to do with the pagelist and not really in the remit of this tool as it stands. Interposing new values into the pagelist without destroying it in some corner case is non-trivial. Maybe one day. Inductiveload—talk/contribs 19:44, 3 December 2020 (UTC)
- Okay, noted - although I guess I was of the opinion noting 500 missing/unformatted tables now might prove useful if at some point somebody develops a tool to magically transpose them...better that we already have a backlog list created from the previous years that the tool can run against. Save a couple years of newly finding/tagging them. On related note, would still love if the Show Page Grid allowed simply Missing Image for when it's not a full-page but it's still making it easier for interested editors who resolve to upload all images of the Mughal Empire (or old coins, or whatever their interest is) to quickly skim only the pages with images that still need uploading, etc. And different issue, also related, would love if I could do the same one-click edit that didn't mess up the OCR but marked specific pages as "Index" for example (because often between 2-50 pages at the end of the work are just an index which is useful to see at a glance how much actual work there is to transcribe just the book wihtout the index) Peace.salam.shalom (talk) 16:18, 3 December 2020 (UTC)
By no means something that needs looking at in the near future, but I was wondering if there was a way of extending the CSS, so that if you've got a 2a type paragraph following a definition, it auto indents them.
The relevant stylesheets are : Template:Uksi/styles.css , but I also don't think I updated the /2a template variant to use the CSS styles at all yet :(.. If you want to take a look and cleanup the somewhat convoluted approach feel free. ShakespeareFan00 (talk) 22:14, 4 December 2020 (UTC)
- I'll take a look at some point, but probably won't be any time really soon. Inductiveload—talk/contribs 21:39, 5 December 2020 (UTC)
Special thanks
Thank you for all the work you've been doing for me to find good-quality scans and properly index them, etc. so I can proofread entire books. The ProofreadPage extension is officially how I just read books now; this is a lot of fun! As I'm not affiliated with any of the universities accepted by HathiTrust I can't download entire books from them, so really it's a big help to me. PseudoSkull (talk) 16:38, 5 December 2020 (UTC)
- You're very welcome and I'm glad a small effort on my part can multiply up to a fully proofread work! I don't have direct access either - I have to rip them page by page and re-assemble as a DjVu! Bit of a pain, but c'est la vie, and the computer keeps the room warm while it's chugging along!
- Please always feels free to drop scan requests for me! Inductiveload—talk/contribs 21:39, 5 December 2020 (UTC)
Newbie Part 2
So I'm looking at Help:Match_and_split#MATCH and what you did on Symposium, so I went and tried this - but Step 5 says "5) Click the __MATCH__ and the job will start" and the word "__Match__" is not clickable, and clicking the PAGE redlink also doesn't seem to do anything. Peace.salam.shalom (talk) 18:09, 25 November 2020 (UTC)
- Never mind, I needed to activate a gadget called Phe-Bot, done that, now it's clickable. Peace.salam.shalom (talk) 18:12, 25 November 2020 (UTC)
- This may be the record for the quickest graduation to M&S ever! Inductiveload—talk/contribs 18:13, 25 November 2020 (UTC)
- Wait and see how badly I screw it up...but I was so distressed at the idea of 150 copy/pastes last night just to satisfy y'all that my one long copy/paste was valid...and then I saw your Match/Split miracle and thought "That's something I need", so went and read the page (apparently not quite closely enough). So then it was to {{migrate}} or {{match}} or something and it had a category showing works that need M/Sing so I picked the one that had a name I recognized and figured I'd mess around with it. So can Doctor Faustus (1604) have some of those templates removed from the top now? Peace.salam.shalom (talk) 18:16, 25 November 2020 (UTC)
- I transcluded the rest of the work (front matter and the first page) and removed {{migrate to}}. It's still a mess in terms of formatting, so {{standardize}} should stay for now. Inductiveload—talk/contribs 20:39, 25 November 2020 (UTC)
- Wait and see how badly I screw it up...but I was so distressed at the idea of 150 copy/pastes last night just to satisfy y'all that my one long copy/paste was valid...and then I saw your Match/Split miracle and thought "That's something I need", so went and read the page (apparently not quite closely enough). So then it was to {{migrate}} or {{match}} or something and it had a category showing works that need M/Sing so I picked the one that had a name I recognized and figured I'd mess around with it. So can Doctor Faustus (1604) have some of those templates removed from the top now? Peace.salam.shalom (talk) 18:16, 25 November 2020 (UTC)
- This may be the record for the quickest graduation to M&S ever! Inductiveload—talk/contribs 18:13, 25 November 2020 (UTC)
Victoria
Ah, but how would I handle it with Queen Victoria where Index:Queen Victoria (Strachey).djvu has many already-proofed pages? I wouldn't want to overwrite those pages by just putting the __match in at page 92...unless I could tell it to stop at page 131, start again at 137, etc.UPDATE: Lived dangerously, tried anyways - it hits nomatch__ when it reaches an image...is there a way to tell it to do PAGE17-21,45-92,98,109-144 for example of an index and not do the others? Oh and do I need to leave tab open while it says "Splitting" for ten minutes, or I can close it? Peace.salam.shalom (talk) 18:23, 25 November 2020 (UTC)- Quick confirmation that I goofed up somewhere, because Queen Victoria now ENDS where the bot hit an image, and it just deleted the 300k of data afterwards :\ Peace.salam.shalom (talk) 18:49, 25 November 2020 (UTC)
- It's possible, the bot is far from foolproof. You can always restore the text from the old revision if it trips up. Sometimes it can take some experimentation, sometimes it's just too dumb to figure it out. You can always adjust the match headings manually around images if you need before you press "split".
- There's a reason {{migrate to}} has a backlog: matching and splitting is a pain in the neck. And it's the same reason we prefer to scan back from the start, because otherwise we get works that sap time to fix up. Inductiveload—talk/contribs 20:39, 25 November 2020 (UTC)
- @Peace.salam.shalom: ok, so I think Victoria is pretty much matched and split, but there are still some defects like missing footnotes. Inductiveload—talk/contribs 22:32, 25 November 2020 (UTC)
- Thanks, I might try to play around with this tool and once I figure it out a bit better, clear up some of the backlog at Match/Split - it seems like this is within my realm to learn...hopefully :) Putting multiple match headings in is a good idea, I'm just concerned - does the bot auto-stop when it reaches a page that exists and/or is yellow not red? Or does it rewrite over it? I'd rather not rewrite somebody else's proofread/validate, etc for my plaintext bot dump. Peace.salam.shalom (talk) 00:00, 26 November 2020 (UTC)
- @Peace.salam.shalom: In the Split step, the bot overwrites all the pages it finds references to in the headings. If some of the target pages should not be overwritten you will have to either remove those headings (and the contents of that section) before running Split, or you will have to revert the bot's changes to the affected pages after Split finishes. All pages on the wiki retain all old versions, so while messing something up may create a need for cleanup work, the worst case is lots of manual cleanup and not loss of data.Adding scans to existing texts (using Match&Split) is a great way to help the project, if that way of working appeals to you. Just be aware that there are some subtle pitfalls there. For example that a non-scan-backed text may not actually match the edition of a work it claims to be sourced from, and thus may have completely different pagination, punctuation, or even textual changes. There's nothing dumber than a computer, so Match&Split will never notice such issues and will merrily carry on regardless. I recommend a "measure twice, cut once" approach and being quick to ask the community for help or advice if you're unsure about something. --Xover (talk) 07:06, 26 November 2020 (UTC)
- @Xover: I don't think the split phase will actually overwrite existing pages. It's part of the reason why bulk-dumping raw OCR is suboptimal in many cases, because it inhibits M&S. Moreover "not proofread because it's just OCR" and "not proofread because it's M&S output" are really hard to tell apart other than manually (which you of course know very well!)
- @Peace.salam.shalom: assisting with the M&S backlog would be highly appreciated. It's one of the most frustrating backlogs we have, because every drive-by dump of Project Gutenberg grows it!
- Another very useful thing to do is to patrol {{no source}} and try to find the source documents and change to {{scans available}}. If the scans are also uploaded and the Index page exists, then it can be {{migrate to}}.
- The one thing (other than the caveats Xover mentioned) is that it would be good if after match and split, even if the pages are "proofread", the mainspace text is at least in the same or better condition as it was before. This unfortunately does sometimes require a little bit of time and fiddling to ensure (as with Victoria). Of course, if you run into a situation that you're unsure about, someone will gladly help. Inductiveload—talk/contribs 13:16, 26 November 2020 (UTC)
- @Peace.salam.shalom: In the Split step, the bot overwrites all the pages it finds references to in the headings. If some of the target pages should not be overwritten you will have to either remove those headings (and the contents of that section) before running Split, or you will have to revert the bot's changes to the affected pages after Split finishes. All pages on the wiki retain all old versions, so while messing something up may create a need for cleanup work, the worst case is lots of manual cleanup and not loss of data.Adding scans to existing texts (using Match&Split) is a great way to help the project, if that way of working appeals to you. Just be aware that there are some subtle pitfalls there. For example that a non-scan-backed text may not actually match the edition of a work it claims to be sourced from, and thus may have completely different pagination, punctuation, or even textual changes. There's nothing dumber than a computer, so Match&Split will never notice such issues and will merrily carry on regardless. I recommend a "measure twice, cut once" approach and being quick to ask the community for help or advice if you're unsure about something. --Xover (talk) 07:06, 26 November 2020 (UTC)
- Thanks, I might try to play around with this tool and once I figure it out a bit better, clear up some of the backlog at Match/Split - it seems like this is within my realm to learn...hopefully :) Putting multiple match headings in is a good idea, I'm just concerned - does the bot auto-stop when it reaches a page that exists and/or is yellow not red? Or does it rewrite over it? I'd rather not rewrite somebody else's proofread/validate, etc for my plaintext bot dump. Peace.salam.shalom (talk) 00:00, 26 November 2020 (UTC)
Hmm, Chapter 4 and 5 seemed to go well enough once I put in multiple anchor points for __Match - but First Footsteps in East Africa/Chapter 6 seems to have hit a weird snag where it insists that the texts do not match from where it ended the match...and I manually set a few more anchors since they do match (though we have the footnotes as endnotes right now, but it did Chapter 4, 5 and half of 6 without complaining)...so I'm confused what stopped it. I think I handled Chapter 4 and 5 correctly in then going back and removing the text that matched already-proofread/validated pages and just changing the number of pages transcluded with the <s Peace.salam.shalom (talk) 14:33, 26 November 2020 (UTC)
- @Peace.salam.shalom: It can be a bit hit and miss - it could be those enormous footnotes are just too much for it. If there isn't enough "body" text to latch onto, or the OCR is just different enough, it might just declare it a "no match". I don't specifically know the technique it uses to find matches, but I can imagine that might upset it, and it's a rather tricky computing problem in general to solve. You can insert a manual
==[[Page:First Footsteps in East Africa, 1894 - Volume 1.djvu/XXX]]==
heading when needs and restart the match from the next page. In this case I imagine re-starting from/189
would work OK. (edit: yep, it worked OK) Inductiveload—talk/contribs 14:53, 26 November 2020 (UTC)
- Proofing myself, I screwed up somehow because Page:First Footsteps in East Africa, 1894 - Volume 2.djvu/246 not only assigned far too much to the page...but all subsequent pages as well, and I'm not sure how to "unedit" the pages :\ Peace.salam.shalom (talk) 17:33, 26 November 2020 (UTC)
- @Peace.salam.shalom: Right, so there is clearly some wierd issue with the match here, since it's glomming the first two pages into one and getting out of step. I have actually seen this before in Lord of the World, so it must be an issue with the bot. @Xover: any ideas - the wrong match is diff.
- In general the best case if you check that the match is correct before you split as if you split wrongly, the incorrect Page: pages need to be deleted (I just did this).
- If you match only and do not split, all you have to do is "undo" the bot's "match" edit in mainspace and try again with a different match point (e.g. like this, which still didn't work) and maybe fill some in manually (which is what I finally did, since it's only a handful of pages: here). Basically, don't click "split" unless you're fairly sure the pages are right, because that's going to require an admin to delete the pages so you can try again (or you do it all manually, which is a waste of your time if a bot can do it). Inductiveload—talk/contribs 18:16, 26 November 2020 (UTC)
- Just guessing without really digging into what the problem is… Once the bot finds a match it keeps adding for a few pages (in case it's hit a plate or similar), and it only looks for a match at the start of a page. If it's glomming two pages together then it most likely matched the start of the first page, started adding the text, failed to find a match on the second page, and kept on adding that text to the page it was on. It only stopped when it found a match for the third page.If it was subsequently off for the remaining pages it was presumably some other issue; either a bug in the bot or user error (it looks like there were manually added ___MATCH__ tags in play here?). --Xover (talk) 21:16, 26 November 2020 (UTC)
- I don't think it was the manual __MATCH__es, I've seen this before and I had it happen with a single __MATCH__ (/248 should be at "the guide...", not "which I was", which is /249). My guess is the same as yours, but I guess somewhere something's doing
page + 1
notpage + num_pages_skipped + 1
. - One more thing to look out for if/when I dig into the code. Inductiveload—talk/contribs 21:25, 26 November 2020 (UTC)
- I don't think it was the manual __MATCH__es, I've seen this before and I had it happen with a single __MATCH__ (/248 should be at "the guide...", not "which I was", which is /249). My guess is the same as yours, but I guess somewhere something's doing
- Just guessing without really digging into what the problem is… Once the bot finds a match it keeps adding for a few pages (in case it's hit a plate or similar), and it only looks for a match at the start of a page. If it's glomming two pages together then it most likely matched the start of the first page, started adding the text, failed to find a match on the second page, and kept on adding that text to the page it was on. It only stopped when it found a match for the third page.If it was subsequently off for the remaining pages it was presumably some other issue; either a bug in the bot or user error (it looks like there were manually added ___MATCH__ tags in play here?). --Xover (talk) 21:16, 26 November 2020 (UTC)
Page:Ancient Accounts of India and China.djvu/7 looked like a fascinating book except every page seems to have weird formatting with sideboxes which seem beyond my ability...I only remember/process/understand about half the template you show me, and generally just those that are {{word}} at that. Is there an easy {{quotebox}} type way to handle them? Also, unrelated to this work, but do you really put entire works in Blackletter and not offer an easier translation? Ouch. Peace.salam.shalom (talk) 03:58, 29 November 2020 (UTC)
- @Peace.salam.shalom: sorry for the delay.
- Re. sidenotes - these are a constant thorn in the side. We have template like {{left sidenote}}, but they're all a bit unsatisfactory in terms of experience, for editors, and readers on the web and (especially) ebooks. Some work will hopefully be done to improve it as the PageNumbers and Layouts gadget gets work done, but there may be limits to that.
- Re. blackletter, we don't replicate the entire work in blackletter if the original was. We can replicate it when it's a word or two for emphasis or decoration, usually on title pages, and mostly because it looks nice. Which work did you have in mind that's all in blackletter? Inductiveload—talk/contribs 09:02, 3 December 2020 (UTC)
- Egad, it took forever to do a single page (the ʃ doesn't help), but I think Page:Ancient Accounts of India and China.djvu/44 uses the sidenotes properly? Might be wise to create a redirect if redirected templates work just called LSidenote and RSidenote to save people a little bit of time - and the begins/ends templates are insane as well...and I'm guessing you're going to tell me there's no magic tool that I can Page Grid View, find all the pages that have sidenotes and click it to insert {{sidebar begin}} at the top of each page and {{sidebar end}} at the bottom of each page, right? But the genie still recognizes two wishes remaining? Peace.salam.shalom (talk) 22:56, 3 December 2020 (UTC)
- ...also, Index:The booke of thenseygnementes and techynge that the Knyght of the Towre made to his doughters - 1902.pdf - is it okay that I used the pagelist to just make notes to myself for proper proofreading? The "X. Curteously" on Pg40 there is just that it's the tenth illuminated dropinitial/chapterheading with a keyword, to make it easier to place/find other "chapters". Or should it not be used for that? I assume "Index" is not an "official" page? Peace.salam.shalom (talk) 17:51, 5 December 2020 (UTC)
- @Peace.salam.shalom: Re sidenotes. They are a right royal pain, I did warn you :-p. You can add things to the header and footer fields on the index page to insert them automatically for all new Page pages and delete them when the age doesn't need them.
- I imagine it's fine to "adulterate" the pagelist, though probably best to remove the hints later because they'll overlap the page content in mainspace if they're too long. Inductiveload—talk/contribs 22:58, 5 December 2020 (UTC)
I don't think I'll get through the work myself without serious help, but I just wanted to show off that I think I got The_Book_of_the_Knight_of_the_Tower#HOW_GOOD_WYMMEN_OUGHT_TO_MAYNTENE_THEMSELF_CURTOYSLY. specifically working so that the left side is transmigrating a PAGE pages, and the right side is my freehand translation as best as able. I don't mind doing the translations, it's the proofreading of Ye Olde language PAGEs and the weird <code> stuff that takes me so darned long. Peace.salam.shalom (talk) 04:22, 6 December 2020 (UTC)
- More work today just trying to figure out how the heck to display it, not even getting to do work proofreading/translating...very frustrating. In the meantime, do you know how to make the "Contents" template of The Book of the Knight of the Tower maybe make two or three columns instead of one? Because it's insane (it's at 50 now, but there are 150 total) - but there is no natural desire/way to group them into subpages since the author didn't, and each one shouldn't be its own subpage where there are 150 and each one is basically just one paragraph. Peace.salam.shalom (talk) 04:27, 8 December 2020 (UTC) And since you know all the accents, is there a symbol for the ye/the that the author uses in Page:The booke of thenseygnementes and techynge that the Knyght of the Towre made to his doughters - 1902.pdf/11 that I have currently marked as a {{sic}}? The author uses w:yogh inconsistently through the work, occasional long-s, etc...but this is the first time I've come across a "ye" (meaning the, not you - that is the original meaning of Ye Olde) as a single character while proofreading on Wikipedia. AND The_Book_of_the_Knight_of_the_Tower#How_a_woman_sprange_vpon_the_table - how to adjust for the fact its transmigration over to the work leads to uneven distribution of columnns? User:ta(e)Ta[e] or whatever said it was likely because the system only compared the length of the first paragraphs, but I assume there's a manual "split it evenly" code? Peace.salam.shalom (talk) 19:42, 11 December 2020 (UTC)
- For ye, I use {{ruby|y|e}}, which produces y; an individual template designed for this may be better. TE(æ)A,ea. (talk) 20:38, 11 December 2020 (UTC).
- More work today just trying to figure out how the heck to display it, not even getting to do work proofreading/translating...very frustrating. In the meantime, do you know how to make the "Contents" template of The Book of the Knight of the Tower maybe make two or three columns instead of one? Because it's insane (it's at 50 now, but there are 150 total) - but there is no natural desire/way to group them into subpages since the author didn't, and each one shouldn't be its own subpage where there are 150 and each one is basically just one paragraph. Peace.salam.shalom (talk) 04:27, 8 December 2020 (UTC) And since you know all the accents, is there a symbol for the ye/the that the author uses in Page:The booke of thenseygnementes and techynge that the Knyght of the Towre made to his doughters - 1902.pdf/11 that I have currently marked as a {{sic}}? The author uses w:yogh inconsistently through the work, occasional long-s, etc...but this is the first time I've come across a "ye" (meaning the, not you - that is the original meaning of Ye Olde) as a single character while proofreading on Wikipedia. AND The_Book_of_the_Knight_of_the_Tower#How_a_woman_sprange_vpon_the_table - how to adjust for the fact its transmigration over to the work leads to uneven distribution of columnns? User:ta(e)Ta[e] or whatever said it was likely because the system only compared the length of the first paragraphs, but I assume there's a manual "split it evenly" code? Peace.salam.shalom (talk) 19:42, 11 December 2020 (UTC)
Wizardry
Hello again, long time no speak! Hope you are keeping well? I was eyeing up some of your javascript stuff with interest, but I am a complete novice. I was most interested in your running header one, because I feel lazy sometimes when proofreading (mostly on this). The instructions say 'to use this script, add the following to your .js...', but my question is... where is my .js? Do I have to create it? Help a noob out, please! AndrewOfWyntoun (talk) 20:47, 11 December 2020 (UTC)
- @AndrewOfWyntoun: Your user JS is located at User:AndrewOfWyntoun/common.js. "Simply" add this to that page (creating it in the process) and it should work:
importScript('User:Inductiveload/Running header.js');
- One day it will become a gadget but for now it's still magic! Inductiveload—talk/contribs 22:28, 12 December 2020 (UTC)
- Thank you kindly, as always, your help much appreciated! Let's hope I did that right... AndrewOfWyntoun (talk) 10:30, 14 December 2020 (UTC)
- @AndrewOfWyntoun:
Let's hope I did that right...
If it's working, you did it right :-p Inductiveload—talk/contribs 10:44, 14 December 2020 (UTC)
- @AndrewOfWyntoun:
- I maybe haven't done it right... I added as per the instructions, and have cleared the cache on my browser (firefox). Unless I'm misunderstanding the use of the tool? My thinking is that on this page the header box should be automatically filled in? It's just showing as blank for me... I've probably managed to do something weird ¯\_(ツ)_/¯ AndrewOfWyntoun (talk) 11:05, 14 December 2020 (UTC)
- You have to click
Running header
in the left sidebar to trigger it. - A future improvement will be to auto-trigger on page creation (but not edit of an existing page, even if the header is empty as that's often valid, like on a title or blank page). Inductiveload—talk/contribs 11:19, 14 December 2020 (UTC)
- Fancy! I pay so little attention to the left sidebar (this is probably not a good idea) that I did not notice its appearance in the TemplateScript part. And indeed, it does work! Thank you again AndrewOfWyntoun (talk) 12:50, 14 December 2020 (UTC)
- You have to click
- I maybe haven't done it right... I added as per the instructions, and have cleared the cache on my browser (firefox). Unless I'm misunderstanding the use of the tool? My thinking is that on this page the header box should be automatically filled in? It's just showing as blank for me... I've probably managed to do something weird ¯\_(ツ)_/¯ AndrewOfWyntoun (talk) 11:05, 14 December 2020 (UTC)
Gizmos and gadgets
Low priority: but whenever time permits, could you set up a Gadget, clearly labelled "DO NOT USE: FOR DEVELOPMENT ONLY", that only pulls in User:Xover/Gadget-PageNumbers.js and User:Xover/Gadget-PageNumbers.css? It's becoming clear that the difference in execution environment for a Gadget and a user script (or site script for that matter) are highly relevant for the layout/pagenumbers script, and that the softly softly approach is getting bit by these. I want to test out just mercilessly refactoring it and see whether that solves anything or just breaks in different ways. No rush, my own available time is unpredictable for the foreseeable future so it may take a while before I get dug in. --Xover (talk) 09:03, 15 December 2020 (UTC)
- Done (there was already a sandbox gadget for exactly this purpose. I have pointed it at your JS + CSS. Inductiveload—talk/contribs 12:49, 15 December 2020 (UTC)
The Fauna of British India, including Ceylon and Burma (Moths Vol 5)
This scan if you wanted to upload I think - https://archive.org/details/moths05hamp
Thanks for checking back on old 'can't host' notices :) 18:58, 16 December 2020 (UTC)
- Thanks - I was about to convert a no-DJVU version so that saved me time! Is that the complete set now? :-D Inductiveload—talk/contribs 19:21, 16 December 2020 (UTC)
Invisible page break
Hello. I have noticed that you tried to solve a problem of the act’s title at the bottom of an exported page by adding {{Invisible page break}}. However, now when I export the work into pdf, the result is that the previous act is finished with only one line on its last page, the rest of the page being absolutely empty, and the new act begins on a new page, which is not what the authors of the edition intended. Of course it is not a big issue, but would it be possible not to force dividing the acts on separate pages under all circumstances and bind the title of the act with the following text instead? That is the way how e. g. Microsoft Word solves such problems. --Jan Kameníček (talk) 00:30, 21 December 2020 (UTC)
- @Jan.Kamenicek: I have tried to solve this with {{no break after}} which has the behaviour we really want (no break after, or between the title and the DHR). It works in EPUB in Koreader and the Kobo native reader, but not in Calibre desktop reader, PDF export, MoonReader+ on Android or the Firefox EpubReader extension. I think the problem probably is that anything that uses a "browser-like" HTML engine doesn't work because they don't support
avoid
for page-break styles. - So, I'm not quite sure what's the best option. Inductiveload—talk/contribs 08:27, 21 December 2020 (UTC)
- Now I see. Maybe we should not give up better solutions to technically low-level browsers, but I am not sure either… --Jan Kameníček (talk) 16:17, 21 December 2020 (UTC)
"ready to export"
I downloaded a couple of mine. I have a 5M restriction on my email, so I was happy that the 2 tomes together added up to less than that and could be mailed to my device together. And they looked great! Books with images, however...
Many months ago I was watching a book here being uploaded at commons. The artist was following the commons guidelines as I remember them making the non-photographic images as png. Additionally, removing the background and making them transparent png. My ereader is 8G I think and even if it is 16G that book is probably not going to fit in the mail and so much space for what?
I don't know how the gutenberg post processor does it, but if you set a "cover" attribute in the head, it will display that image in the reader. That would be nice here. The first image was displayed in my reader -- so maybe that is already happening?
Also, pipedreams, I have some without the pipe even: a limited SVG font to include for blackletter and other whimsical text. I need one that has only 6 scalable characters, for instance.
The reason I started here with all this is your excellent and long time uploads at commons.
Summary:
- Guidelines for images to be used for export, posted here and at the commons.
- A cover hack for mobi and epub
- Limited character set SVG fonts
- (first mention) can I put books into the export cat?
@Inductiveload: Thanks for your time.--RaboKarbakian (talk) 21:48, 22 December 2020 (UTC)
- @RaboKarbakian: Welcome to the export party! :-D
- Re. image sizes, the most critical thing is not to use {{FI}} which uses the largest image possible and can be 100x (seriously) the size of a scaled image. PNGs can be slightly bigger, but rarely that big after rescaling.
- Not sure about covers, but there's an open request for that: phab:T270423. It's not as simple as the first image because there's a cover page that the exporter adds.
- Font support: I have created phab:T270743 to ask for it to be picked up automatically (if possible). I doubt reducing the font glyphs to just what is used would be practical, but it would be nice if we could only pull in fonts when they are actually used.
- Of course you can. Have a check of Help:Preparing for export to see what you might want to look out for. Currently, I am avoiding marking works with a few issues: dotted tables (really just a minor gripe), braces or formulae. You can use {{export to check}} to ask for a check.
- Good luck and feel free to ask anything here (as always!) Inductiveload—talk/contribs 22:31, 22 December 2020 (UTC)
- Transparency makes a huge difference in file size. It is like cubing (^3) a non-transparent image.
- And you were right about dotted indices, but it failed gracefully:
-
dotted index
-
Kindle 5
-
Kindle 4
- Screenshot doesn't function here. I blame FB and a shot I got of Viggo in a gucci bandanna, but have no proof of this so a long apology about photographic quality.
- I think SVG fonts would be so pretty....--RaboKarbakian (talk) 01:21, 23 December 2020 (UTC)
- @RaboKarbakian: huh, interesting about the image. What device is that? None of the software I've used managed that, it just shows the scribe illustration if anything.
- The issue isn't transparency, per se. That might add about 25% (RGB to RBGA). The problem is that PNG is lossless and can't crush the image like JPG. Which book are you concerned about the size of? Inductive l.p.load—talk/contribs 01:41, 23 December 2020 (UTC)
- Kindle Paperwhite ( " Papaer weight " ) version 5.13.2.--RaboKarbakian (talk) 01:45, 23 December 2020 (UTC)
- Kindle 4.1.4
- The Autocrat of the Breakfast-Table (Holmes, 1858) is the one I watched being uploaded.--RaboKarbakian (talk) 01:58, 23 December 2020 (UTC)
- File:Autocrat of the Breakfast-Table pg 39.svg <--look at that! Puttinb me out of business....--RaboKarbakian (talk) 02:04, 23 December 2020 (UTC)
- @RaboKarbakian: Using SVG is somewhat questionable for an illustration like that, but because SVG compresses very well, it is small in the EPUB. The whole file is under 1.3MB. Not slim by ebook standards but not obese either.
- The first image is bigger, looks like a MB or so, and it won't zip down. It's sad because JPG isn't really ideal for engravings, due high frequency image content and the desire for transparency (not very important on eink devices as they have white backgrounds anyway). But huge files are bad too. Inductiveload—talk/contribs 02:34, 23 December 2020 (UTC)
- The svg is kbs. The rendered png is Ms. Had it been indexed and then converted back to rgb, a seeable size loss would happen, with mostly no detail (or color) loss for these line drawings and engravings. JPG is simply easier for all this. ereaders displaying SVG -- they are just dedicated browsers, so probably the future? And I am not so out of business because those lines need to be dug out of aged paper....
- Another matter, indented paragraphs. They work well in the pictured ereader
and they just don't work, nicely, in my older model.I like them. They make reading easier.--RaboKarbakian (talk) 15:32, 23 December 2020 (UTC)
- Another matter, indented paragraphs. They work well in the pictured ereader
- @RaboKarbakian: The title-page PNG at 400px width is actually 321kB, which is large for a 400px greyscale image, but not obscene. The same PNG converted to JPG at 95 quality is 157kB and 127kB at 90 quality. However, there's probably not much benefit to these images being PNG in the ebook, so perhaps the exporter could transcode them on the fly, for roughly 50% size reduction. Substantial, but not mind-blowing.
- Indented paragraphs are a bit of an issue. I can set them on my device (which runs Koreader) by selecting a CSS master style (found in "epub.css"). However, setting them blindly can cause some unexpected results like indenting things in tables and lists and headings. Lots of things at Wikisource are semantically-ill-formed and use P-tags when they should not. Which means that lots of things get an indent that you wouldn't expect. Inductiveload—talk/contribs 17:49, 23 December 2020 (UTC)
- Grayscale, as a mode (like indexed or rgb) makes larger file sizes than rgb. I only ever use one software, so maybe that is an issue there, but beware! I stick to my claim that png rgba is a cube of rgb (colorspaces are as complicated as the eye cones that see them). It was explained to me by one of the software authors. Also, did you know that two layers of transparency one 25% and the other 75% do not total 100% opacity when joined? Please don't ask me to explain it 10+ years after I learned this, but it is true.... Happy whatever to you and your ironic chum there!--RaboKarbakian (talk) 04:33, 24 December 2020 (UTC)
- Assuming we're still talking about PNG only, and we assume 8 bits per pixel per channel (true for most images here), then GREY = 8 bits per pixel, GREYA = 16 bpp, RGB = 24 bpp and RGBA = 32 bpp. PNG compression may fuzz the numbers a little, but in general, adding a channel at most doubles the image size (from GREY→GREYA: 8 to 16), and it's only a 33% increase for RBG->RGBA. Cubing an image would be enormous: 100kB cubed is 1 petabyte, or roughly 4 times more than all media data at Commons combined! Indexed probably won't save much even for greyscale because most images here have pixels of nearly every value, so you're nearly needing a byte for the index anyway.
- The transparency thing is because alpha is blended multiplicatively. Imagine looking through two panes of tinted glass of 50% transmittance: the overall transmittance is 0.5 * 0.5 = 0.25 (25%).
- Merry Everything (or Everything Eve) to you too! :-) Inductiveload—talk/contribs 08:24, 24 December 2020 (UTC)
- Okay, but the pictures of the colorspace are of a cube. And it is not like adding a color (r,b,g, and a) more multiplicative (ra, ga, ba) wherein the a starts to look like a scalar and as such, adds a dimension to what jpeg considers to be simple colors. So, is it a multiple or is it a cross product? I don't know any more, but I do know that I only have 1M left on the oldest device....--RaboKarbakian (talk) 14:04, 24 December 2020 (UTC)
- The colour-space of an RGBA image is indeed a 4-D hypercube. However, to index such a space requires only 4 numbers, in the same way that a 2D space is the "square" of a linear space, but only requires 2 numbers (x and y) to index into it. In general, an n-space requires n numbers to reference an point within it. Thus you need 4 * pixels * sample size (usually 8, sometimes 16 bits) for an uncompressed RGBA image.
- We probably should worry about image sizes, but the primary sinner is {{FI}}, which produces monstrosities like the 89 MB (!) EPUB for The Evolution of Worlds, not RGB vs RGBA. Inductiveload—talk/contribs 15:33, 24 December 2020 (UTC)
- 39.94M for the mobi.--RaboKarbakian (talk) 16:35, 24 December 2020 (UTC)
- Grayscale, as a mode (like indexed or rgb) makes larger file sizes than rgb. I only ever use one software, so maybe that is an issue there, but beware! I stick to my claim that png rgba is a cube of rgb (colorspaces are as complicated as the eye cones that see them). It was explained to me by one of the software authors. Also, did you know that two layers of transparency one 25% and the other 75% do not total 100% opacity when joined? Please don't ask me to explain it 10+ years after I learned this, but it is true.... Happy whatever to you and your ironic chum there!--RaboKarbakian (talk) 04:33, 24 December 2020 (UTC)
Irony
The irony when the code in one's signature trips one's filter for code.
Do you want to exclude some namespaces, or target some namespaces, or are you comfortable having it log? — billinghurst sDrewth 23:45, 23 December 2020 (UTC)
- Not any more, I noticed that was about to happen :-D. Maybe one day I'll bot the old usages out, but they're all in Project/Talk NS's so...meh. Inductiveload—talk/contribs 07:38, 24 December 2020 (UTC)
- Not worth the electronic cycles. Content namespaces only IMNSHO. — billinghurst sDrewth 09:22, 24 December 2020 (UTC)
- Interestingly,
<big>
is deprecated, but<small>
is not...go figure. And now watch this edit set off the filter! :-D Inductiveload—talk/contribs 09:24, 24 December 2020 (UTC)
- Interestingly,
- Not worth the electronic cycles. Content namespaces only IMNSHO. — billinghurst sDrewth 09:22, 24 December 2020 (UTC)
Bigger
Are there plans to replace <big> with {{larger}} globally with a bot, or should I look through my edits, and do them one at at a time? --Richard Arthur Norton (1958- ) (talk) 18:58, 24 December 2020 (UTC)
- @Richard Arthur Norton (1958- ): Don't worry about any existing pages: it's not a critical issue and they can easily be fixed by bot if needed. --Xover (talk) 22:29, 24 December 2020 (UTC)