Wikisource:Scriptorium/Archives/2021-11

Please do not post any new comments on this page.

This is a discussion archive first created in November 2021, although the comments contained were likely posted before and after this date.

See current discussion or the archives index.

October Monthly Challenge

Latest comment: 3 years ago1 comment1 person in discussion

The Monthly Challenge has had another excellent showing, with 3672 pages proofread, validated, or marked "no text", smashing the target of 2000, and claiming a new monthly record by a margin of over 500 pages! This represents 12% of all the ~30000 pages processed at enWS in October.

Works completed include:

Index:The Chaldean Account of Genesis (1876).djvu (a version of the Epic of Gilgamesh)
Index:King Alfred's Old English version of St. Augustine's Soliloquies - Hargrove - 1902.djvu: as far as I know, our first complete, scan-backed Old English work
Index:The Book of the Duke of True Lovers - 1908.djvu: as far as I know, our first complete for by a medieval woman: Christine de Pizan
Index:Shirley (1849 Volume 1).djvu by Charlotte Brontë
A Gallery of Children by A. A. Milne (also validated)
Index:Ernest Hemingway - In Our Time (1925).pdf by Ernest Hemingway
Index:Letters from a farmer in Pennsylvania - Dickinson - 1768.djvu (also validated)
Index:Traditions of Palestine (microform) (IA traditionsofpale00martrich).pdf
Index:A father's legacy to his daughters - Gregory - 1808.djvu (also validated)
The Red-Headed League
A Case of Identity

Validated works:

Index:Oliver Twist (1838) vol. 3.djvu, giving us a complete set of validated volumes for this edition
Index:Left-Wing Communism.djvu by Vladimir Lenin

Only one work expired without being completed:

Index:The World's Most Famous Court Trial - 1925.djvu: a transcript of the w:Scopes Monkey Trial

The November Monthly Challenge is in full swing and includes continuing works from last month, as well as:

The Posthumous Papers of the Pickwick Club.djvu by Charles Dickens
Index:The common reader.djvu by Virginia Woolf
Index:Middlemarch (Second Edition).djvu by George Eliot
Index:Tess of the D'Urbervilles (1891 Volume 1).pdf by Thomas Hardy
Index:The collected works of Henrik Ibsen (Heinemann Volume 1).pdf
Index:The origin of continents and oceans - Wegener, tr. Skerl - 1924.djvu (which introduced the idea of continental plates)

...as well as many more. This month is testing the hypothesis that larger Monthly Challenges result in greater participation due to more works of interest being available even at the end of the month. If you agree with this idea, there's only one way to show it: make it come true!

There's also a new idea to add a few un-transcluded works to the MC to drive down the proofread-but-not-transcluded backlog.

Thanks especially to Languageseeker for the leg-work in setting this month's data up and shepherding October's challenge to record heights!

Come on in, the weather is getting cold for some of us, but the Monthly Challenge water is nice and warm. Inductiveload—talk/contribs 15:47, 1 November 2021 (UTC)

software adding line breaking hyphens in Page namespace

Latest comment: 3 years ago1 comment1 person in discussion

Reality check please: somebody has introduced line breaking hyphens into the justified text? If the intent was yet another way to keep people busy—such as the blind man in a dark room looking for the black cat … that isn't there!—it is ingenious. Cygnis insignis (talk) 14:39, 1 November 2021 (UTC)

Tech News: 2021-44

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

There is a limit on the amount of emails a user can send each day. This limit is now global instead of per-wiki. This change is to prevent abuse. [1]

Changes later this week

The new version of MediaWiki will be on test wikis and MediaWiki.org from 2 November. It will be on non-Wikipedia wikis and some Wikipedias from 3 November. It will be on all wikis from 4 November (calendar).

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

20:28, 1 November 2021 (UTC)

Translation redirects

Latest comment: 3 years ago9 comments4 people in discussion

I notice that {{translation redirect}} has been deleted. What's the current consensus on linking mainspace to Translation space? Is it an exception to WS:CSD M3: Cross-namespace redirects - if so can we update the deletion policy accordingly? Or do we prefer to leave the work title redlinked in mainspace? —Beleg Tâl (talk) 13:26, 1 November 2021 (UTC)

IMO, cross-namespace redirects from Main→Translation, as well as Author→Portal when the person has been "portallifed" and Main→Portal for periodicals and similar without any mainspace content (yet) should all be allowed.

Cross-namespace redirects only increase the ability of a reader (who may well not even know about portals, translation namespaces) to find the text. Even as a Wikisource editor, if I see a redlink, I will assume there is no content for that work/author, and I will continue on my way, blissfully unaware of a translation or portal page. Inductiveload—talk/contribs 14:02, 1 November 2021 (UTC)

Gospel? Cygnis insignis (talk) 14:25, 1 November 2021 (UTC)

Agreed, this seems to be the wisest course of action, but is that the consensus that was established when the soft redirects were removed? —Beleg Tâl (talk) 15:26, 1 November 2021 (UTC)

We don't need them. The search functionality treats Main, Author, Portal, and Translation as content namespaces so they all show up in searches. Why would you see a redlink? Any redlinks should be fixed if someone is doing the job properly. These links should only exist for limited periods per {{dated soft redirect}}. — billinghurst sDrewth 10:19, 2 November 2021 (UTC)

I still do not understand the need to delete, say Printing Times and Lithographer instead of simply redirecting to Portal:Printing Times and Lithographer pending mainspace content. When the mainspace page Printing Times and Lithographer is eventually made, many incoming links to Portal:Printing Times and Lithographer should be changed back. What's the point? It seems to me to be make-work, and designed specifically to trip people up who want to link the text "Printing Times and Lithographer" in some other work (example: this page), but do not expect there to be a portal and just settle for the red link. Inductiveload—talk/contribs 12:10, 2 November 2021 (UTC)

@Billinghurst: The High Mountains is a red link. If I have a work and I want to link it to "The High Mountains", I will just put The High Mountains and assume that no one has added this work yet. If I upload a work called "The High Mountains", I will put it at The High Mountains and not bother to disambiguate. Why would I check to see if Translation:The High Mountains exists? Why would anyone? (Inductiveload's point on periodical portals is essentially the same, but the caveats about portal-space pages not being "Works" does not apply when we are talking about WS:T) —Beleg Tâl (talk) 13:34, 2 November 2021 (UTC)

What is the purpose of main namespace? Can we stick to it? What is the purpose of a redlink ... CONTENT! People will come here via external search, via internal search, or via links from other wikis. I hardly doubt that a significant portion of our traffic arrives by someone typing the title name component into a url. When we have a redlink in a work, it is meant to be pointing to the published work, not our listing of various scans in the portal namespace. While I appreciate volunteers translations of works in the Translation: ns, we should not be creating direct links from published works in main ns to Translation ns. That was never part of the reasoning for setting up that namespace for those user focused, and unchecked. Think through what that explicitly saying about a foreign language work and what you are saying.

Every time someone thinks that it is a finding aid and you flick people all over the site you are diluting main ns's purpose. By your logics, we should create redirects for every author so you find those. We would need to create redirects for each version of the name of a publication. That is just a maintenance nightmare and a linkrot waiting to happen. If you don't like redlinks to a journal name here, then create the content. If you think that we have an issue with how our landing pages are working then create a better solution of what it could look like beside disambiguation pages, version pages, translation pages, not go the lazy option of cross namespace redirects. Please take a step back and maintain the quality and integrity of the system. How many times have we had situations of main ns works coming in to the match the name of works in the Translation ns and causing conflict? Have we captured and resolved those? Is there a means that we can have report those and resolve those? Please don't give hypothetical edge cases where we have a system that looks at such additions and resolves, or where we can fix those by alternate processes through reports. — billinghurst sDrewth 22:10, 2 November 2021 (UTC)

If that's the consensus that replaced the use of {{translation redirect}}, then that's fine with me - Translation space is a wild west anyway —Beleg Tâl (talk) 23:22, 2 November 2021 (UTC)

Exhibition of the Impressionists

Latest comment: 3 years ago2 comments2 people in discussion

This seems to be a translation from French, but there is no translator information. Huhu9001 (talk) 04:36, 2 November 2021 (UTC)

This looks to be from 1946 by Rewald Google Books , which may be a copyright violation (Renewal: R570616) MarkLSteadman (talk) 04:55, 2 November 2021 (UTC)

Meet the new Movement Charter Drafting Committee members

Latest comment: 3 years ago7 comments3 people in discussion

The Movement Charter Drafting Committee election and selection processes are complete.

The election results have been published. 1018 participants voted to elect seven members to the committee: Richard Knipel (Pharos), Anne Clin (Risker), Alice Wiegand (Lyzzy), Michał Buczyński (Aegis Maelstrom), Richard (Nosebagbear), Ravan J Al-Taie (Ravan), Ciell (Ciell).
The affiliate process has selected six members: Anass Sedrati (Anass Sedrati), Érica Azzellini (EricaAzzellini), Jamie Li-Yun Lin (Li-Yun Lin), Georges Fodouop (Geugeor), Manavpreet Kaur (Manavpreet Kaur), Pepe Flores (Padaguan).
The Wikimedia Foundation has appointed two members: Runa Bhattacharjee (Runab WMF), Jorge Vargas (JVargas (WMF)).

The committee will convene soon to start its work. The committee can appoint up to three more members to bridge diversity and expertise gaps.

If you are interested in engaging with Movement Charter drafting process, follow the updates on Meta and join the Telegram group.

With thanks from the Movement Strategy and Governance team. --Civvi (WMF) (talk) 15:13, 1 November 2021 (UTC)

@Civvi (WMF) Why is the WMF using a closed, commercial, app-based platform like Telegram (which requires my phone number and don't allow a dedicated account) for this? Inductiveload—talk/contribs 11:58, 2 November 2021 (UTC)

because the other media channels have adult supervision, unlike wiki-talk. they are merely acknowledging where the conversation is. this is a perennial subject, but the community does not acknowledge its own role to driving conversation elsewhere. --Slowking4 亞 Farmbrough's revenge 16:17, 2 November 2021 (UTC)

@Inductiveload: Hi, thanks for asking and sorry for my bad english. Yes, Slowking is true, the discussion is perennial and AFAIK taking place in a lot of different places. I can just add that every language community (but sometimes even single projects) has different preferences so I guess that the choice was to stay were most persons are, and this seems to be telegram. Every alternative has pros and cons. (Personally I would love to go back to IRC...) --Civvi (WMF) (talk) 17:29, 2 November 2021 (UTC)

@Civvi (WMF) would it not make sense for the WMF to undertake to "open" these disparate channels by formally bridging them - especially in the context of a channel being advertised in a message from a WMF staffer? Please note, this isn't a aimed at you personally, it's just that the (WMF) suffix brings with it some level of organisational approval, so I'm talking to the "(WMF)", not the "Civvi" part of the username. A Wikimedia Matrix homeserver springs to mind (used by, e.g. projects like KDE and Mozilla delegate to EMS, or it can be self-hosted) as a flexible set-up, but "just" IRC would work too. Inductiveload—talk/contribs 08:37, 3 November 2021 (UTC)

@User:Inductiveload Thanks :-) Indeed the WMF suffix unfortunately is not linked to "WMF-Omniscience" (does this word exist?) I don't have answers to this specific and quite technical question but I am happy to try to find them, perhaps @User:Xeno (WMF) might know more about this and can help. --Civvi (WMF) (talk) 14:36, 3 November 2021 (UTC)

i kind sympathize for WMF in that they tried meta:Flow and Liquid Threads, and were roundly criticized, but then started and abandoned Wikimedia Space. we are in thrash mode, and need some sustained messaging leadership. until then we will just patch together ad hoc channels; and the open failure to deliver newbie friendly comms will lead people to try everything. --Slowking4 亞 Farmbrough's revenge 22:10, 3 November 2021 (UTC)

Unstrip size limit exceeded (5,000,000)

Latest comment: 3 years ago9 comments4 people in discussion

I'm getting this error on A Dictionarie of the French and English Tongues. Does anyone know what it means? Languageseeker (talk) 01:03, 3 November 2021 (UTC)

@Languageseeker Basically, you asked the server to process more text than it's configured to (5MB). You should split the work up into smaller chunks. Inductiveload—talk/contribs 08:15, 3 November 2021 (UTC)

That's true, but there's something more going on too. If it was simply too big the error would just reference the "Post-expand include size" (which is the 5MB limit). In this case, something is causing unstripping of strip markers to exceed that limit after transclusion. It may be a simple potato vs. potato thing, but it could also mean that there's a suboptimally behaving template, or module, or extension tag, or invocation of either in there somewhere. For example, if large chunks of text are stuffed into a template param, and especially if that text itself contains templates, you might get this effect. To quote T189416: unstrip-size-warning is shown when the maximum expansion size for nested parser extension tags is exceeded. "Unstrip" refers to the internal function of the parser, called 'unstrip', which recursively puts the output of parser functions in the place of the parser function call. If that's the case it might be worthwhile tracking it down and fixing it (or at least understand it) to avoid problems down the road. Xover (talk) 08:54, 3 November 2021 (UTC)

@Xover I'm pretty sure that this is caused by the massive use of <poem> (or maybe even <pages>, which is, in total being fed megabytes of text here (I'm not sure if all the poems share the same StripState). It's a "fun" edge case, but it looks to me like just another way that you can fall off the edge of the world when transcluding excessively large amounts of text. Inductiveload—talk/contribs 14:59, 3 November 2021 (UTC)

@Languageseeker: Who thought that it was a good idea to transclude 900+ pages of not proofread pages to a single page at enWS? How is that considered a reasonably presentation of a work? Do we truly have to even ask that question? — billinghurst sDrewth 11:49, 3 November 2021 (UTC)

Transcluding pages 1 to 100 gives me

NewPP limit report
Parsed by mw1371
Cached time: 20211103115326
Cache expiry: 1814400
Reduced expiry: false
Complications: [vary‐page‐id]
CPU time usage: 1.190 seconds
Real time usage: 1.625 seconds
Preprocessor visited node count: 4865/1000000
Post‐expand include size: 145366/2097152 bytes
Template argument size: 68010/2097152 bytes
Highest expansion depth: 15/40
Expensive parser function count: 0/500
Unstrip recursion depth: 2/20
Unstrip post‐expand size: 847175/5000000 bytes
Lua time usage: 0.084/10.000 seconds
Lua memory usage: 1524608/52428800 bytes
Number of Wikibase entities loaded: 0/400

— billinghurst sDrewth 11:54, 3 November 2021 (UTC)

I agree, it was a bad idea. But, I tried to first transclude in my user ns to see which pages needed the most help. However, in user NS, there are no page numbers. The text has been fully proofread and mostly formatted by Distributed Proofreaders. So, I was hoping to look through, see what pages needed help, and then split up the work. It seems that I created a royal mess instead. My apologies to everyone. Languageseeker (talk) 14:56, 3 November 2021 (UTC)

@Languageseeker: Important context that would have been valuable in the beginning. Are you aware that if you look at the page source of any wiki work that it generates a mw:NewPP parser report. Always good reading to help look at pages that are anything outside of vanilla production values. YOu will also see following that a "Transclusion expansion time report". Also to reference mw:Strip marker which explains that MediaWiki software adds elements that look and act like XML tags. — billinghurst sDrewth 00:47, 4 November 2021 (UTC)

@Languageseeker: page numbers should now be enabled in an Wikisource or User namespace page with "Sandbox" in the title. Inductiveload—talk/contribs 11:27, 4 November 2021 (UTC)

Academic journal articles with a "Digital object identifier"

Latest comment: 3 years ago2 comments1 person in discussion

What is the status for journal articles with a "Digital object identifier" (DOI) the index page does not appear to support this field? --2db (talk) 16:29, 5 November 2021 (UTC)

@2db it does now ^_^. In the longer term, this should come from Wikidata. 17:04, 5 November 2021 (UTC) Inductiveload—talk/contribs 17:04, 5 November 2021 (UTC)

M&S (Phe-Bot) is Stuck

Latest comment: 3 years ago8 comments3 people in discussion

It seems that the poor match-and-split bot got stuck. Could someone give it a nudge? Languageseeker (talk) 06:23, 3 November 2021 (UTC)

@Languageseeker (CC MarkLSteadman): There was an unplanned network outage on WMCS and Toolforge yesterday, so multiple tools hosted there may need a kick to come back alive (Toolforge tools run on a job grid with NFS-mounted home directories; when the network goes, everything breaks and often can't recover without being restarted). In any case, the match & split tool has had the requisite Russian Space Station Fix™ applied and should now be back in working order. You and Mark were the only ones with jobs queued, but those jobs will need to be resubmitted. Xover (talk) 08:35, 3 November 2021 (UTC)

Yep, it took out the Discord-Matrix/IRC bridge too (though that did eventually recover itself and the Kubernete pod appeared magically). Inductiveload—talk/contribs 09:17, 3 November 2021 (UTC)

It seems as if the bot is running, but not actually creating pages. Languageseeker (talk) 14:48, 3 November 2021 (UTC)

@Languageseeker: Yeah, it looks like after the restart code changes that happened since the previous restart has broken it. I'm looking at it, but it might be a while (it's a big, complex, and completely undocumented codebase that's mainly written in Python 2 and in the process of being updated to Python 3 piecemeal). Xover (talk) 18:55, 3 November 2021 (UTC)

Ayup. The bitrot has finally caught up with Phetools. Match & Split and several other bits of it are currently either non-functional or will be unstable, and it's going to take major surgery to fix it. In other words, you should plan to be without M&S for a bit. The flip side is that this "surgery" was badly needed anyway, and the prognosis is good. It'll be a bit of a pain short term, but it might end up getting us a more maintainable tool once done. Or, you know, I might mess it up completely and you'll all hate me forever. One of those two, probably. :) Xover (talk) 22:15, 3 November 2021 (UTC)

Such is life. In worse case, Phetools shall lie down with the iron and the lamp in silicon heaven. Languageseeker (talk) 02:53, 4 November 2021 (UTC)

@Languageseeker (CC Beleg Tâl and MarkLSteadman): Ok, I think I have it back up and running at least to test. With the type of changes that were needed the most likely type of bug to occur is various forms of breakage related to non-ascii characters (including accents, greek, fancy quote marks, etc.) and primarily in either the source wikipage name or the target PDF/DjVu filename. Non-ascii characters in page content may also be affected, but these are more likely to be subtle or trivial. Xover (talk) 11:28, 6 November 2021 (UTC)

Tech News: 2021-45

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

Mobile IP editors are now able to receive warning notices indicating they have a talk page message on the mobile website (similar to the orange banners available on desktop). These notices will be displayed on every page outside of the main namespace and every time the user attempts to edit. The notice on desktop now has a slightly different colour. [2][3]

Changes later this week

Wikidata will be read-only for a few minutes on 11 November. This will happen around 06:00 UTC. This is for database maintenance. [4]
There is no new MediaWiki version this week.

Future changes

In the future, unregistered editors will be given an identity that is not their IP address. This is for legal reasons. A new user right will let editors who need to know the IPs of unregistered accounts to fight vandalism, spam, and harassment, see the IP. You can read the suggestions for how that identity could work and discuss on the talk page.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

20:36, 8 November 2021 (UTC)

PAGE:subpages' parent

Latest comment: 3 years ago4 comments2 people in discussion

This page does not exist:

Page:The production of the Gospel of Mark – An essay on intertextuality.pdf

But does have multiple subpages, for example:

Page:The production of the Gospel of Mark – An essay on intertextuality.pdf/12

Is there ever any need to create the parent page of PAGE:subpages?

How do you automatically watchlist all the subpages, or does each subpage have to be manually added to a watchlist? --2db (talk) 16:28, 9 November 2021 (UTC)

@2db There is no need to create the parent page (sometimes it can be done when moving pages, but that's an admin-only trick for if you don't use a bot).

There's currently no way to watch all pages in an index in a one-click way, but I am literally currently working on adding that to the server (phab:T289466) and it will then get some kind of UI. 16:51, 9 November 2021 (UTC) Inductiveload—talk/contribs 16:51, 9 November 2021 (UTC)

@Inductiveload I suggest making each empty/nonexistant parent in the PAGE namespace into a "police log" that displays a watchlist style report of all the subpages. 2db (talk) 17:03, 9 November 2021 (UTC)

@2db The underlying task will allow many things like that over the API, but on-wiki, you can already see that particular information via "Related changes" of an index page: https://en.wikisource.org/wiki/Special:RecentChangesLinked/Index:Tarzan_and_the_Golden_Lion_-_McClurg1923.pdf Inductiveload—talk/contribs 17:11, 9 November 2021 (UTC)

A new way to transclude multi-page scores

Latest comment: 3 years ago1 comment1 person in discussion

I've written an initial attempt (in the form of Module:Mscorewithpipes) to transclude a multi-page score without some of the restrictions previously imposed by Template:Tscore. Namely, it is possible to use pipes and unseparated brackets (and perhaps templates; haven't checked this myself) within the <score> tag on a page. On the other hand, the module expects certain aspects of the overall score structure (such as the ordering of staves and layout preferences) to be marked with specific comments, and at the moment expects notation to be presented in separate contexts after specifying the staff order (these contexts also marked with specific comments). My hope is that the net changes needed compared to the previous attempt at score transclusion lead to a less intrusive experience proofreading scores.

An example of this module's use is at Asleep in the Deep (1898). Comments and criticisms welcome. @CalendulaAsteraceae: as someone who might be interested in this. Mahir256 (talk) 00:15, 11 November 2021 (UTC)

Captains Courageous source txt origin

Latest comment: 3 years ago8 comments5 people in discussion

Where did the source txt for Captains Courageous come from? --2db (talk) 04:34, 7 November 2021 (UTC)

2db: Obvious, it’s not stated, but the text is most likely from Project Gutenberg, which is available here. TE(æ)A,ea. (talk) 04:39, 7 November 2021 (UTC)

Should this be in the category ready to split and match? --2db (talk) 04:45, 7 November 2021 (UTC)

see previous followup question 2db (talk) 04:49, 7 November 2021 (UTC)

2db: No. The process for match-and-split is described here. It requires a scan to be in existence. To request a scan, you may place a request at the Scan Lab. TE(æ)A,ea. (talk) 04:51, 7 November 2021 (UTC)

Also, because the source edition for this copy is unknown, it is not appropriate for match-and-split and a new scan should be proofread through the normal process. Beeswaxcandle (talk) 05:09, 7 November 2021 (UTC)

you have three scans to choose from c:File:Kipling - Captains courageous, 1899.djvu; c:File:"Captains courageous" (IA cu31924013493246).pdf; c:File:"Captains courageous", a story of the Grand Banks (IA captainscourageo00kipl).pdf. --Slowking4 亞 Farmbrough's revenge 18:21, 7 November 2021 (UTC)

Please do not match-and-split. The correct text to proofread is Index:Captains Courageous (1897 London).djvu. Languageseeker (talk) 06:18, 11 November 2021 (UTC)

Index:Over the Sliprails - 1900.djvu

Latest comment: 3 years ago1 comment1 person in discussion

An easy proofread for whoever wants one. —Beleg Tâl (talk) 19:49, 11 November 2021 (UTC)

Question about TOC

Latest comment: 3 years ago8 comments3 people in discussion

I working on transcluding The Wealth of Nations, but the TOC is a style that I do not recognize. Can somebody take a look at Page:The wealth of nations, volume 1.djvu/15? Languageseeker (talk) 02:56, 11 November 2021 (UTC)

Languageseeker: Is there any problem with the listing? Otherwise, what is wrong? TE(æ)A,ea. (talk) 03:08, 11 November 2021 (UTC)

I'm wondering about if this will export cleanly and whether or not automatic headers will work with this style of TOC. Languageseeker (talk) 03:11, 11 November 2021 (UTC)

Languageseeker: Yes, this will export cleanly; it was designed with export in mind (and made quite recently, too). Automatic headers aren’t dependent on templates, so that won’t be a problem. TE(æ)A,ea. (talk) 03:15, 11 November 2021 (UTC)

TE(æ)A,ea. Hmm, automatic headers don't seem to work The Wealth of Nations/Introduction and plan of the work. Languageseeker (talk) 04:06, 11 November 2021 (UTC)

@TE(æ)A,ea. Actually {{TOCstyle}} is pretty questionable on export, because every line is a <li> element, which contains a whole separate <table>.

It looks like this in Koreader, for example: phab:F34741733 (the last line has gone wrong). This is one of the better exports for TOCstyle, some other models are a bit less successful: phab:F34568887. 07:58, 11 November 2021 (UTC) Inductiveload—talk/contribs 07:58, 11 November 2021 (UTC)

Languageseeker: The table of contents needs to be added to the “Table of contents” field in the Index: to work for automatic headers. The reason it’s not fully working is because the Wiki-links have not yet been fully added (the chapters don’t have any, for example). Once that is fixed, there won’t be a problem. TE(æ)A,ea. (talk) 13:17, 11 November 2021 (UTC)

Thanks! Languageseeker (talk) 15:34, 12 November 2021 (UTC)

Source digitized by Google

Latest comment: 3 years ago4 comments4 people in discussion

The following source has been digitized by Google:

Bruno Bauer (1852). Die theologische Erklärung der Evangelien. Hempel. OCLC 246014297.

Every page of the downloaded PDF is watermarked, is it OK to upload this PDF to Commons after stripping the top page even though the other pages are watermarked? --2db (talk) 14:43, 12 November 2021 (UTC)

@2db: You'd have to ask at Commons to get a definitive answer, but merely having the text "Digitized by Google" at the bottom of each page should not be a problem as there is nothing copyrightable in that text. The first page is different as that contains a lot more text that could theoretically be construed to be copyrightable, and the logo which definitely is. Xover (talk) 15:03, 12 November 2021 (UTC)

As a wordmark, the Google logo is held to be not copyrightable by Commons (it's still a trademark, obviously). Hence the various denizens of commons:Category:Google logos. I think the watermark is safely void of creative input, though the first pages should indeed be stripped. That said, commons:Book scans with Google Books cover sheets (to remove) has ~400 members and countless others floating about the place I should think. Inductiveload—talk/contribs 15:08, 12 November 2021 (UTC)

Three years ago, the google covers were being removed "because they are ugly". My suggestion was to place a |zz]] at the end of the category assignment, but that only puts the "ugly covers" last if there are images. It is weird to see the "ugly covers" reason being fluffed up with trademark/copyright concerns now. I mention it now because if this is just an evolution of a personal preference, it shouldn't have to cause consternation and excessive work for people who are more worried about the end product, ie, the transcription and transclusion of the "ugly scan". I mean, there is already quite a bit to care about....--RaboKarbakian (talk) 15:38, 12 November 2021 (UTC)

Doctor Dolittle's Post Office (1923)

Latest comment: 3 years ago15 comments7 people in discussion

Are we allowed to use the Gutenberg epubs to pdfs for Hugh Lofting's Dolittle books? I know there was talk of getting scans of Doctor Dolittle's Circus (1924) earlier in the year, but nothing seems to have come of it. I ask this because I found a file on Gutenberg that I could use for Doctor Dolittle's Post Office here: https://www.gutenberg.org/cache/epub/58947/pg58947-images.html I have not found a file for Doctor Dolittle's Circus on my own. I am a bit unhappy that all the talk of getting scans earlier this year amounted to nothing. SurprisedMewtwoFace (talk) 23:48, 7 November 2021 (UTC)

I don't think there's much point using a Gutenberg ebook for proofreading - if you're going to use a Gutenberg edition, then you may as well skip the proofreading process until an actual scan is found. By the way, there is a very bad scan of the book here if anyone wants to take the time to split and reassemble the page images. —Beleg Tâl (talk) 00:48, 8 November 2021 (UTC)

There's a scan at Hathi (https://babel.hathitrust.org/cgi/pt?id=uva.x002562566) but it needs splitting. Would that work for you? Inductiveload—talk/contribs 10:02, 8 November 2021 (UTC)

I think I'll upload the Hathi that you linked to, @Inductiveload:. Thanks for the link! SurprisedMewtwoFace (talk) 13:16, 8 November 2021 (UTC)

@SurprisedMewtwoFace Can you split the pages, or would you like me to do it? Inductiveload—talk/contribs 13:52, 8 November 2021 (UTC)

@Inductiveload I've uploaded it on Wikimedia Commons. Could you do the page splitting, please? Here is the link: https://commons.wikimedia.org/wiki/File:Doctor_Dolittle%27s_Post_Office_(1923).pdf I think you have a much better knowledge of how to page split than I do! Don't feel you have to rush, I'm working on finishing up the Agatha Christie novel for now. SurprisedMewtwoFace (talk) 14:34, 8 November 2021 (UTC)

@SurprisedMewtwoFace Here ya go: Index:Doctor Dolittle's Post Office - 1923 - Lofting.djvu Inductiveload—talk/contribs 15:07, 8 November 2021 (UTC)

@Inductiveload Thank you so much! This is a great help. This will be my next project after the Agatha Christie novel now. SurprisedMewtwoFace (talk) 15:13, 8 November 2021 (UTC)

Currently there are 2 more Gutenberg books by Agatha Christie marked for speedy deletion: File:MurderOnTheLinks.pdf and File:TheSecretofChimneys.pdf. I am unsure about them. I also do not like the idea of transcribing Gutenberg books, but I also failed to find these two anywhere else. --Jan Kameníček (talk) 16:04, 8 November 2021 (UTC)

@Jan.Kamenicek I was the one who uploaded those. I was looking for copies of them on Hathitrust and Internet Archive that would have been actual book scans, but I could not find any other thna the Gutenberg copies. I'm not sure what people will decide on them.

SurprisedMewtwoFace (talk) 20:07, 8 November 2021 (UTC)

I think if you have to use a PG edition, the PDF is not adding anything except complexity. The edition is born digital, so the text version is a good a source as any and more easily Wikified.

Still, my general opinion stands that copying PG works without scans is not very useful, since PG is very much still a thing, and Wikisource can do better than being a very limited and incomplete backup of PG. We can just link out to an existing PG work if we really cannot find a scan. Inductiveload—talk/contribs 20:28, 8 November 2021 (UTC)

yeah, we need to go find the first edition scans. and provenance seems not to be a priority at PG. it will be a long term quality improvement task. PG is a good placeholder until the scans get done and uploaded. --Slowking4 亞 Farmbrough's revenge 22:46, 8 November 2021 (UTC)

I have a physical copy of the Secret of Chimneys but (a) I won't be able to scan it until at least the first of the year and (b) it's an abridged version from slightly later.--Prosfilaes (talk) 01:11, 9 November 2021 (UTC)

We already host the PG copy of The Murder on the Links and File:Agatha Christie-The Murder on the Links.djvu. This last was kept as a result of a Copyvio discussion in 2019. We don't need any further copies. Beeswaxcandle (talk) 04:20, 9 November 2021 (UTC)

So I have deleted MurderOnTheLinks.pdf as redundant to File:Agatha Christie-The Murder on the Links.djvu. What about the File:TheSecretofChimneys.pdf which does not seem to have any other copy available? Delete as well or keep? I am really hesitant. --Jan Kameníček (talk) 08:37, 15 November 2021 (UTC)

Jungle Tales of Tarzan (1919) issue with chapter split

Latest comment: 3 years ago4 comments2 people in discussion

I noticed that some of the chapters of "Jungle Tales of Tarzan" (1919) that I have been working on have been replaced with the versions I have worked on. However, there seems to be a slight issue with the division between Chapters IX and X. Some of the pages from Chapter IX are in the Chapter X space. Chapter IX ends at https://en.wikisource.org/wiki/Page%3AJungle_Tales_of_Tarzan.djvu/245 The ending of Chapter X is correct and there do not appear to be any other issues with chapter splits. Thanks for all your help, and I'm glad we're using the updated version! SurprisedMewtwoFace (talk) 00:09, 15 November 2021 (UTC)

@SurprisedMewtwoFace: Thank you for working on this book. Sorry about that. Adjusted and fixed. In the future, you can fix errors by changing to to and from values in <pages index="Jungle Tales of Tarzan.djvu" (index name) from=246 (start of chapter) to=276 (end of chapter) header=1/>. Languageseeker (talk) 00:28, 15 November 2021 (UTC)

@Languageseeker Thanks so much for your help! It is much appreciated. I plan on doing some proofreading on "Tarzan and the Golden Lion", which is a monthly challenge book, next. SurprisedMewtwoFace (talk) 00:32, 15 November 2021 (UTC)

@SurprisedMewtwoFace: Looking forwards to it! Always great to have more people helping out in the Monthly Challenge. Languageseeker (talk) 03:33, 15 November 2021 (UTC)

Invisible change

Latest comment: 3 years ago6 comments3 people in discussion

Can anybody explain, what has been changed at [5] ?--Jan Kameníček (talk) 17:33, 15 November 2021 (UTC)

I think that's a change in how the data is stored internally in the database. Years ago, it was stored as actual complete chunk of Wikitext. It is now serialised differently and only appears to be Wikitext because the serve constructs the Wikitext live on demand (you can also have it served to you as JSON). There is a small amount of more technical detail here: mw:Extension:ProofreadPage/Index_data_API. 17:54, 15 November 2021 (UTC) Inductiveload—talk/contribs 17:54, 15 November 2021 (UTC)

@Inductiveload: Well, if it is only such a technical issue, why was the edit attributed to a novice user who had founded their account only two minutes before? --Jan Kameníček (talk) 18:01, 15 November 2021 (UTC)

@Jan.Kamenicek It's probably just that they saved the page. Normally that would be a null edit that doesn't actually result in a revision, but in some very old ProofreadPage content model pages, it can result in a revision actually being saved. 18:05, 15 November 2021 (UTC) Inductiveload—talk/contribs 18:05, 15 November 2021 (UTC)

@Inductiveload: Should we null-edit all index pages by a bot? Ankry (talk) 20:58, 15 November 2021 (UTC)

There's no need: this kind of update is specifically designed to be able to happen transparently on next save. If there was a technical need to migrate storage formats, it would be done directly with a database upgrade script, and should be done for all Wikisources. Inductiveload—talk/contribs 21:03, 15 November 2021 (UTC)

Tech News: 2021-46

Latest comment: 3 years ago3 comments3 people in discussion

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

Most large file uploads errors that had messages like "stashfailed" or "DBQueryError" have now been fixed. An incident report is available.

Problems

Sometimes, edits made on iOS using the visual editor save groups of numbers as telephone number links, because of a feature in the operating system. This problem is under investigation. [6]
There was a problem with search last week. Many search requests did not work for 2 hours because of a configuration error. [7]

Changes later this week

The new version of MediaWiki will be on test wikis and MediaWiki.org from 16 November. It will be on non-Wikipedia wikis and some Wikipedias from 17 November. It will be on all wikis from 18 November (calendar).

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

22:06, 15 November 2021 (UTC)

Relevant changes

There are two major changes of note for Wikisource. The first is mentioned above: large file uploads are hopefully fixed, thanks to the hard work of User:Legoktm and the eagle eyes of User:Xover.

The second one is one that's finally pushed though review after a few months' hiatus: OpenSeaDragon (OSD) image zooming in the Page namespace editor, thanks to the hard work of Yash9265 (via a Google Summer of Code, AFAIK no enWS account) and our mate User:Samwilson. This should be much more robust and functional for image zooming and also, due to the OSD plugin system, allows a lot of extra features.

For example, it will soon be possible to OCR only portions of images. The patch for this may not make the cut this week, but you can see a sneak preview at this PatchDemo (note: only PDF files work there). There are more cool features in the pipeline for OSD, like a marker line widget to keep your place, but regional OCR's probably the most requested feature.

The interface to the zoom controls is basically the same, but hopefully the finicky click-to-enter zoom mode will be a thing of the past (not that it was wasn't a brave attempt when written)! Welcome to the future! Inductiveload—talk/contribs 23:13, 15 November 2021 (UTC)

@Inductiveload, @Yash4357, @Sohom data: Yes, thank you all for your hard work! It's exciting to see this happening. — Sam Wilson 23:20, 15 November 2021 (UTC)

I'm always in awe of the technical users making these changes on the backend that enable or make efficient thousands and millions of edits later. Great work, comrades! —Justin (koavf)❤T☮C☺M☯ 23:26, 15 November 2021 (UTC)

Illustrations are not displayed

Latest comment: 3 years ago12 comments2 people in discussion

@Beleg Tâl: What's wrong with my photos?

@Виктор Пинчук: I put {{image missing}} where the images in the original publication are; if you (or anyone else) wants to upload a copy of those images and insert them into the text, that would be great. —Beleg Tâl (talk) 00:04, 5 November 2021 (UTC)

@Виктор Пинчук: I have added the missing images to Translation:God is One… in 200 Persons. Please note that Wikisource is not a platform for publishing new versions of these works with new images - we expect works on Wikisource to be faithful to the scans of the original publication. —Beleg Tâl (talk) 02:56, 9 November 2021 (UTC)^^^

@Beleg Tâl: What about this?

Multimedia content added to texts can greatly improve the quality and presentation. Such content includes not only published illustrations or photographs from or about the book itself which are out of copyright, but also original contributions of audio recordings, diagrams, or other content. https://en.wikisource.org/wiki/Wikisource:What_Wikisource_includes#Multimedia —Виктор Пинчук (talk) 06:12, 9 November 2021 (UTC)

@Виктор Пинчук: This policy also notes that "Multimedia contributions are subject to Wikisource:Image use guidelines", which indicates that "Inappropriate images are those which are not part of the original document, those indirectly related to the work, and should not be included on the page". Perhaps you could put extra images on the Talk page, or on your User page, and put a link to it in the work header? (Not in the work itself) —Beleg Tâl (talk) 01:52, 11 November 2021 (UTC)

@Beleg Tâl:The Wikisourse administrator is not a bot: he should not blindly follow the rules, but should consider each case individually. In this case, the author has not died yet. For example, in the Russian version of Wikisource, I illustrate my articles with author's videos. Of course, they were not in the paper version: video can't be printed in a newspaper. — Виктор Пинчук (talk) 06:14, 11 November 2021 (UTC)

@Beleg Tâl: Illustrations and videos not published in the newspaper refer to the author of the works, and not to the participant (user) of the project. The user is a temporary concept, the author of literary texts is forever. If required, I can request VTRS permission confirming that the illustrations belong to the author of the text posted on Wikisourse. — Виктор Пинчук (talk) 06:38, 11 November 2021 (UTC)

@Виктор Пинчук: If you want to have an edition of your articles that has extra images and videos, or which omits the "grunge text", there are other websites on which you can do that. On English Wikisource, however, we only welcome editions that are faithful to the original publication. If another publisher agrees to publish your articles with the new images, you can provide a scan of the new publication with the images included in the new publication. Otherwise, as I have suggested, you will need to put your alterations elsewhere, and we can link to them from Wikisource as appropriate. —Beleg Tâl (talk) 15:40, 11 November 2021 (UTC)

@Beleg Tâl: …you will need to put your alterations elsewhere, and we can link to them from Wikisource as appropriate. Can you show me what it looks like and how it's done? — Виктор Пинчук (talk) 14:58, 13 November 2021 (UTC)

@Виктор Пинчук: Something like this, perhaps —Beleg Tâl (talk) 02:39, 15 November 2021 (UTC)

@Beleg Tâl: (An expanded version of this article can be viewed on the author's [https://www.viktor-pinchuk.example.org/god-is-one-in-200-persons website) But this link doesn’t work, because I haven’t "author's website". May be use for this aim page "Discussion"? —Виктор Пинчук (talk) 05:59, 16 November 2021 (UTC)

P.S. For example, here: https://en.wikisource.org/w/index.php?title=Translation_talk:Notes_of_an_international_tramp&action=edit&redlink=1 —Виктор Пинчук (talk) 10:29, 16 November 2021 (UTC)

New image viewer (aka OpenSeadragon)

Latest comment: 3 years ago2 comments1 person in discussion

As mentioned in the last Tech News, there is a new image viewer, based on the very capable OpenSeadraon viewer, which was deployed yesterday (should have been Wednesday but there was some unrelated blockage). This has been in review for several months. Only the basic implementation has made it in so far, the following features are complete, but still to come, pending review and merge:

Rotation control (merged, but missed the cut-off: will land next week)
Marker lines (you can preview this here, along with rotation control): 737973
Region-based OCR to allow OCRing of a part of a page/column/etc (also in the above preview): phab:T294903
Image zoom and position persistence on reload so you don't need to re-zoom when previewing: 737788
Configuration of zoom speed and smoothness: 740133

Something else that it enables is future touchscreen and mobile work, since it natively supports pinch-to-zoom (though it is not yet enabled in the mobile skin).

The new viewer does not (yet) implement the old click-to-toggle between scroll and zoom. There is a ticket for this phab:T296079, but the main question there is: was the old behaviour actually optimal in the first place, or are there more fluid methods to do this?

The very old 2006 toolbar is not working with it: this has new been fixed, bug again is pending review and deployment: phab:T296033.

The Jump to file (aka hi-res loader) script and the page_carousel scripts have been updated, and the former even loads "IIIF" tiled zoomable images from the IA. I am not personally aware of any other gadgets or scripts that are likely to explode, but if there are any, let me know. Inductiveload—talk/contribs 14:01, 19 November 2021 (UTC)

Tracked in PhabricatorTask T293950

Update: Due to unrelated breakage (a memory leak) the entire wmf.9 release has been rolled back. OSD will be unavailable until that release is re-attempted, maybe on Monday. If that does not go ahead, the next opportunity for an OSD release will be wmf.11 in a little under 2 weeks' time.

In the meantime, a demo site with (some) unmerged features is here and Beta Wikisource always shows the latest merged patches. People interested in improving the OSD "experience" before then are welcome to give it a go there and reply here or on Phabricator with bugs, and suggestions. Inductiveload—talk/contribs 20:35, 19 November 2021 (UTC)

Getting completed Wikisource transcriptions into local library catalogues?

Latest comment: 3 years ago15 comments7 people in discussion

Hi all. User:Giantflightlessbirds, User:Eothan and I have been throwing around ideas about how to show completed Wikisource works in local library catalogues. We were wondering if any related work has been done in this space? We have also made an initial approach to OCLC who are engaging with how open content might be made available on Worldcat. We'd love to hear if anyone else is interested in pursuing this, or if the community already has ideas. --99of9 (talk) 22:57, 15 November 2021 (UTC)

@99of9: Do you have in mind printed books on dead trees (I hope bamboo) sitting in a brick-and-mortar library or do you mean some kind of digital access to Wikisource editions that online users of the library could find? If it's the former, PediaPress is tangential to this and if it's the latter, maybe Kiwix is kinda/sorta related. I don't know of any examples of actual libraries either hosting books made of Wikimedia movement editions, nor do I know of any libraries whose digital card catalogue also includes entries for accessing our works digitally. —Justin (koavf)❤T☮C☺M☯ 23:29, 15 November 2021 (UTC)

@Koavf: the latter. A regular library customer should search their catalogue, and see that WS items can be read/"borrowed" digitally. Perhaps only those selected by the librarians as locally-relevant. I'll look into Kiwix. --99of9 (talk) 23:33, 15 November 2021 (UTC)

If you're already doing outreach for a scheme like this, I would be very surprised if Internet Archive hadn't done something like this in the past three decades. —Justin (koavf)❤T☮C☺M☯ 23:37, 15 November 2021 (UTC)

Here's an example. We proofread Old Westland and uploaded it manually as an EPUB to Overdrive in my library, so it shows up for loan (7 copies on loan at the moment): [8]. We've also listed it in our online catalogue next to the physical book, with a link to Wikisource. This is a bit tedious to do one at a time, so any way a library could streamline the process, perhaps via WorldCat, would be nice. —Giantflightlessbirds (talk) 00:16, 16 November 2021 (UTC)

@Koavf: Yes, it appears that the Internet Archive may have achieved part of this already. This example on Worldcat links directly to the Internet Archive scanned version. It would be nice if it also linked to our transcribed version. --99of9 (talk) 00:29, 16 November 2021 (UTC)

@99of9, @Giantflightlessbirds, @Eothan: This is a topic that I’ve thought intensely about and that I care deeply about. My basic response would be that “yes, this is possible, but it would require libraries to hire paid staff for it to work.” Wikisource has numerous advantages and several drawbacks that I would enumerate as follows:

On a broad scale, Wikisource features two types of texts: scan-backed and non-scan-backed. Scan-backed texts have an Index in Proofreader Page extension where the digital text can be produced in direct comparison with a digital, photographic reproduction of the original text. These are the golden standard for digital text because it allows for an easy comparison between the original and digital versions of a work which can satisfy scholarly standards. Non-scan backed texts are drawn from various sources across the internet. Some of them are quite good, but most are either incomplete or have dubious textual quality. As a broad category, non-scan backed works must be considered as junk and no library will want to incorporate them into their catalog. Therefore, any library importer will need to separate scan-backed works from non-scan backed work.
Wikisource is committed to open-data access. Therefore, it’s possible to download the raw wikitext of a given Index. This will enable institutions to preserve a copy for themselves. Sites, such as Project Distributed Proofreaders or Literature Online (LION), either remove access from the original text or never provide it. By contrast, once a text is proofread on Wikisource, it will never need to be proofread again. Therefore, Wikisource is the best platform for long-term preservation.
Wikisource is free and does not impose access restrictions. This makes it much easier and cheaper to provide access to works. To be blunt, digital content providers are bleeding institutions dry. They have figured out how to transform the public domain into a goldmine. Across the globe, institutions are paying billions of dollars annually to gain access to texts that are in the public domain. Access to Wikisource is free.
Wikisource is modular which makes it easy to make rapid adjustments. A user can update any one part and all the rest of the pieces will be automatically updated. If a user wants to correct one scanno, the system will automatically retransclude the text and reexport the text.
Wikisource is open-source. This makes it easy to add new features or deploy it in another setting.

These are the positives. Here is the other side

Integration with libraries will require the development of code to transform wikidata into MARC records, automatically retrieve the data from Wikisource, and link it in local catalogs. This is something that libraries will need to pay for.
While Wikisource does offer the possibility of replacing expensive paid databases, volunteers have limitations. The catalog of Wikisource is quite small and it’s impossible to tell volunteers that they must work on something. There are vast chasms is the offering of Wikisource.
Wikisource needs more accessible features. Once again this will require development.
Librarians don’t necessarily make good Wikisource users. While the librarians of the National Library of Scotland have done an absolutely fantastic job of importing and mostly proofreading the Scottish Chapbooks, too many of them remain untranscluded or without images.

In the end, I believe that Wikisource has a lot to offer to libraries, but libraries also need to give back.

Provide development funding. This does not necessarily mean hiring a full-time developer, but for certain projects, it would make sense to provide certain a bounty.
Importing texts into libraries will require lots of grunt-work to fix metadata. This should be paid for.
Pay proofreaders. It’s impossible to expect volunteers to digitize the entire world library in any reasonable amount of time or to have the same interests as libraries. For some rough economics, it takes about 15-20 hours to proofread a novel of around 300 pages. Therefore, the cost of proofreading a book will be this amount of time multiplied by the hourly wage. To make any economic sense, the hourly rate would have to be quite low. This leaves two possibilities: a country where workers speak English, but have a low wage or undergraduates. I believe that hiring workers from a low-salary English-speaking country would just lead to a ton of badly, proofread texts that would wind up burdening the Wikisource community. Therefore, undergraduates stand-out as the best possible labor source. In the US, undergraduates earn between $10-15 an hour. Therefore, to digitize one book will cost between $150-$300 USD. Federal law prohibits most undergraduates from working more than 20 hours a week. Therefore, even one full time undergraduate can probably only digitize 30 books an academic year. This should give you some sense of scale. The only way for libraries to pull this off would be form a consortium. If 10 libraries joined a consortium, the price per-book would drop to $15 for a perpetual license to a work. With 100, it would decrease to $1.50 and so on.

I would love to see more input from libraries, but I wanted to lay out some of the benefits and challenges. It’s not going to be as easy as most would hope for, but I do believe that the rewards will be worth it. Wikisource is the only site that offers libraries a way to stop paying annual subscriptions while maintaining scholarly standards. It will save money, but this will be a transitional process that will cost money rather than a simple, free switchover. Languageseeker (talk) 03:08, 16 November 2021 (UTC)

Thanks User:Languageseeker for your well considered reply. It sounds like your eventual end goal is very ambitious. Perhaps we should focus on a modest starting point: getting our best complete Wikisource items integrated, trying to impose minimal additional work on anyone. Your comments about scan-backing and metadata completion are crucial to filtering out which items are "ready". We agree with your intuition that this metadata work is best housed on Wikidata. From a simple query it's nice to see over 8000 texts that have both an author and a copyright status in Wikidata. I can't yet see an easy way of filtering for those backed by scans. I'm sure a typical MARC record would want more metadata. Are the required fields/properties well established? --99of9 (talk) 04:55, 16 November 2021 (UTC)

I wanted to lay out the big picture because I think that you're asking for the hardest bit right now which is the construction of a bridge between Wikisource and a library catalog. This is going to take careful planning and a clear vision of what the ultimate goal is. As I see it, there will probably need to be some sort of software that can take Wikidata and automatically convert it to a MARC record. The question becomes what do you want to import into your library catalog? Do you want to have links to Wikisource or compilied epub/mobi/pdf? Should they have generic covers like enWS or original covers like frWS? How will the library catalog update the metadata when it changes on Wikidata? What about mitigating vandalism? Should there be a time delay? What changes will need to happen on enWS or Wikidata? Who will write and maintain this software? Who will work on the metadata? I fully support this idea and I would love for it to happen, but I don't think its right to ask Inductiveload or Xover to write a quite complex piece of software for libraries. Languageseeker (talk) 05:41, 16 November 2021 (UTC)

I really think that it would be useful to get some potential libraries to articulate what their needs are to have an understanding what is needed in terms of additional work on top of proofreading and transclusion (e.g. preparing for export onto devices, additional metadata needed, whether they want to merge back the proofread text as a text layer, etc.), areas which could use more people helping to drive them forward. Having outside organizations can be helpful here in providing more motivation and getting more volunteers who might be specially interested in doing that type of work. But as mentioned, there is in addition the area of the integration between the two systems (how to get the data over into their systems and keep it in sync, how to serve the actually ebooks if required). This basically cannot be done without libraries investing in it as likely to vary by library anyways. Projects like "make more works on WS exportable" or "do more with wikidata" are more generic. Separately note that archive has a catalog of books https://openlibrary.org/, there might be possibilities to have WS ebooks on there as well. MarkLSteadman (talk) 10:26, 17 November 2021 (UTC)

There are large amounts of projects that the few people developing software here can work on, but one way to get more prioritization on something like automated generation of MARC is getting people excited about using it, volunteers excited about going through and backfilling the huge number of works with the required metadata and people interested in helping to build and maintain the code. While it is easy to talk about doing cool things with works once proofread, it takes a large amount of additional work when there is a large backlog of existing stuff to do on the proofreading infrastructure side. MarkLSteadman (talk) 10:40, 17 November 2021 (UTC)

Open Library has been accepting wikidata ids for a while now and lately, I see that it has a spot for wikisource as well. Open Library and Internet Archive are related enough that the same login works for both.--RaboKarbakian (talk) 16:26, 17 November 2021 (UTC)

The problem is that Open Library does not offer a MARC export either? Also, not all library software is the same. This is going to be a fairly challenging thing to do. I wish that it wasn't, but I don't see a simple way. Languageseeker (talk) 19:51, 18 November 2021 (UTC)

It may be that we don't need to work with the entire Marc record as for most items in WikiSOurce there should already be a world-cat Marc record. We will find out if this is the case but we may only need to work on one, or a small subset, of fields. As you say they are already ingesting links to Open Library so some systems must already be in place. Eothan (talk) 21:42, 18 November 2021 (UTC)

Most of the Marc records are fairly poor. My honest suggestion would be to figure out how to do this manual first and document every step. How do you get from an Index to the transclusion to the Wikidata to the MARC record? What fields in Wikidata map to what in a MARC record? What needs to be there? What should the program log as errors? How do you handle complex cases where a single Index is transcluded to multiple areas? Document everything and ask for help for the small part. People are willing to help, but the question have to be specific. In the end, you'll asking to import something on the scale of over 20,000 items into local catalogs. So the value is there, but the system needs to be created and thought through. Also, overdrive is not the answer. Instead, the link to the wikisource should probably be recorded as 856 40 |y Online book |u . Languageseeker (talk) 01:45, 20 November 2021 (UTC)

Styling and semantics on requested works

Latest comment: 3 years ago15 comments5 people in discussion

Hey, this is kind of weird: for some strange reason, pages like Wikisource:Requested_texts/1924 and Wikisource:Requested_texts/1928 have bulleted lists of works that use proper semantics to show that they are lists but weirdly, Wikisource:Requested_texts/1923 doesn't. I propose that all of these pages use actual unordered lists because they are exactly that: lists. Does anyone else think that 1923 is different from every other year and should have a different format? Does anyone else think that listings are somehow better by breaking them up into dozens or hundreds of paragraphs rather than lists? If the community consensus is that these pages should be consistent and that lists should be lists rather than streams of paragraphs, I'll change the 1923 page. Thanks. —Justin (koavf)❤T☮C☺M☯ 21:56, 13 November 2021 (UTC)

The 1923 page is not different in formatting from the other pages. Some items on the pages are bulleted lists, some are not. Some items are separated from each other by blank lines, and some are not. The 1923 page is not different from the other pages in this respect; the same is true of lists on the other pages. Why does the 1923 page have to have a bullet in front of every single item at every level when the lists on the other pages do not? --EncycloPetey (talk) 22:04, 13 November 2021 (UTC)

I'm very interested to see which items are not part of bulleted lists on Wikisource:Requested texts/1924 or Wikisource:Requested texts/1928 or Wikisource:Requested texts/1929 or Wikisource:Requested texts/1930 or Wikisource:Requested texts/1931 or Wikisource:Requested texts/1926 or Wikisource:Requested texts/1945 or Wikisource:Requested texts/1927. It seems like we've used proper semantics on all of those pages to make bulleted lists of works to transcribe and then nested lists for things like individual volumes or specific authors of multiple works or commentary. Am I wrong? —Justin (koavf)❤T☮C☺M☯ 22:09, 13 November 2021 (UTC)

A casual look at those pages will reveal items without bullets and groups separated by blank lines. --EncycloPetey (talk) 22:12, 13 November 2021 (UTC)

Every single work on all of those pages is part of a bulleted, unordered list and all of the commentary is included as a nested list. Give me an example of a work that is not. It is only on the 1923 page that they are not. —Justin (koavf)❤T☮C☺M☯ 22:24, 13 November 2021 (UTC)

Old New York volumes on 1924; Great Gatsby editions on 1925; Yale Shakespeare volumes on 1926. . . --EncycloPetey (talk) 22:34, 13 November 2021 (UTC)

@EncycloPetey: Please actually look at those pages before you spread misinformation. Did you actually look at the Old New York volumes? They have "**: " at the beginning, making them unordered lists nested under an unordered list. As do the Great Gatsby editions in 1925. As do the Yale Shakespeare volumes in 1926. Thank you for proving my point for me. Now, can you actually show me any works in any of the other pages that are not listed in semantically meaningful unordered lists, which I have asked you several times now and you have given me examples of the exact opposite? Why do you think 1923 is different from every other year? —Justin (koavf)❤T☮C☺M☯ 22:56, 13 November 2021 (UTC)

So, you are fine with an absence of visible bullets? What you are arguing for is invisible formatting because. . . . ? Is there any reason for making the change in terms of Wikisource? Is this a desire to impose a specific format on a select group of working pages alone? --EncycloPetey (talk) 23:11, 13 November 2021 (UTC)

@EncycloPetey: You obviously did not look at this edit or the one prior to it or else you would not be asking this question. You have also argued that we should follow the existing practice and that the pages should be similarly formatted: that is literally what I am arguing. I never made an argument about style: I made one about semantics. Proper semantics help structure information in web pages (e.g.) for search engines or screen readers, etc. Why are pages enhanced by removing semantic elements? Why is 1923 different from all other years? Can you actually show me any works in any of the other pages that are not listed in semantically meaningful unordered lists, which I have asked you several times now and you have given me examples of the exact opposite? Did you actually even look at the edits I made or did you blindly revert? —Justin (koavf)❤T☮C☺M☯ 23:30, 13 November 2021 (UTC)

No, I have not made that argument. That is indeed the argument you are making, but I have not made that argument. These are working pages, back-end lists of things to be accomplished, and I see no reason why they should be forced to be consistent. And why would we be concerned with such working space pages in the Wikisource namespace appearing in search engines? But the key question I have been asking is "Is there any reason for making the change in terms of Wikisource?" and you have provided no answer except to say that it should be that way because it is that way, which is circular. Since this discussion is unproductive, I will wait to see what others say. --EncycloPetey (talk) 23:38, 13 November 2021 (UTC)

@EncycloPetey: Since, as you have shown, all the other pages are semantically correct and since there is on reason why 1923 is somehow different than other years, it follows that it should be semantically correct as well. Plus, as I've already explained, all web pages should have proper semantics, this is a web page, therefore, it should have proper semantics. As I have asked you and you have ignored: Why are pages enhanced by removing semantic elements? Why is 1923 different from all other years? Can you actually show me any works in any of the other pages that are not listed in semantically meaningful unordered lists, which I have asked you several times now and you have given me examples of the exact opposite? Did you actually even look at the edits I made or did you blindly revert? Note also that "e.g." refers to examples and that you skipped over screen readers. Please actually answer the questions that have been asked of you, as I have done for you. —Justin (koavf)❤T☮C☺M☯ 00:14, 14 November 2021 (UTC)

It seems to me that an unordered list (i.e. *) is a pretty natural representation for such a list (the other being a section per work, like the MC nominations, which is overkill here, since the sections don't need to be archived). I do not see why this deserved too be instantly reverted, to me I appears to be a perfectly fair AGF/BOLD edit.

On one hand, I don't find myself especially worried by the semantic correctness of such back-room pages (which is really my own laziness and chauvinism as someone who doesn't need a screen-reader, though my excuse is I don't have one and though I have tried to use Orca to see how Wikisource behaves in a screen-reader, I've never really gotten it to work). Also, if we actually made a concerted effort to make Wikisource more accessible (which we should be, but aren't really, doing) I wouldn't necessarily start there of all places. On the other hand, certainly there's nothing wrong in my book with such a change there. Inductiveload—talk/contribs 00:29, 14 November 2021 (UTC)

^^^ This. Who gives a toss, fix it or leave it. They are work pages in the Wikisource: ns. Is this really sucking up our time? — billinghurst sDrewth 01:06, 14 November 2021 (UTC)

Consistency is good, whether as policy, or convention, or unwritten and new. This type of it aids in legibility, and reduces decipherment. Presenting a list as a list in the HTML document I find desirable. And I do not find long bullet-pointed lists bad. —Genesis Bustamante (talk) 01:16, 14 November 2021 (UTC)

@EncycloPetey: Please unprotect and revert yourself per the above. —Justin (koavf)❤T☮C☺M☯ 06:52, 20 November 2021 (UTC)

Indexes created from transcriptions

Latest comment: 3 years ago8 comments4 people in discussion

Recently, I've seen a number of Indexes created that are not scans or print-outs of websites. Instead, they are transcriptions of scans that are converted into PDFs. For example, a pdf of a Project Gutenberg work. What is the policy on those? It seems to make so sense to have these as Indexes and it seriously damages the meaning of scan-backing. After all, if a scan can be a Project Gutenberg text, then what is the difference between scan-backing and non-scan-backing? Is there a specific policy on these? If not, should they be added to the sdelete criteria? Languageseeker (talk) 19:49, 18 November 2021 (UTC)

Scan backing with a PDF from PG is, IMO, just completely pointless, since it's not really any more useful that just the text dump, and introduces a lot more faffing about with pages that never actually existed. If someone must copy-and-paste, just do that. Such transcriptions are no more (or less) valuable than the copy-dump, they just take more effort for no gain. That said, I don't think they need to be specifically militated against other than to avoid people wasting their own time. I don't think it damages the meaning of scan backing in general, any more than painting a racing stripe on a Fiat Punto damages the meaning of a GT 350. People will just wonder what you're smoking. Inductiveload—talk/contribs 17:17, 19 November 2021 (UTC)

I don't think we should be confusing the terminology here--if it's backed by a "scan" that's actually a PDF from PG, it's not really scan-backed. IMO, a better definition of "scan" is all that's needed, and then the policy will naturally follow. — Dcsohl (talk)
(contribs) 21:53, 19 November 2021 (UTC)

i am not confused. project guttenberg, not scan backed, should not be deleted. --Slowking4 亞 Farmbrough's revenge 03:21, 20 November 2021 (UTC)

@Inductiveload: I think it's more of a case of someone ordering a GT 350 and finding out that it has a Ford Pinto engine. As Dcsohl said, I always felt that scan backing should mean that the transcription is based on a scan of the original text, not that the transcription is based on a transcription. If scan backing can be a transcription of a transcription, then what is the difference between transcluded and non-transcluded texts? Languageseeker (talk) 11:56, 20 November 2021 (UTC)

Whether the "scan" (loosely used) is "valid" is not our problem. We don't make any claim that a work is automatically "better" because it's "scan backed". It really depends on what it's scan backed by. I don't like these "scans", and I think they're completely pointless and wrong-headed, but there is no basis to forbid them specifically.

I don't personally think we are facing an issue here that needs solving. At worst, there may be a few new users who think that's somehow helpful, but the solution to that is engagement and advice, not pre-emptively forbidding things and deletions. If we focus on scan backing our existing PG works, as you have been doing and the problem will solve itself. PG imports as a process pre-date ProofreadPage: treat it like any other long-term cleanup task.

In general, spamming piecemeal reactive rules is not good for a cohesive policy platform. No "PG scans" is too specific. A better proposal, IMO, would be amending WS:WWI with "no new PG, or similar second-hand, texts of any sort", even if "scan" backed by a PDF export of the text. I'd support that. Inductiveload—talk/contribs 12:11, 20 November 2021 (UTC)

i agree, we need a PG fyi, PG migration process, and newbie onboarding. we do not need more policy, deletion, and block threats. we will not get more quality by increasing the scrap rate, rather we will need a quality circle. Slowking4 亞 Farmbrough's revenge 01:43, 21 November 2021 (UTC)

I think that a policy is appropriate because it makes it seem less like an unwritten law that is imposed on new users rather than actual policy. I'm always believe that education and a gentle approach is the right way. This formal policy is to make it seem less arbitrary. In many ways, this is a codification of a common law principle. Languageseeker (talk) 03:31, 21 November 2021 (UTC)

Tech News: 2021-47

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Changes later this week

There is no new MediaWiki version this week.
The template dialog in VisualEditor and in the new wikitext mode Beta feature will be heavily improved on a few wikis. Your feedback is welcome.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

20:02, 22 November 2021 (UTC)

Amending WS:WWI

Latest comment: 3 years ago8 comments4 people in discussion

As suggested by Inductiveload in a thread below, I propose amending WS:WWI to create a new section "Defining what is not included" to state "Wikisource no longer accepts any new texts from Project Gutenberg, or similar second-hand transcriptions, of any sort", even if "scan" backed by a DJVU, PDF, or any other format accepted by Proofreader Page extension created from that text. Languageseeker (talk) 00:25, 21 November 2021 (UTC)

I would say "no longer accepts any new" rather than "does not include any new". --EncycloPetey (talk) 00:37, 21 November 2021 (UTC)

Good idea. Updated. Languageseeker (talk) 02:02, 21 November 2021 (UTC)

Support --EncycloPetey (talk) 19:29, 23 November 2021 (UTC)

I generally support, but I wonder if there should be an exception for when there is genuinely no extant scan at all and no possibility of getting our greasy paws on one. WWI follows w:WP:IAR in that it can always be overridden on a case-by-case basis with some level of community agreement. Maybe we should call out that possibility explicitly here (as opposed to a tricky-to-write-clearly policy carve-out)? Inductiveload—talk/contribs 09:56, 22 November 2021 (UTC)

Why? Cygnis insignis (talk) 12:06, 22 November 2021 (UTC)

@Inductiveload: There already is "Some works which may seem to fail the criteria outlined above may still be included if consensus is reached. This is especially true of works of high importance or historical value, and where the work is not far off from being hostable. Such consensus will be based on discussion at the Scriptorium and at Proposed deletions." in the policy. This would take care of your case. Languageseeker (talk) 12:28, 22 November 2021 (UTC)

@Languageseeker Fair enough.

Support. Inductiveload—talk/contribs 12:34, 22 November 2021 (UTC)

Celebrating 18 years of Wikisource

Latest comment: 3 years ago3 comments3 people in discussion

Hello Wikisource enthusiasts and friends of Wikisource,

I hope you are doing alright! I would like to invite you to celebrate 18 years of Wikisource.

The first birthday party is being organized on 24 November 2021 from 1:30 - 3:00 PM UTC (check your local time) where the incoming CEO of the WMF, Maryana Iskendar, will be joining us. Feel free to drop me a message on my talk page, telegram (@satdeep) or via email (sgillwikimedia.org) to add your email address to the calendar invite.

Maryana is hoping to learn more about the Wikisource community and the project at this event and it would be really nice if you can share your answers to the following questions:

What motivates you to contribute to Wikisource?
What makes the Wikisource community special?
What are the major challenges facing the movement going forward?
What are your questions to Maryana?

You can share your responses during the live event but in case the date and the time doesn't work for you, you can share your responses on the event page on Wikisource or in case you would like to remain anonymous, you can share your responses directly with me.

Also, feel free to reach out to me in case you would like to give a short presentation about your and your community's work at the beginning of the session.

We are running a poll to find the best date and time to organize the second birthday party on the weekend right after 24th November. Please share your availability on the following link by next Friday:

https://framadate.org/zHOi5pZvhgDy6SXn

Looking forward to seeing you all soon!

Sent by MediaWiki message delivery (talk) 09:10, 12 November 2021 (UTC)

Note for everyone: there's a live stream here: https://stream.meet.google.com/u/1/stream/60c89368-65a7-4f27-8934-2f053e62f7f1. Even if you are not in the Google Meet, you should be able to listen in there. Inductiveload—talk/contribs 13:24, 24 November 2021 (UTC)

Nope, doesn't work. Languageseeker (talk) 13:48, 24 November 2021 (UTC)

on behalf of User:SGill (WMF)

Monthly challenge displaying error message

Latest comment: 3 years ago3 comments2 people in discussion

Tracked in PhabricatorTask T296092

The Monthly Challenge for this month is displaying an error message ("The time allocated for running scripts has expired") where the texts should be. Why is this, and how can it be fixed? DraconicDark (talk) 16:19, 19 November 2021 (UTC)

There are too many indexes (actually, pages) in the MC and there's a performance issue. A fix is being worked on. Inductiveload—talk/contribs 16:58, 19 November 2021 (UTC)

Done The fix for this has been deployed this week and performance on those pages seems much better. Inductiveload—talk/contribs 12:32, 24 November 2021 (UTC)

uploading scans to IA -- are you experienced?

Latest comment: 3 years ago5 comments3 people in discussion

I followed the instructions, and yet there is no ocr there. Just my uploaded tar file.

Is anyone experienced with IA and knows the magic word or the right place to thump it?--RaboKarbakian (talk) 22:23, 23 November 2021 (UTC)

How long has it been since you uploaded the file to IA? It can take time for the other file types to be generated. --EncycloPetey (talk) 23:12, 23 November 2021 (UTC)

The marc.xml says:23-Nov-2021 15:02 I pushed their "Derive button" this morning because they seemed to be just sitting there and what I could glean from the "management" all of the files were in the purple bunch and that meant everything was done and fine. I am afraid of annoying them by pushing it a second time.--RaboKarbakian (talk) 01:16, 24 November 2021 (UTC)

Given that we are close to a US Holiday, and that the process can take several days anyway, I would be patient and check back next week. --EncycloPetey (talk) 01:20, 24 November 2021 (UTC)

I think it's supposed to be a zip file, not a tar. A link to the item would be helpful too. Inductiveload—talk/contribs 07:54, 24 November 2021 (UTC)

AWB Approval

Latest comment: 3 years ago2 comments2 people in discussion

Is there a separate process for AWB approval for enWS or is it the same as enWP? Languageseeker (talk) 13:45, 25 November 2021 (UTC)

We haven't put restrictions on its use, and more interested in the function and intensity of its use, and this is related to fact that with works and the Page: namespace that we are different, and it hasn't been abused. Now noting that our designation of bots is a little different, and noting that AWB can be semi-automated. If you are looking to run AWB as a bot, see WS:Bots. If you are running as a user, then it is about consideration of the RecentChanges, and then considering whether it should be run as a bot. For instance, I run it as an automated bot through my bot account, and semi-automated through this account where I have pattern/formulaic changes that I eyeball prior to saving, or where I need admin rights to process some of what I am doing. I would also note that if you are going write regexes for changes, that can undertaken — billinghurst sDrewth 09:29, 26 November 2021 (UTC)

Talk to the Community Tech: The future of the Community Wishlist Survey

Latest comment: 3 years ago1 comment1 person in discussion

Hello!

We, the team working on the Community Wishlist Survey, would like to invite you to an online meeting with us. It will take place on 30 November (Tuesday), 17:00 UTC on Zoom, and will last an hour. Click here to join.

Agenda

Changes to the Community Wishlist Survey 2022. Help us decide.
Become a Community Wishlist Survey Ambassador. Help us spread the word about the CWS in your community.
Questions and answers

Format

The meeting will not be recorded or streamed. Notes without attribution will be taken and published on Meta-Wiki. The presentation (all points in the agenda except for the questions and answers) will be given in English.

We can answer questions asked in English, French, Polish, Spanish, German, and Italian. If you would like to ask questions in advance, add them on the Community Wishlist Survey talk page or send to sgrabarczuk@wikimedia.org.

Natalia Rodriguez (the Community Tech manager) will be hosting this meeting.

Invitation link

Join online
Meeting ID: 82035401393
Dial by your location

We hope to see you! SGrabarczuk (WMF) (talk) 20:03, 26 November 2021 (UTC)

Seeking opinion: Moves in Index: and Page: namespaces

Latest comment: 3 years ago2 comments2 people in discussion

Hi to all. At the moment, users/wikisourcers are able to move pages in the Index: and Page: namespaces (Special:ListGroupRights => Users) however it leaves redirects which require tidying up by an administrator. We have been tracking this with Special:AbuseFilter/36 ([9]). It seems to me that this half-pregnant approach is not particularly working. We either need to allow users to move without redirects in that namespace (if that is even possible to have it set to be namespace specific moves), or we stop the ability to move from these namespaces for general users.

Interested to hear users thoughts. — billinghurst sDrewth 01:10, 28 November 2021 (UTC)

My mid-term plan is to introduce a back-end user right to move without redirects specifically in the Page/Index NS and then allow wikis to individually assign this as part of user groups (eg autoconfirmed or a dedicated group) as/if they wish on a wiki-by-wiki basis: phab:T293200. Inductiveload—talk/contribs 10:41, 28 November 2021 (UTC)

Unprotect The Time Machine (Heinemann text)

Latest comment: 3 years ago14 comments3 people in discussion

I'm trying to transclude over the existing text, but the Chapters seems to be under sysop level protection since 2006. Could this restriction be removed? Languageseeker (talk) 14:14, 28 November 2021 (UTC)

Done. At the same time I think the text should be defeatured (which imo applies to all non-scan backed texts) and renominated after all the work is done. --Jan Kameníček (talk) 16:48, 28 November 2021 (UTC)

@Jan Kameníček Thanks. I think some of the chapters are still under protection. See, The Time Machine (Heinemann text)/Chapter III, The Time Machine (Heinemann text)/Chapter V, The Time Machine (Heinemann text)/Chapter IX.Languageseeker (talk) 17:01, 28 November 2021 (UTC)

Done --Jan Kameníček (talk) 17:40, 28 November 2021 (UTC)

@Jan Kameníček The Time Machine (Heinemann text)/Chapter IX still looks protected. Languageseeker (talk) 17:42, 28 November 2021 (UTC)

So now it should be finally OK :-) --Jan Kameníček (talk) 17:46, 28 November 2021 (UTC)

It is! Thank you! :) Languageseeker (talk) 17:49, 28 November 2021 (UTC)

@Jan Kameníček One more request. Could you delete The Time Machine (Heinemann text)/Epilogue as it is not in the original book. (There's a different epilogue that already transcluded) Languageseeker (talk) 18:11, 28 November 2021 (UTC)

Done. It seems to be taken from some much later edition. The text should have never been featured in this state. --Jan Kameníček (talk) 18:38, 28 November 2021 (UTC)

When it was Featured, it was not the Heinemann text, but a later edition of The Time Machine. It became "the Heinemann text" as a result of this move in 2010. --EncycloPetey (talk) 19:26, 28 November 2021 (UTC)

I note that the Table of Contents from this new Heinemann text points to chapters in the Holt text. That can't be right. --EncycloPetey (talk) 16:57, 28 November 2021 (UTC)

Fixed my tab confusion. Languageseeker (talk) 17:01, 28 November 2021 (UTC)

It seems that the work was featured despite the fact that it was incomplete as some chapters were omitted… --Jan Kameníček (talk) 17:49, 28 November 2021 (UTC)

In 2006, when the Featured Text process was in its first year. Also see above. When it was featured it was not "the Heinemann text"; it should never have been declared so, since it did not include the final chapters from the Heinemann text. --EncycloPetey (talk) 19:24, 28 November 2021 (UTC)

Adventures List...

Latest comment: 3 years ago4 comments2 people in discussion

I've reinstated by Adventures list, with a view to the remaining unvalidated (or non scan backed volumes) being added to the Monthly Challange at some point.

There are currently 20 or so works which are not scan backed on Wikisource. ( Only 15 or so of these have no located scans.)

Would other contributors please assist in matching the remaining works to suitable scans? Thanks. ShakespeareFan00 (talk) 18:24, 28 November 2021 (UTC)

@ShakespeareFan00 The list looks amazing. I'm deeply grateful to you for making it. Speaking from the perspective of the MC, I have a few tips/requests. First, only the original publication or a printing of the electrotyped/stereotyped text should be listed. I don't want to run into the situation where I ask users to proofread some derivative edition. Second, no match-and-splits. Many of these texts are extremely popular and have more editions than one can count, match-and-splitting usually creates an unmitigated mess. Three, for translated works, it would be nice to pick a specific translation.

In general, I'm going to add it to the nominations. Jules Vernes already failed, so his works will probably be added to the bottom. Is there any particular order that you would like the texts to be featured in? Languageseeker (talk) 00:20, 29 November 2021 (UTC)

Thanks for considering these as nominations for the MC.

Not currently, Somewhere I have the publication order of the UK partwork, which inspired the list, and I may consider if certain works haven't been featured by the time I find it, use the ordering of that list.. I added a few additional suggestions to the list of volumes the original partwork had, (and had to remove a few for copyright reasons.)

For translated works, a specifc translation is indeed a good idea. Also in respect of multiple editions, I would suggest concentrating on first (and if we already have those "popular" editions), or those with illustrations by regarded artists, (Such as editions with illustrations by Rackham to give one example.) ShakespeareFan00 (talk) 09:47, 29 November 2021 (UTC)

I'm open to suggestions on what could be added to it. (I already add The Four Feathers for example.)

For example, Kim is another possibility for inclusion. It depends on what you count as the "adventure" genre and what is considered a work of literary or cultural interest.

ShakespeareFan00 (talk) 09:47, 29 November 2021 (UTC)

Category:Bulgarian authors

Latest comment: 3 years ago3 comments3 people in discussion

Hi, I'm wondering why there are so many French authors in the Category:Bulgarian authors. Is it an issue with Wikidata? --M-le-mot-dit (talk) 15:01, 29 November 2021 (UTC)

I'm not sure either. But I don't think it's a Wikidata issue, since authors like Author:Eugène Aubry-Vitet are categorized as Bulgarian and not French, despite not having a single mention of Bulgaria on his Wikidata page. I haven't seen any Bulgarian authors in that category, so it might be a problem with the category? DoublePendulumAttractor (talk) 15:35, 29 November 2021 (UTC)

It was a typo in the auto-categorisation for authors. It's fixed now, thanks for the report. Inductiveload—talk/contribs 15:54, 29 November 2021 (UTC)

Adding Images to Index:Baum - The Wonderful Wizard of Oz.djvu

Latest comment: 3 years ago4 comments3 people in discussion

The Index for The Wizard of Oz has been proofread, but needs the illustration added from a high-resolution LOC scan. Would anyone like to take this request? Languageseeker (talk) 18:41, 28 November 2021 (UTC)

Why LOC scan? Any problem with the jp2 folder of the source site (1)? Moreover, images are already present at 2. Hrishikes (talk) 13:48, 30 November 2021 (UTC)

The LOC has the highest image quality available because they have the original TIFF available. JP2 are a lossy compression and incur further losses with image processing. Languageseeker (talk) 14:15, 30 November 2021 (UTC)

jp2 can be a problem on commons. but you do not really "need" tiffs, just use crop tool on the low res images to finish the work. you can always upgrade images later. this is a perennial issue, since book scans have low res images for size reasons. --Slowking4 亞 Farmbrough's revenge 03:28, 1 December 2021 (UTC)

Tech News: 2021-48

Latest comment: 3 years ago2 comments1 person in discussion

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Changes later this week

The new version of MediaWiki will be on test wikis and MediaWiki.org from 30 November. It will be on non-Wikipedia wikis and some Wikipedias from 1 December. It will be on all wikis from 2 December (calendar).

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

21:15, 29 November 2021 (UTC)

Relevant changes to Wikisource due in wmf.11

Some relevant changes to be released in wmf.11 include:

Some page viewer configurations, which you will be able to set at Special:Preferences#mw-prefsection-editing-proofread-pagenamespace:
- You can now set the animation speed on the OpenSeadragon (OSD) viewer, which makes the panning and zoom feel smoother. The default is '0', which is how it's always been. The usual OSD default is 1.2. I personally like about 0.5.
- You can now set the zoom step of the viewer. The default is 1.2, which is roughly similar to how it used to be when zoomed out, but the new viewer does not zoom asymptotically slower as you zoom in (this was arguably a bug or oversight in the old implementation). If you find the zoom too aggressive, you could try to lower it to, say, 1.1.
Page rotation is now supported in the viewer
OSD click-to-zoom is now set to the same as the scroll step
Various other small OSD fixes (e.g. phab:T296153, phab:T296260).
The CodeMirror syntax highlighting editor should now at least be the right size in the Page namespace (Gerrit 740821)
The default name for the Index page config data is now Mediawiki:proofreadpage_index_data_config.json, not Mediawiki:proofreadpage_index_data_config. The old name will continue to work as a fallback.
JS config variables for page number, index name, index fields and image URLs are now set in edit, view and submit modes (phab:T285218, phab:T255345, phab:T167200 and phab:T204384). This will make it easier to get useful information about a given page from a JS script or gadget.
There is now a formal API to get a list of the page in an index: mw:Extension:ProofreadPage/Index pagination API.

Before wmf.11 lands (tomorrow evening UTC, assuming no train blockers like 2 weeks ago), you can, as always, try out every change that's been merged so far at Beta Wikisource.

The following completed items are still in review and will again be deferred:

Image region OCR (e.g. for OCRing a single column) phab:T294903
Marker lines in the page NS (phab:T296160)
Position persistence after reload
Correctly toggling the indicators for horizontal/vertical layout

The following item needs community input:

Middle-click-to-toggle between zoom and scroll (also adds Ctrl/Shift scroll actions) phab:T296079. Input is still sought on the ideal user experience for scroll/panning at that Phabricator issue or here. No concrete positive (i.e. what people would like to see) suggestions have yet been made.

You can try all the above features, including the current "best-guess" about the scroll and pan here: https://patchdemo.wmflabs.org/wikis/09e7300254/wiki/Main_Page.

Inductiveload—talk/contribs 15:36, 30 November 2021 (UTC)

The wmf.11 train that was expected to deploy a couple of hours ago and would have brought the above changes has been cancelled this week due to "distraction" of the Rel Eng team by "internal changes". None of the above-mentioned changes will therefore be deployed to Wikisource this week. You are still, as always, able to try the current state of merged changes at Beta Wikisource and the patch demo wiki here provides a sneak-peek of proposed changes that are still pending review, and issues can be raised here or at Phabricator even for un-deployed changes. Inductiveload—talk/contribs 22:08, 1 December 2021 (UTC)

Adventures list...

Latest comment: 3 years ago5 comments3 people in discussion

With a few additions/substitutions, it covers most of the list of volume published as part of the original partwork.

However, I'd like some suggestions on what could be added to cover certain 'adventure' novels that may be lacking.

There is currently no adventure novel that is set in the period of Imperial Rome, in the list. My thought was perhaps to have Last Days of Pompeii assuming we have a scan (or one can be located.) but I'm open to other suggestions.

The other absent 'adventure' is those involving Space exploration. Are there any pre 1964 (and not renewed) novella of print Science Fiction's Glden era that would be appropriate? (Alternatively First Men in the Moon by H.G. Wells would be a reasonable choice.)

Do other contributors on Wikisource have suggestions on specific 'adventures' that are worth including in the list, suitable for a general audience, but are sub-genres not covered by those already in the list? ShakespeareFan00 (talk) 08:22, 30 November 2021 (UTC)

The scans for Pompeii in first edition are here: IA. Scott's Count Robert of Paris is set in Byzantium. We also could scan back the actual novel from Ancient Rome by Apuleius. Marius the Epicurean is another novel set in Rome. MarkLSteadman (talk) 10:36, 30 November 2021 (UTC)

Comment Whist "lirst" [sic.] are you asking about? Where is it? A link would help us know what you're talking about. --EncycloPetey (talk) 23:32, 30 November 2021 (UTC)

@EncycloPetey: The list is at User:ShakespeareFan00/Adventures_List ShakespeareFan00 (talk) 06:43, 1 December 2021 (UTC)

There a 6 works for which a suitable scan needs to be located - User:ShakespeareFan00/Adventures_List#Needs_a_scan_locating ShakespeareFan00 (talk) 17:33, 1 December 2021 (UTC)

Enable mobile talk page tabs (and page NS prev/next/index) for anonymous users

Latest comment: 3 years ago2 comments1 person in discussion

Tracked in PhabricatorTask T47955

Currently, anonymous users on the the mobile site do not see the talk page tabs. This means they also do not see the next/previous/index page tabs.

I suggest that we enable this via a configuration change: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/741097. Config changes generally need some level on community support, so please support (or oppose) below. Inductiveload—talk/contribs 10:05, 24 November 2021 (UTC)

This is now deployed: users not-logged in can now see the "next/prev/index" links in the page namespace. Inductiveload—talk/contribs 12:23, 2 December 2021 (UTC)

Category:Pages with illegal formatting in header fields

Latest comment: 3 years ago5 comments2 people in discussion

"Illegal" is a very strong word. Who decided that certain formatting is "illegal" in header fields? Was this a community decision, or unilateral? I suspect that someone took on the task of getting some consistency into our headers, and I'm sure this was helpful overall... but I would prefer if this category focussed on the fact that the standard template can't do some things; e.g. ":Category:Pages with header field formatting not yet supported by the standard header template" or ":Category:Pages with explicit formatting in header fields".

Headers on pages like A Passionate Pilgrim and Other Tales (Boston: James R. Osgood & Co., 1875)/Madame de Mauves/Part 5 have been crafted to express clearly that this page is part of a particular short story as published in a specific edition of a book. It concerns me that someone might come along and strip out this formatting, and return it to a bog-standard header that doesn't effectively communicate what the page is, just because the formatting has been declared "illegal".

Hesperian 22:40, 29 November 2021 (UTC)

I'm not feeling the love for this topic of discussion. I might just boldly rename it to Category:Pages with explicit formatting in header fields. Hesperian 22:41, 30 November 2021 (UTC)

@Hesperian makes sense to me! We should still try to work towards a template which can properly capture these cases without any formatting. We are seeing a lot more works like "collected works of" and periodicals where the top level work name is actually not the "primary title" of the work. Inductiveload—talk/contribs 22:09, 1 December 2021 (UTC)

Thanks, I agree. Hesperian 23:27, 1 December 2021 (UTC)

Done, but it will take yonks for the old category to depopulate and the new one to populate. Hesperian 02:34, 3 December 2021 (UTC)