User talk:Xover
poem tag question
[edit]On Wikisource:Scriptorium/Help I have an open topic about the poem tag formatting carrying over into footnotes included inside the tag. I've had no responses yet. --EncycloPetey (talk) 22:49, 12 August 2024 (UTC)
Archive of Files missing machine-readable data?
[edit]Both categories have been cleaned up, and there are now only a few files that go in and out, so I thought maybe we should remove the {{DNAU}} and let the bots archive. Fine with that? — Alien333 (what I did & why I did it wrong) 18:10, 17 August 2024 (UTC)
- @Alien333: Indeed. Thanks for the reminder. Xover (talk) 17:24, 18 August 2024 (UTC)
Daily News
[edit]Not really your fault, since Daily News is such a generic title, but you’ve got the wrong Daily News. Our Daily News was created for Daily News/1940/12/24/Cheated Death In Air Battles, Dies In Crash, which is for the New York Daily News, while G.K. Chesterton contributed to the Daily News of London (see The Daily News (UK) on Wikipedia). I’ll try to get scans of the relevant articles, but I can’t promise anything insofar as British newspapers (and library holdings) are concerned. TE(æ)A,ea. (talk) 17:54, 24 August 2024 (UTC)
- @TE(æ)A,ea.: Thanks. I'm not sure I can absolve myself of sloppiess here, because I really should have caught that. I've dab'ed the two and updated links etc. Interestingly, almost all incoming links were intended for the London magazine, so the New York title was somewhat of a squatter. Xover (talk) 18:56, 24 August 2024 (UTC)
IRC #wikisource
[edit]Is there still any discussion over there? Because the few times I poked my nose around there wasn't. — Alien333 ( what I did
why I did it wrong ) 14:18, 28 August 2024 (UTC)
- @Alien333: Very rarely. But most IRC channels are pretty low-volume these days, so I don't know that #wikisource is any worse. Xover (talk) 08:07, 29 August 2024 (UTC)
I saw it mentioned on your page, but I'm pretty sure that there's no need of any further technical work, as {{overfloat image}} fits perfectly (see that page). — Alien333 ( what I did
why I did it wrong ) 12:16, 3 September 2024 (UTC)
- @Alien333: The issue isn't with Andersen /472, it's with {{img float}}. Feel free to remove the hidden comment as it was just a reminder / todo for myself about the issue. Xover (talk) 12:25, 3 September 2024 (UTC)
Orlando Furioso v4
[edit]Could you please generate a DjVu file from File:Orlando Furioso (Rose) v4 1825.pdf? Seven of the eight volumes were available at IA, and have been uploaded to commons:Category:Orlando Furioso (Rose), but volume 4 does not exist there for some reason. TE(æ)A,ea. (talk • contribs) was kind enough to acquire and provide a PDF, but I would prefer a DjVu, so that the whole series is in the same format (and because of the numerous technical issues we're having with PDFs). The DjVu should be named File:Orlando Furioso (Rose) v4 1825.djvu to match the naming pattern for the rest of the series. --EncycloPetey (talk) 20:12, 3 September 2024 (UTC)
- @EncycloPetey: File:Orlando Furioso (Rose) v4 1825.djvu. IA has a scan and HathiTrust has several, it's just that UCal seems to be missing vol. 4 from their physical collection so it's missing in that scan series. I grabbed one of the Harvard copies and uploaded that since it seemed to be decent quality and then I wouldn't have to deal with Google's terrible PDFs. Xover (talk) 15:40, 4 September 2024 (UTC)
- I found my IA copies using a search, which did not turn up a copy of volume 4. And if you look at the pattern in the local IDs, you can infer what we be correct for volume 4 in the set I found, but it's a scan of an entirely different book. I am aware of the copies at Hathi, and I asked TE(æ)A,ea. if one could be provided, but there were complications, I gather, from subsequent conversation.
- Well, thank, and I'll take a look today to see whether this copy is a complete scan or not. I have come across copies that were missing portions of the original. --EncycloPetey (talk) 16:43, 4 September 2024 (UTC)
There is no text layer. Could you please generate a text layer for the file? I am also getting zero file size errors, which I never have had previously with DjVu files. --EncycloPetey (talk) 16:52, 4 September 2024 (UTC)This problem sorted. --EncycloPetey (talk) 16:59, 4 September 2024 (UTC)- The text layer exists, but is garbled because it was generated by Google. Where there is text, there can be whole lines placed at the bottom of the page, instead of in their proper sequence, if not missing altogether from the page. I have found pages with randomized punctuation. I may be able to use the OCR tool, since this is a regular and very structured text with a relatively clean scan, but I forsee a higher error rate on this volume, and we have had recent days where the OCR tool failed or was unpredictable. --EncycloPetey (talk) 17:08, 4 September 2024 (UTC)
- @EncycloPetey: The text layer in the DjVu was generated by my tools (tesseract is the OCR engine), not by Google. Spot checking pages in Index:Orlando Furioso (Rose) v4 1825.djvu I see no significant problems with the text layer. On what pages are you seeing problems? Xover (talk) 17:42, 4 September 2024 (UTC)
- I did not keep track of which pages. I checked several dozen to be sure the scan had likely included all the relevant pages without duplicates, and noted bizarre issues like the ones I describe. But looking for a few examples now: scan page 130 has randomized start-of-line punctuation; 190 has text that does not appear on the page; 200 is one of the pages where the text was out of sequence. --EncycloPetey (talk) 17:50, 4 September 2024 (UTC)
- /130 is just Tesseract being really bad at quotation marks. That's a general problem with no fix. /190 is Tesseract being over-eager and detecting the text on the opposite side of the sheet. It'll mostly just happen on empty pages (because it doesn't have real text to correct against), so it's usually not a big problem. The misplaced text on /200, though, is a weird bug. Tesseract detects the relevant line correctly, and with the correct coordinates (if you load it in DjView and turn on hidden text you'll see the text positioned exactly over the letters in the scan), but the line is stored out of order in the OCR output (Tesseracts outputs a HTML-like structured format where each detected word is tagged with its coordinates on the page; normally each line is in the output in the order it is on the page, but here that line comes at the end of the output, and hence also in the plain text shown in the text box). I'm guessing this is because it is getting confused by the first-line indentation and thinking the page is a two-column layout. I'll try to see if there are any settings I can tweak or something, but I'm not hopeful and it probably won't happen soon in any case. IOW, unless the problems with this are more severe than currently apparent this is as good as it's going to get for now. Xover (talk) 18:46, 4 September 2024 (UTC)
- I did not keep track of which pages. I checked several dozen to be sure the scan had likely included all the relevant pages without duplicates, and noted bizarre issues like the ones I describe. But looking for a few examples now: scan page 130 has randomized start-of-line punctuation; 190 has text that does not appear on the page; 200 is one of the pages where the text was out of sequence. --EncycloPetey (talk) 17:50, 4 September 2024 (UTC)
- @EncycloPetey: The text layer in the DjVu was generated by my tools (tesseract is the OCR engine), not by Google. Spot checking pages in Index:Orlando Furioso (Rose) v4 1825.djvu I see no significant problems with the text layer. On what pages are you seeing problems? Xover (talk) 17:42, 4 September 2024 (UTC)
Vector 2022
[edit]Hey, I'd like to revive the topic of making Vector 2022 the default here. Before I start a discussion in Scriptorium though, I wanted to check in with you. Do you see any issues that need to be addressed (fixed, explained, regardless) either before or reasonably shortly after deployment? Maybe we could do some things before involving more people, esp. the less technical editors. Thanks! SGrabarczuk (WMF) (talk) 19:20, 4 September 2024 (UTC)
- @SGrabarczuk (WMF): This is just a quick braindump before morning coffee. Once the caffeine kicks in I may regret everything and take it all back. Or something like that… 😎I think the biggest issue is going to be general pushback from the community of the kind enWP so emphatically provided, even if somewhat more muted and on different causes. Partly that's going to be motivated by resistance to change (we have contributors still using Monobook for no articulable reason), but partly also because Vector 2022 reflects different priorities than their own. Its major focuses are things that make sense on a Wikipedia, but not so much on Wikisource; it moves around UI elements that are now harder to find and get at than before; and it reflects WMF priorities over community priorities (e.g. the language selector vs. the mw-indicators positioning). I'm afraid the community here will see little benefit in the changes Vector 2022 makes, and things like having to go to a submenu to find the link to your own user talk page will be viewed as significant drawbacks. I could be wrong, but that's my concern.In more concrete and technical terms I'm not aware of any major things of the "breaks core workflow" variety. The new menus are breaking some Gadgets that modify them (bigChunkedUpload is the latest I've noticed). The interlanguage links we manually add to Special:RecentChanges by way of MediaWiki:Recentchangestext and {{Interwiki Wikisource}} no longer work in Vector 2022 (it works in all other skins). Vector also overlaps our Dynamic Layouts (essentially MediaWiki:Gadget-PageNumbers.js) while providing no community control, not integrating with our layout system, nor provide any facilities that make our implementation easier (the Gadget is somewhat fragile and prone to FOUC-type problems). Also, because Wikisource is so poorly supported by the WMF (so far as I can tell not a single developer has ever been allocated to Wikisource; we depend entirely on the good will of individual developers and teams with other responsibilities for everything we need) we are dependent on a large number of gadgets and user scripts that make repetitive editing tasks more efficient. Vector 2022 is designed with hiding these away as an apparent goal (stuff added to #p-toolbox is now hidden in the looong and cluttered Tools dropdown menu), leaving us with no clear way to surface editing helpers that need to be Fitts's law-compliant. The 2017 editing toolbar and Visual Editor doesn't support Wikisource (at all), and the 2010 editor is really primitive in terms of extension and integration points (see e.g. T370353 for the completely basic s...tuff that's not there). The paragraph spacing is still broken (compare the text inside the box on this page in Vector 2010 and Vector 2022), and this affects a lot of pages on enWS.I haven't done a systematic assessment of Vector 2022 here (partly because it's a moving target, partly because I haven't had time, partly because y'all have been focussed on the Wikipedias), but I have had it set as default since the last time the issue was brought up back in March. My assessment is that the main issue with making Vector 2022 default is that the value proposition for English Wikisource—the "What's in it for me?"—is too poor when held up against both the concrete drawbacks and the need for change in general (all change has a cost; resistance to change is not inherently irrational). If the development of the skin had been more able to identify and incorporate this specific community's needs and priorities in its scope early on I think that calculus could have easily changed. But as it stands the value proposition is going to be perceived as marginal, at best, and the Wikisourcen in general have way too few technical contributors able to follow up with the Web Team to get issues fixed as they crop up (by the time the community has got its act together the team will be onto other tasks and greener pastures). Xover (talk) 06:14, 5 September 2024 (UTC)
- Wow, thanks for the detailed and long response, I really appreciate it! If you'd like to add something or take something back, you can also reach out to me on Discord, Telegram, lots of places - there are very few people named Szymon Grabarczuk, I'm easy to find across platforms :D SGrabarczuk (WMF) (talk) 08:43, 5 September 2024 (UTC)
The text layer on this Index is off by one. Could you please correct this issue? It's in PD in the US this year, but transcription has been held up for months for a variety of issues. --EncycloPetey (talk) 17:47, 8 September 2024 (UTC)
Wikisource News
[edit]The latest edition of WS:News is out. Please enjoy. You are welcome to unsubscribe from these notifications by removing your name from this list. MediaWiki message delivery (talk) 15:56, 3 October 2024 (UTC)
Linter/
[edit]The remaining tidy font issues in the Wikisource namespace cannot easily be repaired by other contributors (notably @Zinnober9: because they are currently protected. Does this protection still need to be there? ShakespeareFan00 (talk) 09:54, 17 October 2024 (UTC)
- @ShakespeareFan00 Very nice timing, I was just composing my thoughts to ask about this since the Tidy Font set was nearing completion. I'll post that here since the discussion has been started.
- Hello Xover,
- Hope you are doing well! In regards to delinting, how does Wikisource handle full protected pages in the event of corrective edit requests for taking care of various Lint errors? Are pages temporarily lowered in page protection, as enwiki does, or do you offer temporary adminship to reliable editors (pending a community's admin nomination approval) as Wikivoyage does? I'm not making any request one way or the other with this comment, just asking questions about how things are done here so that I know how to proceed in getting these and some other currently protected Lint errors addressed.
- ShakespeareFan00 asked me the other month on enwiki if I would join them here in tackling some of the Linter syntax issues, and since I was between error type interests on enwiki and had just finished things on Wikivoyage, I got busy here and made quick work of the Tidy font errors.
- A little bit about me, I've been delinting on enwiki for two years and have eliminated, or helped eliminate, a few error types from enwiki (Tidy font, invalid image options, and fostered table content were some of the major ones). I spent Aug-Sept of this year on Wikivoyage clearing all but 59 of their 30k or so Lint errors in 5.9k edits and was granted temporary (3-mo) adminship there so I could access a few hundred lint issues on protected pages that had been unavailable to me otherwise. All my edits are human and I try to maintain a one edit per page goal in my delinting when I can so that I can minimize any disturbance to other editors. Happy to answer any questions you may have. Zinnober9 (talk) 12:33, 17 October 2024 (UTC)
- What I had actually asked about was Missing tags in content namespaces. ShakespeareFan00 (talk) 22:22, 17 October 2024 (UTC)
Apologies..
[edit]I am considering qutting Wikisource. It seems that good faith delinting outside Page: namespace is contentious, and my attempts to interest someone in resolving Missing tgas on Page:'s backfired.
I remain perfectly willing to revert any non Page: delints I made before leaving however, but I am of the view that would just create more noise. ShakespeareFan00 (talk) 22:46, 17 October 2024 (UTC)
- However it seems that some backlogs will be mostly cleared very soon :) ShakespeareFan00 (talk) 22:45, 19 October 2024 (UTC)
- ShakespeareFan00 (talk) 22:45, 19 October 2024 (UTC)