Wikisource:Scriptorium/Archives/2019-02

Please do not post any new comments on this page.

This is a discussion archive first created in February 2019, although the comments contained were likely posted before and after this date.

See current discussion or the archives index.

Changes to page numbering

Latest comment: 5 years ago4 comments3 people in discussion

Page numbering in the ProofreadPage extension appears to have been modified, in particular now appearing within the flow of text instead of in the left margin. Does anyone know where this change came from? —Beleg Tâl (talk) 03:00, 4 February 2019 (UTC)

It still appears in the margin for me, at least in the works I checked. Where are you seeing this behavior? --EncycloPetey (talk) 04:42, 4 February 2019 (UTC)

Are you aware that this is a display option in the sidebar? Perhaps you accidentally bumped it.... Hesperian 10:58, 4 February 2019 (UTC)

I was not aware, but I am now! Thanks. —Beleg Tâl (talk) 14:42, 4 February 2019 (UTC)

Tech News: 2019-06

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Changes later this week

It was easy to untick a box by accident in Special:Preferences. This will now be fixed. [1]
The new version of MediaWiki will be on test wikis and MediaWiki.org from 5 February. It will be on non-Wikipedia wikis and some Wikipedias from 6 February. It will be on all wikis from 7 February (calendar).

Meetings

You can join the technical advice meeting on IRC. During the meeting, volunteer developers can ask for advice. The meeting will be on 6 February at 16:00 (UTC). See how to join.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

17:12, 4 February 2019 (UTC)

Resolution of the CharInsert problem

Latest comment: 5 years ago4 comments2 people in discussion

Would anyone know how to update the MediaWiki:Gadgets-definition for CharInsert? This is the instruction I got from w:Wikipedia:Village pump (technical). Since the code is a copy of the WP CharInsert.

Our updated/upgraded script of Charinsert is not fully functional. The problem is that the update is not writing the user's Charinsert row selection to a cookie as it used to. The code change is that the script now writes to the LocalStorage and not a cookie. — Ineuw talk 20:27, 4 February 2019 (UTC)

A bureaucrat can do it, or give you access to do it -- @Hesperian, @Mpaa, @Zhaladshar: —Beleg Tâl (talk) 22:45, 4 February 2019 (UTC)

Thanks for the reply. Unfortunately I have no clue what to do.— Ineuw talk 01:05, 5 February 2019 (UTC)

However, I found more information on this Mediawiki page.— Ineuw talk 13:38, 5 February 2019 (UTC)

Is this standard procedure?

Latest comment: 5 years ago7 comments2 people in discussion

I was validating this page and noticed the proofreader re-arranged it so that the image doesn't divide the paragraph as in the original source document. Sounds reasonable, but I thought I'd run it by you guys to make sure this is the correct procedure. Abyssal (talk) 20:10, 7 February 2019 (UTC)

It varies by text; there is no uniform standard for this. In this work, there is high density of images, and thus a danger of placing images on consecutive pages together in ways that break the formatting. The original places images in the vertical center of pages (regardless of section or paragraph breaks), which is a feature that cannot be replicated in an electronic format, and which is not desirable when this results in the apparent random breakage of paragraphs. The two of us who worked on the initial text agreed that repositioning some of the images was a good compromise, and would better preserve the integrity of the text. --EncycloPetey (talk) 20:16, 7 February 2019 (UTC)

@EncycloPetey:Makes sense. On this specific page, though, I am kind of concerned that the image isn't listed under the section that discusses it. Am I making too big a deal of things? Abyssal (talk) 01:44, 8 February 2019 (UTC)

More than a few of the images, as published, were not in the section that discusses them. All images are referenced by number from the text, and this image comes immediately before the text that makes reference to it. The alternative would be to move the image and thereby to break the opening paragraph of that section into two separate paragraphs. --EncycloPetey (talk) 02:13, 8 February 2019 (UTC)

@EncycloPetey:What about placing it under the section heading but before the text? Abyssal (talk) 03:10, 8 February 2019 (UTC)

That would look really, really odd. The section heading would look like an image caption instead of a section heading. There's a reason that publishers avoid doing that. --EncycloPetey (talk) 03:23, 8 February 2019 (UTC)

@EncycloPetey:Alright. I'll defer to your judgement and just validate that page and similarly formatted pages. Abyssal (talk) 05:19, 8 February 2019 (UTC)

template improvements needed? Auxiliary Table of Contents

Latest comment: 5 years ago7 comments5 people in discussion

Anyone who knows about template programming, please take a look at this. As you can see, when you use {{TOC line}} and {{Dotted TOC line}} in {{Auxiliary Table of Contents}}, the background color doesn't match. Can this be fixed? Thanks Levana Taylor (talk) 20:42, 9 February 2019 (UTC)

Is there some reason that the illustrators need to be listed as a separate column in an Auxiliary ToC? --EncycloPetey (talk) 22:30, 9 February 2019 (UTC)

I discussed that in the first post of this thread. The reason is that I am exactly imitating the format the magazine used to announce their contents in their advertisements (though the advertisement contents can't be used as a substitute TOC because they're not complete and in order). True, it's not necessary ...

Even if I wasn't using two columns I would still need {{TOC line}} in order for hanging indents to work properly. Plus, other people have combined these templates. Levana Taylor (talk) 22:54, 9 February 2019 (UTC)

I think I solved the clolour problem with TOC line by setting the background for transparent instead of white. It seems more difficult with the Dotted TOC line, as the dots would become visible through the text too. The only solution I have is to add another parameter that would enable to change the background colour if the template is used within the Auxiliary TOC, see my experiment at Dotted TOC line/sandbox. --Jan Kameníček (talk) 08:59, 10 February 2019 (UTC)

It was never intended that the TOC templates as used in the Page: namespace would be used in the AuxTOC template, which is why combining them is being problematic. For the hanging indent spacing problem, you'll need to use a single {{hi}} for all the lines, not just the ones that are longer. For the dotted TOC lines, I suggest that you'll need to consider a different format. Remember that AuxTOC is a construct by us to make a work more easily navigble in the absence of a printed TOC, so it's OK to do things in a pragmatic way that works within the confines of what it's capable of. Beeswaxcandle (talk) 09:17, 10 February 2019 (UTC)

The problem is that Dotted TOC Line depends on the z-index property and a solid background to hide the extraneous dots behind the text; and in CSS backgrounds are not inherited. To fix this you would need to ensure that every single HTML element between the green background and the div that's currently set to white had an explicit "inherit" value. It would probably be better to simply reimplement the dotting functionality into AuxTOC directly. But caveat, I haven't actually looked at how the dotted toc line template actually achieves its effect so that may be a tall order. --Xover (talk) 09:43, 10 February 2019 (UTC)

@Beeswaxcandle:: You're absolutely right, the TOC looks vastly better if I don't use AuxTOC but instead recreate a similar effect using {{border}} with a white background. Levana Taylor (talk) 11:28, 10 February 2019 (UTC)

TemplateScript for Greek beta code

Latest comment: 5 years ago1 comment1 person in discussion

I created a TemplateScript tool for entering Greek text using w:Beta Code. If anyone wants to use or improve it, it's at User:Beleg Tâl/Beta.js —Beleg Tâl (talk) 03:23, 11 February 2019 (UTC)

Tech News: 2019-07

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

You can use the ambox CSS class to show page issues to mobile readers. When you use ambox there are classes you can use.

Changes later this week

The new version of MediaWiki will be on test wikis and MediaWiki.org from 12 February. It will be on non-Wikipedia wikis and some Wikipedias from 13 February. It will be on all wikis from 14 February (calendar).

Meetings

You can join the technical advice meeting on IRC. During the meeting, volunteer developers can ask for advice. The meeting will be on 13 February at 16:00 (UTC). See how to join.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

18:45, 11 February 2019 (UTC)

Dynamic layouts...

Latest comment: 5 years ago2 comments1 person in discussion

Is there someone dedicated that could write an explanation of how the page numbering script interacts with LST and proofread page?

I'm asking because I wanted to make a suggestion on how to possibly solve certain problems, but needed to know a little bit more about the internals to do it, to be sure if the fix was an appropriate one. ShakespeareFan00 (talk) 20:03, 14 February 2019 (UTC)

The relevant portion of the page numbering java-script makes reference to a number of CSS classes. ( or overrides), that are defined upon various aspects of page display. Is there a diagram explaining the Hireachy of what gets generated for each layout? ( There are references made to page and region container classes for example?)

ShakespeareFan00 (talk) 20:18, 14 February 2019 (UTC)

Wikidata Edit section on the sidebar

Latest comment: 5 years ago2 comments2 people in discussion

On the sidebar, I would like to either place the Wikidata Edit section below the Tools section, or hide it? Would someone have a script which I could copy? Thanks in advance. — Ineuw talk 00:10, 14 February 2019 (UTC)

pst, you could also try out timeless skin, which moves sidebar menu to the top (as you zoom in), and puts edit in upper right. yrmv. Slowking4 ‽ SvG's revenge 00:10, 18 February 2019 (UTC)

OCR button is non functional...

Latest comment: 5 years ago1 comment1 person in discussion

Was attempting to use the OCR button , to get the OCR text for a page - It generated the following error - "ws_ocr_daemon robot is not running. Please try again later."

Can someone reboot the bot responsible? Thanks. ShakespeareFan00 (talk) 11:18, 17 February 2019 (UTC)

Tech News: 2019-08

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

When you thank someone on the mobile web you will now have two seconds to cancel the thank. This is in case you clicked on the thank button by accident. [2]

Changes later this week

The new version of MediaWiki will be on test wikis and MediaWiki.org from 19 February. It will be on non-Wikipedia wikis and some Wikipedias from 20 February. It will be on all wikis from 21 February (calendar).

Meetings

You can join the technical advice meeting on IRC. During the meeting, volunteer developers can ask for advice. The meeting will be on 20 February at 16:00 (UTC). See how to join.

Future changes

There is a proposal to add a red link to mobile search results if there is no page with that name. This is how it works on desktop. You can leave feedback. [3]

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

23:14, 18 February 2019 (UTC)

IA Upload bot is down

Latest comment: 5 years ago4 comments2 people in discussion

@Phe, @Samwilson, @Tpt: the IA upload bot is down, FYI —Beleg Tâl (talk) 13:54, 25 February 2019 (UTC)

@Beleg Tâl: Do you mean toolforge:ia-upload? It seems to be up again now, and has been used recently. Might have been a random transient error. —Sam Wilson 22:57, 25 February 2019 (UTC)
- @Samwilson: yes it came up since. Thought you fixed it. Sorry to bother you over just a little hiccup. —Beleg Tâl (talk) 23:23, 25 February 2019 (UTC)
  - @Beleg Tâl: No worries! Ping me anytime. I've also added an uptimerobot tracker for ia-upload, so will be notified of any downtime (I thought I'd done that ages ago, but it seems not). Sam Wilson 23:26, 25 February 2019 (UTC)

Tech News: 2019-09

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

There is a new version of the iOS Wikipedia app. It has for example syntax highlighting and new toolbars to make it easier to write wikitext. It also has night mode, a find-on-page function and other things. You can give feedback and suggestions. [4]

Changes later this week

When you look at your watchlist or the recent changes page you can use the new filters for edit review. There you can choose tags to filter different edits. Empty tags will no longer be shown. [5]
The new version of MediaWiki will be on test wikis and MediaWiki.org from 26 February. It will be on non-Wikipedia wikis and some Wikipedias from 27 February. It will be on all wikis from 28 February (calendar).

Meetings

You can join the technical advice meeting on IRC. During the meeting, volunteer developers can ask for advice. The meeting will be on 27 February at 16:00 (UTC). See how to join.

Future changes

The Wikipedia app for Android will invite users to add Wikidata descriptions to Wikidata objects that have Wikipedia articles but no Wikidata descriptions. It will only invite users who have added a number of Wikidata descriptions in the app without being reverted. This is to avoid spam and bad edits. You can read more and leave feedback.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

21:17, 25 February 2019 (UTC)

Thirty-One Years on the Plains and in the Mountains

Latest comment: 5 years ago11 comments5 people in discussion

Why is this deleted?--115.27.198.88 15:49, 24 February 2019 (UTC)

(1) no source, (2) no license, (3) no formatting, (4) no author, (5) drive-by copy-paste dump with no verifiable page scans, per this advice:

"While a djvu file at Commons is not currently a requirement, there has been discussion of making it a requirement. Your work will have a better chance of standing the test of time, if it can stand the test of validation to an available scan. Because Commons is a sister site under the same organization as Wikisource, as long as Wikisource, Wikipedia and related sites exist, your work is likely to survive if the page images are stored at Commons."

Drive-by unformatted copy-paste dumps do not add value to Wikisource. --EncycloPetey (talk) 17:21, 24 February 2019 (UTC)

here is the IA [6] and gutenburg [7] shouldn’t be too hard to create an index. Slowking4 ‽ SvG's revenge 01:55, 25 February 2019 (UTC)

I'm not sure I would have been so quick to delete, but Project Gutenberg is a major project that is not going to just go away any time soon. There's no value in just cutting and pasting the text and dropping it here, especially if you're not adapting it to wiki formatting.--Prosfilaes (talk) 03:08, 25 February 2019 (UTC)

Maybe it came from Gutenberg, and maybe it didn't. No source was given. --EncycloPetey (talk) 03:28, 25 February 2019 (UTC)

This edit comment for the creation was "http://www.gutenberg.org/cache/epub/5337/pg5337.txt". That's pretty clear.--Prosfilaes (talk) 05:16, 26 February 2019 (UTC)

I disagree. You might infer that it's the source from the edit comment, but it's not clear whether it is the source, or that it is the full work, part of the work, a citation, or what. None of this justifies drive-by copydumps. --EncycloPetey (talk) 00:29, 27 February 2019 (UTC)

Should we add drive-by copydumps to WS:CSD? Otherwise, a discussion at WS:PD would have been preferable. —Beleg Tâl (talk) 13:20, 25 February 2019 (UTC)

Yes. Every discussion of IP drive-by copydumps ends in either (a) deletion or (b) someone starting a new project with a new scan and Index, which means the original copydump is done away with in favor of the scan-backed version. No discussion has ever resulted in retaining the copydump. --EncycloPetey (talk) 01:23, 26 February 2019 (UTC)

@EncycloPetey: I have officially proposed it above: see #Adding copydumps to WS:CSD. —Beleg Tâl (talk) 13:53, 27 February 2019 (UTC)

here is the index Index:Thirty-One Years on the Plains and in the Mountains.djvu. -- Slowking4 ‽ SvG's revenge 16:12, 27 February 2019 (UTC)

Cleaning up Once a Week namespace

Latest comment: 5 years ago13 comments3 people in discussion

Right now the namespaces for Once a Week are chaotic, because in the past some of the articles were created stating the volume number in Roman numerals and some in Arabic numerals (the existing parts of Volume IV are half-and-half). I think they should all be Roman numerals, since that is how the magazine itself does it. I started trying to clean it up but I was just making myself confused and introducing errors; so can a bot fix this? Change all pagenames and links using an Arabic-numbered volume to Roman-numbered? (Or vice versa, even, it just ought to be consistent.) Levana Taylor (talk) 03:19, 17 February 2019 (UTC)

The preference on Wikisource is usually for Arabic numbering, unless there is some reason to do otherwise. Arabic numbering will sort properly, and be more easily handled with linking. Roman numerals have the disadvantage that they sort alphabetically, not numerically, so for large works with many parts or sections, Arabic numbering is probably the better way to go. --EncycloPetey (talk) 04:26, 17 February 2019 (UTC)

I agree that it would make the pagenames easier to handle if they were numbered in Arabic numerals. What is needed in that case is for someone (a bot?) to find and fix pagenames and links that currently use Roman numerals. Some of those I created myself (sorry) but some were there before me.

Nonetheless, all the existing headers (whatever the numbering in the pagename) are piped to display Roman numerals for the volumes. I think this makes sense.

The only remaining thing needed, in that case, is a modification of {{Once a Week link}}. It should display volume numbers as Roman numerals even though they would be entered as Arabic numerals. (Plus, unrelatedly, that template needs to be modified to allow displaying the "article" parameter as the visible title rather than having "link" override "article" the way it does now.) Levana Taylor (talk) 05:14, 17 February 2019 (UTC)

Can I offer a time swap? If someone will take care of finding and correcting all Roman numerals in Once a Week, I'll spend a few hours on some tedious task you don't want to do. Levana Taylor (talk) 16:03, 18 February 2019 (UTC)

The best place to seek Bot help is to post at Wikisource:Bot requests, because that it where people expect to see requests for help that requires a Bot. --EncycloPetey (talk) 16:12, 18 February 2019 (UTC)

Also check the sandbox for {{Article link/sandbox}} , I put something there that may be of interest. ShakespeareFan00 (talk) 16:35, 18 February 2019 (UTC)

@EncycloPetey:: Thanks, will x-post.

@ShakespeareFan00:: Is that intended to display the volume as a Roman numeral? It doesn't seem to be working. Levana Taylor (talk) 21:22, 18 February 2019 (UTC)

Did you specify the additional option? ShakespeareFan00 (talk) 21:23, 18 February 2019 (UTC)

Thanks - Template:Article_link/testcases#Volume/issue ShakespeareFan00 (talk) 21:26, 18 February 2019 (UTC)

The "roman_vol" parameter is great! (Just what I was thinking of.) Did you intend to change the quotes around the title to curly, though? Levana Taylor (talk) 23:20, 18 February 2019 (UTC)

The template code changes are in the sandbox because it would need an interface admin to review and update the main template. did you want roman_iss function as well?ShakespeareFan00 (talk) 12:45, 19 February 2019 (UTC)

That'd probably be a good idea, although it's very rarely used. I can think of just one magazine offhand that didn't have volumes but did number its issues I, II, etc.Levana Taylor (talk) 18:26, 19 February 2019 (UTC)

(later) Can someone with the authority to do so please implement that very useful roman-numeral parameter? Thanks. Levana Taylor (talk) 06:45, 3 March 2019 (UTC)

Phrase Books and Other Multi-Lingual Documents

Latest comment: 5 years ago15 comments5 people in discussion

What's the Wikisource policy or guideline for side-by-side English-Other Language documents? Multi-language documents such as those listed below, which were published New York. Should they be transcribed here in English Wikisource? If not, where?

Side-by-side translation -- e.g. (1852) "Freibrief und nebengesetze der Deutschen gesellschaft der stadt New-York".[8] The document language on archive.org is "German," the cover is in English, the document is bilingual, side-by side, English-German.
Phrase book -- e.g. (1855) "English, French and German Conversational Phrase-book: For the Use of Students and Travellers, etc."[9]

-- Outlier59 (talk) 03:14, 28 February 2019 (UTC)

Multilingual works in which the target language is English are welcome at English Wikisource. Normally we handle side-by-side translations by proofreading the English parts on English Wikisource and the other language parts on the other language Wikisource. For an example of this, see Index:Aida Libretto English.djvu and it:Indice:Aida Libretto English.djvu. Phrase books, dictionaries, and similar works targeted at English speakers can be placed here in entirety. —Beleg Tâl (talk) 04:00, 28 February 2019 (UTC)

Thank you, Beleg Tâl. That's very clear. How is the "iwtrans it.wikisource.org." note on the no-proofing pages added -- such as on the top of Page:Aida Libretto English.djvu/8? -- Outlier59 (talk) 16:53, 28 February 2019 (UTC)

@Outlier59: you use the template {{iwpage}}. —Beleg Tâl (talk) 16:55, 28 February 2019 (UTC)

@Beleg Tâl: So half the document is transcluded to namespace in each language Wikisource? Or can both be transcluded here? -- Outlier59 (talk) 17:16, 28 February 2019 (UTC)

@Outlier59: The English section is transcluded to English Wikisource, and the other language section to the other language Wikisource. —Beleg Tâl (talk) 17:21, 28 February 2019 (UTC)

@Beleg Tâl:, thank you again. I'll try to add some of this information to Wikisource:Multilingual texts. --Outlier59 (talk) 17:27, 28 February 2019 (UTC)

@Beleg Tâl, @Outlier59:I strongly diasagree with the view that only the English part of side-by side translations should be added here.

If it were true, the template {{bilingual}} would not have been founded.
The main aim why the original publishers decided to publish the work bilingually was that they wanted to provide the readers who can read both languages with the opportunity to compare the translation with the original. If we add only the English translation here, we strip the work of this possibility and destroy the publishers' intentions.
As for "target language is English": with side by side translations the target language is also English, only the book is also enriched with the possibility of comparison to those who can read both languages.
Bilingual publications are categorized at Category:Bilingual publications. Only a few have been added here so far, but it will be great if some more follow. --Jan Kameníček (talk) 19:47, 28 February 2019 (UTC)

I understand that contributors who do not speak the original langugae of the translated work add only the English translation: it is absolutely understandable and it is better to add only the translation than nothing. But it some contributors are able and willing to add both language sides of such publication, it would be a pity not to allow them to do so. --Jan Kameníček (talk) 19:52, 28 February 2019 (UTC)

@Jan.Kamenicek: you can disagree if you like. This is the way that is preferred, and which is accepted on all the various language Wikisources that we have dealt with on this issue. Sometimes it must be done this way, such as Index:The New Testament in the original Greek - 1881.djvu which has no English translation and is therefore out of scope on enWS, but which has an English introduction which is not permitted on Greek Wikisource. Some works are bilingual for convenience of publication only, such as Index:National anthem act Canada.pdf, and in such cases the non-English text should not be hosted on English Wikisource. In some cases you can use your discretion and host a parallel translation here—but this should only be done where the author clearly intends for the English and non-English to be viewed in parallel, and even then splitting between wikisources is an acceptable alternative. —Beleg Tâl (talk) 20:12, 28 February 2019 (UTC)

I do not know if it is preferred or not (I remember a similar discussion some longer time ago where somebody adviced adding it to mul instead of here, you adviced splitting it among two wikisources and somebody other agreed with placing it here entirely, with no clear preference of the community as a whole). Here I have just provided arguments why it would be a harm not to allow side by side translations in their entirety, and showed three examples which had been added in this way and nobody had objected against it. To sum up my opinion: if only translation is added–well done, if both language versions are added side by side–even better. Jan Kameníček (talk) 20:34, 28 February 2019 (UTC)

I agree that side-by-side works should be side-by-side somewhere. If they are out-of-scope at English wikisource, there should be somewhere else to put them. What happens to the principle of reproducing the original text as it was originally published, if you leave out the part of it that's not in English? Levana Taylor (talk) 22:53, 28 February 2019 (UTC)

In most cases, the English is transcribed here, and the other language text on its home WS. There is a tool developed that allows texts from two WS projects to appear side-by-side, or at least there used to be. I'm uncertain whether the tool has been maintained and still works. --EncycloPetey (talk) 22:25, 6 March 2019 (UTC)

If such a tool exists, I would be really interested. However, the biggest problems with transcluding content from other wikisources that I see are: 1) other wikisources have different formatting templates which are not compatible with en.wikisource environment, 2) some wikisources even do not have templates for some specific kinds of formatting that we are used to do here. For these reasons it seems easier to store everything here at en.ws (which does not mean that it cannot be stored at some other language ws as well). --Jan Kameníček (talk) 22:55, 6 March 2019 (UTC)

Some comprehension of a foreign language (or use of Google Translate or a translation dictionary) is necessary to discuss documents and do edits using a foreign language interface on the various language Wikisources. But I don't think comprehension of a foreign language is necessary for all editorial work in foreign languages -- and in some cases "knowing" the language might lead to "corrections" to the original printed text -- which we don't usually do on Wikisource without indicating it's an annotation.

What Wikisource editors should know before editing are the individual characters of the edited written language, not the necessarily the words. From my experience, typefaces such as Fraktur can be converted to UTF-8 Latin characters -- with some uncertainties (such as "J" and "I", which render the same, or "B" and "V", which often look very similar). I don't know the character sets of most non-Latin-based languages, so I don't edit those documents.

My point here is that many people who edit any language Wikisource in good faith -- in many languages -- are not polyglots. I greatly appreciate the assistance of people proficient in multiple languages. I am not myself proficient in multiple languages. I simply edit Latin characters as I see them.

That said, "discussion" and "interface pages" might be a problem on Wikisource for those of us who don't understand a foreign language well enough to be confident that we are understanding and writing appropriately. I can't speak for others, but for me I am hesitant to join discussions in foreign languages. -- Outlier59 (talk) 03:45, 1 March 2019 (UTC)

Adding copydumps to WS:CSD

Latest comment: 5 years ago28 comments9 people in discussion

There have been several discussions recently about copydumps, where texts are added to Wikisource with no proofreading and (usually) no source and (sometimes) no authorship or license information. Such texts often contain OCR errors that the contributing editor has no intention of fixing. Such texts frequently end up sitting in our system for years with no further action. You can see some really old ones at Category:Texts requiring OCR fixes.

It has been suggested that we add copydumps to our list of criteria for speedy deletion. Considering that these works are frequently nominated for deletion, and that all nominated copydumps have had full consensus for deletion, I propose modifying the Deletion Policy to allow speedy deletion of copydumped texts. —Beleg Tâl (talk) 13:51, 27 February 2019 (UTC)

Support. Can I support my own proposal? I support this proposal. I would stipulate, however, that to qualify as a copydump, the text must be in such bad shape that it is easier to delete and start over than to match-and-split. Copydumps from curated sources like Gutenberg, where a scanned source can be clearly identified to migrate the text to, should be migrated rather than speedied. —Beleg Tâl (talk) 13:52, 27 February 2019 (UTC)

Comment And what about postponed deletion? That means tagging such a text immediately as one that needs to be improved and the work will be deleted without further discussion if no improvement works start within 2–3 weeks after it was tagged. --Jan Kameníček (talk) 14:55, 27 February 2019 (UTC)

What do you mean by "improved"? People are always welcome to start scan-backed copies, but that is not a reason to retain a copydump. We don't have the manpower to tag and watch garbage. If someone printed out a text copy of the 1911 Encyclopædia Britannica and left it on a desk in a library, there would be no reason the library should be obligated to hold onto that copy and "improve" it for the use of their patrons. --EncycloPetey (talk) 15:35, 27 February 2019 (UTC)

I mean that the tag will provide information to the contributors who added the work that the contribution is not in accordance with our standards a will give them a chance to improve it towards our standards. In this way we may get a new contributor. Quick deletion usually means loss of the contributor. --Jan Kameníček (talk) 16:36, 27 February 2019 (UTC)

We have already template {{standardise}} and its redirect {{cleanup}}, which never results in any clean-up. --EncycloPetey (talk) 17:14, 27 February 2019 (UTC)

Support, but not the qualified version. "Migrating" to a scanned text is a significant investment of time, and in my experience it is easier to start fresh than to match-and-split. For example, Gutenberg copies are often modernized, and have spellings standardized to US English, even when the source uses British English. Finding and correcting these many differences takes longer than working from the text layer of a DjVu, and requires a good and experienced eye for long periods of time. In the interim we have an inauthentic copy; and the extra time and work is not worthwhile. --EncycloPetey (talk) 15:30, 27 February 2019 (UTC)

I am rather surprised at this, as I have migrated many such texts, and found it much much much easier and far less time consuming than proofing from OCR (especially since most of these texts have crappy text layers and crappy OCR results). Fixing the occasional misspelled word or misplaced phrase is much quicker than re-typing every other word. I just now migrated Fishin' Jimmy which I had copydumped from Gutenberg when I was a new user in 2014, and it was extremely easy. —Beleg Tâl (talk) 16:09, 27 February 2019 (UTC)

I validated The First Men in the Moon, and found it tedious to identify all the US spellings against the original British. It would have taken less than half the time to work from the text layer in the scan. --EncycloPetey (talk) 16:13, 27 February 2019 (UTC)

I guess it depends on the quality of the scan and on the quality of the text. —Beleg Tâl (talk) 16:19, 27 February 2019 (UTC)

That is one of the older ones; more than half of them have gone through Distributed Proofreaders, which doesn't change spellings. It's entirely possible that it was transcribed from an American edition; this is even possible for a DP edition. It wasn't until 2006 Google made PDFs available to download, so works were all scanned from copies available to random volunteers, which frequently were old American editions.--Prosfilaes (talk) 05:05, 28 February 2019 (UTC)

Support per EncycloPetey. I've completely given up on Gutenberg texts, and they are at least usually high quality for what they are. Stuff that has even more problems are not really worth the effort of fixing (vs. starting over from a scan). But since we're discussing speedy here, perhaps some stipulation along the lines of it being obvious the uploader is not planning to further improve the text to avoid new contributors getting their work deleted overnight by an overzealous speedy; which has happened more than once on enwp where the new page patrollers exhibit remarkable zeal(otry). enWS is probably a small enough a community that that won't be a problem in practice, but might as well codify it in case someone with +sysop feels hyper-efficient that day. --Xover (talk) 15:50, 27 February 2019 (UTC)

Oppose fix it do not delete. it is a wiki. the fact that newbies view us as a dumping ground for gutenburg ascii text, is not a deletion rationale. and it is bitey. how will we recruit those newbies to higher quality standards if the first response is delete. and a backlog of low quality text is not a deletion rationale. will you now go through and delete all non-scan backed texts? put on a maintenance category, and i will work it. Slowking4 ‽ SvG's revenge 16:08, 27 February 2019 (UTC)

I think Jan Kameníček's suggestion above is a reasonable compromise. We could have some kind of delayed-speedy process, whereby works that are not keepable in their current form are tagged with {{OCR-errors}} or {{no license}} or whatever relevant template, and if they are untouched by the uploader for a certain amount of time (a month would probably be best) they can be considered abandoned copydump and speedied. I guess the CSD category in that case could be called abandoned copydump. This would also cover my stipulation in my own support comment. —Beleg Tâl (talk) 16:15, 27 February 2019 (UTC)

enWP has something they call proposed deletion: you tag an article and if nobody objects in 7 days the article is deleted with no further process. Anyone can contest the PROD by simply removing the tag. It is, aiui, designed to be something midway between a speedy delete and a full on deletion discussion. It could work for this provided there is some consensus on what is eligible for this process. --Xover (talk) 16:32, 27 February 2019 (UTC)

importing process from english is a sign of failure. i can work a backlog, but not under a sword of damocles. my time is taken up by other events such as art+feminism, and i cannot be bothered to monitor an artificial deletion-clock. Slowking4 ‽ SvG's revenge 16:34, 27 February 2019 (UTC)

The point of PROD is that it leads to deletion only if nobody objects. Once a PROD tag has been removed—which anyone can do—it may not be re-added. Let's say we set the limit to one month: if someone thinks the text is worth saving they simply remove the tag. Admins processing PROD requests also have the option to decline the request on their own cognisance if they feel it is worth saving. This is in contrast with speedy deletion which leads to automatic deletion in a matter of hours (some times; depends on when an admin happens by to process it). --Xover (talk) 17:09, 27 February 2019 (UTC)

With our small community, the PROD approach does not make sense. we need a general solution that doesn't require a nomination and discussion for every single copydump. Creating yet another procedure would be counterproductive for the community. --EncycloPetey (talk) 05:13, 28 February 2019 (UTC)

Far be it from me to argue too assiduously for this, but I think there is confusion about what the "proposed deletion" process actually is. It's practical effects are this: anyone may propose a text for deletion if it meets certain criteria. These criteria can be less stringent than speedy criteria because it takes longer before deletion happens and it can more easily be challenged. To propose a text for deletion any editor may place a {{prod}} template on a text, including a brief reason in a parameter. The template displays some suitable boilerplate on the text, as well as place the text in a maintenance category ([[:Category:Proposed deletions from February 2019]] say). If anyone disagrees they simply remove the {{prod}} template. Once removed it may not be re-added, so if anyone wants to pursue it further it would have to be through the full deletion process. If nobody objects, admins will process the maintenance categories after a suitable amount of time (can be anything: a week, a month, ... whatever we decide is good) and delete texts that have been proposed and not challenged. Admins have latitude to decline the proposed deletion if they think it improper or that the text is worth saving. This is a very lightweight process, but it is midway between speedy and full deletion discussion. It's not so hard and fast as speedy, which means less risk of biting and the criteria can be laxer; but it doesn't require a full discussion and assessing of consensus. Speedy deletion is, of course, still the fastest and easiest process and preferable iff the category of text can be neatly fit into relatively stringent criteria. I suggested adopting PROD because the discussion here indicated that perhaps the category of text wasn't entirely clear cut, and there were concerns with using speedy. --Xover (talk) 05:38, 28 February 2019 (UTC)

@Xover: I suspect the confusion is because "proposed deletion" is what we call our existing process at WS:PD which is completely unlike and unrelated to what you are describing. —Beleg Tâl (talk) 11:52, 28 February 2019 (UTC)

@Slowking4:, while I appreciate your willingness to work a backlog, we still have a backlog dating back 10+ years of untouched copydumps, and the backlog increases much faster than any single willing editor can keep up with. —Beleg Tâl (talk) 11:56, 28 February 2019 (UTC)

all backlogs are small if the number of editors is larger. and it does not matter how many text dumps there are because it is not paper. the answer is recruit more editors by making it fun. deleting new low quality texts is not fun. i.e. you are creating a downward spiral by your gatekeeper behavior. we are increasing scan backed, and proofread pages. i do not see an increased proportion of low quality, rather, i see increased quality over time.[10] but you could show the statistics to support you fear based argument. you might well prefer we acted more like german wikisource, but we are bigger and growing faster. Slowking4 ‽ SvG's revenge 03:37, 1 March 2019 (UTC)

~~Oppose~~. I'm with Slowking. Our barriers to entry are already too high. Let's not make them higher. Hesperian 07:46, 28 February 2019 (UTC)

Oppose, per Slowking4 and Hesperian. --Zyephyrus (talk) 13:20, 28 February 2019 (UTC)

Question How will this effect WikiProject US Code? For an example, I just sort of did a copy dump at United States Code/Title 39/Chapter 10. I like the modification that Xover where we implement a w:WP:PROD-like system (not WS:PD, but like EN-Wikipedia's version). This gives a chance for a contributor to contest the deletion. That's just me, though.–MJL ‐Talk‐^☖ 19:57, 19 March 2019 (UTC)

@MJL: The text you linked to appears to be properly formatted and proofread, and is part of a larger project with clearly identified sources. Compare that to something like Journal of the Gypsy Lore Society/Volume III, which is the sort of text we are having trouble with. —Beleg Tâl (talk) 21:03, 19 March 2019 (UTC)

Beleg Tâl, I didn't read your proposal thoroughly enough. In opposing, I was thinking about text copied from PG. Slowking4 also refers to PG in their oppose rationale, so possibly they were thinking the same. I would support speedy deletion of unsourced uncorrected OCR copydumps only. Hesperian 00:15, 20 March 2019 (UTC)

@Beleg Tâl, @Hesperian: Ah... Yeah that would be something I'd want speedy deleted (especially under Hesperian's criteria if no one finds objections with it). Either way, I'm firmly going to say:

Support. –MJL ‐Talk‐^☖ 00:52, 20 March 2019 (UTC)