Jump to content

Wikisource:Bot requests/Archives/2007

From Wikisource
Latest comment: 17 years ago by SQL in topic Done

Not done

Patrol edits by trusted users

It'd be very useful to have a bot that automatically patrols edits by trusted users. Currently only edits by administrators are autopatrolled, which means that bots like Pathosbot and trusted users like Pmsyyz leave red flags whenever they edit. The bot should ideally get the list of trusted users from a protected Wikisource page. —{admin} Pathoschild 02:02:21, 21 February 2007 (UTC)

I believe a bot is somewhat unsuitable for this kind of task. Instead, an autopatrol feature should be included in the MediaWiki software.
  • I check my watchlist and/or recent changes several times a day. The most recent items interest me most. To get rid of the red marks I'm most interested in to be rid of, a bot would have to be nearly realtime in autopatrolling trusted users.
  • Edits are patrolled by recent change ID, not by revision ID, so it's not possible to just inspect the user contributions of trusted users. Instead, the recent changes list must be loaded (with what limit? That page does not have a paging feature) and each entry inspected individually.
  • Classes/routines would have to be written for pywikipedia to represent a recent change, to parse the list of recent changes and to mark a recent change as patrolled. There might be other bot frameworks where some of that work is already done, however.
Here's how an autopatrol feature might be implemented in MediaWiki:
  1. There is a list of autopatrolled users. Whenever this list is edited, MediaWiki stores/updates the list in a suitable format in the database. The list might be a page in the MediaWiki namespace, e.g. Mediawiki:Autopatrolusers. Another option would be that bureaucrats can edit the list in their preferences.
  2. Whenever admins access their watchlists or the list of recent changes, trusted users' edits are automatically unflagged. Or, if it's not too expensive, the software could check after each edit if the editor is an autopatrol user and not flag the edit in the first place.
I just checked, there is a bugzilla entry with a similar feature request. We might want to vote for that (or, given the somewhat confused nature of that request, open a new one).--GrafZahl 13:15, 1 March 2007 (UTC)

Linking between subdomains

Is it possible to add links between different language subdomains housing separate versions of a work? For instance, Journey to the Interior of the Earth and fr:Voyage au centre de la Terre are linked via their Index/Table of Contents pages but not their respective chapters. The chapters progress in a predictable way (ie. 1 comes after 2 in both languages) and it would be really tedious to add all those links manually. Around the World in Eighty Days could also be helped in this fashion as well as many other translated works. --Metal.lunchbox 02:14, 19 April 2007 (UTC)

I think only the main page should be linked. This reduces maintenance if subpages are renamed (which could require updates to several different wikis for every subpage), without decreasing the usefulness of interlanguage links (which are still on the main page). Crosslinking subpages would be further complicated for works where division is unclear and different projects split pages in different places. —{admin} Pathoschild 08:10:48, 26 April 2007 (UTC)

Done

Task description

Several UN Security Council resolutions have been restored recently. They need to be moved to subpages of UN Security Council Resolution, by request of Jusjih.

After subsequent discussion, we decided on a new course.
  1. All page names will be of the form United Nations Security Council Resolution <number> (no subpages)
  2. The pages UN Security Council Resolution <number> will be proper redirects
  3. The pages UN Security Council Resolution/<number> will be made soft redirects
Zhaladshar (Talk) 17:00, 30 November 2006 (UTC)
I've updated the script.--GrafZahl 16:38, 1 December 2006 (UTC)

Execution schedule

Description: All in one batch, if possible
Script TalBot/un-res.py
Start 4 December 2006, ca. 9:00 UTC
Commit interval 20 seconds
Estimated execution time roughly 18 hours

Status: Done. Remaining issues: UN Security Council Resolution/181 needs to be deleted (we don't have UNSC resolution 181 on Wikisource), UN Security Council Resolution 1559 is a duplicate.--GrafZahl 10:14, 6 December 2006 (UTC)

Thank you very much! I've taken care of both of these pages now.—Zhaladshar (Talk) 13:58, 6 December 2006 (UTC)
    • I have merged the page history of UN Security Council Resolution 1559 and UN Security Council Resolution/1559 into United Nations Security Council Resolution 1559. Someone misidentified UN General Assembly Resolution 181 as UN Security Council Resolution 181, so deleting the redirect is no problem.--Jusjih 09:54, 7 December 2006 (UTC)

Task description

Execution schedule

Description: Several runs will be needed due to the large number of pages to be edited. However, the script can be interrupted and restarted any time.
Script TalBot/dickinson-uncat.sh with datafile TalBot/dickinson.tree.
Start 28 November 2006, ca 9:00 UTC
Commit interval 20 seconds
Estimated execution time ca. 20 hours in total.

Task description

All of these from Executive Order 12844 onward need to be moved to an article without the dash; namely, Executive Order 12844, for example. Then create soft redirects. --Spangineerwp (háblame) 02:38, 18 October 2006 (UTC)

Execution schedule

Description: All in one batch.
Script TalBot/clinton-xo.py
Start 9 November 2006, ca. 9:00 UTC
Commit interval 20 seconds
Estimated execution time ca. 6½ hours

The relative links in the headers of many Security Council resolutions have been broken recently during a page move (example: United Nations Security Council Resolution 2). They should be fixed. For example, a link such as [[../123]] should be changed to [[United Nations Security Council Resolution 123]].

--GrafZahl 11:39, 7 December 2006 (UTC)

That's right, but any internal links to other UN Security Council Resolutions also have to be fixed. Since the words "United Nations" are spelt out in lieu of "UN" for the Security Council Resolutions, General Assembly Resolutions should also spell out "United Nations" but no more "UN" in Wikilinks.--Jusjih 17:23, 7 December 2006 (UTC)
Assigned to Pathosbot. 1740 pages will be edited at 5 second intervals to bypass redirects and correct relative links. —{admin} Pathoschild 04:56, 6 January 2007 (UTC)
Done. 574 existing pages edited in 5h44. —{admin} Pathoschild 21:43, 7 January 2007 (UTC)

Standardize headers for documents in Wikisource:UN Security Council Resolutions

Combined with the one below, this will help bring the standardization to a state that's not so sorry for UNSC Resolutions.

I propose that the header for these documents be:

{{header
| title    = {{subst:PAGENAME}}
| section  = 
| author   = | override_author=the UN Security Council Resolution
| previous = [[Wikisource:UN Security Council Resolutions]]
| next     =
| notes    = 
}}

Zhaladshar (Talk) 22:07, 7 December 2006 (UTC)

Also, not all pages have headers, some merely have the backlink "<Wikisource:UN Security Council Resolutions".—Zhaladshar (Talk) 22:09, 7 December 2006 (UTC)
Assigned to Pathosbot. The following adjusted template will be used; notes will be copied.
{{header2
| title    = {{subst:PAGENAME}}
| section  = 
| author   = | override_author=by the United Nations
| previous = [[United Nations Security Council Resolutions]]
| next     =
| notes    = 
}}
{admin} Pathoschild 01:18, 8 January 2007 (UTC)
Done. 585 pages were edited in 04h10 at 5-second intervals. The bot seems to have missed a few backlinks, and several pages have obsolete header notes linking to the deleted UN Copyright template. I'll run the bot through the pages to remove backlinks and categorize pages with notes to Category:UN resolutions with notes pending human review. —{admin} Pathoschild 08:20, 11 January 2007 (UTC)
Done. 90 pages corrected in 01h05 at 1-second intervals (most of the time is spent checking pages). —{admin} Pathoschild 01:57, 12 January 2007 (UTC)

Soft redirect maintenance

Task description

Many of the soft redirects we have need to be deleted. We have thousands that are way behind schedule and are resulting in a very inaccurate page count right now. So here is what I'm requesting:

  1. For the Soft redirects of the months of June, July, August, and September, generate a list of all pages that link to those soft redirects.
  2. Preferably, have the bot correct those links, but the list of links that need to be corrected can be posted on WS and we editors can manually do it.
  3. Write a delete script to delete those old pages. We can give the bot temporary sysop priveleges through the normal means of adminship for this task.

Zhaladshar (Talk) 19:43, 2 December 2006 (UTC)

AFAIK Xenophon already has sysop privileges, so it might be quickest if I give the deletion script to Jude.--GrafZahl 12:39, 5 December 2006 (UTC)
I've written a bot script, but given the delicacy of the task, it should be discussed/audited by several users before being run (possibly by Xenophon instead of TalBot because of the sysop privileges). I'll make an announcement in the Scriptorium.--GrafZahl 11:39, 7 December 2006 (UTC)
I'm currently checking all the soft redirects of some month (I think June). I believe Pathoschild is currently writing a script that will be able to check for incoming links to soft redirects and then delete the ones that don't have any.—Zhaladshar (Talk) 15:13, 7 December 2006 (UTC)
That would be great because together with his script, we have a way to separate the sysop-required part of the task from the rest. For example, my script fixes all incoming links (no sysop privileges required), then his script will delete all redirects.--GrafZahl 16:10, 7 December 2006 (UTC)
Wait, so you have a script that can detect/correct links that link to soft redirects?—Zhaladshar [User talk:Zhaladshar|[(Talk)]] 17:49, 7 December 2006 (UTC)
Affirmative. As I have said, it is pretty much untested right now, but with some care and scrutiny it should work.--GrafZahl 09:07, 8 December 2006 (UTC)
Are you sure all the soft redirects are correctly tagged? I think there is a possibility that som pages that should be kept as ordinary redirects are marked as soft redirects. As an example, maybe To The River —— should be kept since it redirects to a page title (To The River ——) wich includes characters that are not easy to type into the search box. (Redirects are not bad, so I don't really see why you have this system with soft redirects. Diary of Samuel Pepys, April 1663 is not very useful, but there is no harm in keeping it either.) But if you feel that you absolutely need to delete redirects, I think the deletion log comment should include links to the targets of the redirects. /81.229.40.5 18:59, 7 December 2006 (UTC)
If I understand correctly, soft redirects are a convenience feature to keep unneeded redirects for a limited period instead of deleting them outright. A needed redirect should be an ordinary redirect, not a soft one, and should be changed if necessary. The link in the deletion log is a good idea, thanks! I'll incorporate that into the script.--GrafZahl 09:07, 8 December 2006 (UTC)

The script is able to automatically shortcut soft redirects. This raises the question of what links should not be corrected. I can think of

  • lists of soft redirects created by users to coordinate their own maintenance efforts. User:Pathoschild/Soft_redirects is an example. Comprehensive lists are easy to spot, but I might overlook the less comprehensive ones.
  • discussion archives (pages containing the string /Archive). Their integrity should be preserved. Possibly unarchived discussions as well.

Anything I missed?--GrafZahl 00:21, 20 December 2006 (UTC)

I think all links should be updated, regardless of where they are. Broken links are notoriously unhelpful. :)
Archives are useful for preserving past discussion; updating links helps do that by ensuring that a page relevant to a discussion can still be reached in two years.
In most cases, maintenance lists should be updated as well. Most such pages (such as Xenophon's list of pages with no header) are concerned with the content of the pages, not the page titles, and should be updated to preserve the usefulness of the list. A few (such as my table of soft redirects) are concerned with the page titles; those pages don't need to be updated, but there's no harm in updating them if exclusion requires extra effort.
I see no reason not to update links in discussion and on user pages, and the same benefits apply as anywhere else.
{admin} Pathoschild 06:08, 29 December 2006 (UTC)
I didn't see your reply until now, sorry. Must have been too eager to get the script going. What you write makes sense. When I checked the changes made by the current run, my idea was that in the future, I'll always go through the list of affected pages manually and decide if they should be corrected (feasible as long as the number of such pages stays low). As for the current month, I'll try to manually correct the pages left out. I'm still a little unsure about user pages, but the worst thing that can happen is that the change has to be reverted. Let's see what feedback I get (if any).--GrafZahl 19:44, 30 December 2006 (UTC)
I corrected the links manually except where the title mattered (your redirect list and some page move discussions).--GrafZahl 23:51, 30 December 2006 (UTC)

When correcting links on talk pages, I'd suggest piping them to maintain relevance; it can be useful to know what page was being discussed at the time. For example, see Pathosbot's edit to Wikisource:Bot requests; that discussion would not make much sense later if the links were not piped to maintain the same text. Another example (this one hypothetical) of discussion which wouldn't make much sense later is "I suggest combining Wikisource:Style guide and Wikisource:Style guide into Wikisource:Style guide." —{admin} Pathoschild 02:47, 31 December 2006 (UTC)

Good point. I'd go even further and pipe all links. Sometimes, users hyperlink text in the main namespace. Changing them might destroy text integrity. Granted, it isn't necessary in all cases, but usually not harmful either. Since it's difficult for a bot to decide what to do, I'd lean towards piping unpiped links by default.--GrafZahl 17:29, 31 December 2006 (UTC)
I'm not sure about the main namespace; piping inline links is necessary to maintain integrity, but piping links in a see-also section or in the header might lead to confusingly outdated link names ("see Wikisource:Title format"). Logging changes publicly would solve the problem, but that may add a great deal of complexity to the script; maybe it could be implemented as an extension usable by other scripts. If that's too difficult, I think piping all links would do. For example:
Time stamp Page Changes
2006-12-29 07:07 Wikisource:Bot requests [[Help:Author pages]]

[[Help:Text editors for Wikisource]]
[[Help:Images and other uploaded files]]

[[Wikisource:Style guide|Help:Author pages]]

[[Wikisource:Tools and scripts|Help:Text editors for Wikisource]]
[[m:Help:Images and other uploaded files|Help:Images and other uploaded files]]

2006-12-29 07:08 User talk:Kyle [[Help:Images and other uploaded files|Uploading Images and Files]]

[[Help:Text editors for Wikisource|Text Editors]]

[[m:Help:Images and other uploaded files|Uploading Images and Files]]

[[Wikisource:Tools and scripts|Text Editors]]

{admin} Pathoschild 20:22, 31 December 2006 (UTC)
I've updated and tested User:TalBot/rm-soft-redir.py. Now links are piped by default, exceptions can be specified on the command line as regular expressions. I can also publicly post how the links are changed before the script is actually run by extracting that information from the test logs. Before I tackle the July 2006 soft redirects, I'll write a script which detects double redirects and other anomalies to be removed manually before the main script is run.--GrafZahl 17:12, 5 January 2007 (UTC)

Execution schedule

General procedure
  1. Run User:TalBot/spot-double-redirects.py to detect double redirects and other anomalies (redirects with no target or with a target outside this wiki). These redirects (usually few in number) will be corrected/deleted manually.
  2. Run User:TalBot/rm-soft-redir-helper.py to detect hard redirects erroneously classified as soft and to obtain a list of soft redirects with their targets. This list is publicly posted. This list is later used in the decision whether, for each redirect, link correction should preserve the old text (e.g. replacing [[Link A]] with [[Link B|Link A]]) or not (e.g. replacing [[Link A]] with [[Link B]]). Existing pipes will not be altered.
  3. Run User:TalBot/rm-soft-redir.py in fake mode, i.e. don't alter the wiki but provide a list of pages which would receive link correction and a list of all text replacements that would be done. This information is also posted publicly.
  4. Decide which pages should be excluded from link correction and which links should be excluded from text preservation. Post this information and wait a few days, so people have the opportunity to speak up if they're unhappy with the decisions.
  5. Run User:TalBot/rm-soft-redir.py in real mode.
Description: The following pages will receive automatic link correction:

The following pages will not receive automatic link correction despite containing links to soft redirects:

Script TalBot/rm-soft-redir.py
Start 30 December 2006
Commit interval get/put/delete: 5 seconds/20 seconds/no limit
Estimated execution time unknown; actual edits will be few; most time is spent on loading pages and references
  • Status: Done. One redirect had to be deleted manually due to the same bug which forced a restart earlier (with a different redirect). This is a spurious bug which I was unable to reproduce. But it does no harm either, as long as it happens only rarely. I will manually correct some of the pages which were excluded from link correction. If you don't like that, please notify me or simply revert me.--GrafZahl 21:51, 30 December 2006 (UTC)
Information: List of redirects (talk)references (talk)changes (talk)exceptions (talk)
Scripts spot-double-redirects.py, rm-soft-redir-helper.py and rm-soft-redir.py
Start Stage 5: 15 January 2007, ca. 9:00 UTC
Commit interval get/put/delete: 5 seconds/20 seconds/no limit
Estimated execution time ca. 8 hours
Information: List of redirects (talk)references (talk)changes (talk)exceptions (talk)
Scripts spot-double-redirects.py, rm-soft-redir-helper.py and rm-soft-redir.py
Start Stage 5: 22 January 2007, ca. 9:00 UTC
Commit interval get/put/delete: 5 seconds/20 seconds/no limit
Estimated execution time ca. 3 hours
Information: List of redirects (talk)references (talk)changes (talk)exceptions (talk)
Start Stage 5: 29 January 2007, ca. 9:00 UTC
Commit interval get/put/delete: 5 seconds/20 seconds/no limit
Estimated execution time ca. 1 hour

Re-implement categorical author sorting

Pathosbot will re-implement category sorting of the 1180 author pages by reading sort keys defined in category tags or {{DEFAULTSORT}}. The bot will process pages differently depending on the conditions below. For the sake of simplification, the bot will simply categorize and skip nonstandard pages.

Condition met Add category Other actions
no sort keys or author template [[author pages needing human processing]]
nonstandard author template [[author pages needing standardization]]
conflicting sort keys [[author pages with conflicting sort keys]] add parameter with first sort key, do not remove conflicting sort keys.
standard with sort keys add parameter, remove sort keys.

{admin} Pathoschild 02:34, 16 January 2007 (UTC)

I've eliminated the categories in favour of [[category:author pages without defaultsort]], which the bot operates from. The bot now makes a pretty accurate guess at the defaultsort key when there's no sort key on the page, so the subcategories aren't needed for tracking so long as I review the edits. There are 695 pages left to process. (Note that this is being done very discontinuously, thus the long period of time before completion.) —{admin} Pathoschild 04:02:08, 18 February 2007 (UTC)
Done in 15 hours 34 minutes runtime. —{admin} Pathoschild 02:04:38, 28 February 2007 (UTC)

Move pages for catholic encyclopedia

I will need someone to move Catholic Encyclopedia (1913) pages from old , 1913 page names, you may use this script or find them here, thanks --Riccardo (better on it.wikipedia) 23:18, 26 February 2007 (UTC)

I'll do it. —{admin} Pathoschild 01:35:11, 04 March 2007 (UTC)
Done with WikilinkMoveTable. —{admin} Pathoschild 06:06:14, 04 March 2007 (UTC)

Task description

Change to . The Shadow knows! --Benn Newman (AMDG) 23:27, 18 February 2007 (UTC) [Task description edited to reduce image size--GrafZahl 16:19, 20 February 2007 (UTC)]

This will be done in several steps:

  1. Write a script to replace the image link in a given list of pages (should be easy because the make_search_replace_list() function from rm-soft-redir.py can be reused.
  2. Retrieve image usage information using Duesentrieb's CheckUsage.
  3. Launch a test run (first without actually changing the wiki, then changing a small number of pages).
  4. Launch real script. This last step should take roughly 7 hours.
  5. Recheck image usage a few days after the run (something might have slipped because of database lag).

--GrafZahl 16:19, 20 February 2007 (UTC)

Execution schedule

Script TalBot/replace-link.py
Resumes 6 March 2007, provided I've managed to fix the script until then.
Commit interval 20–40 seconds
Estimated execution time unknown
  • Status: Interrupted by an uncaught socket exception. Bug filed. I'll resume the task after stage three of this month's soft redirect maintenance. By then the toolserver should have caught up with the new image usage situation.--GrafZahl 13:31, 2 March 2007 (UTC)
    Couldn't you get usage from the local image page? —{admin} Pathoschild 19:49:46, 02 March 2007 (UTC)
You're right, of course. Thank you for the hint. I tried "What links here" (which doesn't work) and I forgot there's a list below the page content. Anyway, I don't have access to the pywikipedia machine until Monday, but there are less than 600 redirects to deal with this month, so I can run both tasks concurrently.--GrafZahl 00:05, 3 March 2007 (UTC)

Convert Emily Dickinson's poems to <poem>

Convert Emily Dickinson's poems to <poem>. --Benn Newman (AMDG) 22:33, 13 February 2007 (UTC)

Assigned to Pathosbot. This will be fun. —{admin} Pathoschild 09:18:32, 08 March 2007 (UTC)
Done. 1775 pages formatted in 13h34 runtime. —{admin} Pathoschild 06:48:47, 12 March 2007 (UTC)

Image:??%.png > .svg

Task description

A bot changing all Image:??%.png (Image:00%.png, Image:25%.png, Image:50%.png, Image:75%.png, Image:100%.png) from .png to .svg may be helpfull :) Lugusto 17:38, 6 March 2007 (UTC)

I'll spice up the script to handle multiple links at a time. Furthermore, make_search_replace_list() needs a small bugfix. Apart from that, it should be a standard job.--GrafZahl 21:26, 7 March 2007 (UTC)
The script has been updated and is ready to run. During testing, one edit was made accidentally to Author:Mohandas K. Gandhi (reverted). Sorry about that.--GrafZahl (talk) 19:57, 9 March 2007 (UTC)

Execution schedule

Script TalBot/replace-link.py
Start 13 March 2007
Commit interval 20 seconds
Estimated execution time ca. 6 hours

Standard chapter headings

Need PathosBot or someone to add the default Chapter headings to the following books.

Assigned The Princess and the Goblin, Jacob's Room, and The Mill on the Floss to Pathosbot. The bot cannot recognize the chapter order in Seize the Time, since subpages are named rather than numbered. —{admin} Pathoschild 06:55:49, 12 March 2007 (UTC)
Done (except Seize the Time); Pathosbot can't process unnumbered subpage titles. —{admin} Pathoschild 07:05:07, 20 March 2007 (UTC)

Standardize headers (Complete Encyclopaedia of Music)

I'm going to have Pathosbot standardize the headers in the Complete Encyclopaedia of Music and switch to the transitional {{header2}} standard. Since there's a huge number of pages involved, I'd like some feedback first.

current formatting:
{{header
| previous=
| next=[[/Title|Title Page]]
| title=Complete Encyclopaedia of Music
| section=<br>''Table of Contents''
| author=John W Moore
| notes=''1880 Edition''
}}

proposed formatting:
{{header2
 | title    = [[../../]]
 | author   = John Weeks Moore
 | section  = {{subst:SUBPAGENAME}}
 | previous = $<previous>
 | next     = $<next>
 | notes    = $<notes>
}}

The extra whitespace for the more legible column view is missing, but perhaps the biggest problem is the extra formatting in the section parameter. The newline tag (<br />) will break display when we complete the transition to the new header format. Instead of "Title (section)" as normal, it will display as:
"Title (
section)".

The title, author, and section parameters will be overwritten by the text above. The previous, next, and notes parameters will be carried over from the current format. The bot can perform other changes simultaneously, so this would be the best time to take a look at the work and see if anything else can be improved automatically. One possibility is to add all pages to a category for the work with the article title as sort key. —{admin} Pathoschild 06:59, 12 January 2007 (UTC)

Moved into unassigned for now; a lot of other things on to-do list before this one. :) —{admin} Pathoschild 09:03:18, 08 March 2007 (UTC)
(edit conflict) Inspecting some random subpages, it seems this task will not be as easy as indicated:
Suggested course of action:
  1. Convert all hard redirects to soft redirects.
  2. Obtain all equivalence classes of page title which differ only in capitalisation. Convert all but one of the titles in a class to soft redirects. Before doing that, the contents of the pages should be compared to each other. This may involve manual labour.
  3. Eventually, all links to redirects will self-correct via soft redirect maintenance. Then only the totally incorrect links remain. These can be corrected by obtaining a sorted list of subpages and doing a successor/predecessor comparison.
--GrafZahl 09:53, 8 March 2007 (UTC)
Pathosbot will convert headers to the new format and standardizing the title, author, and section. The link corrections are beyond its capabilities. —{admin} Pathoschild 22:22:33, 20 March 2007 (UTC)
Done, 1431 pages edited in 10h44. —{admin} Pathoschild 05:10:20, 23 March 2007 (UTC)

Implement new header

A bot will need to implement the new header format (see "Tweak standarised header", Scriptorium, October 2006, and User:Pathoschild/Sandbox6). This requires converting all {{header}} usage to {{header2}}. Pathosbot will eventually do this (although any other bot is welcome to it), but the to-do list is too extensive to remember everything. :)

The following heuristic regular expressions should correctly convert most instances, but it will require human review of every edit. This would be easier to do with a dedicated script; Pathosbot is a pure regex bot.

search
^header[^\|]*\|(?:title=(?<title1>[^\|]*)|author=(?<author1>[^\|]*)|section=(?<section1>[^\|]*)|previous=(?<previous1>[^\|]*)|next=(?<next1>[^\|]*)|notes=(?<notes1>[^\|]*))\|(?:title=(?<title2>[^\|]*)|author=(?<author2>[^\|]*)|section=(?<section2>[^\|]*)|previous=(?<previous2>[^\|]*)|next=(?<next2>[^\|]*)|notes=(?<notes2>[^\|]*))\|(?:title=(?<title3>[^\|]*)|author=(?<author3>[^\|]*)|section=(?<section3>[^\|]*)|previous=(?<previous3>[^\|]*)|next=(?<next3>[^\|]*)|notes=(?<notes3>[^\|]*))\|(?:title=(?<title4>[^\|]*)|author=(?<author4>[^\|]*)|section=(?<section4>[^\|]*)|previous=(?<previous4>[^\|]*)|next=(?<next4>[^\|]*)|notes=(?<notes4>[^\|]*))\|(?:title=(?<title5>[^\|]*)|author=(?<author5>[^\|]*)|section=(?<section5>[^\|]*)|previous=(?<previous5>[^\|]*)|next=(?<next5>[^\|]*)|notes=(?<notes5>[^\|]*))\|(?:title=(?<title6>[^\|]*)|author=(?<author6>[^\|]*)|section=(?<section6>[^\|]*)|previous=(?<previous6>[^\|]*)|next=(?<next6>[^\|]*)|notes=(?<notes6>[^\|]*))$
replace
header2
 | title    = ${title1}${title2}${title3}${title4}${title5}${title6}
 | author   = ${author1}${author2}${author3}${author4}${author5}${author6}
 | section  = ${section1}${section2}${section3}${section4}${section5}${section6}
 | previous = ${previous1}${previous2}${previous3}${previous4}${previous5}${previous6}
 | next     = ${next1}${next2}${next3}${next4}${next5}${next6}
 | notes    = ${notes1}${notes2}${notes3}${notes4}${notes5}${notes6}

{admin} Pathoschild 03:50:50, 05 March 2007 (UTC)

Human review of every edit? There are tens of thousands of instances of {{header}}, so whatever the method; it will be an enormous task. Here is what I could offer:
  1. Write a script which loads a page containing {{header}} a time and compares the title against a list of rejected pages. If the page is not rejected, do the regex driven replacement and print a diff using python's difference library. Then the user is asked to accept or reject the proposed changes. Rejected titles are appended to the rejected list, accepted changes are committed. The list of rejected titles is written to a file, so the script can be interrupted and resumed at any time without rejected titles being presented twice.
  2. Write a noninteractive script which instead of committing the changes writes them to a directory. This may cost several hundred MiB of disk space. Then a second script goes through all the files, presents the changes and marks them as accepted or rejected. A third script may then commit the changes. Again, interruptibility must be implemented. This has the advantage of less wait time between interactive prompts.
Is that what you had in mind?--GrafZahl 16:38, 5 March 2007 (UTC)
I use a similar method after the edit. I load all the diffs from Special:Contributions/Pathosbot in tabs, and for each one I quickly glance through to make sure the conversion is correct and tap the button combination to close the tab and view the next. The result requires very little time and effort per tab, although it will still take a long time. Ocasionally I need to edit pages to fix mistakes, but the bot can then check for that mistake and correct it automatically, steadily reducing the need for human editing. —{admin} Pathoschild 18:08:15, 05 March 2007 (UTC)
I see. As long as the server can cope with the increased number of edits, your method is probably better because the diffs shown by MediaWiki are more colourful and thus easier to check. But how do you open all the diffs in tabs quickly? Have you got a special browser plugin? JavaScript?--GrafZahl 09:27, 6 March 2007 (UTC)
I would think if you load the bot's contributions page, you could click on the "diff" link by each edit and have them all load in new tabs. I know in FireFox and IE, if you hold Ctrl (of course, this is on a Windows machine) and click a link, it opens the link in a new tab.—Zhaladshar (Talk) 16:07, 6 March 2007 (UTC)
Thanks. It doesn't work with Mozilla, but it does work with Iceweasel (and thus likely Firefox). Very useful and probably the simplest method. Pathoschild, if you don't mind, we can lighten your burden of manual review by splitting the work among interested editors. Like, one would post to this page e.g. "I'll check all of Pathosbots edits from point in time X through point in time Y" and post again when it's done. What do you think?--GrafZahl 21:26, 7 March 2007 (UTC)
Sorry for the slow response. Yes, that would be great. I put together a much simpler and more effective set of patterns based on a recent bot task assigned to Pathosbot; this should greatly decrease the possibility of error. —{admin} Pathoschild 07:22:41, 20 March 2007 (UTC)
I'm willing to help double check some of the edits, too, to break up the load a bit (just let me know when I can help out). We've got ~50,000 pages to check, so it's definitely a job for multiple people. And, since you wrote the scripts and are running the bot, it's only fair some other editors help out ;).—Zhaladshar (Talk) 20:19, 20 March 2007 (UTC)

Archived, re-opened an updated section to be merged in the archives. —{admin} Pathoschild 05:30:57, 23 March 2007 (UTC)

Task description

The endnotes at The Public Orations of Demosthenes/Endnotes need to be incorporated into the text of that work. The endnotes page is broken up by headers for each speech, and then contains notes like this for the sections with in the speech (indentation added).

Sec. 18. "the first hundred, &c". Demosthenes thinks...
"by lot". In this and other clauses...

These comments refer back to [n] markers within the referred to section of the speech (there are two [n]s in section 18 of this speec; sections in the speech are identified using the {{verse}} template. The endnote material needs to be placed inside <ref> tags at the location of the relevant [n]. The task is complicated somewhat by the fact that some endnotes headers are like this:

Sec. 10, 11. The argument is this...
"acknowledged foes". i.e. probably Thebes..

Robth 19:13, 3 December 2006 (UTC)

In this task, a certain level of ambiguity during parsing is to be expected. Therefore I'll write an interactive bot, prompting before committing changes. To simplify the automated part further, the bot will ask for user input when writing a footnote. Since endnotes are sequential, this should not be a great burden on the user.--GrafZahl (talk) 16:28, 13 March 2007 (UTC)
The script is available now: demosthenes-endnotes.py.--GrafZahl (talk) 12:51, 15 March 2007 (UTC)

Execution schedule

Script demosthenes-endnotes.py
Start I'll run the script whenever I've got time. The total work to be done is spread over 13 pages.
Commit interval interactive

Remove extra white space in Bush's executive orders

All the pages listed at Author:George W. Bush/Executive orders seem to have an extra blank line before {{header}}. --01:56, 18 February 2007 (UTC)

After visiting Special:Random a few more times, I found that the pages at Author:William Jefferson Clinton/Presidential Proclamations have this problem too. The pages listed at Author:Ronald Reagan/Executive orders have extra stuff before {{header}} too. --Benn Newman (AMDG)
I'll start assessing the situation tomorrow (i.e. find out in which pages everything before {{header}} can just be deleted) and alter the pages in a few days from now.--GrafZahl (talk) 15:35, 20 June 2007 (UTC)
Running xo_pp_check.py revealed that all extra stuff before {{header}} is whitespace or the page title. A script to delete this extra stuff is in preparation.--GrafZahl (talk) 15:07, 21 June 2007 (UTC)
script used TalBot/xo_pp_fix.py
progress done

--GrafZahl (talk) 09:11, 27 June 2007 (UTC)

Perform routine {{header}} checks

A while back, Jude's bot Xenophon did a massive check of all the pages in WS's main article namespace and compiled a list of the one's which didn't have the header template. Of course, the current list he has is quite outdated (hasn't been actively checked or used--to my knowledge--by anyone here) and many pages have been added which do not have the header template.

If we could get a bot that routinely (like, twice a year or so) would run through WS's main namespace and compile a list of which pages do not have the header template, that would really help with standardizing our pages.—Zhaladshar (Talk) 18:43, 17 March 2007 (UTC)

It should be possible to maintain a small list of header generating templates (are there others besides {{header}} and {{header2}}?), get their references and subtract them from the list of main namespace pages. That way it would not be necessary to download nearly the whole wiki.--GrafZahl (talk) 21:16, 19 March 2007 (UTC)
There are many header templates, but they should all use either {{header}} or {{header2}}. {{header2}} is transitional and will eventually replace and become {{header}}. —{admin} Pathoschild 02:55:14, 20 March 2007 (UTC)
I've written a script header_check.py to detect all pages from the Main, Author, Help, Page, Portal and Wikisource namespaces which do neither transclude one of the {{author}}, {{header}}, {{header2}} or {{process header}} templates, nor belong to Category:Soft redirects or Category:Protected deleted pages, and posted a result of the first run to User:TalBot/Missing headers. Since the script does not actually load the listed pages, its runtime is just over seven minutes on TalBot's host, so adding improvements and testing is uncomplicated. This also means it can be run much more often than only twice a year. Of course, any improvement suggestions are very welcome.--GrafZahl (talk) 14:28, 28 June 2007 (UTC)
I've updated User:TalBot/Missing headers. Pages from the Page: namespace were removed as the new index system makes headers superfluous. {{EB1911}} was added to the list of header-generating templates. I'll create an entry on /Persistent tasks for this script.--GrafZahl (talk) 09:21, 18 July 2007 (UTC)

Shakespeare's sonnets

Right now, Shakespeare's sonnets are in subpage notation. However, we have many other sonnets (like "Sonnet 1 (Author)" or Sonnet 13 (Author)") for other poets which aren't in subpage notation. I can only think that the reason Shakespeare's are is because his poems are commonly collected together. However, his sonnets, like those of other poets, are standalone works and should have standard notation to reflect it.

If a bot could change the titles from "The Sonnets/#" to "Sonnet # (Shakespeare)" that would be great and would save a lot of time and manual page moves.—Zhaladshar (Talk) 20:11, 14 April 2007 (UTC)

Done. You can move many pages in just a few minutes using my WikilinkMoveTable script with a tab-based browser (which is what I used). Pathosbot converted the 154 resulting redirects to soft redirects in 18 minutes. —{admin} Pathoschild 05:58:33, 11 August 2007 (UTC)

Implement new header format, standardisation and normalisation

Description

Summary of tasks
  • standardisation
  • normalisation
    • normalise spacing and arrangement of header parameters;
    • replace double-hyphens with em-dashes or en-dashes;
    • fix multiline paragraphs;
    • convert HTML and URL entities to literal characters;
    • unsubstitute {{wikipediaref}} and normalize spacing;
    • normalize heading spacing.
  • maintenance
    • detect main pages with no license templates or with only a non-US-applicable template.
Statistics
  • 53647 pages targeted, of which 37753 use {{header}} and 15894 use {{header2}}.
  • error rate: 3/2311 (0.13%)
Discussions

Following previous discussion, Pathosbot will convert all pages using the former {{header}} standard to the transitional {{header2}} format, then switch pages back to {{header}} once the new format has been merged into the old name. The first run will also include various maintenance and normalisation changes on all pages using {{header}} (37,753) or {{header2}} (15,894), although most {{header2}} pages should be standard already.

The bot uses heuristic regular expressions; I expect a low error rate due to constant refinement, the number of pages involved means that many pages may not be parsed correctly. I'm looking for interested users to help review the bot's edits during the initial transition. I will post short lists of edits to the bot's user space for each interested user, spreading them out so nobody is whelmed with a huge list at any one time.

A typical edit will look like this; since the changes are usually all in one place, most pages can be reviewed in under a second once one is used to it. You can report any errors or suggest improvements in this discussion, and I will make the necessary adjustments to the code. Please state whether you'd like to participate throughout the whole transition, for a limited time, or review a specific number of pages. If you get tired of it later, you can of course notify me and stop reviewing.

If you're interested, please comment here. I will review all remaining edits, so the more volunteers the better. ;) The following users have volunteered so far:

{admin} Pathoschild 04:02:40, 25 March 2007 (UTC)

Currently running a semi-automated test run. —{admin} Pathoschild 04:02:40, 25 March 2007 (UTC)
I'm extending the test run to reach pages beyond the 1911 Encyclopædia Britannica, which only tests some minor formatting fixes. The error rate so far is 0.656% (3/457). —{admin} Pathoschild 06:25:48, 26 March 2007 (UTC)
Aborted. Pathosbot crashed while saving the settings, corrupting the file and losing all improvements to the patterns made since 20 March 2007. I'm putting this aside until I feel like starting over from there. If another bot operator would like to take over the task, I'll share the patterns used; note that it requires a bot framework that supports multiple consecutive pattern execution, such as the AutoWikiBrowser. —{admin} Pathoschild 06:29:59, 14 April 2007 (UTC)
I was able to restore the settings file (and made an extra backup). Continuing, still in the 1911 Encyclopædia Britannica range. —{admin} Pathoschild 15:31:19, 14 April 2007 (UTC)

Discussion

Pathoschild: I'm willing to help with the hand-checking.—Zhaladshar (Talk) 18:43, 24 March 2007 (UTC)

Thanks; I'll leave a message on your talk page when the first batch is ready. :) —{admin} Pathoschild 02:45:31, 25 March 2007 (UTC)
Hello Zhaladshar. You volunteered to help with the header bot run, but I expanded the scope after you did so. Since it will be affecting most pages on Wikisource, I added various standardization and normalization tasks Pathosbot commonly performs. This will make reviewing some pages more difficult (typical changes: nearly standard, most common, less common) although the error rate for the extra tasks should be virtually zero. Are you still interested in reviewing the edits? —{admin} Pathoschild 04:09:52, 25 March 2007 (UTC)
The new things that Pathosbot is doing is fine; I'm still willing to help.—[[User:Zhaladshar|Zhaladshar]] (Talk) 17:08, 25 March 2007 (UTC)
Thanks. :) —{admin} Pathoschild 23:31:58, 25 March 2007 (UTC)

Archived. This is way out of date; Pathosbot has been offline for months, and is still some time away from being capable of this task again. —{admin} Pathoschild 06:03:54, 11 August 2007 (UTC)

Move image pages for The New Student's Reference Work to the Page: namespace

Task description

The image pages for The New Student's Reference Work, starting with The New Student's Reference Work/1-0001, should be moved to the Page: namespace.--GrafZahl (talk) 15:05, 28 June 2007 (UTC)

Procedure:
  1. Pages will be moved to the Page: namespace.
  2. Redirects will be converted to soft redirects.
  3. Navigation template will be removed.
  4. Index page will be created (manually).
--GrafZahl (talk) 08:56, 21 August 2007 (UTC)

Execution schedule

Script TalBot/nsrw-move.py
Start 22 August 2007
Commit interval 20 seconds
  • Status: Crashed. First, the infamous socket.error struck, see [1]. Now, after updating to rev 4096 it is no longer possible to alter redirect pages. Furthermore, the throttle does not appear to be working as of this revision. I'll file bug reports and retry tomorrow.--GrafZahl (talk) 15:03, 23 August 2007 (UTC)
This error was raised because the wikimedia servers were temporarily down today. the bug report in your link suggests to catch the exception in pywikipedia, but you can also catch it in your code. ThomasV 18:47, 23 August 2007 (UTC)
Certainly, but it's somewhat hard for the user to find out just what kinds of exceptions there might be raised. Anyway, the pywikipedia developers have already fixed the bug and everything seems to be running fine again. Sorry if I got a bit shirty yesterday.--GrafZahl (talk) 08:19, 24 August 2007 (UTC)
OK, forget what I wrote in my last post. All these exceptions are derived from Exception and you can catch just that if you're indifferent anyway. While this practice might be slightly dangerous in general, it's certainly practical in a bot script. Sorry for my previous failure to understand.--GrafZahl (talk) 08:30, 24 August 2007 (UTC)

Task description

This request is split from an earlier request completed by Pathosbot, who can't perform the extra maintenance mentioned. —{admin} Pathoschild 23:48:05, 23 March 2007 (UTC)

There's serious mayhem going on with previous and next. For examples, there are all caps hard redirects, such as Complete Encyclopaedia of Music/A/ACATHIST JS. These may point to all lowercase titles, such as Complete Encyclopaedia of Music/A/acathist js. The correct page, however, is Complete Encyclopaedia of Music/A/Acathist js. There are also wrong pages with title capitalisation (e.g. Complete Encyclopaedia of Music/A/A Ballata) or correct pages with a completely wrong link (e.g. Complete Encyclopaedia of Music/A/Acolythia).

Suggested course of action:

  1. Convert all hard redirects to soft redirects.
  2. Obtain all equivalence classes of page title which differ only in capitalisation. Convert all but one of the titles in a class to soft redirects. Before doing that, the contents of the pages should be compared to each other. This may involve manual labour.
  3. Eventually, all links to redirects will self-correct via soft redirect maintenance. Then only the totally incorrect links remain. These can be corrected by obtaining a sorted list of subpages and doing a successor/predecessor comparison.

--GrafZahl 09:53, 8 March 2007 (UTC)

Execution schedule

Step 1: convert hard redirects to soft redirects
Script TalBot/ceom_hard2soft.py
Start 18 July 2007
Commit interval 20 seconds
Step 2: handle equivalence classes
Script TalBot/ceom_equivalence.py
Start 9 August 2007
Commit interval interactive
This step will have to wait until all soft redirects generated in the previous steps have been processed by soft redirect maintenance.

Convert Talk:Author: redirects to soft redirects

These pages were probably created with the introduction of the Author: namespace. They all appear to be redirects. Since these pages are not needed any more, I suggest to convert them all to soft redirects.--GrafZahl (talk) 20:49, 26 August 2007 (UTC)

To clarify, it is these pages that need to be fixed: Special:Prefixindex/Talk:Author: ; they need to be converted to {{subst:dated soft redirect|[[target]]}} John Vandenberg 12:41, 13 November 2007 (UTC)

My bot has completed this task. SQLQuery me! 04:51, 14 November 2007 (UTC)