User talk:Tarmstro99/Archives/2009

From Wikisource
Jump to navigation Jump to search


TarmstroBot

Hi,

I've granted TarmstroBot bot status. Sorry it took so long, I completely forgot the request was still standing.—Zhaladshar (Talk) 21:40, 19 March 2009 (UTC)

Thanks, and not to worry. Tarmstro99 (talk) 01:06, 20 March 2009 (UTC)

WS and OA

Comment from Germany http://de.wikisource.org/w/index.php?title=Wikisource:Skriptorium&oldid=704342#Wikisource_als_Open-Access-Repositorium.3F --FrobenChristoph (talk) 13:18, 20 June 2009 (UTC)

Wikisource as a repository

Hi Tim, I picked up a news item from Open Access News about your recent conference talk on Wikisource and write a bit about it on my blog. [1] It's really heartening to see professionals participating in the Wikimedia projects and especially sharing these ideas back to their professional communities. cheers, Pfctdayelise (talk) 15:46, 20 June 2009 (UTC)

US Statutes at Large

Hi, could I ask for a little assistance/guidance on US Statutes at Large, please? I find the formatting challenges much more interesting than plain proofreading, so it helps to keep me motivated! Sorry if it's a bit long, but I have run out of ways to answer the questions on my own.

On Vol 33, I realised that I had been creating my pages in the root wiki space, rather than under a United_States_Statutes_at_Large/Volume_33/ type structure, so I've started to impose the better structure by moving the pages over. However, I'm a little concerned that I will be leaving behind a forest of orphaned redirects if I do so. Is this a problem, or can admins and/or bots delete orphan redirect pages?

Also, what is the received wisdom on directory structure for the US Statutes at Large? "1st session" "2nd session" etc. for the next level after Volume (as used in Volume 6 - the most developed comp I can find) will not accommodate Presidential Proclamations etc., will they, or other elements not tied to a specific session? Would it be better to use Public Acts, Private Acts, Concurrent Resolutions etc. as the next directory level down, if that is how the relevant Volume is constructed?

Equally, do we include page numbers when dealing with index pages, or is it better to put everything into a prettytable? Do we maintain the double column format for index pages? I think that we have to maintain double column for treaties where they are printed side by side in both operative languages, so is there a way of letting the OCR engine know that a page is double column?

Finally, when I attempt to look behind some of the verified pages in Vol 6 to copy the perfect layout in use there, I see a blank space rather than the underlying code, which I have only been able to get hold of by clumsy copy/pasting out of history pages via Word. Do I have old Javascript or do the noinclude tags blank this content to me?

Thanks CharlesSpencer (talk) 17:35, 15 October 2009 (UTC)

Thanks for the note, and sorry for my delay in replying; I have been out of town. To take your questions in order:
  1. On the issue of redirects, I will look through your contributions and see whether any more of them need to be moved under the “correct” directory structure. You can use the {{dated soft redirect}} template to mark a redirect that will be automatically deleted by a bot after a certain period of time. Don’t worry about this if it is too confusing, though; I am happy to review what you have done and take care of any additional moves or deletions myself.
  2. On the directory structure, take a look at United States Statutes at Large/Volume 1, which is the farthest along. In brief, the directory structure we use should mimic the structure of the source we are reproducing. For volumes of the Statutes at Large that include separate sections dedicated to Presidential proclamations, constitutional amendments, and so forth, it makes sense to include those sections as subdirectories under that particular volume. That basically means that there will be no single “standard” structure or table of contents that applies across all the various volumes of the Statutes at Large, but that’s just the way they are.
  3. Ideally, we could create an #ifeq construct in index pages so that if you’re viewing the scanned page, the link takes you to the corresponding scanned page, but if you’re viewing the index as it has been transcluded outside the Page namespace, the link would just take you to the corresponding transcluded text. But that entails some pretty heavy lifting. I’m indifferent as to whether we keep the two-column structure of the indexes, but if you want to, the {{colbegin}} and {{colend}} templates are the easiest way to go about it. We should keep the side-by-side appearance of the treaty texts; tables are probably the best way to do that.
  4. I guess I am not understanding the problem you are having with page text not showing up when you click the edit button. Any chance you could create a screenshot or describe the problem in more detail? I’ll do what I can to help. Tarmstro99 (talk) 18:25, 19 October 2009 (UTC)


Tarmstro99, thank you very much for your help. I'd really appreciate you taking a look at my directory structure - I think that it may well be the underlying problem to some apparently rogue redlinking in the USStatHeader line of some of these pages. BTW, is there a way to make the "CH." element of a USStatHeader read "CHS." where that's how the scanned page shows it (in Vol. 33, for instance). The style of earlier volumes such as Vol. 6 seems to be fixed at CH. even with multiple chapters on a single page. And (sorry to ask for formatting advice AGAIN!) but how would you recommend I do page numbering, say, here? I don't think I was right to use a running USStatHeader, which is plainly not present in the scanned text.
Here's a screen shot of my page text problem (cropped in the middle to reduce file size) - as you can see, the Page body box is entirely blank!
CharlesSpencer (talk) 17:20, 21 October 2009 (UTC)
OK, I will see about moving some of your pages under United States Statutes at Large/Volume 33, which presumably is where they belong. I’ll add dated soft redirects so the “old” page names will gradually disappear on their own.
Regarding the formatting question, what do you think of the way I’ve handled Page:United_States_Statutes_at_Large_Volume_33_Part_2.djvu/4? It seems more closely to match the scanned original, but if there is something you really don’t like about it, then it can be changed.
Not sure what to make of your blank Page Body edit box! Seems like a possible browser error. Would it be possible for you to try either (1) disabling JavaScript, or (2) viewing the page in a different browser such as Firefox? One or the other of those might fix it. Afraid that is about as far as my technological competence extends, unfortunately. Tarmstro99 (talk) 18:19, 22 October 2009 (UTC)
Perfect! Thank you very much indeed - and I'll give Firefox a go. CharlesSpencer (talk) 09:35, 23 October 2009 (UTC)

An FYI just in case...

Hello,

First, I must say the Stat. project looks so good it make me envious. Thanks for taking at crack at all those volumes.

Second, you mention in passing something about the legislative histories of the 1976 Copyright Act being kind of limited or scarce on your front page. I happen to know of a free source with just about everything BUT the actual Act itself (mis-labeled as the 1979 Act or something). All the related committee reports and bills from introduction to conference that were generated by both chambers prior to encactment do seem to come up just fine though.

Anyway, I can get to it via this URL but depending on what you're running and junk, you might need to start from the begining using the lower menu of collections found on this page (DOC Digital Legislative History Holdings I believe, the default), and just look for the Public Law 94-553 folder.

I figure if anybody can find it useful, it would be you. Prost. George Orwell III (talk) 21:24, 21 October 2009 (UTC)

Thanks for your note and for the kind words. That’s a great find! Still not complete, unfortunately, but much more so than anything we have online here, at least for now. The university library here has the complete legislative history of the ’76 Act, beginning with a set of studies commissioned by the Senate back in the 1950s; it runs to 16 volumes in total (and they are all entitled, maddeningly, “Copyright Law Revision,” which is going to make one giant disambiguation page when all is said and done). I have been thinking that it would be necessary to have all 16 volumes scanned, but your discovery will certainly eliminate the need for a good part of that work. Tarmstro99 (talk) 18:29, 22 October 2009 (UTC)
Glad to hear the find was of some worth to you. I also felt Public Law 82-593, enacting Title 35 (the codification of Patent law & related), was a nice find there too. Anyway, I've already made it a point to collect semi-related sites and docs for this topic long ago and if I come across something similar I'll pass it on again.
See you in the trenches and feel free to call if you need grunt work or something done; I'd be more than willing to pitch in if I can. George Orwell III (talk) 23:45, 22 October 2009 (UTC)

User:TarmstroBot

Hi, looking recent change through IRC from time to time, I see your bot doing something I'm doing too for some books about geology. I use a different work flow, correcting directly the djvu text layer, correcting at hand the whole books but maintaining two list of typical correction done, then use these lists for the next book and so on (not applicable to your work flow as the page already existss). Here, the lists User:Phe/Dict tess error and User:Phe/Dict tess typography, some entry in the first list can help you to improve your own dictionary of correction. Entry are by pair of line, first line the regex to match, second the replacement string, some of them are probably useless for you, too specific for geology, other are perhaps dangerous but some are probably useful. Note than space at start and at end are meaningful, for example the first entry is " t0 " --> " to ". I do it this way because I prefer to play safe with the regex and to miss some correction's opportunity rather to do from time to time a wrong correction, anyway it's up to you to decide if and how you can use these data sets. The second list is typography correction's oriented, but is more dangerous to use, for example (at bottom of User:Phe/Dict tess typography) is removing space after "(" and before ")" always right? Phe (talk) 12:17, 25 October 2009 (UTC)

Thanks for the note! I agree with you about playing it (relatively) safe by checking for wordspace boundaries on either side of the text to be corrected. From your error list, it looks like Tesseract is making some of the same mistakes for you as it is for me. I’ll certainly check some of the items on your list to see whether I should be looking for them, too. One unusual substitution I have seen Tesseract make lately is “1·” for lowercase “r”; it was only recently that I figured out how to include Unicode characters in the regex search pattern (that centered dot is 0x00B7). In any case, I appreciate the input! Tarmstro99 (talk) 12:46, 26 October 2009 (UTC)

I can't take it anymore....

First off, that external link to the collection of scanned Statues at Large rocked. Big thanks for that.

Now, I've reached the end of my rope with this goofball 'Acts of Congress' category, or whatever you'd call it, and am hoping to try this djvu thing and at least move the accumulated plain text stuff under something more along the lines of the Front Matter lists found in each volume. I tried gleaming bits and pieces on how and what to do, but other than copying what you and some of the others have done already, I didn't get very far with it. I'm not looking to go back to our roots but if I could just hit some of these key pieces of legislation from the late 60's on forward I might actually be able to walk some folks back to ERISA, the 68 Omnibus Crime bill, and the host of the others I'm sure you know about already that today's legislation is constantly citing & amending so that there is just a a chance there being one less jackazz out there gumming up the works for everybody else with "manufactured" facts (& in the worst case scenario, that azz would be me).

I'm no expert but I'm persistant & not affraid to learn so if you have any pointers on where to get better info or where I should start it would be greatly appreciated. Apologies for the intrusion on your time. George Orwell III (talk) 14:58, 14 November 2009 (UTC)

I definitely think you are on the right track with trying not to reinvent the wheel, but instead preserving (to the extend possible) the structure of the Statutes at Large for older federal legislation. The Statutes at Large already provides, in each volume’s table of contents, a chronological and canonical listing of all federal legislation enacted during a particular period. No reason at all why we couldn’t select a couple of pertinent volumes from the mid-’60s (lots of very conseqential legislation during the Kennedy & Johnson admins, should be easy to get a bunch of people interested) and give them the kind of treatment we are (slowly) giving to Index:United States Statutes at Large Volume 1.djvu. (My library has microfiche of the Statutes at Large through 1983, not that I’m looking to spend all that much more of my time in front of a fiche scanner.) I don’t really think that a category is the right tool here, because the number of documents is likely to overwhelm the category. On the other hand, more narrowly defined subject matter categories like “U.S. federal civil rights legislation (20th century)” might work. Tarmstro99 (talk) 02:14, 18 November 2009 (UTC)
Yeah but I need to draw upon recent sessions of Congress and the Public Laws enacted during those sessions just as much as I need to go farther back to instances where major legislation first came into existance as well. For the most part, I need to get away from the Wikisource Lists that are rooted in some bogus notion of "Acts of Congress" or whatever and just follow the Statues at Large front matter lists for Public Laws and so on (sort of like THIS). I'm talking the first 20 pages typically in each volume - can't overwhelm the category by creating just the indexes can we? George Orwell III (talk) 02:37, 18 November 2009 (UTC)

With Index:The Records of the Federal Convention of 1787 Volume 1.djvu and Vol. 3, if the alternate PDF files have a text layer, we could look to replace the existing djvu files with updates of them. We would just need to ensure that the files corresponded page for page through the document. If we reloaded at Commons, and then purged the files, then the text layers would be available within the Page: environment. Now we could either do this ourselves, or look to upload to archive.org and have then prepare the files for us as part of their normal automated processes.-- billinghurst (talk) 01:25, 31 December 2009 (UTC)

The versions of these files that are presently hosted at Commons were created by djvu’ing a set of TIFF scans that were made by the Library of Congress, so (as you already know) they do not include a text layer. I’ll do some poking around online and see whether I can locate an alternate version that includes a text layer. It’s fairly likely that one of the big digitization projects has already tackled this text (it’s not particularly obscure); just need to make sure their version matches the pagination of the 1911 original that is online here. Tarmstro99 (talk) 23:41, 31 December 2009 (UTC)