Wikisource:Scriptorium/Archives/2009-03/double pages in djvu

double pages in djvu

With the help of Help:DjVu files I now managed to create a djvu file from my png scans. But one issue remains: My scans were scans of double pages, always a left side and a right side on one png. So my scan of a 180 page book results in a djvu of 90 pages. Is there any convenient way to split the original pngs or the pages in the djvu so I will get a djvu with 180 pages? Does anybody know how to solve this problem? --Slomox (talk) 15:08, 18 February 2009 (UTC)[reply]

I do something like this all the time, but under Linux. Given files labeled 001.png through 999.png that are 3500 pixels across and 300 DPI:

 mkdir Output
 for i in `seq -w 1 999`
 do
     pngtopnm "$i".png > temp.pnm
     pnmcut -right 1750 temp.pnm > temp1.pnm
     cjb2 -dpi 300 temp1.pnm "$i"a.djvu
     pnmcut -left 1750 temp.pnm > temp1.pnm
     cjb2 -dpi 300 temp1.pnm "$i"b.djvu
     rm temp.pnm temp1.pnm
 done
 djvm -c book.djvu [0-9][0-9][0-9][ab].djvu

If they aren't even pages, half the width (1750, in this case) may not work, and you may want to cut a bit off the edges, too. If the scans aren't totally even, you may need to change that value part way through the book. Probably less than helpful, but that's how I do it.--Prosfilaes (talk) 16:47, 18 February 2009 (UTC)[reply]

The unpaper utility, which I generally try to use when cleaning up scanned pages, will optionally convert a single scanned image of two side-by-side pages into two separate output files (see the --input-pages and --output-pages options in the documentation). It locates the proper content for each page semi-intelligently by searching for margins consisting of mostly white space. I have been happy with its output so far. Tarmstro99 (talk) 17:15, 18 February 2009 (UTC)[reply]

Unpaper looks good, but I couldn't find a pre-compiled download. Although personally I like GUI programs most, I'm fine with command-line tools. But if I even have to compile the program, that's a bit too much for me ;-) Is there a pre-compiled Windows version available for unpaper? --Slomox (talk) 17:56, 18 February 2009 (UTC)[reply]

Google is your friend! :-) See http://www.abs.net/~donovan/pgdp.html. Tarmstro99 (talk) 18:26, 18 February 2009 (UTC)[reply]

Thank you. I still have one problem: If I provide a multi-page pbm as input, it will only handle the first page. Is there any special parameter I have to provide to handle all pages? --Slomox (talk) 20:44, 18 February 2009 (UTC)[reply]

I don’t believe so. The solution is to split document.pbm into doc0000.pbm, doc0001.pbm, doc0002.pbm, ... doc0099.pbm with pamsplit, then feed the resulting files into unpaper (which will accept an input parameter such as doc%04d.pbm to automatically start processing multiple files starting from doc0000.pbm). If you want to start processing at, say, page doc0004.pbm instead of counting from 0, just give unpaper the parameters -si 4 doc%04d.pbm. E-mail me if you have further problems with unpaper; I’ve used it for quite a few projects now, and the time spent mastering its idiosyncrasies is well worth it given the quality of its output. Tarmstro99 (talk) 21:02, 18 February 2009 (UTC)[reply]