User:Mattwj2002/1911 Encyclopedia scripts

From Wikisource
Jump to navigation Jump to search

These are some scripts for working on the 1911 Encyclopedia project

[edit]

Here is my script for doing OCR. This OCR script is version 1.

#!/bin/bash
ddjvu -format=tiff eb1911-vol01-a-androphagi.djvu eb1911-vol01-a-androphagi.tif
tiffsplit eb1911-vol01-a-androphagi.tif eb1911
rm eb1911-vol01-a-androphagi.tif
sleep 2
let i=1
ls -1 *.tif | while read line; do echo $i; tesseract $line page$i -l eng; let i++; sleep 1; done
sleep 2
rm *.tif

Here is my script for doing OCR. This OCR script is version 2.

#!/bin/bash

ddjvu -format=tiff volume1.djvu eb1911-vol01-a-androphagi.tif
tiffsplit eb1911-vol01-a-androphagi.tif eb1911
rm eb1911-vol01-a-androphagi.tif
sleep 2

let i=1
ls -1 *.tif | while read line; do
echo $i
if [ $i -le 32 ]; then 
	tesseract $line page$i -l eng
else
	convert $line -crop 50%x100% +repage tmp%02d.tif
	tesseract tmp00.tif tmp00 -l eng
	tesseract tmp01.tif tmp01 -l eng
	cat tmp00.txt tmp01.txt > page$i.txt
fi
mv $line $i.tif
let i++
done

This is my script for taking tiff files and converting them to a djvu file.

#!/bin/bash
let i=1
ls -1 *.TIFF | while read line; do cjb2 $line $i.djvu; let i++; done

djvm -c volume1.djvu 1.djvu

for((i=2;i<=1029;i+=1)); do
echo $i
djvm -i volume1.djvu $i.djvu
done

Here is my script for crop images.

#!/bin/bash
ls -1 *.TIF | while read line; do convert +compress -crop 100%x99% -gravity South $line $line.TIFF; done

PNG files to PDF files

[edit]
#!/bin/bash
let i=1
ls -1 *.png | while read line; do convert +compress $line $i.pdf; echo $i; let i++; done
mv 1.pdf outputfile.pdf
let i=2
ls -1 *.pdf | while read line; do pdfjoin outputfile.pdf $i.pdf --outfile outputfile.pdf; echo $i; let i++; done