User:Mattwj2002/1911 Encyclopedia scripts
Jump to navigation
Jump to search
These are some scripts for working on the 1911 Encyclopedia project
[edit]Here is my script for doing OCR. This OCR script is version 1.
#!/bin/bash ddjvu -format=tiff eb1911-vol01-a-androphagi.djvu eb1911-vol01-a-androphagi.tif tiffsplit eb1911-vol01-a-androphagi.tif eb1911 rm eb1911-vol01-a-androphagi.tif sleep 2 let i=1 ls -1 *.tif | while read line; do echo $i; tesseract $line page$i -l eng; let i++; sleep 1; done sleep 2 rm *.tif
Here is my script for doing OCR. This OCR script is version 2.
#!/bin/bash ddjvu -format=tiff volume1.djvu eb1911-vol01-a-androphagi.tif tiffsplit eb1911-vol01-a-androphagi.tif eb1911 rm eb1911-vol01-a-androphagi.tif sleep 2 let i=1 ls -1 *.tif | while read line; do echo $i if [ $i -le 32 ]; then tesseract $line page$i -l eng else convert $line -crop 50%x100% +repage tmp%02d.tif tesseract tmp00.tif tmp00 -l eng tesseract tmp01.tif tmp01 -l eng cat tmp00.txt tmp01.txt > page$i.txt fi mv $line $i.tif let i++ done
This is my script for taking tiff files and converting them to a djvu file.
#!/bin/bash let i=1 ls -1 *.TIFF | while read line; do cjb2 $line $i.djvu; let i++; done djvm -c volume1.djvu 1.djvu for((i=2;i<=1029;i+=1)); do echo $i djvm -i volume1.djvu $i.djvu done
Here is my script for crop images.
#!/bin/bash ls -1 *.TIF | while read line; do convert +compress -crop 100%x99% -gravity South $line $line.TIFF; done
PNG files to PDF files
[edit]#!/bin/bash let i=1 ls -1 *.png | while read line; do convert +compress $line $i.pdf; echo $i; let i++; done mv 1.pdf outputfile.pdf let i=2 ls -1 *.pdf | while read line; do pdfjoin outputfile.pdf $i.pdf --outfile outputfile.pdf; echo $i; let i++; done