Would you like to extract text from a scanned image of a document? Or wish you didn’t have to retype text from a PDF document? GImageReader, the free OCR (optical character recognition) software helps you does that easily. it’s a GUI frontend to Google’s Tesseract OCR, perhaps one of the most accurate open source OCR engines and can be considered as an open-source alternative to the professional OCR ABBYY FineReader.
Main Features of GImageReader:
1. First download and install Tesseract OCR with English language data at here (current version 3, 1.8Mb).
2. Then download and install GImageReader (16Mb) from here. After installation run it. A configuration window will display.
3. In the configuration option, the field ‘Directory containing tesseract’ must be selected automatically, or enter the path C – Program Files – Tesseract-OCR (Windows 7).
4. In the field ‘Directory containing dictionaries’ enter the path C – Program Files – Tesseract-OCR – tessdata and apply the settings. [If problem, right click tessdata, select properties, copy, and paste the location path]
Now run the program select a scanned image or PDF document, select an area that you wan to extract text by dragging and click on ‘Recognize Selection’ button. That’s it.
Note: For spell checking, download spell checker dictionaries from OpenOffice and extract the files to: C – Program Files - gImageReader – Spelling Dictionaries (Windows 7). For best results, the resolution of source image should be between 200 dpi and 300 dpi for normal, 10-12 pt text.
Similar post: Google Doc as OCR
Main Features of GImageReader:
- It supports popular languages such as English, Spanish, French, German, Japanese, Italian, Korean, etc.
- Supports JPEG, PNG, GIF, TIFF images and PDF files
- You can directly acquire source image from digital scanners.
- Supports spell checking,
1. First download and install Tesseract OCR with English language data at here (current version 3, 1.8Mb).
2. Then download and install GImageReader (16Mb) from here. After installation run it. A configuration window will display.
3. In the configuration option, the field ‘Directory containing tesseract’ must be selected automatically, or enter the path C – Program Files – Tesseract-OCR (Windows 7).
4. In the field ‘Directory containing dictionaries’ enter the path C – Program Files – Tesseract-OCR – tessdata and apply the settings. [If problem, right click tessdata, select properties, copy, and paste the location path]
Now run the program select a scanned image or PDF document, select an area that you wan to extract text by dragging and click on ‘Recognize Selection’ button. That’s it.
Note: For spell checking, download spell checker dictionaries from OpenOffice and extract the files to: C – Program Files - gImageReader – Spelling Dictionaries (Windows 7). For best results, the resolution of source image should be between 200 dpi and 300 dpi for normal, 10-12 pt text.
Similar post: Google Doc as OCR
The specific versions of OpenOffice spell checking dictionaries that are required by gImageReader are no longer available from OpenOffice. HOW CAN I OBTAIN THEM SO I CAN STILL USE gImageReader???????
ReplyDelete