Word Docs (.DOC OR .DOCX)
These general-purpose formats are suitable for reading or if the document is to be further edited. SensusAccess has a special option for converting image-type documents in Arabic or Arabic/English bilingual into Word documents.
This general-purpose format is useful when editing in a wide variety of Word processors, and also a good format if the document is to be read on a Braille notetaker, where RTF is usually well-supported.
Tagged PDF is an accessible form of PDF capable of containing not only the visual contents of a document such as text and illustrations, but also the semantic structure of the document. When converting image-type documents into tagged PDF, any structural elements recognised in the process are stored in the resulting PDF document. Tagged PDF is suitable if the document is going to be read using assistive technology or read using the text-to-speech capabilities of Adobe Acrobat.
Many instances of the SensusAccess web form have two options for converting PDF and image-type documents into tagged PDF: “pdf – Tagged PDF (text over image)” and “pdf – Tagged PDF (image over text)”, both in the drop-down menu in the accessibility conversion options section.
Selecting the first option will cause PDF and image-type documents to be OCR processed and returned with the recognised text in a layer on top of the original image. Selecting the second option will cause PDF and image-type documents to be OCR processed and returned with the original image in a layer on top of the recognised text. The quality of the text recognition is identical in the two options.
In most cases, presenting the recognised text on top of the original image will result in much clearer text. However, logos and other graphical elements may appear blurred or even appear disfigured. Presenting the original image on top of the recognised text will retain all original graphical elements, but the visual presentation of the text will not be sharpened.
In cases where image-type documents contain statistics, timetables or other sorts of tabular information, users may find it easier to work with the document if it’s presented as a spreadsheet. SensusAccess has options to convert image-type documents into Excel in either XLS or XLSX format, or into a more generic spreadsheet format as a comma-separated file in CSV.
SensusAccess supports conversion of image-type documents into HTML if the user’s preferred reading platform is a web browser.
As plain text, only recognised text is included in the resulting document. All formatting as well as any illustrations and similar are stripped from the document. The format is especially useful for blind readers and in situations where the semantic structure of a document is to be recreated manually from scratch. SensusAccess has a special option for converting image-type documents in Arabic or Arabic/English bilingual into plain text files.