ℹ️ Pro Tip: The below sample can also be used with Image Documents we just need to use ImageDocument#getDocument() when creating the PdfProcessorTask. We simply need to load the document and then use PdfProcessor to apply a PdfProcessorTask configured to perform OCR. This is your typical scanned document: It’s a bit roughed up, mostly legible, and saved as a PDF. Let’s start by looking at the document we’ll be working with today. For a more in-depth look at OCR, check out our Optical Character Recognition in Scanned PDFs blog post. This means we can select, copy, highlight, and redact any part of the text. Then, we extract lines of text from those areas.įinally, we embed the textual information into the PDF using an invisible layer of text above the image.Īfter this process is complete, we can work with the new PDF the same way we would with any other document. So how does this work in the case of PSPDFKit? Here’s a simple outline of the steps needed:įirst, we need to identify areas of text in a PDF. It’s the process of taking a plain picture and extracting machine-readable text from it. OCR stands for Optical Character Recognition. ![]() Before getting started, let me give you a quick reminder of what OCR actually is so you know what will be happening. More specifically, we’ll be taking a scanned-in document, performing OCR to make the text machine readable, and then automatically redacting certain parts of it. We added support for OCR in PSPDFKit 6.5 for Android, and in this small how-to, I’ll walk you through how to use PSPDFKit for Android to perform OCR on scanned documents and then explain what can be done with the resulting files.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |