Welcome to our third blog of a series uncovering the key components of Artificial Intelligence to provide greater understanding for business leaders who may currently have FOMO (Fear Of Missing Out) from the blizzard of acronyms and hype.
Here, we look at Computer Vision, one of the main applications of AI where computers can be made to gain high-level of understanding from digital images or videos.
Critically, Computer Vision is concerned with automatic extraction of data, enabling documents that have handwriting and random layouts to become machine-readable.
Huge data volumes
Computer Vision needs a lot of data to be able to distinguish and recognize images.
In a way, it looks like a jigsaw puzzle where you assemble all the scattered tiles to make an image. Neural networks for CV work on the same principle.
Yet the computer does not have the final image, but it is fed hundreds, if not thousands of related images that train it to recognize specific objects.
To identify a cat, the computer would not be shown individual elements such as ears, whiskers, tail etc, but millions of pictures of cats so that it can model the features of our feline friends.
CV is used for visual surveillance, medical image processing for patient diagnosis and navigation by autonomous vehicles.
But in Altilia’s development of Intelligent Document Processing (IDP), CV has several key roles to play.
With Optical Character Recognition (OCR) and Intelligent Character Recognition (ICR), we are able to convert scanned documents into machine-readable PDFs and with Handwritten Text Recognition (HTR) are incorporate items such as signatures.
The end goal of an IDP solution is to extract meaningful information that are “hidden” in unstructured texts and documents, so we need to first break words down in a way that a machine can understand.
This is especially relevant when the documents that need to be processed are (low quality) scans such as contracts, forms, invoices or ID cards.
We then need to apply OCR to recognize both printed and handwritten text, using smaller units called tokens. To each token is added metadata, which is useful later in a search engine.
In IDP, it is useful to distinguish a photo from text and to tag elements such as signatures, stamps and markings, saving human labor time by automating checks such as whether a contract is signed and marked.
Finally, we focus on document layout analysis so that unsorted documents can be classified and then we can apply different machine learning algorithms and branch out different ML pipelines.
These core capabilities allow Altilia’s solution to work as a general purpose platform, rather than a point solution for specific document types and formats. We have also developed a patented solution for document layout analysis.
For more information on how Altilia Intelligent Automation can help your organization, schedule a free demo here.