🧱 Document Structures#
Document structures are also important to consider when building a document understanding system. The structure of a document can be defined as the way in which the information is organized and presented.
Depending on document structured and complexity, different approaches and techniques may be required to extract information from documents.
In our context, there are three main types of document structures:
Structured#
Fixed page format
Identifies where and what to enter
Areas for data entry are clearly defined and labeled (e.g. textbox, checkbox, etc.)
Fields have one-to-one mapping with values (e.g. Account Number)
Tax forms
Identification cards
Application forms
Semi-structured#
No fixed page format
Information is usually grouped in a logical manner
Invoices
Receipts
Purchase orders
Unstructured#
Little to no organization
Continous, verbose, text-heavy content
Information is can be communicated in sentence or paragraph
Complex for non-subject matter experts to read and understand
Contracts
Legal documents
Medical records