Business Results

  • Automate document routing and reduce manual labor costs

  • 96% reduction in training time

  • 60% faster inferencing time

  • 46% faster data preprocessing

  • 65% accuracy of prediction

author-image

By

View All Reference Kits

Background

Enterprises use intelligent document analysis (IDA) to examine documents (such as policies, contracts, and legal agreements) for specific terms, and then identify those documents that may pose a risk to the business. IDA can also identify a particular document (such as legal, finance, or marketing) so that it can be categorized and routed to an appropriate department.

Paper-based documents still account for 46% of all records, which represents substantial costs to public sector organizations. An average government agency receives and manually routes approximately 3.5 million documents annually. Manual routing takes seven to ten minutes per document to read the letter or document before routing it. This manual process is time-consuming and costly.

The majority of documents managed by intelligent document processing (IDP) solutions are structured or semi-structured, leaving a significant portion of unstructured documents unmanaged. AI can make automated processing and categorizing of documents—structured, semi-structured, and unstructured—more cost-effective.

Solution

Term frequency-inverse document frequency (TF-IDF) was used to measure and quantify the importance or relevance of string representations in the documents. A support vector classification (SVC) model was trained to categorize the documents. The publicly available dataset used in the training contained about 200K topic-related documents obtained from HuffPost*. Dataset text was cleaned using stop word removal, stemming, and tokenization. The supervised training model classifies the document based on the headline into 42 predetermined categories, such as entertainment or politics.

The data ingest and text processing was optimized using Intel® Distribution of Modin* and processed 46% faster than stock Modin. Training and inferencing of the SVC model were optimized using Intel® Extension for Scikit-learn*. The optimizations improved training time by 96% and inferencing time by 60%. Reviewing and sorting the documents had an accuracy of 65%. Intel Distribution of Modin and Intel Extension for Scikit-learn are part of Intel’s end-to-end AI software portfolio of tools and framework optimizations that are powered by oneAPI.

Technology

Optimized with Intel oneAPI for Better Performance
 

Benefits

Data scientists can build a better IDP solution to address the semi-structured and unstructured documents. The time saved in training and inference allows data scientists to put more AI models into production.

Government organizations can automate the processing and categorization of more incoming semi-structured and unstructured documents and realize cost savings.

Benefits include:

  • Less time needed to build the machine learning pipeline with an instruction set from data ingest to model development to deployment
  • Compute savings from faster data preprocessing, model training, and inferencing time using oneAPI optimizations from Intel
  • Optimized performance using your compute of choice (such as CPU, GPU, or FPGA) with oneAPI interoperability across hardware architectures

Download Kit

References

IDC Survey Spotlight: What Types of Documents Are Organizations Managing with Intelligent Document Processing (IDP) Solutions, April 2021 (Available by paid subscription only.)

News Category Dataset, Kaggle, Inc. Licensed under Creative Commons 1.0 Universal (CC0 1.0) Public Domain Dedication

 

 

Stay Up to Date on AI Workload Optimizations

Sign up to receive hand-curated technical articles, tutorials, developer tools, training opportunities, and more to help you accelerate and optimize your end-to-end AI and data science workflows.

Take a chance and subscribe. You can change your mind at any time.

By submitting this form, you are confirming you are an adult 18 years or older and you agree to share your personal information with Intel to use for this business request. Intel's web sites and communications are subject to our Privacy Notice and Terms of Use.
By submitting this form, you are confirming you are an adult 18 years or older and you agree to share your personal information with Intel to use for this business request. You also agree to subscribe to stay connected to the latest Intel technologies and industry trends by email and telephone. You may unsubscribe at any time. Intel's web sites and communications are subject to our Privacy Notice and Terms of Use.