Analytics Automation through Natural Language Processing for Government
From IC Insider Alteryx, Inc.
Every agency in government, and at every level regardless of primary mission, has a procurement function. Good or bad, procurement is a fact of government operations. In the United States, the federal government spends more than $3.5 trillion and issues over 5.7 million procurement contracts annually. In order to improve transparency on where and how taxpayer money is spent, the Digital Accountability and Transparency Act of 2014, or “DATA Act”, requires the U.S. federal government to transform its spending information into open data.
The DATA Act objectives of increasing accountability and transparency are sound, but they also create some big and complex challenges. Many agencies are seeing their contract volumes increase while at the same time seeing their staffing levels decline. These factors make it difficult to reach a high level of data quality and, in many agencies, legacy process require large amounts of manual processing. Based on market research and discussions with prospects and customers in both public and private sector organizations, we estimate that the average organization loses approximately 7 hours per data worker in productivity due to manual legacy data processes.
Imagine that you are a federal procurement analyst tasked with drafting a contract. You gather inputs from multiple systems and stakeholders while maintaining compliance with various federal acquisition regulations. As you dig through emails and folders for information to complete multiple pages and hundreds of data fields, you suddenly find yourself trying to interpret vague, incorrect, and incomplete data, such as whether an address is up to date, an award description is meaningful, or contract dates are accurate. In some cases, you can perform reference checks with 3rd party websites. In other cases, you send emails, make phone calls and schedule meetings to confirm the data you need.
What if there was a way to wade through all this information, automate the review, and identify errors and incomplete data? Imagine a process that could automatically analyze the terms used to identify descriptions and find those that are unclear, confusing and non-descriptive. Better yet, what if there was an automated process that could proactively notify a user when there is incomplete or incorrect address and vendor information, or when unclear, non-compliant award descriptions are used?
Furthermore, what if the same process could utilize natural language processing (NLP) algorithms to perform automated analysis on the contract description fields to determine if the contract description is meaningful? Finally, what if these processes could be deployed and scheduled to perform automation tasks within the procurement systems or work in tandem with data entry processes performed by government employees?
We estimate that an approach like this could reduce the total contract creation and correction process time by over 866,000 labor-hours per year across all agencies. Additionally, imagine the impact of analytics automation on the morale of analysts and procurement specialists. Consider how this increase in analytics productivity would improve the quality of data and ability to audit the procurement award process across an agency.
ResonantLogic, an Alteryx partner, has developed a proof-of-concept to leverage analytics automation, specifically NLP to tackle the manual nature of procurement processes. While this process deals with the analysis in procurement data, it could be very easily deployed to tackle many other manual analytical based processes that require the review of data for omissions, lack of clarity or syntax-related issues.
The solution developed by ResonantLogic with the Alteryx platform utilizes the following methodology to enable the functionality shown in Figure 1.
Step 1: Connect
- The solution connects to procurement data in Federal Procurement Data System and USASpending.gov. Alteryx Designer was used to extract, transform, and load federal procurement data and Alteryx Server was used to schedule this activity to ensure the latest information is always available for users
Step 2: Verify and Validate
- The solution translates the DATA Act Information Model Schema (DAIMS) validation rules through an easy-to-use Alteryx Designer workflow. This allows users to build and test logical algorithms using a drag-and-drop user interface without the need to write complex code. And because Alteryx Designer makes managing data workflows easy, the users can simply update the workflows as conditions and validation rules change over time.
- The solution uses NLP to perform syntax analysis, context analysis, entity extraction, and classification analysis to interpret the meaningfulness of user-generated award descriptions.
Step 3: Correct
- The solution identifies errors and exceptions and intelligently suggests corrected data.
- The solution automatically performs reference checks and provides corrected values using 3rd party sources such as Dun & Bradstreet and US Postal Services (CASS).
- The solution uses a machine learning model that can be trained by users to determine whether a user-generated award description is meaningful. The system allows users to update a training library and view the results of analyzed award descriptions.
Step 4: Notify
- The solution creates a report that lists the contract awards, the data fields with errant values, and the recommended values.
- The solution can provide custom reports to identify defect trends that can be used to support user training and other continuous improvement activities.
For this effort, a NLP macro was developed that utilizes Google’s Natural Language API to analyze the award description. This provides the macro workflow with natural language processing features including sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax analysis. In addition to the information provided by the API, the macro contains a library of words that are commonly used by the federal government. With both the NLP and library data, the macro can score the award description. When the score exceeds a certain threshold, the award description is determined to be meaningful.
On Demand Award Description NLP Analysis: USAspending.gov is the official source for spending data for the U.S. government. Its mission is to show the American public what the federal government spends every year and how it spends the money. The website provides an API that contains information for individual awards, including the award description. The workflow below is run on Alteryx Server when an award number is entered. The workflow connects to the USAspending.gov API and requests the award description for the award number. The NLP macro is then utilized to score the award description for meaningfulness and the output is displayed.
The NLP macro described here is only one example of the power of analytics automation. Automation is the next evolution of the analytics value chain and will create significant benefit for all government organizations that are looking at stainable ways to tackle their largest data challenges. With the Alteryx platform and the innovation of our partners like ResonantLogic and others, government entities will benefit from unified platform experience that eases and automates a wide range of data tasks, analytics outcomes and business processes with hundreds of automation building blocks that enable millions of analytics, data science and process automation to accelerate mission outcomes for government.
Our next article will go deeper into analytics automation and explore how the Alteryx platform plays a central role in the scaling of digital transformation with its ability to democratize of data and analytics, automate business processes, and the upskill people to unleash the power of actionable insights.
ResonantLogic specializes in the planning, design, development, and implementation of enterprise IT solutions and services for our customers in the healthcare, education, insurance, and public sector industries. Its mission-driven, agile approach delivers software, analytics, and consulting solutions centered on our customer’s objectives. ResonantLogic was founded in 2012 with the focus of developing software and analytics solutions for the U.S. healthcare market.
About Alteryx, Inc.
Revolutionizing business through data science and analytics, Alteryx offers an end-to-end analytics platform that empowers data analysts and scientists alike to break data barriers, deliver insights, and experience the thrill of getting to the answer faster. Organizations all over the world rely on Alteryx daily to deliver actionable insights. For more information, visit www.alteryx.com.
Alteryx is a registered trademark of Alteryx, Inc.
About IC Insiders
IC Insiders is a special sponsored feature that provides deep-dive analysis, interviews with IC leaders, perspective from industry experts, and more. Learn how your company can become an IC Insider.