Can’t Find What You’re Looking For? You Need Identity-Centric AI

From IC Insider Basis Technology

By Bruce Lawhorn

How many analysts in a typical workday have thought to themselves: “I know what I’m looking for, but I just can’t find it?” Analysts spend far too much time searching for “dots,” the pieces of information that contain actionable intelligence. They spend so much time finding the dots, they have little time to connect those dots and make crucial intelligence decisions.

Why does “dot discovery” take so much time? “Dots” are the nuggets of information — a person, place, location, really any information of value — that are in your data. The many ways people refer to these dots creates a massive challenge for analysts:

  • Variety of Expression – Different words often mean the same thing. For example, a 5th generation Chinese Stealth fighter is the Chengdu J-20, but it is also referred to as the Mighty Dragon. Myanmar and Burma are commonly used to identify the same country.
  • Ambiguity of Expression – Many entities potentially have multiple identities, e.g. “Bill Clinton” can be identified as “William Clinton” or the “42nd President of the US.”
  • Language Barriers – Analysts often need to find identities in foreign languages. An English-speaking analyst won’t be able to search for and find the “dots” if the content is written in a foreign language like Chinese, Arabic, or Russian.

Variety and ambiguity of expression are daunting hurdles even before you begin to analyze content in multiple languages. And these challenges require that the analyst seek out the knowledge and expertise of others within their agency.

The Search for Subject Matter Expertise and Institutional Knowledge

One of the greatest challenges for analysts is obtaining the subject matter expertise and institutional knowledge needed to find specific information in content.

Let’s say an English-speaking analyst needs to find content regarding a terrorist organization and their related members. He would need to reach out to subject matter experts on the terrorist organization to obtain names of individuals of interest, social media accounts, aliases, phone numbers and relationships. Based on the information provided by these experts, the analyst then creates a 4,000-word query (or modifies a predecessor’s 4,000 word query) and enters it into the system. That query returns a few hundred thousand documents that contain the information the analyst needs. Then they begin the slow, pain-staking, and error prone process of finding actionable intelligence.

No analyst is omniscient, so analysts often reuse previous queries, consult with colleagues and experts, and slowly learn the folk lore and expertise required to obtain actionable intelligence. This laborious and slow process can be circumvented when the subject matter expertise and institutional knowledge of your group are captured in an AI system. And with that same AI system, you can also eliminate the challenges of variety, ambiguity, and language.

Less Digging, More Analysis with AI

AI and machine learning can dramatically improve the accuracy of your search by automating key parts of discovery, allowing analysts to focus on the task of connecting crucial pieces of information. Instead of the analyst trying to think of every possible variation of a name to search for, an AI algorithm can do that for them.

The concept of using AI and machine learning to automate intelligence tasks is not new to the U.S. Intelligence Community. In January 2019, the Director of National Intelligence (DNI) introduced the Augmenting Intelligence Using Machines (AIM) initiative. The AIM initiative encourages the Intelligence Community to adopt AIM technologies in order to close the gap between data collection and making intelligent decisions.

At Basis Technology, we deliver AI and machine learning that removes challenges of variety, ambiguity, and language. We also use these technologies to capture the subject matter expertise and institutional knowledge of your analysts. Our AI and machine learning capabilities are available through Rosette® Identity Resolver, a solution that extracts and disambiguates the things you care about in unstructured text, then transforms those mentions into meaningful “dots.”

Transforming Entities into Identities with AI

The search process begins with the AI transforming entities — mentions in text of people, places, organizations, and things of importance — into identities, which are specific people, places, organizations, and things.

Entities go through an AI-driven disambiguation process so that vague and ambiguous mentions are resolved to specific identities in a knowledge base. When an analyst enters a query, they search for a specific identity instead of entities that may or may not be related to their current search. We call this AI-driven search process “identity-centric querying.”

For example, the phrase “Bush walked next to NFO Lt. Ryan Phillips,” contains the entities “Bush” and “Ryan Phillips.” Identity Resolver transforms those entities into specific identities, “George W. Bush, 43 President of the United States” and “Lieutenant Ryan Phillips,” the naval flight officer who flew him onto the USS Abraham Lincoln. Analysts no longer have to dig through thousands of mentions of “Bush” but rather query “Show me mentions of George W. Bush,” and the AI will do the rest.

Return Content Based on Context

Identity Resolver returns content based on the context of the search. Let’s say an analyst is searching for Jhon Jairo Velásquez Vásquez, an infamous contract killer who worked for the Columbian Medellin Cartel. In content, he’ll rarely be referred to by his full name, but rather “Popeye,” his nickname. In searching for Jhon Jairo Velásquez Vásquez, the Identity Resolver knowledge base will know of his aliases and nicknames and include them in the search results.

There are many “Popeyes” in the world, including a cartoon character and Jeff Daube, a world-class arm wrestler. Identity Resolver uses contextual information around the mention of an entity to disambiguate the many possibilities of “Popeye” to the specific Identity. Identity-centric querying for Jhon Jairo Velásquez Vásquez would return content related specifically to Popeye, the contract killer, not the arm wrestler.

Eliminate Language Barriers

To overcome language barriers, many analyst organizations have turned to machine translation of their foreign-language content. Unfortunately, to navigate content, analysts mostly use proper nouns, which typically do not follow standard language rules. This is how the city of “Raqqa” becomes “tenderness” when translated from Arabic into English, and “Springfield,” when translated into another language, becomes “field of springs” or “field of metal coils.”

The loss of fidelity when translating content is dramatic, and it affects most proper nouns, the things analysts care the most about. However, Identity Resolver processes content in the native language, enabling English-speaking analysts to find identities in foreign-language content.

Automate the Capture of Subject Matter Expertise and Institutional Knowledge

Identity Resolver captures your organization’s expertise and tradecraft into a knowledge base. It then uses that information to authoritatively disambiguate all the various references into a single identity. With identity-centric querying, analysts don’t have to go through the long journey of gathering expertise and knowledge from others in the organization. And they don’t have to create queries with thousands of search terms and comb through massive volumes of false-positive content looking for the needle in the haystack.

Uncover the “Ghosts” in Your Data

The “unknown unknowns” are the most challenging part of the analytical process. How do you find what you didn’t know you were looking for? Identity Resolver uncovers previously unknown identities to the system, as it ingests content. We call these identities “ghosts.” Every time the system detects an entity in text, it checks if that entity has appeared before. If the entity is known, Identity Resolver correlates the mention of that entity to the corresponding identity record in the knowledge base. However, if the entity has not been previously processed, Identity Resolver automatically creates a new identity in the knowledge base and starts collecting information about it.

Now that we are uncovering the “unknown unknowns,” analysts can get easy answers to the very difficult questions of “Who are the associates of Identity-1? And show me all the content we have on those associates.

Let Identity-Centric AI Do the Heavy Lifting

The science of artificial intelligence has progressed to the point where solutions like Identity Resolver can significantly address the challenges analysts face due to variety, ambiguity, language, subject matter expertise, and institutional knowledge, removing that burden from the analyst. With AI delivering the AI-driven technologies required to implement identity-based queries, analysts can focus on connecting the dots rather than finding them.

About Basis Technology

Analyzing text—the hardest part of big data—is critical to verifying identity, understanding customers, anticipating world events, and uncovering crime. Companies such as Airbnb®, Luminoso®, Recorded Future®, Medallia, Société Générale, and agencies across the U.S. Intelligence Community, use Rosette to solve their toughest human language problems. For over 20 years, Basis Technology has been at the forefront of natural language processing applied to enterprise search, social listening, e-commerce, and e-discovery. Our cyber forensics team pioneers state-of-the-art, faster, and cost-effective techniques to extract digital evidence, keeping government and law enforcement ahead of exponential data volume growth.

For more information on how you can implement identity-centric querying with Rosette Identity Resolver, visit

About IC Insiders

IC Insiders is a special sponsored feature that provides deep-dive analysis, interviews with IC leaders, perspective from industry experts, and more. Learn how your company can become an IC Insider.