Artificial Intelligence and Data Analytics in Fraud and Corruption Investigations

When a legal team needs to find the facts behind fraud and corruption allegations in a government investigation, technology can drive substantial new efficiencies. By filtering and evaluating vast amounts of information, artificial intelligence (AI) can:

effectively sort text messages, audio files, emails and other unstructured data into manageable groups; identify potential relationships between parties accused of fraud or corruption; and
recognise patterns of frequency or timing, which may support a client’s defence.

Technology-assisted data analysis can provide the diligence and reliable quality control needed to provide the government with conclusions that can be trusted.

This update explains how the process of gathering, sorting and evaluating enormous volumes of data has changed, and why skilled human intelligence is likely to remain a required component of an accurate analysis.

Government investigations – first steps

When conducting investigations, the goal is to determine the facts. In most cases, a vague allegation will have been made – usually through an internal whistleblower hotline or a subpoena from the government. Most often, the legal team conducting the investigation has:

a list of documents that the government entity is requesting; a complaint that somebody has submitted to a hotline; an anonymous email that reports an allegation; or a news article.

As such, legal teams will often have a certain amount of information which will give an approximation of the situation. The goal then is to determine whether there is an issue, and whether it is the same issue that has been identified. If it is a government investigation, the following questions must be asked:

What is the government looking at?
What is the strength of the evidence?
What are the legal or regulatory defences that can be used to advocate?
What is the client’s exposure?

How can the firm explain this to in-house general counsel, the chief compliance officer, the board, the board’s audit committee and outside auditors to give them an assessment of the risk?

If the government is requesting a meeting, the legal team must be able to clearly demonstrate:

how it conducted its investigation; why this investigation can be relied on; that it found all relevant facts; and that based on the information available, it has taken sufficient steps to rule out issues (where necessary).

The government will be unwilling to simply trust the legal team, so it is paramount to demonstrate that the team has looked at the whole picture.

There are other constituencies that drive investigations, especially for big public companies. For example, are they trying to get a line of credit? Are they looking at a possible merger where someone may ask them, as part of due diligence, whether they have any issues? If they do, the legal team must assess the issues, what steps have been taken to resolve them and how much confidence it has in the results.

Obtained data – structured and unstructured

Companies keep data in structured and unstructured formats. Structured data is essentially kept in an accounting or enterprise resource planning (ERP) system, such as SAP or Oracle. The data housed there is a record of all of the transactions that have been undertaken, and the legal team will work with a forensic accounting firm to define a set of data analytic tests that can be run.

Those tests can use a variety of different parameters to show basic fraud or corruption criteria – in particular:

Are there round-number transactions?
Are there sequential invoices to the same vendor (eg, is the vendor receiving sequential invoices from one customer when it should ostensibly have many customers)?
Is there a mismatch in the location of the work and the actual route of payment (eg, is work being conducted in Colombia, but the payment going through France)?

The legal team should run those parameters across structured data and compile the transactions that can be tested by taking what has been journaled in the accounting system and looking at the underlying documents. If a contract exists, does it description match the payment description in the system? If there are deliverables under the contract, are they general and vague or measurable and specific? Can the legal team determine that the transaction has actually taken place?

The scope of those data analytics can be narrowed if there is a specific question. For example, if a consultant is allegedly paying bribes to government employees, the legal team can look at that consultant, the contract, the signatories on the contract and the description of work under the contract. The team will be looking for evidence of the work and the payment terms in order to determine whether they commensurate with the value being delivered. The question then becomes whether this is fair market value. The goal is to test the genuineness of the contractor business arrangement.

Imagine a situation where a client has paid a significant amount of money to hire a well-known lawyer from another firm. However, that lawyer has limited or no expertise in the relevant area. How can this be explained? Although there may be a legitimate explanation, such a situation should always raise a red flag. While the investigation may not provide a clear answer, it can help to highlight what to look out for.

Unstructured data is essentially the way that people use communications systems. It includes text messages, emails, messaging apps (eg, Viber and WhatsApp) and other types of point-to-point encrypted communication. There has recently been a significant increase in unstructured data.

Managing data

How to narrow data

In order to start narrowing data, the first step is collection. This includes looking at the email system. Unstructured data can be viewed as a series of concentric circles. Legal teams must collect individuals’ devices (eg, laptops) and image their hard drives. If possible, and depending on the data protection rules, the team should then collect peripheral devices (eg, smartphones, external drives and USB sticks) that store data.

Once all of this data has been collected, the team must exclude data-heavy items that have little value (eg, program files and photographs) in order to create a filterable set of data.

Techniques to filter data

The most basic technique to filter data is by searching terms. At the start, the legal team should come up with a list of words relating to the investigation and apply them across the data to see whether any documents contain those words. These documents must then be reviewed at the first and second level to see whether they are relevant to the investigation.

There are other techniques that can be used, such as an algorithm – so-called ‘technology-assisted review’. This process involves taking a set of documents and reviewing it with a subject matter expert on the investigation. The subject matter expert will review hundreds or thousands of documents to create a ‘seed set’ that helps the algorithm determine which are relevant. This process essentially hones the algorithm to choose documents that are more likely to be relevant. The probability stratifications can help to eliminate a portion of documents that are less likely to be relevant or responsive.

Although no true AI is available, certain applications allow concept searches. These programs allow teams to decide what concepts to look for and then use technology-assisted review to search for documents that contain those concepts.

It is recommended to run different techniques as a way to cross reference. This will help to examine larger amounts of data at a higher and more efficient rate. However, while technology can go a long way, it ultimately takes human evaluation and intelligence to determine relevance.

It is paramount to remember that each situation is dynamic and iterative. Keep an open mind when reviewing documents. It may be the case that a new term needs to be considered or interviews need to be conducted. If they do, the question then becomes whether the information matches. Or perhaps whether the interviewee will raise another issue in the process which must now be reviewed. Alternatively, there may be another whistleblower email, competitor complaint or newspaper article to examine.


All of these techniques help to make investigations more efficient – and efficiency means cost effectiveness. Clients are getting more comfortable with data and techniques to analyse data, to the point that some clients have not only lawyers and accountants, but also data scientists in their compliance programmes. A big multinational client with tens or hundreds of thousands of employees will likely have staff who can design state-of-the-art search engines and training algorithms and use them to leverage resources.

While only the largest clients will have those capabilities, many have forensic technologists, both in-house and at forensic consulting companies, with which legal teams can also work. These technologists are familiar with different search techniques and technologies and the way to leverage them in order to process and filter large amounts of data.

There have been significant advances in technology in the past few years and increasing interest in this regard. This is largely due to ever-increasing amounts of data and, as a result, ever-increasing costs. Legal teams must know how to get costs to a controllable, reasonable level and work with people who understand and are comfortable with relevant concepts. They must be able to articulate to clients (if they are unfamiliar with it) and the government (to defend it) what they are doing, how they are doing it and why it is reliable. Governments use these techniques as well, so most of them are familiar with it. It is instead a matter of ensuring that there is a sufficient level of reliability.

For further information on this topic please contact Peter S Spivack at Hogan Lovells US LLP by telephone (+1 202 637 5600) or email ( The Hogan Lovells US LLP website can be accessed at