
By Ariel Markelevich, Professor of Accounting, Sawyer Business School, Suffolk University
Artificial intelligence is increasingly used to assist in financial decision making, from financial forecasting to predicting stock prices and identifying fraud. Users of AI know that there can be hurdles to overcome such as hallucinations and errors. A logical question to ask is if AI produces results that are sufficiently accurate to be confident of its use for financial analysis.
As academic researchers, we took on the challenge of testing the ability of artificial intelligence to correctly read financial statement data under varying conditions. Our goal was to understand what situations led to the greatest success in AI’s ability to accurately identify financial data.
The test conducted involved pulling a random sample of 5,000 annual reports published on the Securities and Exchange Commission (SEC) EDGAR system between 2014 and 2023. We made queries to an untrained LLM based on 26 accounting metrics that are found in the balance sheet, income statement, cash flow statements, and notes to the financial statements. We were working with historical data, so we could easily assess the accuracy of queried results and understand which situations result in data errors, and which produce accurate results.
We used commercial databases and provided financials in text, HTML, and XBRL (structured, standardized) format. We found:
- Metrics from the main financial statements (Income Statement, Balance Sheet, Statement of Cash Flows) such as total assets have lower error rates for AI retrieval with variation depending on the format of data used.
- Metrics taken from the notes to the financial statements, such as current income tax expense are more difficult for AI retrieval, again with variation depending on the format.
- As shown on the table below, structured, standardized data that contains contextual information about the fact, like XBRL, has a significantly lower error rate for all metric types.
Error Rates by Report Section and Source Document
Report Section Text HTML XBRL Main financial statement metrics 16.99% 14.70% 9.46% Metrics from the notes 29.19% 28.45% 7.37% - AI makes more mistakes with more complex financial statements, as measured by company size, number of operating segments, number of geographic segments, and accounting reporting complexity (ARC).
- When examining the types of errors AI made, we found that about half of the errors identified were scaling errors. AI has particular challenges in determining the scale of reported data, for example if a number is reported in millions or thousands. Data in structured, contextualized format (XBRL) virtually eliminates the scaling errors. Scaling errors were 8.16% for data in text format, 5.81% for HTML and only 0.11% for XBRL data.
Artificial intelligence offers an important opportunity to produce faster, more robust financial analysis. Our study shows that when it’s powered by structured, standardized data, like XBRL, researchers can have much greater confidence in the accuracy of results.
In the United States alone, there are large datasets of structured XBRL data available, including data from public companies, investment management companies, banks, and public utilities. There is even more data available in non-US markets that is open and freely available.
Pairing these highly structured, contextualized datasets with AI is a powerful combination that can turbo-charge our ability to perform robust, granular financial analysis.
Access the full study: AI determinants of success and failure: The case of financial statements. For more information about this study you can email amarkelevich@suffolk.edu.