
Getting XBRL in LLMs for as-filed research
Posted Thursday, March 19By David Tauriello, Vice President of Operations, XBRL US
It's generally accepted that structured data improves analysis and recent AI advances mean getting machines to find and access machine-understandable structured data is also getting better.
This post touches on four topics related to XBRL in research tasks supported by AI:
- prompting to get as-filed, context-rich, unambiguous XBRL directly from regulatory sources;
- getting connected to the Public Filings Database* with the XBRL US MCP (Model Context Protocol), which is the open standard for connecting LLMs (Large Language Mdels) to external sources;
- using the MCP to get as-filed XBRL data, full report context and more, and;
- understanding why reviewing any work completed by LLMs remains an essential task.
The latest commercial releases of LLMs from Anthropic, OpenAI and Google (maybe others) now find and use XBRL embedded in HTML reports collected by regulators when prompted, instead of inferring meaning from text scraped from the HTML.
Try asking Claude's Opus 4.6 for XBRL at its source by name:
Here's the prompt and response as well as a screenshot of the result below from March 2026, and the spreadsheet created by it.
While gathering data from a single company's tax disclosure is generally not a difficult task, it's the LLM's awareness of how to fall back to, then find and parse facts from inline XBRL after trying unsuccessfully to use tools like the SEC's APIs that signals promising growth in the technology's handling of XBRL.
Similar prompts for FERC and ESEF reports with Opus 4.6 also worked, and suggest a future where direct, large-scale retrieval from regulatory sources and analysis using freely-available structured XBRL data like this could be possible, albeit at a significant cost in terms of processing power.
As Kristin Bitterly of Citi Wealth recently noted, "It's not just about the AI tools. (It's about) who has the data?".** Provenance will always be critical and efficiency is likely to be just as important a consideration as AI costs grow. The ability to harness regulatory data with related insights from a single API call for use in LLM-related tasks improves should lower processing cost and reduce time spent tracking inconsistent or unexpected results.
To illustrate the potential for using commingled as-filed data and insights with AI, we've created a set of tools in an MCP server, which extends the work of Jose Antonio Huizar at 2H Software and was built with help from Hamid Vakilzadeh at the University of Wisconsin Whitewater. This MCP queries the Public Filings Database that includes XBRL reports ingested from regulatory and other sources along with any data quality issues logged during the load process, and returns results to the LLM for processing in tasks.
To get started with the XBRL US Filing Repository MCP server:
- create a free XBRL US Web account (for best results, use an organizational email and password, instead of Gmail or Yahoo)
- create a free account at Smithery (login >> Sign up), a registry of MCP Servers
- Click the "Humans" tab on the XBRL US Filing Repository entry at Smithery and follow instructions for Claude
- after the MCP is listed in Claude, click Connect and when the login prompt below appears in the browser, enter your XBRL US Web account email and password (watch this brief video for a step through)

Once it's enabled as a connector in Claude's desktop or web interface, you'll be ready to include XBRL data and more from our Database in your AI work. It's important to note that as a demo, this MCP restricts the volume of data retrieved for AI processing. Use the XBRL API discussion forum to let us know how the MCP is working for you or contact us if you have questions or are interested in learning more about the Database, tools and methodologies or the development of a customized/comprehensive version of the MCP.
Prompting LLMs with an MCP still requires some knowledge and thought.
A good MCP starting point is to ask for a summary of available tools to get a sense for how it will query for data. Some familiarity with field nomenclature used in the XBRL API (different from the SEC's APIs) will help. Also there are several terms that can focus the work of gathering data from our Public Filings Database - latest (or most recent), annual (or quarterly) and dimension members are a few words that help filter data. In addition, the tools will seek clarification if there are conflicts or multiple options - for example, if you ask for data about Ford, you might get asked to confirm Ford Motor Company vs. Ford Motor Credit Company.
Returning to the Tax Disclosure prompt that was used initially, a modified prompt asking for a comparison between the data sourced from the SEC and using the MCP returned nearly identical results for the data, with the Public Filings Database also incorporating any Data Quality Committee (DQC) issues that may have been logged during ingestion.
Here is the second prompt and response as well as a screenshot of the result below and the Tax Disclosure comparison report as PDF completed the same day (March 14).

Similar prompts can create some different results.
While both methods retrieved XBRL, there were a couple of variances noted in section 6 of the report due to: 1 - a restatement; 2 - a DQC issue (one of several for the filing); 3 - Claude's methodology for extracting data, and; 4 - rounding. The data quality insights are a value-add, as a researcher might be very interested to know about inconsistencies in a company's income tax reporting.
For data users and analysts, there are good reasons to cheer these AI advances, especially as enhancements underway with the XBRL Specificatioin will result in a more LLM-friendly and freely-available standard that makes the work of AI in understanding the underlying model for reported data, consolidating sources and analyzing XBRL directly from regulators (or elsewhere) an increasingly efficient task.
It's still important to note that LLMs are advancing and MCP is an even more recent development. Even using highly standardized data today, what's done to 'interpret' the data is still largely unknown - the engineering and training across LLMs might make different versions less suitable to specific tasks and could skew how the data is used, creating unintended consequences in AI-generated results.
* The Public Filings Database is a repository of facts, document details and taxonomies we ingest from US regulator websites in XBRL (the SEC and FERC), as well as XBRL International's collection of ESEF filings. The XBRL API is a good tool for fully exploring what we curate, and in addition to the MCP we've also published a number of free resources that work with the XBRL API to our Data Community page.






Comment
You must be logged in to post a comment.