Resources for Extracting Machine-Readable Data

Posted on Wednesday, November 26, 2025

XBRL US Members and others using our Public Filings Database and the XBRL API know extracting data from machine-readable reports yields a wealth of granularity for unambiguous analysis.

Extraction involves an XBRL processor - software that reads tagged facts and interprets them according to the taxonomies defined for the report. Altova's MapForce and XMLSpy feature mapping and export functionality in the user interface, and the open-source tool Arelle can also extract data. The XULE syntax works in Altova and Arelle and can be used to filter data and attributes for extraction as well as completing additional steps, calculations or analysis as required before output.

XULE is an abstraction of the XBRL model; as a public working draft XBRL specification it is known as the xBRL Query and Rules Language. The US Securities and Exchange Commission (SEC) now uses XULE to check every XBRL submission for Data Quality Committee Rules maintained in the Financial Accounting Standards Board's US GAAP Taxonomy. In addition to evaluating data in reports, XULE extracts, gathers and analyzes data, converts, restructures and normalizes it, creates taxonomies and more.

This Jupyter Notebook has a complete process for extracting data using XULE with Arelle. The first interactive cell installs Arelle, XULE and some SEC-specific tools in a Python environment. The second cell takes several user inputs, including a decision to use one of two separate XULE expressions to extract data from a report. Click the 'Show code' text for the cell to review the two options:

- In the first XULE expression between the first two sets of ''' characters in the Notebook's second cell, the code list({covered @}) instructs the processor to consider all XBRL facts in the report. The subsequent code filters the facts so that only numeric facts are part of the set included in the dictionary and defines attributes to be included for each fact. Finally, the fact extraction expression defines the output format, location and ability for the file to be appended. Click the image at right to see the XULE expression for numeric fact extraction in our syntax highlighter (get details for this free VS Code extension from the XULE syntax link above).

- The second XULE expression process is similar to the first; taxonomy().cubes is the instruction for the processor to find all occurrences of data hypercubes in the report. This structure combines facts and attributes with taxonomy characteristics like the human-readable label, order and hierarchical structure of facts within the report (all good details for AI training, by the way). The use of table elements is what enables fact cubes to be identified in a report - think statements or schedules. This is an emerging best practice but not an SEC requirement, so data output may not be a complete report. Federal Energy Regulatory Commission's eForms are built so this XULE expression will return fact cubes for each report schedule.Like the facts example, the subsequent code in the cube XULE expression defines attributes of returned facts in each cube, as well as format, location and append details.

At the end of the second code block is a sequence of steps that compile and relocate the .xule file as a .zip necessary to use for processing reports.

The last step in the Notebook defines a report to use and runs Arelle, invoking the .zip file to produce the output. It's important to note that once the XULE expression is compiled as a .zip, it can be used to process data in ANY XBRL report. With file-append set to true, the code in this cell could be revised to iterate on a list of reports (or RSS feed) build a data collection for use.

Resources for Extracting Machine-Readable Data

XBRL US Members and others using our Public Filings Database and the XBRL API know extracting data from machine-readable reports yields a wealth of granularity for unambiguous analysis.

Point of View

Recent PoV Posts