XML, CSV, XBRL - what’s the difference?

Posted on Wednesday, January 4, 2017

David Tauriello, VP Operations, XBRL US

Formats are sometimes mistaken as financial data standards by regulators, government agencies and businesses.

Recently, we’ve seen and heard U.S. regulators discuss and decide to collect data in XML or CSV <format /> while mistakenly referring to it as a <financial data standard /> with the expectation that it will result in comprehensive, consistent, interoperable data that lets investors and other data users automate data extraction and analysis.

The danger in this approach is that a <format /> is simply a means to exchange numbers that have no embedded meaning.

The XML format supports related schemas that help further define the context of the reported data, but the nature of the data represented in a schema is not guaranteed to be the same across multiple data collection scenarios. For example, one XML data collection system may represent time period using the ISO 8601 standard while another one builds the time period into the data field reported, e.g., “Revenues for 2016”, “Revenues for 2017”.

The C,S,V,file,format is a good way to transmit numbers, but the reported values themselves contain no contextual data to help humans and machines understand what the data means.

More than a <format />, the XBRL financial data standard is designed to capture the characteristics of financial information.

For example, an effort to collect financial data like assets requires recording the currency of the investments. In XBRL, the XBRL International Units Registry provides a clearly defined mechanism to record this information; the XBRL specification requires that currency be selected from the Registry and recorded in the same way by every reporting entity.

In XML format, currency is defined by the designer of the data collection system. In separate collection systems there is no requirement that currency for assets be recorded consistently. The same is true for durations of time, entity reporting the information, breakdowns by classes of security, etc.

With XBRL, each number’s context is captured, communicating definition, time period, units and name of reporting entity consistently. Built and maintained with collaborative consensus among accounting, finance information and software professionals, the XBRL specification has clearly defined mechanisms to handle important aspects of financial data – without them, the reader is lost in a sea of meaningless numbers.

When financial data collection systems are built on XML, CSV or some other <format />, the method used to define units such as currencies, periods of time, the entity the data relates to, and disaggregation of data is re-created every time. This means data cannot be easily compared without manual reconciliation. The system designer wastes time addressing these issues for every data set. There’s no intelligence built into the system. Users – either human or machine – cannot automatically extract and analyze the reported data.

Does a ‘simple’ financial data collection system require a ‘simple’ data format?

Sometimes organizations say the financial data reporting need is ‘simple’, so “XML is good enough”. In its final rule for Investment Company Modernization, the SEC adopts XML over XBRL because, in part:

“…We believe that requiring funds to report information on Form N-PORT in XML will be both efficient and cost-effective for funds…For this data set, the additional flexibility offered by a broader XML based framework such as XBRL incurs data volume and processing overhead with little incremental benefit; for example, the information funds will report will be as of a single reporting date, the units of measurement are predetermined or are constrained by the data type, and there is little value in customizing the content or presentation. (see page 429)

There are some inaccuracies in the SEC’s explanation:

[XBRL] incurs data volume and processing overhead.
This statement suggests that XBRL would be more expensive for regulators, reporting entities and data users than XML. We disagree. An XBRL implementation for financial reporting requirements would be significantly less expensive than starting from scratch building a new XML standard because:
- The SEC will incur costs in creating a new standard when they can leverage an existing standard (US GAAP Financial Reporting Taxonomy) for the same needs
- Software providers to the fund and user communities will need to build new tools customized to work with a new XML standard. Software applications that work with one XBRL taxonomy can be adapted to work with any XBRL taxonomy. Applications that are not currently XBRL-enabled can be adapted to work with the XBRL specification. This ensures a competitive, cost-effective marketplace of tools for fund data use.
- Consumers of the data will require manual review of the data as the context (time period, definitions, reporting entity) will not be represented in the same way for every fund report submitted. Applications that work with XBRL data can automatically consume reported information because of the greater contextual metadata provided in the XBRL standard; therefore processing costs are actually less than using an XML standard which does not provide sufficient context for the reported data and would require manual translation of data before analysis can begin.
…the additional flexibility offered by a broader XML based framework such as XBRL.
XBRL is actually significantly less flexible than XML. The restrictions in the structure of XBRL which require issuers to conform to a single method to convey time period, currency, scale, reporting entity, etc.; and the requirement to adhere to agreed-upon definitions of data fields, are what make the structured data reported consistently understandable and enable the data to be automatically consumed. Extensions in XBRL submissions allow ‘additional flexibility’; permitting ‘additional flexibility’ is a data collector’s decision.
…information funds will report will be as of a single reporting date, the units are predetermined or are constrained by the data type and there is little value in customizing the content or presentation.
This implies that data reported will not be used in multi-fund comparisons, trend analyses looking at a single fund financials at different time periods or analysis comparing fund data to other types of reporting entities such as public companies. The XBRL standard would enable/automate these analyses. An XML implementation would require analysts to review and translate the data before such analysis could be performed. Separately, while XBRL does allow for the creation of custom data fields for line items that are specific to a single reporting entity, the XML standard does too. Any regulatory implementation of standards however, can preclude the use of custom elements, regardless of the standard format chosen.

The Bottom Line

To fully appreciate the differences between a <format /> and a financial data standard requires rethinking data. Most organizations putting a standards program in place are looking to streamline processing, improve consistency and accuracy, and reduce cost and duplication. Only the appropriate financial data standard will really do the job.

XML, CSV, XBRL – what’s the difference?

Does a ‘simple’ financial data collection system require a ‘simple’ data format?

The Bottom Line

Point of View

Recent PoV Posts