Posted on Wednesday, April 21, 2021

By Steven Huddart, Smeal College of Business, Penn State University

All US public companies submit their financial statement data to the Securities and Exchange Commission (SEC) in standardized XBRL format. These structured disclosures are based on the U.S. Generally Accepted Accounting Principles (GAAP) Financial Reporting Taxonomy, include values in the financial statements and footnotes, and can be tabulated to meet a range of data consumers’ needs.

Despite the ready availability of XBRL data, many investors, analysts, businesses, policy-setters, and academics use corporate financial statement data assembled by commercial providers, which are normalized from SEC filings using opaque, often proprietary, methodologies.

Advantages of XBRL Data

XBRL (or “as-filed”) data have certain advantages over data from commercial providers:

  1. More timely. Whereas commercial data providers may take days or even weeks to compile financial statement data and make them available to their users, there is no delay between when a corporation files its Form 10-K and when users can obtain all the information in the filing in a format suitable for statistical analysis.
  2. More granular. Compustat’s Fundamental Annual dataset contains about 900 data items. In contrast, the 2020 GAAP taxonomy specifies 643 unique balance sheet tags, 574 unique income statement tags, and 766 unique cash flow statement tags.  The greater number of detailed items identified in the taxonomy allows users the flexibility to construct metrics appropriate to their specific purpose.
  3. Authoritative. XBRL data adhere to an authoritative taxonomy, created by the Financial Accounting Standards Board (FASB) and approved by the SEC, and so are verifiable and reproducible.

Predictive Nature of XBRL Data

Given these advantages, it is natural to ask whether use of XBRL data affects investment decisions.

To tackle this question, Kai Du (also at Penn State’s Smeal College of Business), X. Daniel Jiang (School of Accounting and Finance, University of Waterloo), and I tested whether portfolios constructed to exploit the accruals anomaly contained different stocks or earned different returns depending on whether the analysis was based on XBRL data or Compustat data.  We chose the accruals anomaly, because it is perhaps the best-known accounting-based return anomaly.  The anomaly is that stocks for which reported earnings contain a large accrual component (and a small cash-flow component) tend to underperform, while stocks for which reported earnings contain a small accrual component (and a large cash-flow component) tend to overperform.

We find significant discrepancies between the as-filed data and the Compustat data for several accounting items involved in calculating operating accruals. These discrepancies tend to be greater for firms that are smaller or are experiencing higher growth. Furthermore, discrepancies tend to be larger when (i) it is more difficult to compare the accounting practices between industry peers; (ii) the filing contains more industry-specific XBRL tags; or (iii) the financial statements present more granular items (e.g., the change in accounts payable to related parties). In other words, in cases where the accounting is complex or the registrant discloses uncommon accounting items, greater data discrepancies arise from Compustat’s normalizations.

One effect of these discrepancies is that stocks selected for the high- and low-accruals portfolios differ markedly depending on the data source used.  This, in turn, leads to different returns on the hedge portfolios. In particular, over the 2012 to 2018 period, we find that the accruals anomaly is significant (i.e., the low-accruals stocks earn significantly higher returns than the high-accruals stocks) using as-filed data, but not significant using Compustat data. We double-checked our results with a second commercial data provider and obtained similar results.

We also gathered indirect evidence on whether institutional investors (e.g., mutual funds and hedge funds) base their trading decisions on Compustat data or XBRL data.  Our tests are consistent with the view that some institutional investors are indeed making trading decisions based on XBRL data, especially when they directly retrieve XBRL filings from the SEC’s EDGAR site.

Our findings are not limited to the accruals anomaly. We find that at least four other accounting-based anomalies are similarly affected by discrepancies between data sources.

What does this mean for users of corporate financials?

Every data user has different needs and technological capacity. Commercial data providers’ normalized datasets serve an important role. By normalizing, vetting, and in many cases, cleaning the data, they make financial statement data easier to use than as-filed data.

But as-filed data offer benefits, too. The SEC ensures that they are freely available. Their flexibility, timeliness, and granularity are advantages that sophisticated data users should consider. Finally, the fact that XBRL data adhere to the official taxonomy facilitates dialogue among preparers, users, and regulators of financial reports.

To receive a copy of the full research paper, please email huddart@psu.edu.