Home Forums The XBRL API Lists of similar concepts

Viewing 22 reply threads
  • Author
    • #119805
      Tim Bui

      Hello, as companies have the freedom to create unique names of their tags (concepts), does XBRLUS keep a list of all similar concepts on the edgar_db database so that the users can save time in doing standardization?

      For example, to get a simple Total Revenue, so far I found 5 different concepts:

      1. Revenues
      2. SalesRevenueNet
      3. SalesRevenueGoodsNet
      4. TotalRevenuesAndOtherIncome
      5. RevenueFromContractWithCustomerIncludingAssessedTax

      To find Nonrecurring charges (such as impairment and restructuring), so far I found 13 concepts:

      1. Assetacquisitioncharge
      2. AssetImpairmentCharges
      3. ImpairmentOfLongLivedAssetsToBeDisposedOf
      4. RestructuringCostsAndAssetImpairmentCharges
      5. ImpairmentOfIntangibleAssetsIndefinitelivedExcludingGoodwill
      6. Impairmentandrestructuringexpenses
      7. RestructuringSettlementAndImpairmentProvisions
      8. GoodwillImpairmentLoss
      9. RestructuringCharges
      10. RestructuringCosts
      11. RestructuringChargesAndAcquisitionRelatedCosts
      12. GoodwillAndIntangibleAssetImpairment
      13. ImpairmentOfLongLivedAssetsHeldForUse

      The above 13 non-recurring items are exclusive of the additional 7 gain/loss concepts below:

      1. GainsLossesOnExtinguishmentOfDebt
      2. DerivativeGainLossOnDerivativeNet
      3. GainLossOnSaleOfBusiness
      4. GainLossOnSaleOfNonstrategicBusinessesAndAssets
      5. GainLossRelatedToLitigationSettlement
      6. GainLossOnDispositionOfAssets1
      7. GainLossOnDispositionOfIntangibleAssets

      Thank you!

    • #119925
      David Tauriello

      Tim – we don’t maintain lists like the ones posted here; you’ve done a great job to create a few groups for standardizing elements and tagged it so others can find it easily and post additional groups or additions to those listed above.

    • #119958
      Tim Bui

      For trend analysis, we need to look at historical numbers. However, some companies changed the tag names so it is very difficult to pull a consistent list of numbers using the API. If anyone has a solution, please share with me.

      For example, Microsoft (CIK: 0000789019) used 4 tags for its total revenue over the last few years:
      – For FY 6/18 (which has data for 6/18, 6/17,and 6/16) it used “revenuefromcontractwithcustomerexcludingassessedtax”
      – For FY 6/17, it used “SalesRevenueGoodsNet”
      – For FY 6/15, it used “SalesRevenueNet”
      – For FY 6/10, it used “Revenues”
      Using only one of the above tags will get only 3 years of data. If we use all 4 tags, then we have duplications such as the list below. Even with 4 tags, I am not able to get Total Rev for FY2018.

      entity.cik period.fiscal-year period.end concept.local-name fact.value
      0000789019 2009 2009-07-01 Revenues 58437000000
      0000789019 2010 2010-07-01 Revenues 62484000000
      0000789019 2009 2009-07-01 SalesRevenueNet 58437000000
      0000789019 2010 2010-07-01 SalesRevenueNet 62484000000
      0000789019 2011 2011-07-01 SalesRevenueNet 69943000000
      0000789019 2012 2012-07-01 SalesRevenueNet 73723000000
      0000789019 2013 2013-07-01 SalesRevenueNet 77849000000
      0000789019 2014 2014-07-01 SalesRevenueNet 86833000000
      0000789019 2014 2014-07-01 SalesRevenueGoodsNet 72948000000
      0000789019 2017 2017-07-01 SalesRevenueGoodsNet 57190000000
      0000789019 2017 2017-07-01 SalesRevenueNet 89950000000
      0000789019 2016 2016-07-01 SalesRevenueGoodsNet 61502000000
      0000789019 2016 2016-07-01 SalesRevenueNet 85320000000
      0000789019 2015 2015-07-01 SalesRevenueGoodsNet 75956000000
      0000789019 2015 2015-07-01 SalesRevenueNet 93580000000

    • #119959
      Tim Bui

      The trend in Gross Margin (GrossProfit / Total Revenue) says a lot about the business condition of a company. However not every company reports Gross Profit. For those who do not, we have to subtract Cost of Goods Sold from Total Revenue. I do not have a solution to get consistent numbers for Total Revenue yet, however, for Cost of Goods Sold, I have found a few different tags below:
      – CostOfGoodsAndServicesSold
      – CostOfRevenue
      – CostOfGoodsSoldExcludingDepreciationDepletionAndAmortization
      – CostsAndExpenses

    • #123764
      David Tauriello

      Tim – to get lists of concepts in the base (standard) US GAAP and IFRS Taxonomies, query: /dts/search?dts.taxonomy=US%20GAAP,IFRS&fields=dts.id,dts.taxonomy-name

      With each dts.id from above that you want concept information for, query like this:


      Every company references a base taxonomy in its filing as a starting point. The company filing is essentially a taxonomy that is inheriting from the base and adding concepts as necessary according to the company’s policies and practices with respect to its financial statements.

      NOTE: you could include concept.local-name with a comma-delimited list before the fields that are returned to get only specific concepts.

      If you pull an entire taxonomy, understand that there’s a great deal of information and a significant number of records involved, so this may take a while. Fortunately, these taxonomies don’t change – new releases annually – so the details only need to be pulled one time.

    • #124615
      Tim Bui

      Hi David,
      I finally got around to apply the codes that you described above. It’s just amazing how much information returned from the call.

      A couple questions please:
      1. Using =CONCATENATE(A1&”/concept/search?dts.id=257590&fields=concept.local-name.sort(ASC),concept.*,label.*”) only return about 2000 concept local names. How can I change the code so that I can get a complete list so that I can pick and choose what I need?

      2. I am still trying to find ways to shorten my code to get more factual data for more companies per call (to be under the length of the code that Google restricts). Can I use the concept.id in place of the concept.local-name in the call?

      As always, thank you for help!


    • #124655
      David Tauriello

      Use ENDPOINT.offset(integer) in the fields= portion of your query to get additional concepts. Something like concept.offset(2001) will show concepts after 2,000 (for Power User and Sole Practitioner Individuals, as well as all Organizational XBRL US Members). This offset works for all endpoints (fact, dts, etc.)

      See the documentation and this thread for more information: https://xbrl.us/forums/topic/how-to-get-a-sample-of-records-via-xbrl-api-for-evaluation/#post-116618.

      Again – because you’re querying for the base taxonomy concepts, the details you return will not change (these are published once), so your best/most effective approach will be to copy the details to a tab (or file) and use that as your reference.

      Like other “id” parameters, concept.id returns a unique integer that corresponds with concept.local-name so it can be used as a reliable substitute.

    • #124690
      Tim Bui

      Thank you, David. I was able to download 17,035 concepts. The next part is to understand these data and select which one to use.

      Thank you again for your help

    • #133617
      Tommy Carstensen

      Tim, did you ever manager to create a list of identical concepts? It’s a bit of a mess, when it’s not standardized. I’m surprised the SEC chose to allow companies to use random names as they see fit. It completely defeats the purpose of XBRL.

      • #133655
        Tim Bui

        Hi Tommy,

        I started on the standardization but have not finished it yet. At the beginning, I planned to use data from XBRL US but David Tauriello pointed me to the SEC website where I can get all of the data from all filers more efficiently (https://www.sec.gov/dera/data/financial-statement-data-sets.html). I am importing these data into a SQL Server to do standardization. I am learning the SEC data structure and think I have a way to do better standardization, but I need to test this method further. I would be happy to share my methodology with you if it works. In the meantime, if you want, I can give you what I have done so far, but it is incomplete. I use financial information for investing so I only standardize the items that I think relevant to my work.
        Yes, I agree that whichever entity (SEC, AICPA, CFA,…) that allows companies to name their tags (concepts) at-will really weaken the case for XBRL and disadvantage “smaller” financial data users like me. It’s illegal to fudge the numbers but accountants can name the numbers whatever they want and can change these names when it’s convenient–making peer comparison or historical comparison extremely difficult. Data providers such as Bloomberg, Factset, CapitalIQ, Thomson Reuters will be in business for a very long time.

      • #133680
        Tommy Carstensen

        Aye Tim, I think Morningstar and the rest of them will continue to thrive as long as the data is not standardised. I wanted to plot the data over time and across companies within an industry, but that turned out not to be trivial, because there is no requirement for the data to be homogenous over time and across industries. I hope the law and XBRL specifications are changed and data is homogenised going forward to enable small fintech companies to compete.

        Here my first attempt to plot data for General Mills over time:

        I shall be watching this thread to learn more about your attempts to standardise the data. I appreciate your efforts on behalf of the community. Thanks!

      • #139650
        D Q

        I too am interested in standardizing concepts for investing purposes.
        I don’t know SQL but DM me if I can help.

    • #139657
      Tim Bui

      Hello DQ, I am still working on standardizing the tags. I think I am on the right track, however this is not a trivial task. I was able do download 87MM (yes million) lines of data from the SEC. And now I am parsing them out using SQL Server. It is slow moving because I have to check and recheck to make sure the data match with the 10Ks and 10Qs. I am happy to give the results to whoever wants them because they are not proprietary data and I am too a beneficiary of communities like XBRL US (David Tauriello at XBRL US has spent a lot of time bringing me up to speed.)

      To use this massive amount of data, one will need to use some database for sorting and screening. Microsoft Excel or Access cannot handle this much of data. Maybe we all can put in a request to XBRL US to allow members to contribute to this standardization efforts by creating a depository area on XBRL US pgAdmin.

      Commercial data providers do provide their own standardization but depending one’s need, the standardization has to be customized somewhat. Sorry for this oxymoron word of customizing the standardization. But for example, companies reports several types of Account Receivables. There are 948 distinct tags on just AccRec. Most data providers have just 1 line for AccRec. I try to break them down to 4 subcategories: AccRec_Trade_Short_Term, Acc_Rec_Finance_Short_Term, Acc_Rec_Trade_Long_Term and Acc_Rec_Trade_Long_Term. Acc_Rec_Trades are the receivables from regular customers. Acc_Rec_Financing are the receivables from financing activities such as a promissory note coming due or GM financing the dealers’ floorplans. ST or LT determines whether they are in current assets or long term assets. The change of each of these subcategories provides different type of information to the financial readers.

      In the meantime, I would highly recommend you to check out the XBRL XL (https://xbrlxl.com/) website created by Jim Truscott. Jim also tries to do standardization. Jim had a demonstration of his Excel API hosted by XBRL US a few months back.

      Let’s hope we here from XBRL US on this matter.

      • #140245
        David Tauriello

        To use this massive amount of data, one will need to use some database for sorting and screening. Microsoft Excel or Access cannot handle this much of data. Maybe we all can put in a request to XBRL US to allow members to contribute to this standardization efforts by creating a depository area on XBRL US pgAdmin.

        I’ve raised this idea internally; another possibility might be creating some sort of common Google Sheet that holds these standardized terms (taking off from Peter Guldberg’s template for balance sheet)

    • #139658
      Tim Bui

      By the way, DQ, There is a company named Intrinio (https://intrinio.com/) that provides standardization on Excel. The prices seem to be very reasonable. I I tested their system out and found that their intereface is very to use. I think their data is suitable for most purposes. I do I own parsing raw data because I wanted to do further subsegments for my own use.

    • #162239
      Nathan Suderman

      I have been working in python to load all company XBRL filings into a SQL server. Now that I have the data available, I too am running into standardization issues with the tags(concepts). I don’t want to reinvent the wheel if there is already a mapping table that has been created.

      Has anyone already standardized/mapped the key tags?

    • #162295
      Tim Bui

      Hi Nathan, I too was able to upload the Financial Data Set into SQL Server, but even after spending a tremendous amount of time trying to standardize the tags, I have failed to come up with a workable solution. There are just way too many inconsistencies and variations in the way companies naming their tags. They name the tags to best describe their particular financial items at that moment in time and change the tag if their situations change. It is difficult to find consistency for the same company over time. It becomes more difficult when comparing tags of companies within the same economic sector or across sectors.

      Being an investor, I try to standardize the tags to get the granularity for financial analysis. Not sure what is your goal for standardization, but if you wanted, we can collaborate and share ideas on this issue. Tim

      • #179790
        Chinmay Laddha

        Hello Tim and Nathan, I am working on a project where anyone can populate the balance sheet, Income Statement, Cash Flow statement and some financial ratios for 6-7 years data into a google sheet in format/template and parameters of my choice. A python program where I could create a GUI, a user can select the company, parameters he/she want to see, then after selecting these parameters will be visible on the google sheet

      • #179795
        D Q

        Hi Chinmay, how do you solve the problem of standardizing the different tags?

    • #179792
      Tim Bui

      Hi Chinmay, I would love to see your project. While I have just begun to study Python so I am not sure if I can help in coding, but I do understand finance and financial analysis, so if I can assist in anyway, I would want to do that. Thank you!

    • #179799
      Tim Bui

      Hi DQ, if you have your method to standardize, maybe you can share with us. Here is how I do it, but I am not there yet.
      I download the 2020 US GAAP Taxonomy (https://xbrl.us/xbrl-taxonomy/2020-us-gaap/) using the xls format. The tab “Presentation” inside this spreadsheet lists out all of the US GAAP tags (“name” column) and their sources (“definition” column) in hierarchical order as listed by the numbers on columns “depth” and “order”. I exact similar tags and group them the way I want to see them on a the financial statements. However, since there are so many extensions in addition to the standard tags, I am still missing a lot of tags. I do all of the grouping in SQL Server.

      I get additional extensions by using the data downloaded from XBRL.US xsheet. David Tauriello said there is no source to get all of the extensions in one place as they are company specific.

      For grouping, I use names like ca_101_cash, ca_102_mrkt_sec, ca_103_restricted_cash,….

      So far I get about 26 tags for the cash group.

    • #179800
      Tim Bui

      Sorry for the typo on my note above. It’s not XBRL.US xSheet. It’s XBRLXL xSheet (https://xbrlxl.com/)

    • #186667
      Husein Kirefu

      I read through the xbrl.org site, and was left with the impression they had created, in junction with FASB & SEC, a standardized way of taging lines of financial data with what they call “concept”. However I could not find a xblr based General Ledger giving these “concept”.

      After reading this thread, am I correct in concluding there is no standard IFRS/GAAP/ect standard for pulling data across periods in a company and across companies?

      If so, what value is there in xblr for an investor if a company can change/create “concept” to reclassify a ledger item requiring a manual audit of the data for integrity?

      • #186684
        David Tauriello

        Hi Husein – thanks for taking a look at the information on our site and at XBRL International. It’s not clear from your post what specific data you’re looking for – our Public Filings Database is designed to return all data filed under the SEC’s requirement for public companies since 2009, as well as the base taxonomies published by the FASB and the IFRS. You can use the XBRL API to get started with the data (https://xbrl.us/xbrl-api-community) – we’ve posted documentation and several templates and tools that can help familiarize you with the data set. We also link to taxonomy viewers for the base taxonomies – see the links on the right side of https://xbrl.us/2020-us-gaap.

        The extensibility of the business reporting standard for reporting is one of its great strengths. The US SEC’s implementation for public company financials is an ‘open reporting’ environment (see the glossary of our Taxonomy Development Handbook – https://xbrl.us/tdh). In adopting the use of XBRL for public companies, the US SEC acknowledged its responsibility to ‘limit the use of extensions to circumstances where the appropriate financial statement element does not exist in the standard list of tags’ (https://www.federalregister.gov/d/E9-2334/p-520).

        We continue to work with both the FASB and the SEC to recommend taxonomy modelling for the US GAAP accounting standard and identify issues related to filing patterns. Limiting the use of extensions has been an on-going part of our discussions.

    • #186668
      Tim Bui

      Hi Husein, XBRL has done a tremendous job in bringing financial reporting data into the 21st century. However, from my perspective, this data reporting mechanism is helpful only to the filers and to the data vendors at this present time. Unless one knows a lot about programming, individual investors like me still do not have an easy way to use this information. Companies have so much leeway in using tags and extensions to describe their particular situations. Without standardization, it is impossible to properly do trend or comparative analysis.

    • #191253
      Tim Bui

      Hi Mikko, I am not a real programmer but here is how I get the Fin Data Set into my Postgresql (I do not use MySQL).
      – Within each quarterly release, there is a readme.html file that lists the fields of each of the 4 txt files. I create 4 tables in my Postgresql with these fields using the required datatypes. I use the fields of each table as listed on Section 3 (Organization) as primary keys.
      – I then use Python to clean up the data of each of the txt file and save as csv files with ‘~’ as delimiter (not comma because some fields have commas).
      – Finally I use the Copy command in Postgresql to import the csv files into Postgres.

    • #192501
      Matthew Beveridge

      Tim, Mikko, and others — I have a primitive (so far) python package I am working on to simplify manipulating this data by hosting the dera financials database remotely and standardizing query results into pandas dataframes. If you are interested in collaborating on this, don’t hesitate to reach out. I plan for an alpha release in the next few weeks.

      Functionality will be along the lines of:

      # import the package
      from mypackage import fundamentals
      # get the data
      form = fundamentals.ten_k('aapl', [2018, 2019])
      # manipulate the data
      debt_cap = form.debt_capitalization()
      margin = form.gross_margin()
      roe = form.return_on_equity()

      and so on. Results will look like:

      fy 2018 2019
      tag uom
      AccountsPayableCurrent USD 5.588800e+10 4.623600e+10
      AccountsReceivableNetCurrent USD 2.318600e+10 2.292600e+10
      AccruedIncomeTaxesNoncurrent USD 3.358900e+10 2.954500e+10
      AccumulatedDepreciationDepletionAndAmortization… USD 4.909900e+10 5.857900e+10
      AccumulatedOtherComprehensiveIncomeLossNetOfTax USD -3.454000e+09 -5.840000e+08
      … … …
      UnrecordedUnconditionalPurchaseObligationBalanc… USD 9.328000e+09 8.211000e+09
      UnrecordedUnconditionalPurchaseObligationDueAft… USD 6.600000e+07 1.100000e+08
      WeightedAverageNumberDilutedSharesOutstandingAd… shares 4.473200e+07 3.107900e+07
      WeightedAverageNumberOfDilutedSharesOutstanding shares 5.000109e+09 4.648913e+09
      WeightedAverageNumberOfSharesOutstandingBasic shares 4.955377e+09 4.617834e+09


      [251 rows x 2 columns]
      fy 2018 2019
      tag uom
      DebtCapitalization ratio 0.707029 0.732692

      • This reply was modified 3 weeks, 3 days ago by Matthew Beveridge. Reason: code formatting
      • This reply was modified 3 weeks, 3 days ago by Matthew Beveridge. Reason: code formatting pt 2
      • This reply was modified 3 weeks, 3 days ago by Matthew Beveridge. Reason: include example results
      • #192521
        Mikko Olkkonen

        Hi Tim, Matthew and others,
        I am now reading dera data (and European style ESEF XBRL) to mongoDB databases (i.e. I abandoned SQL). I have developed server functionality for querying the data with somebody. Our server code is mainly typescript, javascript/node.js. I guess that we have produced samewhat similar solution as Matthew has. Loose standardisation/use of the taxonomy/tags is our key problem even if we only need limited number of key items including ProfitLoss, Revenue, Cash, …
        Tim, Matthew: where do you collaborate? In Github?

      • #192528
        Matthew Beveridge

        Tim and Mikko, I generally use github. I can share the repo with you all if you PM me (it’s currently private). Otherwise, I’ll get things in order to make it public and post the link within the next week or so.

    • #192505
      Tim Bui

      Hi Matthew, I would love to pitch in and help in anyway I can. After spending months on it, I almost gave up on the project because I couldn’t think through the proper way to standardize the financial items. I am a new student of Python, so hopefully I can learn from your Python program. Please let me know what steps I can do to help.

    • #192522
      Tim Bui

      Hi Mikko, thank you for reaching out. I would love to see your program. ​I created this github account.
      Is this what you are asking?

      Thank you!

      • #192541
        Mikko Olkkonen

        Tim, I am now following you on github. I have uploaded to my github account a bash script (deramongo.sh) I use for creating/maintaining my mongodb dera database. I have not yet uploaded the node.js server code (the node.js server offers http API for accessing the financials data stored in the mongoDB database).
        However, some version of server may be up and running at:
        vps-09403655.vps.ovh.net:8198/list returns company names and cik codes
        vps-09403655.vps.ovh.net:8198/firm?cik=1503518 returns financials data corresponding to a cik
        and so forth

    • #192544
      Tim Bui

      Thank you, Mikko! I just uploaded the files into my Postgres to check it out. I am also reading your DERA Financials data on Github right now. Thanks again

Viewing 22 reply threads
  • You must be logged in to reply to this topic.