Home › Forums › The XBRL API › Extracting Blocktext tags (i.e. sections of filings)
- This topic has 6 replies, 2 voices, and was last updated 2 years, 3 months ago by Satish Sahoo.
-
AuthorPosts
-
-
Tuesday, August 23, 2022 at 5:57 AM #202987Satish SahooParticipant
Hi David/Others,
I want to extract certain sections of the 10-k filings. For example I tried to extract segment information using ‘SegmentReportingDisclosureTextBlock’ tag. I was hoping to get the entire section (including the tables and text etc).
While I am able to pull the tag, its fact.value just gives me the details of the tag.
For example I get-
<p style=’margin-top:0pt; margin-bottom:0pt’><font style=”font-family:Times New Roman;font-size:10pt;font-weight:bold;margin-left:0px;”>NOTE 4.</font><font style=”font-family:Times New Roman;font-…<div style=”font-family:Times New Roman;font-size:10pt;”><div style=”line-height:120%;padding-top:18px;font-size:10pt;”><font style=”font-family:inherit;font-size:10pt;font-weight:bold;”>ACQUISITI…
ACQUISITIONS, GOODWILL, AND ACQUIRED INTANGIBLE ASSETS<div style=”line-height:120%;padding-top:6px;text-indent:16px;font-size:10pt;”><span style=”font-family:inherit;font-size:10pt;font-style:ital…
How can I get the entire section’s text and formatting information to put it in another HTML/Text File ? I tried footnote.* as well without any luck.
Your help would be much appreciated.
Thanks
-
Wednesday, August 24, 2022 at 3:01 PM #203004David TaurielloKeymaster
Hi Satish – these facts are HTML encoded; there is no ‘plain text’ version – the data is in there, but might be under several HTML tags for formatting purposes. You have a couple of options:
- use regex in your routine to remove tags after you’ve retrieved the data (something like <.*?> should leave you with plain text, which might be tough to read … maybe replace it with spaces, tabs or line breaks?)
- concatenate the fact.id with this string to create a URL that renders the fact: CONCAT( https://csuite.xbrl.us/php/dispatch.php?Task=htmlExportFact&FactID= , xxxxxx ) – we’re using this approach in some of the spreadsheet templates posted in the XBRL Data Community
-
Thursday, August 25, 2022 at 4:18 AM #203027Satish SahooParticipant
Hi David,
Thanks a lot for your response. This is very helpful.
Just have another related quick question. I see that at least since the Inline XBRL has started, the section files are separately posted in EDGAR website filings. Is it possible to point to the URL of those files using the API? Not sure if this is within the API framework. If it’s then it would be great. Thanks-
Friday, August 26, 2022 at 3:44 PM #203046David TaurielloKeymaster
Hi Satish – thanks for your question. As part of the process to keep our Public Filings Database current, we make exact copies of the documents submitted to the SEC, FERC and other regulators that contain XBRL (as .xml instances or .html files that have inline XBRL in them). We do not copy the exhibit files (.htm but without XBRL), images, text files, etc.). You can use the report.sec-url field to get the page on EDGAR where these additional files exist.
-
-
Friday, August 26, 2022 at 2:55 PM #203044Satish SahooParticipant
Hi David,
After some more digging, I could get to the files that contain text for any particular TextBlock tag. But I realized that the fact.value doesn’t really contain the entire block inside it. Rather it seems to be truncated. Is there a size limit on the fact.value output ? If yes is there any setting that can be used for the fact.value to contain the entire text block within it ?As an example you can check the fact.value of the following fact id.
https://csuite.xbrl.us/php/dispatch.php?Task=htmlExportFact&FactID=221545926Thanks
-
Friday, August 26, 2022 at 4:08 PM #203047David TaurielloKeymaster
Hi Satish – there might be a character limit if you’re trying to get the HTML from spreadsheet. This is why we use a hyperlink in spreadsheet to the browser view of the fact when there’s a “<\” character combination.
If you query with curl or python, or use an API testing tool, you should see all of the HTML (the data in the HTML we present is the same data in our database).
-
-
Saturday, August 27, 2022 at 4:59 AM #203054Satish SahooParticipant
Thanks, David. I think you pointed me in the right direction. I guess the truncation is happening when I am writing the json list which is the output from the API into a panda data frame. So the API is still producing the entire section. It’s just the output rendering that is causing the truncation. Thanks
-
-
AuthorPosts
- You must be logged in to reply to this topic.
Search Forums
Recent Topics
Recent Replies
-
Spreadsheet Commands: Query Worksheet and Query Workbook Query Workbook 1 month, 3 weeks ago
-
Query for multiple dimensions 3 months, 3 weeks ago
-
Query for multiple dimensions 4 months ago
-
Showdata function 6 months, 2 weeks ago
-
Showdata function 6 months, 2 weeks ago
Documentation & Discussion
- Get started with Google Sheets OR
- Get started with Microsoft Excel OR
- Get access to as-filed data from us for other tools or your own app
- XBRL API Interactive Documentation
- Ten Tips & Techniques
- The XBRL API
- XBRL Data Community
- 2024 US GAAP Taxonomy Viewer
- Live support - Monday, 3:30 - 4:30 PM ET
Who's using this free data?
API Use 2024 || API Use 2023 || API Use 2022 || API Use 2021 || API Use 2020 || API Use 2019 || API Use 2018
Join XBRL US
- Individual Options - Basic, Power User & Sole Practitioner
- For Your Team - Startup, Non-Profit, Academic & Corporate options
- Member Benefits Comparison Table
Using the XBRL API with the Public Filings Database
Unless otherwise agreed to in writing, any and all use of the XBRL API to authenticate and retrieve data from the XBRL US Database of Public Filings implies user consent and agreement with the XBRL US API Agreement. If you are unable to agree to these terms, do not use the XBRL API.
To use the XBRL API outside of Google Sheets, your account needs to be provisioned for OAuth2 access.