What Thesma is
A developer-friendly REST API that turns raw government data into clean, structured JSON. We parse and normalise SEC EDGAR financial filings, US Census Bureau demographic data, and Bureau of Labor Statistics employment data so you get human-readable fields, consistent schemas, and a single API to query it all in seconds.
The problem
SEC EDGAR
SEC EDGAR is the authoritative source for US public company financial data, but working with it directly is painful. XBRL taxonomies vary between companies, filing formats change over time, and extracting structured data requires deep domain knowledge. Most developers give up or pay for expensive data providers.
US Census Bureau
The Census Bureau publishes a wealth of demographic and economic data, but the developer experience is brutal. Over 20,000 variable codes use opaque names like B19013_001E instead of median_household_income. Margin of error values are provided separately, and most developers skip them entirely — leading to unreliable analysis.
The geographic hierarchy — states, counties, places, tracts, metro areas — has no straightforward way to join across levels. There's no time series endpoint, so you call each year separately and stitch results together yourself. Rate limiting kicks in with no API key management to help. It's publicly available data that requires serious engineering effort to actually use.
Bureau of Labor Statistics
BLS publishes data via opaque series IDs — 13 to 21 character codes like CES0000000001 or JTS000000000000000JOL — that encode industry, geography, data type, and seasonal adjustment in positional fields. No human can read them without a lookup table. Four separate datasets (CES, QCEW, OEWS, JOLTS) use different file formats, release schedules (monthly, quarterly, annual), and geographic granularity with no unified query interface.
Roughly 60% of county-level QCEW data is suppressed for confidentiality with no built-in fallback — you get blanks where you expected numbers. There's no cross-reference to SEC companies either. Connecting a public company to its industry employment trends requires resolving SIC → NAICS concordance chains across 5 vintage translations. It's publicly available data that requires serious engineering effort to use.
What we cover
We cover ~3,000 US public companies — about 98% of the investable US equity market by market cap.
SEC EDGAR
10-K / 10-Q
Annual and quarterly financials — income statement, balance sheet, cash flow — with 68 normalised fields.
Form 4
Insider trades with full transaction details, parsed from XML filings.
8-K
Material events, parsed and categorised by event type.
13F
Institutional holdings with quarter-over-quarter position changes.
DEF 14A
Executive compensation and board composition from proxy filings.
SC 13D / 13G
Beneficial ownership disclosures. Activist positions and large shareholdings parsed from XML filings.
US Census Bureau
American Community Survey (ACS)
5-year estimates across income, population, housing, education, employment, race & ethnicity, and health insurance. 26 curated metrics with human-readable names and margin of error on every value.
Geographic Coverage
6 levels from nation down to census tract — states, counties, places, metro areas. Time series for year-over-year comparison at every level.
Bureau of Labor Statistics
CES (Current Employment Statistics)
Monthly payroll employment and average earnings by industry. National, state, and metro coverage. 14 supersectors down to detailed NAICS industries.
QCEW (Quarterly Census of Employment & Wages)
County-level employment and wages by industry. 3,100+ counties. Suppression-aware fallback to coarser NAICS levels when data is confidential.
OEWS (Occupational Employment & Wage Statistics)
Hourly and annual wages by occupation, industry, and geography. 800+ detailed occupations with percentile distributions.
JOLTS (Job Openings & Labor Turnover Survey)
Job openings, hires, quits, and layoff rates by industry and state. Labor market tightness signals for macro analysis.
Cross-dataset intelligence
Thesma lets you enrich SEC company data with BLS industry employment and county-level wages in a single request — just add ?include=labor_context to any company query. No concordance tables, no multi-source stitching, no separate API keys. One query parameter bridges financial filings and labor market data — the kind of cross-dataset join that separates Thesma from raw government APIs and single-source competitors.