Technical Guide
Financial Statements with Python
February 20, 2026 • 12 min read
Analysts and founders live in Excel, but most financial statements still arrive as PDFs or clunky web tables. Copy-paste does not scale when you track dozens of tickers.
With Python, you can automate downloading, parsing, and reshaping balance sheets, income statements, and cash-flow data. With pandas and Excel, you can deliver this as a clean workbook that matches your existing models.
Workflow Overview
- Locate a consistent data source for statements.
- Fetch HTML or PDF pages with Python.
- Parse tables into pandas DataFrames.
- Normalize tickers, periods, and line-item names.
- Export to Excel using a compatible layout.
Scraping HTML Tables
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://example.com/company/financials"
response = requests.get(url, timeout=30)
soup = BeautifulSoup(response.text, "html.parser")
table = soup.select_one("table.financials-table")
rows = [[td.get_text(strip=True) for td in tr.select("th, td")] for tr in table.select("tr")]
df = pd.DataFrame(rows[1:], columns=rows[0])
df.to_excel("income_statement.xlsx", index=False)
Using pandas.read_html
When the tables are clean, pandas can parse them directly from the URL. This is effective for financial portals that expose statement tables without heavy JavaScript.
import pandas as pd
url = "https://example.com/company/balance-sheet"
tables = pd.read_html(url)
df = tables[0]
df.to_excel("balance_sheet.xlsx", sheet_name="BalanceSheet", index=False)
Combining Multiple Statements
with pd.ExcelWriter("financial_statements.xlsx", engine="openpyxl") as writer:
income.to_excel(writer, sheet_name="IncomeStatement", index=False)
balance.to_excel(writer, sheet_name="BalanceSheet", index=False)
cashflow.to_excel(writer, sheet_name="CashFlow", index=False)
Automate your data pipeline
BohdSolutions designs and maintains end-to-end scraping and reporting pipelines for fintech.
Get a Quote