Analysts and founders live in Excel, but most financial statements still arrive as PDFs or clunky web tables. Copy-paste does not scale when you track dozens of tickers.

With Python, you can automate downloading, parsing, and reshaping balance sheets, income statements, and cash-flow data. With pandas and Excel, you can deliver this as a clean workbook that matches your existing models.

Workflow Overview

  • Locate a consistent data source for statements.
  • Fetch HTML or PDF pages with Python.
  • Parse tables into pandas DataFrames.
  • Normalize tickers, periods, and line-item names.
  • Export to Excel using a compatible layout.

Scraping HTML Tables

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://example.com/company/financials"
response = requests.get(url, timeout=30)
soup = BeautifulSoup(response.text, "html.parser")
table = soup.select_one("table.financials-table")

rows = [[td.get_text(strip=True) for td in tr.select("th, td")] for tr in table.select("tr")]
df = pd.DataFrame(rows[1:], columns=rows[0])
df.to_excel("income_statement.xlsx", index=False)

Using pandas.read_html

When the tables are clean, pandas can parse them directly from the URL. This is effective for financial portals that expose statement tables without heavy JavaScript.

import pandas as pd

url = "https://example.com/company/balance-sheet"
tables = pd.read_html(url)
df = tables[0]
df.to_excel("balance_sheet.xlsx", sheet_name="BalanceSheet", index=False)

Combining Multiple Statements

with pd.ExcelWriter("financial_statements.xlsx", engine="openpyxl") as writer:
    income.to_excel(writer, sheet_name="IncomeStatement", index=False)
    balance.to_excel(writer, sheet_name="BalanceSheet", index=False)
    cashflow.to_excel(writer, sheet_name="CashFlow", index=False)

Automate your data pipeline

BohdSolutions designs and maintains end-to-end scraping and reporting pipelines for fintech.

Get a Quote