Ai for Smart Investors

Step-by-Step Guide: Using AI to Analyze Company Financials

Immagine in evidenza di Hello World

Common Mistakes and How to Avoid Them

Using AI to analyze company financials doesn’t have to be mystical. With a repeatable workflow, a handful of tools, and the right prompts, you can speed up research, reduce human error, and produce reproducible investment insights. This guide walks you, step-by-step, from raw filings to a defensible investment thesis — with concrete prompts, output formats, and risk controls.


TL;DR

  1. Define the research question.
  2. Gather structured financials + qualitative documents.
  3. Clean and normalize data.
  4. Use an LLM to extract and summarize key metrics.
  5. Calculate ratios and run scenario tests.
  6. Build an AI-assisted investment thesis and check red flags.
  7. Backtest, validate, and continuously monitor.

1) Step 1 — Define Objective & Scope

Start with a clear question. Examples:

  • “Is Company X undervalued compared to peer Y over a 3-year horizon?”
  • “Does Company X have improving free cash flow and realistic margin expansion?”

Define outputs you want: summary paragraph, 5-year ratio table, valuation assumptions, and red-flag checklist. Clear scope prevents information overload.


2) Step 2 — Gather Data (What to collect)

Collect both structured and unstructured data:

Structured:

  • Historical financial statements (income, balance sheet, cash flow) — 5 years ideally.
  • Price history and volumes.
  • Consensus estimates (EPS, revenue).

Unstructured:

  • 10-K / 10-Q / annual reports / management commentary.
  • Earnings call transcripts.
  • News, analyst notes, and alternative data (hiring, app downloads, social sentiment).

Useful sources/tools: EDGAR (SEC), company investor relations pages, Yahoo Finance / Alpha Vantage / Quandl, Koyfin, FinChat, AlphaSense, Sentieo. Export to CSV or Google Sheets for structured work.


3) Step 3 — Clean & Normalize Data

Why: AI and models need consistent inputs.

Checklist:

  • Align fiscal year vs calendar year.
  • Convert all amounts to same currency and units (e.g., millions USD).
  • Fill missing values sensibly (interpolation only where justified).
  • Tag timestamps and data sources for auditability.

Store cleaned tables as CSV/Parquet or in a database. Keep an immutable raw-data copy.


4) Step 4 — Use an LLM to Extract & Summarize

LLMs are excellent at reading filings and extracting the narrative. Use structured prompts and request machine-readable output (JSON) for downstream automation.

Prompt template — summarize 10-K and extract key metrics (JSON):

You are a financial analyst. Summarize the key financial highlights from this 10-K for [Company Name]. Then extract the following metrics for the last 5 fiscal years: revenue, gross profit, operating income, net income, EBITDA, EPS, free cash flow, total debt, cash on hand, share count. Finally, list 5 management commentary quotes that indicate strategic direction.

Return output as JSON:
{
 "summary": "...",
 "metrics": {
   "YYYY": {"revenue":..., "gross_profit":..., ...},
   ...
 },
 "top_quotes": ["...", "..."]
}

Why JSON? It’s easy to parse into Python/Excel for ratio calculations and charts.


5) Step 5 — Automated Ratio Calculation & Visualization

With the JSON or CSV results, compute standardized ratios:

  • Gross Margin = Gross Profit / Revenue
  • Operating Margin = Operating Income / Revenue
  • Net Margin = Net Income / Revenue
  • ROE = Net Income / Shareholders’ Equity
  • Free Cash Flow Yield = FCF / Market Cap
  • Debt/Equity and Current Ratio

Use Excel formulas or a short Python/pandas script to create trend tables and charts. Visuals matter: margin trend, revenue CAGR, and cash conversion are often decisive.


6) Step 6 — Build an AI-Assisted Investment Thesis

Ask the LLM to synthesize quantitative and qualitative signals into a concise thesis.

Prompt template — write investment thesis:

Using the metrics and management quotes below, write a 6-point investment thesis for [Company]. Include: bull case, base case assumptions, bear case, 12-month target price (with assumptions), and 3 red flags to monitor. Use concise bullet points.

Require the model to list assumptions (growth rates, margin expansion, multiples) so you can test them numerically.


7) Step 7 — Scenario Analysis & Stress Testing

Translate narrative assumptions into scenarios:

  • Base: revenue CAGR = X%, margin expands to Y% in 3 years.
  • Bull: higher adoption + margin upside.
  • Bear: demand slump, margin compression.

Run sensitivity tables: vary revenue and margin and compute implied valuation ranges. For more rigor, use Monte Carlo sampling (simulate distribution of outcomes) if you have the capacity.


8) Step 8 — Backtest & Validate

Before trusting outputs:

  • Backtest your signals (e.g., EPS surprise vs subsequent 30-day returns).
  • Run out-of-sample tests (train on data through year T, test on T+1).
  • Compare AI-extracted numbers with a trusted source (FactSet, Bloomberg, company filings) to validate accuracy.

Flag systematic errors (e.g., consistently misreading footnote accounting).


9) Step 9 — Monitor, Governance & Drift Detection

Models and sources change:

  • Schedule periodic re-extraction (weekly / monthly).
  • Track model performance metrics: accuracy of ratios, prediction error, number of hallucinations.
  • Log each AI output with timestamp, prompt, and model version for audits.

If results start drifting, retrain workflows and re-verify data sources.


Prompt Bank — Practical Examples You Can Copy

  1. Summarize earnings call (short):
Summarize the Q2 earnings call for [Company] in 5 bullets: revenue drivers, margin commentary, guidance, key risks, management tone.
  1. Extract red flags:
From this 10-Q text, list any accounting red flags or one-time items in bullet form (e.g., revenue recognition changes, large write-downs).
  1. Peer comparison table:
Compare [Company A] vs [Company B], [Company C] across revenue CAGR (3yr), EPS CAGR, EV/EBITDA, and FCF yield. Return as CSV.
  1. Valuation assumptions checklist:
Generate a checklist of 8 assumptions to justify a 12-month target price for [Company].

Common Pitfalls & How to Avoid Them

  • Hallucinations: Always ask the model to cite source text and return offsets or exact quotes.
  • Garbage In → Garbage Out: Clean, reliable inputs are non-negotiable.
  • Overfitting to Past Events: Use out-of-sample validation and scenario analysis.
  • Ignoring Footnotes: Train the workflow to extract footnote data (leases, non-recurring items).
  • Blind Automation: Keep a human in the loop for final judgement and trade execution.

Compliance & Ethics

  • Ensure you don’t process material non-public information (insider data).
  • Retain logs for auditability (prompts, outputs, timestamps).
  • Respect data licenses for paid sources.
  • Avoid giving the LLM prompts that reconstruct or infer private personal data.

Suggested Tech Stack (lightweight)

  • LLMs / AI: ChatGPT (via API), FinChat, custom LLMs for enterprise.
  • Data & Docs: EDGAR, Alpha Vantage, Yahoo Finance, Sentieo, AlphaSense.
  • Processing: Google Sheets / Excel for small scale; Python (pandas) for automation.
  • Visualization: Koyfin, TradingView, Matplotlib / Plotly.
  • Workflow: Notion or Airtable + Git (for versioning prompts).

Example Mini-Workflow (30–60 minutes for one company)

  1. Download last 5 years of statements (CSV).
  2. Run LLM prompt to extract management commentary and red flags (5–10 min).
  3. Parse JSON output into Excel/pandas; compute ratios (5–10 min).
  4. Generate 3 scenario valuations (10–15 min).
  5. Produce 1-page investment summary with thesis and red flags (5–10 min).

Result: reproducible research you can attach to a trade idea.


Key Takeaways

  • Use AI to accelerate reading and structure outputs — not to replace judgment.
  • Always request machine-readable outputs (JSON/CSV) for reproducibility.
  • Validate: cross-check numbers, backtest signals, and log everything for audits.
  • Combine quantitative ratios + qualitative management commentary to form a balanced thesis.