
Common Mistakes and How to Avoid Them
Using AI to analyze company financials doesn’t have to be mystical. With a repeatable workflow, a handful of tools, and the right prompts, you can speed up research, reduce human error, and produce reproducible investment insights. This guide walks you, step-by-step, from raw filings to a defensible investment thesis — with concrete prompts, output formats, and risk controls.
TL;DR
- Define the research question.
- Gather structured financials + qualitative documents.
- Clean and normalize data.
- Use an LLM to extract and summarize key metrics.
- Calculate ratios and run scenario tests.
- Build an AI-assisted investment thesis and check red flags.
- Backtest, validate, and continuously monitor.
1) Step 1 — Define Objective & Scope
Start with a clear question. Examples:
- “Is Company X undervalued compared to peer Y over a 3-year horizon?”
- “Does Company X have improving free cash flow and realistic margin expansion?”
Define outputs you want: summary paragraph, 5-year ratio table, valuation assumptions, and red-flag checklist. Clear scope prevents information overload.
2) Step 2 — Gather Data (What to collect)
Collect both structured and unstructured data:
Structured:
- Historical financial statements (income, balance sheet, cash flow) — 5 years ideally.
- Price history and volumes.
- Consensus estimates (EPS, revenue).
Unstructured:
- 10-K / 10-Q / annual reports / management commentary.
- Earnings call transcripts.
- News, analyst notes, and alternative data (hiring, app downloads, social sentiment).
Useful sources/tools: EDGAR (SEC), company investor relations pages, Yahoo Finance / Alpha Vantage / Quandl, Koyfin, FinChat, AlphaSense, Sentieo. Export to CSV or Google Sheets for structured work.
3) Step 3 — Clean & Normalize Data
Why: AI and models need consistent inputs.
Checklist:
- Align fiscal year vs calendar year.
- Convert all amounts to same currency and units (e.g., millions USD).
- Fill missing values sensibly (interpolation only where justified).
- Tag timestamps and data sources for auditability.
Store cleaned tables as CSV/Parquet or in a database. Keep an immutable raw-data copy.
4) Step 4 — Use an LLM to Extract & Summarize
LLMs are excellent at reading filings and extracting the narrative. Use structured prompts and request machine-readable output (JSON) for downstream automation.
Prompt template — summarize 10-K and extract key metrics (JSON):
You are a financial analyst. Summarize the key financial highlights from this 10-K for [Company Name]. Then extract the following metrics for the last 5 fiscal years: revenue, gross profit, operating income, net income, EBITDA, EPS, free cash flow, total debt, cash on hand, share count. Finally, list 5 management commentary quotes that indicate strategic direction.
Return output as JSON:
{
"summary": "...",
"metrics": {
"YYYY": {"revenue":..., "gross_profit":..., ...},
...
},
"top_quotes": ["...", "..."]
}
Why JSON? It’s easy to parse into Python/Excel for ratio calculations and charts.
5) Step 5 — Automated Ratio Calculation & Visualization
With the JSON or CSV results, compute standardized ratios:
- Gross Margin = Gross Profit / Revenue
- Operating Margin = Operating Income / Revenue
- Net Margin = Net Income / Revenue
- ROE = Net Income / Shareholders’ Equity
- Free Cash Flow Yield = FCF / Market Cap
- Debt/Equity and Current Ratio
Use Excel formulas or a short Python/pandas script to create trend tables and charts. Visuals matter: margin trend, revenue CAGR, and cash conversion are often decisive.
6) Step 6 — Build an AI-Assisted Investment Thesis
Ask the LLM to synthesize quantitative and qualitative signals into a concise thesis.
Prompt template — write investment thesis:
Using the metrics and management quotes below, write a 6-point investment thesis for [Company]. Include: bull case, base case assumptions, bear case, 12-month target price (with assumptions), and 3 red flags to monitor. Use concise bullet points.
Require the model to list assumptions (growth rates, margin expansion, multiples) so you can test them numerically.
7) Step 7 — Scenario Analysis & Stress Testing
Translate narrative assumptions into scenarios:
- Base: revenue CAGR = X%, margin expands to Y% in 3 years.
- Bull: higher adoption + margin upside.
- Bear: demand slump, margin compression.
Run sensitivity tables: vary revenue and margin and compute implied valuation ranges. For more rigor, use Monte Carlo sampling (simulate distribution of outcomes) if you have the capacity.
8) Step 8 — Backtest & Validate
Before trusting outputs:
- Backtest your signals (e.g., EPS surprise vs subsequent 30-day returns).
- Run out-of-sample tests (train on data through year T, test on T+1).
- Compare AI-extracted numbers with a trusted source (FactSet, Bloomberg, company filings) to validate accuracy.
Flag systematic errors (e.g., consistently misreading footnote accounting).
9) Step 9 — Monitor, Governance & Drift Detection
Models and sources change:
- Schedule periodic re-extraction (weekly / monthly).
- Track model performance metrics: accuracy of ratios, prediction error, number of hallucinations.
- Log each AI output with timestamp, prompt, and model version for audits.
If results start drifting, retrain workflows and re-verify data sources.
Prompt Bank — Practical Examples You Can Copy
- Summarize earnings call (short):
Summarize the Q2 earnings call for [Company] in 5 bullets: revenue drivers, margin commentary, guidance, key risks, management tone.
- Extract red flags:
From this 10-Q text, list any accounting red flags or one-time items in bullet form (e.g., revenue recognition changes, large write-downs).
- Peer comparison table:
Compare [Company A] vs [Company B], [Company C] across revenue CAGR (3yr), EPS CAGR, EV/EBITDA, and FCF yield. Return as CSV.
- Valuation assumptions checklist:
Generate a checklist of 8 assumptions to justify a 12-month target price for [Company].
Common Pitfalls & How to Avoid Them
- Hallucinations: Always ask the model to cite source text and return offsets or exact quotes.
- Garbage In → Garbage Out: Clean, reliable inputs are non-negotiable.
- Overfitting to Past Events: Use out-of-sample validation and scenario analysis.
- Ignoring Footnotes: Train the workflow to extract footnote data (leases, non-recurring items).
- Blind Automation: Keep a human in the loop for final judgement and trade execution.
Compliance & Ethics
- Ensure you don’t process material non-public information (insider data).
- Retain logs for auditability (prompts, outputs, timestamps).
- Respect data licenses for paid sources.
- Avoid giving the LLM prompts that reconstruct or infer private personal data.
Suggested Tech Stack (lightweight)
- LLMs / AI: ChatGPT (via API), FinChat, custom LLMs for enterprise.
- Data & Docs: EDGAR, Alpha Vantage, Yahoo Finance, Sentieo, AlphaSense.
- Processing: Google Sheets / Excel for small scale; Python (pandas) for automation.
- Visualization: Koyfin, TradingView, Matplotlib / Plotly.
- Workflow: Notion or Airtable + Git (for versioning prompts).
Example Mini-Workflow (30–60 minutes for one company)
- Download last 5 years of statements (CSV).
- Run LLM prompt to extract management commentary and red flags (5–10 min).
- Parse JSON output into Excel/pandas; compute ratios (5–10 min).
- Generate 3 scenario valuations (10–15 min).
- Produce 1-page investment summary with thesis and red flags (5–10 min).
Result: reproducible research you can attach to a trade idea.
Key Takeaways
- Use AI to accelerate reading and structure outputs — not to replace judgment.
- Always request machine-readable outputs (JSON/CSV) for reproducibility.
- Validate: cross-check numbers, backtest signals, and log everything for audits.
- Combine quantitative ratios + qualitative management commentary to form a balanced thesis.