How I combined LangChain, RAG, MCP tools, and Claude 3.5 Sonnet to build a SaaS Growth Intelligence Agent from scratch — and what it taught me about building production-grade AI systems.
Every revenue team I've ever spoken to has the same problem: too much data, not enough insight. They have a CRM, a data warehouse, a product analytics tool, and a BI dashboard. But they still spend Monday mornings manually assembling spreadsheets, trying to answer the same questions every week.
Why did our win rate drop? Which accounts are about to churn? Where is our pipeline thin? — The questions every RevOps team asks on repeat
These aren't hard questions to answer in isolation. But doing it consistently, across segments, at speed, while also comparing findings against company strategy? That's where most teams fall short. So I asked myself: what if an AI agent could do this work automatically?
That question became the Growth Intelligence Agent — a portfolio project I built to demonstrate how modern AI tooling can act as a virtual revenue analyst for a SaaS company. Here's the full story: what it does, how it's built, the decisions I made, and what I'd do differently.
The agent monitors six critical growth categories across three domains:
| Domain | Metrics Monitored | Target |
|---|---|---|
| New Logo Growth | Pipeline Coverage, Win Rate, Sales Cycle Length, Avg Deal Size | Coverage ≥ 3.5x · Win Rate ≥ 28% |
| Expansion Revenue | Net Revenue Retention, Product Attach Rate, Seat Expansion | NRR ≥ 120% · Attach ≥ 35% |
| Retention Risk | Usage At-Risk Rate, inactive accounts, renewal proximity | At-risk < 10% |
When you ask it something like "What's driving our EMEA win rate decline?", it doesn't just query a database. It pulls metrics from the warehouse, fetches CRM pipeline data by region, retrieves product usage signals, looks up your growth strategy from a RAG knowledge base, and synthesizes everything into a structured insight with root cause analysis and recommended actions.
A multi-layer AI agent system — tools, retrieval, and reasoning working in concert.
The system has three pillars working together:
┌─────────────────────────────────────────────────────────┐
│ Streamlit UI │
│ (Dashboard · Chat · Alerts · Insights) │
└────────────────────────┬────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────┐
│ Growth Intelligence Agent │
│ (LangChain Agent + Claude 3.5 Sonnet) │
└──────┬────────────────┬──────────────────┬──────────────┘
│ │ │
MCP Tools RAG System Metrics DB
(5 tools) (FAISS/HuggingFace) (CSV/Pandas)
│ │ │
CRM/Usage Company dbt-style
Data Playbooks SQL Metrics
| Layer | Choice | Why |
|---|---|---|
| LLM | Claude 3.5 Sonnet | Best tool-use reliability; strong structured output |
| Agent Framework | LangChain | Mature tool-calling support, well-documented |
| Vector DB | FAISS (local) | No server needed, fast for small knowledge bases |
| Embeddings | HuggingFace all-MiniLM-L6-v2 | Free, fast, good quality for English business docs |
| Data | CSV + Pandas | Simulates a data warehouse without the overhead |
| UI | Streamlit | Rapid prototyping, easy to deploy |
Before building any AI, I needed data. I used Python's Faker library to generate five tables that mirror a real B2B SaaS company: Accounts (200 companies across 6 industries and 4 regions), Opportunities (600 deals with realistic stage distribution), Product Usage (multi-product adoption per account), Marketing Leads (1,500 leads with source attribution), and Subscription Revenue (contract values and renewal flags).
The key was making the data analytically coherent — deal sizes follow a log-normal distribution, win rates vary realistically by stage, and usage patterns create genuine churn signals. Generic random data produces garbage metrics; I wanted metrics worth reasoning about.
This is the most underrated part of the project. Instead of writing ad-hoc queries everywhere, I built a metrics computation engine inspired by dbt. Each metric is a standalone Python function with a SQL-equivalent comment explaining the logic:
def model_win_rate(opps: pd.DataFrame, segment_col: str = None) -> list: """ SQL equivalent: SELECT SUM(CASE WHEN stage = 'Closed Won' THEN 1 ELSE 0 END)::float / NULLIF(SUM(CASE WHEN stage IN ('Closed Won','Closed Lost') THEN 1 ELSE 0 END), 0) AS win_rate FROM opportunities """ closed = opps[opps["stage"].isin(["Closed Won", "Closed Lost"])].copy() won = (closed["stage"] == "Closed Won").sum() return round(won / len(closed), 4) if len(closed) else 0
The output is a clean, structured metrics table with four columns: metric_name, segment, metric_value, date. This single table is what the agent queries — no matter how complex the underlying logic. Define once, query anywhere. That pattern separates a prototype from a production-grade data product.
This is where the project moves from "dashboarding" to "intelligence." A vanilla agent with metrics tools can tell you what happened. The RAG layer helps it explain why it matters — and recommend what to do — in the context of your specific company.
The RAG pipeline: company documents → chunking → embeddings → FAISS vector store → contextual retrieval during agent reasoning.
I created three markdown documents representing the kind of internal knowledge most RevOps teams have scattered across Notion and Google Docs:
These are chunked into 800-token segments with 100-token overlap, embedded with HuggingFace's all-MiniLM-L6-v2, and stored in FAISS. When the agent receives a question, it retrieves the top-4 most relevant chunks alongside metric data.
Without RAG: "Your win rate is 24%, below the 28% target."
With RAG: "Your win rate is 24%, below the 28% target. Based on your MEDDIC framework, deals lacking an identified Champion have a 60% lower close rate — consider reviewing open deals for champion qualification gaps before the end of quarter."
I designed five tools following the Model Context Protocol (MCP) pattern — each is a LangChain @tool with a carefully written docstring that tells the LLM exactly what the tool does, what inputs it accepts, and when to use it:
@tool def get_pipeline_by_segment(query: str = "all") -> str: """ Retrieve CRM opportunity pipeline data, segmented by stage, region, or industry. Input examples: 'by region', 'by stage', 'by industry', 'open deals', 'all'. Returns pipeline value, deal counts, and win rates per segment. """
The tool descriptions are not boilerplate — they are the prompt engineering that determines whether the agent uses the right tool at the right time. I rewrote them several times to reduce tool misselection. Vague descriptions lead to random tool selection; precise, example-rich descriptions lead to reliable reasoning chains.
The heart of the system is a LangChain tool-calling agent powered by Claude 3.5 Sonnet. The system prompt does three critical things:
It defines a persona with clear responsibilities — not "you are a helpful assistant" but a specific role with named metrics, explicit targets, and a defined output format. Every response must include: Key Insight, Root Cause Analysis, Supporting Data, and Recommended Actions.
It establishes a comparison standard — the agent knows the targets and will automatically flag when metrics fall short, not just report the raw values.
It's paired with an alert system — a separate module compares current metrics against a prior period and fires prioritized alerts (High / Medium / Low) when any metric changes beyond a configurable threshold.
The Growth Intelligence Agent dashboard — 8 KPI cards, AI chat, automated alerts, and full analysis in four tabs.
The UI has four tabs: AI Chat for natural language questions with multi-turn conversation history; Metrics Explorer with Plotly charts for win rate by region, pipeline distribution, and product usage; Alerts for one-click anomaly detection with severity-coded inline recommendations; and Insights for a full multi-tool analysis sweep across all metrics, segments, and risk signals.
The dark theme was intentional — the entire UI uses CSS custom properties for consistency, with a color system that maps green/amber/red directly to metric health status. Every KPI card tells you at a glance whether you're on track.
Replace simulated data with real integrations. The most impactful upgrade would be connecting to Salesforce, HubSpot, or Snowflake APIs. The tool layer is already designed to make this a drop-in replacement.
Add metric history. The current alert system simulates prior-period data. A real time-series store (even SQLite) would make trend analysis and anomaly detection far more meaningful.
Streaming responses. For better UX, agent responses should stream token-by-token rather than appearing after a 15–30 second wait. LangChain's streaming callbacks make this straightforward.
Evaluation framework. I'd add a test suite for the agent — sample questions with expected answers — to catch regressions when the prompt or tools change. This is the discipline that separates AI demos from AI products.
Dockerize it. A docker-compose.yml would make the project far easier to run and far more impressive to a hiring manager reviewing it.
The single biggest factor in whether the agent picks the right tool is the quality of the @tool docstring. I rewrote mine three times before the agent reliably routed questions to the correct data source.
Generic documents produce generic recommendations. The value comes from grounding the agent in your company's actual strategy, targets, and playbooks — not boilerplate business content.
If the data is messy or the metric definitions are inconsistent, the agent's reasoning will be wrong regardless of how good the LLM is. Invest in the data layer first — always.
The full project is live on GitHub: github.com/ShrikantLambe/growth_intelligence_agent
It's structured across six phases with a clean quickstart — you can have the agent running locally in under 10 minutes with your own Anthropic or OpenAI API key.
# Clone and run in 4 commands
pip install -r requirements.txt
python scripts/generate_data.py
python rag/build_vectorstore.py
streamlit run ui/app.py
I'm planning to extend this into a multi-agent architecture — a Supervisor agent routing questions to specialist sub-agents (Pipeline Agent, Expansion Agent, Churn Risk Agent) for deeper domain reasoning. If you're building something similar, or have questions about any part of the stack, connect with me on LinkedIn.