I Built an AI Agent That Acts as a Virtual Revenue Analyst

Every revenue team I've ever spoken to has the same problem: too much data, not enough insight. They have a CRM, a data warehouse, a product analytics tool, and a BI dashboard. But they still spend Monday mornings manually assembling spreadsheets, trying to answer the same questions every week.

Why did our win rate drop? Which accounts are about to churn? Where is our pipeline thin? — The questions every RevOps team asks on repeat

These aren't hard questions to answer in isolation. But doing it consistently, across segments, at speed, while also comparing findings against company strategy? That's where most teams fall short. So I asked myself: what if an AI agent could do this work automatically?

That question became the Growth Intelligence Agent — a portfolio project I built to demonstrate how modern AI tooling can act as a virtual revenue analyst for a SaaS company. Here's the full story: what it does, how it's built, the decisions I made, and what I'd do differently.

The Problem

What the Agent Actually Does

The agent monitors six critical growth categories across three domains:

Domain	Metrics Monitored	Target
New Logo Growth	Pipeline Coverage, Win Rate, Sales Cycle Length, Avg Deal Size	Coverage ≥ 3.5x · Win Rate ≥ 28%
Expansion Revenue	Net Revenue Retention, Product Attach Rate, Seat Expansion	NRR ≥ 120% · Attach ≥ 35%
Retention Risk	Usage At-Risk Rate, inactive accounts, renewal proximity	At-risk < 10%

When you ask it something like "What's driving our EMEA win rate decline?", it doesn't just query a database. It pulls metrics from the warehouse, fetches CRM pipeline data by region, retrieves product usage signals, looks up your growth strategy from a RAG knowledge base, and synthesizes everything into a structured insight with root cause analysis and recommended actions.

That's the difference between a dashboard and an agent.

A multi-layer AI agent system — tools, retrieval, and reasoning working in concert.

Architecture

The System Architecture

The system has three pillars working together:

┌─────────────────────────────────────────────────────────┐
│                     Streamlit UI                         │
│         (Dashboard · Chat · Alerts · Insights)           │
└────────────────────────┬────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────┐
│              Growth Intelligence Agent                    │
│         (LangChain Agent + Claude 3.5 Sonnet)            │
└──────┬────────────────┬──────────────────┬──────────────┘
       │                │                  │
  MCP Tools         RAG System         Metrics DB
  (5 tools)      (FAISS/HuggingFace)   (CSV/Pandas)
       │                │                  │
  CRM/Usage         Company             dbt-style
    Data            Playbooks          SQL Metrics

Layer	Choice	Why
LLM	Claude 3.5 Sonnet	Best tool-use reliability; strong structured output
Agent Framework	LangChain	Mature tool-calling support, well-documented
Vector DB	FAISS (local)	No server needed, fast for small knowledge bases
Embeddings	HuggingFace all-MiniLM-L6-v2	Free, fast, good quality for English business docs
Data	CSV + Pandas	Simulates a data warehouse without the overhead
UI	Streamlit	Rapid prototyping, easy to deploy

PythonLangChainClaude 3.5 Sonnet FAISSHuggingFaceStreamlit PlotlyPandasFaker

Phase 1

Generating Realistic SaaS Data

SaaS sales funnel and CRM pipeline — CRM pipeline — accounts, opportunities, and revenue data form the backbone of the agent's reasoning.

Python data pipeline analytics — Python-based data generation — five tables simulating a complete B2B SaaS revenue funnel.

Before building any AI, I needed data. I used Python's Faker library to generate five tables that mirror a real B2B SaaS company: Accounts (200 companies across 6 industries and 4 regions), Opportunities (600 deals with realistic stage distribution), Product Usage (multi-product adoption per account), Marketing Leads (1,500 leads with source attribution), and Subscription Revenue (contract values and renewal flags).

The key was making the data analytically coherent — deal sizes follow a log-normal distribution, win rates vary realistically by stage, and usage patterns create genuine churn signals. Generic random data produces garbage metrics; I wanted metrics worth reasoning about.

Phase 2

The dbt-Style Metrics Layer

This is the most underrated part of the project. Instead of writing ad-hoc queries everywhere, I built a metrics computation engine inspired by dbt. Each metric is a standalone Python function with a SQL-equivalent comment explaining the logic:

def model_win_rate(opps: pd.DataFrame, segment_col: str = None) -> list:
    """
    SQL equivalent:
        SELECT
            SUM(CASE WHEN stage = 'Closed Won' THEN 1 ELSE 0 END)::float
            / NULLIF(SUM(CASE WHEN stage IN ('Closed Won','Closed Lost')
              THEN 1 ELSE 0 END), 0) AS win_rate
        FROM opportunities
    """
    closed = opps[opps["stage"].isin(["Closed Won", "Closed Lost"])].copy()
    won = (closed["stage"] == "Closed Won").sum()
    return round(won / len(closed), 4) if len(closed) else 0

The output is a clean, structured metrics table with four columns: metric_name, segment, metric_value, date. This single table is what the agent queries — no matter how complex the underlying logic. Define once, query anywhere. That pattern separates a prototype from a production-grade data product.

Phase 3

Building the RAG Knowledge Base

This is where the project moves from "dashboarding" to "intelligence." A vanilla agent with metrics tools can tell you what happened. The RAG layer helps it explain why it matters — and recommend what to do — in the context of your specific company.

Vector embeddings and semantic search visualization

The RAG pipeline: company documents → chunking → embeddings → FAISS vector store → contextual retrieval during agent reasoning.

I created three markdown documents representing the kind of internal knowledge most RevOps teams have scattered across Notion and Google Docs:

Ideal Customer Profile — firmographic criteria, buyer personas, MEDDIC qualification questions
Growth Strategy Playbook — annual targets, pipeline generation mix, expansion triggers, churn playbooks
Pricing & Sales Rules — tier definitions, discount guidelines, deal stage exit criteria, NRR calculation

These are chunked into 800-token segments with 100-token overlap, embedded with HuggingFace's all-MiniLM-L6-v2, and stored in FAISS. When the agent receives a question, it retrieves the top-4 most relevant chunks alongside metric data.

The RAG difference in practice

Without RAG: "Your win rate is 24%, below the 28% target."

With RAG: "Your win rate is 24%, below the 28% target. Based on your MEDDIC framework, deals lacking an identified Champion have a 60% lower close rate — consider reviewing open deals for champion qualification gaps before the end of quarter."

Phase 4

MCP Tool Design

I designed five tools following the Model Context Protocol (MCP) pattern — each is a LangChain @tool with a carefully written docstring that tells the LLM exactly what the tool does, what inputs it accepts, and when to use it:

@tool
def get_pipeline_by_segment(query: str = "all") -> str:
    """
    Retrieve CRM opportunity pipeline data, segmented by stage,
    region, or industry.
    Input examples: 'by region', 'by stage', 'by industry',
    'open deals', 'all'.
    Returns pipeline value, deal counts, and win rates per segment.
    """

The tool descriptions are not boilerplate — they are the prompt engineering that determines whether the agent uses the right tool at the right time. I rewrote them several times to reduce tool misselection. Vague descriptions lead to random tool selection; precise, example-rich descriptions lead to reliable reasoning chains.

Phase 5

The Agent Core

AI system reasoning and multi-step agent — The agent reasons in a ReAct loop — calling tools, observing results, and building toward a final answer.

Automated alert monitoring system — The automated alert system flags metric anomalies and fires prioritized recommendations.

The heart of the system is a LangChain tool-calling agent powered by Claude 3.5 Sonnet. The system prompt does three critical things:

It defines a persona with clear responsibilities — not "you are a helpful assistant" but a specific role with named metrics, explicit targets, and a defined output format. Every response must include: Key Insight, Root Cause Analysis, Supporting Data, and Recommended Actions.

It establishes a comparison standard — the agent knows the targets and will automatically flag when metrics fall short, not just report the raw values.

It's paired with an alert system — a separate module compares current metrics against a prior period and fires prioritized alerts (High / Medium / Low) when any metric changes beyond a configurable threshold.

Phase 6

The Streamlit Dashboard

Streamlit dashboard with dark theme analytics

The Growth Intelligence Agent dashboard — 8 KPI cards, AI chat, automated alerts, and full analysis in four tabs.

The UI has four tabs: AI Chat for natural language questions with multi-turn conversation history; Metrics Explorer with Plotly charts for win rate by region, pipeline distribution, and product usage; Alerts for one-click anomaly detection with severity-coded inline recommendations; and Insights for a full multi-tool analysis sweep across all metrics, segments, and risk signals.

The dark theme was intentional — the entire UI uses CSS custom properties for consistency, with a color system that maps green/amber/red directly to metric health status. Every KPI card tells you at a glance whether you're on track.

Retrospective

What I'd Do Differently

Replace simulated data with real integrations. The most impactful upgrade would be connecting to Salesforce, HubSpot, or Snowflake APIs. The tool layer is already designed to make this a drop-in replacement.

Add metric history. The current alert system simulates prior-period data. A real time-series store (even SQLite) would make trend analysis and anomaly detection far more meaningful.

Streaming responses. For better UX, agent responses should stream token-by-token rather than appearing after a 15–30 second wait. LangChain's streaming callbacks make this straightforward.

Evaluation framework. I'd add a test suite for the agent — sample questions with expected answers — to catch regressions when the prompt or tools change. This is the discipline that separates AI demos from AI products.

Dockerize it. A docker-compose.yml would make the project far easier to run and far more impressive to a hiring manager reviewing it.

Key Lessons

What Building This Taught Me

Lesson 1: Tool descriptions are prompt engineering

The single biggest factor in whether the agent picks the right tool is the quality of the @tool docstring. I rewrote mine three times before the agent reliably routed questions to the correct data source.

Lesson 2: RAG is only as good as the documents you give it

Generic documents produce generic recommendations. The value comes from grounding the agent in your company's actual strategy, targets, and playbooks — not boilerplate business content.

Lesson 3: The metrics layer is the foundation everything else depends on

If the data is messy or the metric definitions are inconsistent, the agent's reasoning will be wrong regardless of how good the LLM is. Invest in the data layer first — always.

Structure your prompts like job descriptions with KPIs — not personality descriptions.

The Repo

The full project is live on GitHub: github.com/ShrikantLambe/growth_intelligence_agent

It's structured across six phases with a clean quickstart — you can have the agent running locally in under 10 minutes with your own Anthropic or OpenAI API key.

# Clone and run in 4 commands
pip install -r requirements.txt
python scripts/generate_data.py
python rag/build_vectorstore.py
streamlit run ui/app.py

I'm planning to extend this into a multi-agent architecture — a Supervisor agent routing questions to specialist sub-agents (Pipeline Agent, Expansion Agent, Churn Risk Agent) for deeper domain reasoning. If you're building something similar, or have questions about any part of the stack, connect with me on LinkedIn.

PythonLangChain Claude 3.5 SonnetFAISS HuggingFaceStreamlit PlotlyRAGMCP

I Built an AI Agent That Acts as a Virtual Revenue Analyst — Here's Everything I Learned