open to senior & staff roles
milpitas, ca · remote-friendly

Analytics Engineering Leader /
for streaming-scale
data systems.

15+ years building data infrastructure at Workday and Lyft — real-time pipelines, growth analytics at scale, and self-healing DataOps. I built the analytics stack that measured multi-million-dollar acquisition campaigns at Lyft. Now I'm building agentic systems that keep data infrastructure reliable at streaming scale.

15+
years in enterprise analytics
RT
Kafka + PySpark streaming pipelines shipped
8
AI/ML systems built and deployed
40%
reporting overhead removed at Workday
Previously at
why streaming + content analytics

From GTM data to
content intelligence.

The problems I've solved at Lyft and Workday map directly to what streaming platforms need: real-time event pipelines, audience segmentation at scale, growth attribution that holds up under scrutiny, and self-healing infrastructure that doesn't page your on-call at 3am.

At Lyft I built the growth analytics stack — the data systems that measured rider acquisition, attribution windows, and marketing ROI across multi-million-dollar campaigns. The same infrastructure patterns apply to content performance, subscriber acquisition, and engagement scoring. At Workday I shipped an AI Companion that answers natural-language questions about marketing performance, grounded in a governed semantic layer — exactly the kind of query interface content analytics teams are building.

My Real-Time Retail Sales Pipeline project is a production-grade streaming system: Kafka ingestion, PySpark transformations, Snowflake + dbt modeling, Airflow orchestration. I know how to build data infrastructure that runs at the speed content decisions demand.

Shrikant Lambe
selected work · streaming & analytics focus

Real-time pipelines and self-healing infrastructure.

All 8 projects →
03 / retail-pipeline
Data Engineering · Streaming

Real-Time Retail Sales Pipeline — production-grade streaming.

End-to-end real-time data pipeline from simulated POS events to analytics-ready Snowflake models. Kafka ingestion, PySpark streaming transformations, dbt semantic layer, Airflow orchestration, Docker containerization. Full system design documentation included. The architecture maps directly to content event streaming — play events, pause signals, completion rates — at any scale.

🔄 Kafka → PySpark → Snowflake → dbt · full streaming system design · production-ready architecture
RT
Real-time ingestion
E2E
Pipeline coverage
dbt
Semantic layer
Apache KafkaPySparkSnowflake dbtAirflowDocker
retail-pipeline · streaming dashboard
STREAMING PIPELINE · LIVE METRICS
14.2K
Events/min
+8% vs avg
$847K
Rev today
+12% DoD
1.4s
P99 latency
watch
-- dbt model: real-time content events SELECT event_id, content_id, user_segment, COUNT(*) AS play_events, AVG(completion_pct) AS avg_completion FROM raw.streaming_events GROUP BY 1,2,3
07 / sentinel
Multi-Agent · DataOps

Pipeline Sentinel — a self-healing agent for Airflow.

At streaming scale, pipeline failures compound — a broken ingestion job means stale content recommendations, broken A/B test measurements, and on-call engineers pulled at night. Pipeline Sentinel eliminates that: five agents diagnose and remediate Airflow failures in ~11s with pattern memory to prevent repeats. Built on Claude Sonnet + LangGraph.

⚡ Self-heals pipeline failures in ~11s · 94% median confidence · zero human pages on low-risk incidents
5
Agents in ReAct loop
94%
Median confidence
11s
Time-to-resolution
Claude SonnetLangGraphLangSmith FastAPIAirflowStreamlit
pipeline-sentinel · agent dashboard
⚠ INCIDENT · content_pipeline · enrich_metadata
Monitor · task failure detectedT+04s
Diagnosis · upstream schema drift on `content_type`T+07s
Blast Radius · recommendation pipeline affectedT+08s
Remediation · applying `reload_schema` strategyT+09s
🛡 self-healed · confidence 94%
Schema reloaded, downstream tasks re-queued. Recommendation pipeline restored. No human intervention.
06 / growth-agent
Agentic AI · SaaS Analytics

Growth Intelligence Agent — autonomous growth analytics.

An autonomous agent that monitors growth metrics across 6 categories, detects anomalies with severity scoring, and surfaces RAG-grounded strategic recommendations from company playbooks. Built on Claude 3.5 Sonnet + LangChain. The pattern translates directly to content performance monitoring — subscriber acquisition, engagement anomalies, retention signals — at any scale.

🤖 Surfaces playbook-grounded recommendations autonomously · natural-language query interface
6
Metric categories
FAISS
Playbook retrieval
NL
Query interface
Claude 3.5 SonnetLangChainFAISS RAGStreamlitPlotly
growth-intelligence-agent · week 17
SUBSCRIBER METRICS · WEEK 17
$2.4M
MRR
+6.2% WoW
118%
NRR
above target
↑2.1%
Churn risk
anomaly
ANOMALY · HIGH — engagement drop flagged in 3 content segments
Why did engagement drop in the drama segment this week?
Per content playbook §3.2: segments with <60% completion rate + no new titles in 14d show highest churn signal. 3 sub-segments match. Recommend scheduling 2 new releases in next 7 days.
05 / cortex
Snowflake · Enterprise AI

AI-Native Data Platform on Snowflake Cortex.

A production architecture for querying content performance, subscriber analytics, and A/B test results in natural language — zero data egress, end-to-end governed, directly inside Snowflake. Cortex LLM functions, Cortex Search, Cortex Analyst with live demos. The stack that lets content teams self-serve without waiting for SQL tickets.

🏗 Natural-language analytics · zero data egress · live interactive demo on GitHub Pages
3
Cortex layers
0
Data egress
E2E
Governed AI
Snowflake CortexCortex Analyst Anthropic APISQL
snowflake-cortex-architecture
IN-WAREHOUSE GOVERNED AI · CONTENT ANALYTICS
Which content genres drove the highest subscriber retention in Q1?
Documentary and limited series drove 94% 90-day retention vs 71% for reality content. Subscribers who watched 3+ docs in first 30 days had 2.4x lower churn. Recommend prioritizing doc pipeline in Q2 acquisition campaigns.
◆ let's talk streaming data

Let's build content analytics that scales.

I'm actively talking to teams building the next generation of streaming analytics, content intelligence, and self-healing data infrastructure. I know how to ship at the velocity that content teams need.

analytics engineering lead data platform architect AI engineering manager