Nitish.
Systems case study · hybrid AI · live production

SproutRoute

A hybrid AI travel planner that separates what AI should generate from what must be verified, deterministic, or cached. I defined the product, designed the system architecture, and operate the web app in production on Railway. The web stack uses attraction memory and verified data layers, while the native iOS app uses WeatherKit for forecast display.

The Real Complexity

Why this was technically hard.

Input

Fuzzy natural-language input

Users type free text like "beach trip with two toddlers next week." The system must extract destination, dates, children, pets, and travel mode from ambiguous input before any planning begins.

APIs

Multiple external APIs with different failure modes

Seven external services with different auth models, rate limits, response shapes, and outage patterns. Orchestrating them into one coherent response requires careful fallback and timeout design.

Trust

Factual vs generative data separation

Weather, places, and safety data must be factual. Itinerary text can be generative. Mixing these creates hallucination risk. The architecture must enforce the boundary.

Latency

Latency management across serial dependencies

A trip plan touches geocoding, weather, AI generation, and enrichment. Serial execution meant 8-12 seconds. Cutting that required rethinking what could be parallelized, cached, or removed from the hot path.

Safety

Trust and safety for family travel

Car seat laws, airline pet policies, and travel advisories are legal/safety data. They cannot be AI-generated. They must come from verified, deterministic sources with clear sourcing.

Memory

Personalization and memory across trips

Attraction memory, saved preferences, and profile-aware planning require persistent storage, freshness scoring, and a strategy for what to precompute vs generate on demand.

Product Flow

The product flow is concrete, not conceptual.

These screens are from the actual shipped flow.
System Evolution

Three architectural stages, each driven by a real production problem.

V1 AI-heavy generation

Everything generated by a single AI call: itinerary, packing, safety narrative. Fast to ship, but slow, expensive, and unreliable. Packing lists hallucinated items. Safety text had no sourcing. Latency was 8-12 seconds.

V2 Deterministic packing + model routing

Packing moved to a deterministic rules engine. Model routing assigned tasks to the right-sized model. Google Places verification replaced open-ended AI place suggestions. Latency dropped. Reliability improved.

V3 Attraction memory + verification cache

Supabase stores verified attractions per city. Freshness scoring ranks them. The AI planner receives a shortlist, not open-ended search. Each trip enriches the data for the next trip. Profile-aware planning in progress.

Why this matters

The system did not arrive at V3 by planning ahead. Each stage solved a real problem discovered in production. V1 shipped fast but hallucinated. V2 fixed reliability but still had latency. V3 added memory to make each trip improve the next. This is the kind of architectural evolution that separates a demo from a real product.

Runtime architecture
SproutRoute end-to-end runtime architecture
Latency Case Study

How I diagnosed and fixed a multi-second latency problem.

Before

The problem

  • AI-generated packing lists in the hot path added ~2 seconds per request.
  • Open-ended place discovery meant the AI model searched for places from scratch every time.
  • One large model call handled itinerary + packing + safety together, with no parallelization.
  • Full response waited for all tasks to complete before rendering anything.

Total response time: 8-12 seconds

After

The fix

  • Deterministic packing replaced AI packing. Rules engine runs in milliseconds, not seconds.
  • Cached attraction shortlist gives the AI planner verified places instead of open-ended search.
  • Per-task model routing assigns the right model to each task. Input parsing uses a fast model; itinerary uses a capable one.
  • Progressive rendering shows itinerary immediately while packing and safety load in background.

Response time improved significantly. The remaining bottleneck is AI itinerary generation, which is inherently serial.

Request lifecycle
SproutRoute request lifecycle showing blocking vs non-blocking steps
Latency before vs after
SproutRoute latency comparison before and after optimization
Key insight

Changing the model alone was not enough. The latency problem was architectural: too many tasks in the serial path, AI doing work that deterministic logic could do faster and more reliably. The fix was restructuring what the AI is responsible for, not just making it faster.

Trust Architecture

The product is not just AI text generation.

Weather

From weather APIs, not AI

Weather.gov for US forecasts and Visual Crossing for international forecasts in the web app. The native iOS app uses WeatherKit for forecast display. Real data with timestamps, not AI-generated predictions.

Places

From Google Places, not AI

Every restaurant and attraction is a real place with ratings, photos, and opening hours from the Google Places API. No hallucinated venues.

Safety

From deterministic rules and static data

Car seat laws from state-level legal data. Airline pet policies from carrier records. Travel advisories from the US State Department. No AI generation for safety-critical information.

Packing

From a deterministic generator

Packing lists are built by rules that consider weather, trip duration, children's ages, and pet presence. Consistent, fast, and never hallucinates items.

What AI actually generates

The AI generates the itinerary narrative: connecting verified places into a coherent day-by-day plan with timing, transitions, and family-appropriate recommendations. It receives a shortlist of verified attractions, real weather data, and family constraints. The AI writes the story. The facts come from verified sources.

Trust and verification
Trust and verification architecture showing AI vs deterministic vs verified data sources
Key Design Decisions

The tradeoffs that shaped the system.

Decision 1

Deterministic packing over AI packing

AI packing was creative but unreliable. A rules engine is faster, consistent, and never hallucinates "inflatable pool" for a city trip. Reliability beats novelty for a family product.

Decision 2

Per-task model routing over one global model

Input parsing needs speed, not depth. Itinerary needs quality, not speed. Routing each task to the right model cut cost and improved both latency and output quality.

Decision 3

Cached attraction shortlist over open-ended discovery

Letting the AI search for places from scratch was slow and produced inconsistent results. A verified shortlist from Supabase constrains the AI to real, ranked places.

Decision 4

Factual APIs separated from generative output

Weather, places, and safety are factual. Itinerary is generative. Mixing them in one AI call made everything feel untrustworthy. Separating them made the product feel reliable.

Decision 5

Optional accounts + DB-backed memory

The product works without accounts. But Supabase-backed memory lets returning users benefit from their history. No forced signup, no data wall. Memory is additive.

Decision 6

Demand-driven storage before broad precompute

Instead of precomputing attractions for every city, the system stores attractions as users request trips. High-demand cities accumulate data naturally. Precompute comes later for top destinations.

Personalization Roadmap

Memory and personalization, built incrementally.

Live Attraction memory

Verified attractions stored per city in Supabase. Freshness scoring ranks them. Each trip enriches the data for the next user.

Live Demand-driven storage

Attractions are stored as users request trips. No broad precompute yet. High-demand cities accumulate verified data naturally.

Next Profile-aware planning

AI-imported profile, saved traveler preferences, trip intent overlay. The planner adapts to returning users without requiring manual setup.

Attraction memory layer
Attraction memory flywheel showing how trips enrich data for future trips
The flywheel
  • Trip generates attractions — AI discovers places, Google Places verifies them.
  • Attractions stored in Supabase — city_attractions table with place identity and freshness metadata.
  • Rank endpoint scores by freshness — recently verified attractions rank higher.
  • Next trip consumes shortlist — AI planner gets verified, ranked places instead of searching from scratch.
  • Backfill upgrades legacy data — older rows get enriched with Google Places identity over time.
Production Operations

Shipped, tested, and operated as a real system.

Testing

350+ tests across unit, integration, and E2E

Node.js test runner for backend unit and integration tests. Playwright for end-to-end flows. Dependency injection for external services keeps the regression checks fast and reliable.

CI/CD

GitHub Actions + Railway auto-deploy

Every push triggers test suite and Vite build. Merge to main auto-deploys to Railway. No manual deployment steps.

Persistence

Supabase for attraction memory and profiles

PostgreSQL-backed storage for verified attractions, place identity, and user profiles. Row-level security. Freshness metadata for ranking.

Infrastructure

Railway deployment with in-memory caching

Single Railway instance with LRU caches for geocoding (6h), weather (1h), and advisory data (24h). Health endpoint for monitoring. Rate limiting at 30 requests per 15 minutes.

What I own end to end
  • Product definition: what to build, what to cut, what to defer.
  • System design: service boundaries, API contracts, data flow.
  • Full-stack implementation: React frontend, Express backend, AI integration.
  • Test infrastructure: DI patterns, mock fixtures, E2E harness.
  • Deployment and operations: Railway config, Supabase setup, CI/CD pipeline.
  • Latency diagnosis and architectural optimization.
Quality signals
Tests 350+
External APIs 7 orchestrated
Packing Deterministic
Attraction memory Live in production
What I'd Build Next

The current state is intentional. Here is what comes next.

Data quality

Stale shortlist refresh

Attractions older than 90 days get re-verified against Google Places. Closed venues drop from the shortlist. Freshness scoring already supports this; the refresh pipeline is the next step.

Generation

Stricter shortlist-driven itinerary generation

Constrain the AI planner to only use attractions from the verified shortlist. Currently the planner can suggest places outside the list. Tighter constraints improve trust and consistency.

Personalization

Profile memory

Returning users get plans shaped by their history: preferred activity types, dietary needs, accessibility requirements, past destinations. Supabase schema already supports this.

Scale

Attraction intelligence precompute

For high-demand destinations, precompute and refresh attraction shortlists on a schedule. Demand-driven storage works now; precompute makes the system faster for popular cities.