Architecture
Two pieces, talking over HMAC:
┌───────────────────────────┐ ┌──────────────────────────────┐
│ Control plane (Vercel) │ │ Per-user VM (Fly.io) │
│ │ │ │
│ apps/web (Next.js 16) │ ──HMAC─►│ shell-mux (:8080 public) │
│ @askrobin/db (Neon) │ │ integrations (loopback) │
│ @askrobin/billing │ ◄─HMAC──│ dispatcher (per-pane mux) │
│ @askrobin/catalog │ │ notifier (outbox watcher) │
│ @askrobin/core │ │ sshd (:443) │
└───────────────────────────┘ └──────────────────────────────┘
▲ ▲
│ │
└────── you (browser, SSH, chat) ──────────┘
Control plane
apps/web is a Next.js 16 App Router app on Vercel. Surfaces:
- Marketing (
/,/pricing,/privacy,/terms) - Signup + onboarding wizard
- Cloud-shell SPA at
/(app)— xterm.js connects to your VM's shell-mux via wss - Admin (
/admin) gated byADMIN_EMAILS - API routes for signup, secrets pass-through, ssh-keys, billing portal, OAuth broker, OAuth refresh, Stripe + Postmark webhooks
- OAuth broker at
/auth/start/[provider]and/auth/cb/[provider]
Auth.js v5 with Google sign-in and a JWT session. Drizzle on Neon Postgres for users, machines, subscriptions, oauth_sessions, audit_log, etc. The control plane never persists OAuth refresh tokens — just relays and forgets.
Per-user VM
Built from infra/vm-image/. Multi-stage Dockerfile, runs Ubuntu 24.04 + Node 22 + Tailscale + tmux + ttyd + sshd + Claude Code + robin-assistant. Image is ~1.4 GB and lives on Fly.
Inside, a supervisor entrypoint (scripts/entrypoint.sh) runs in place of systemd-as-PID-1:
| Service | Port | Profile |
|---|---|---|
| shell-mux | :8080 (public) | always |
| integrations | 127.0.0.1:8081 | always |
| sshd | :443 | always |
| dispatcher | — | claimed only |
| notifier | — | claimed only |
| three scheduler loops | — | claimed only |
Two profiles, branched on INBOUND_KEY:
- warm-pool — pre-spawned Fly machines waiting to be claimed. Skip the inner services to save RAM.
- claimed — full stack. Triggered when the control plane pushes
INBOUND_KEYand restarts the machine at signup.
The scheduler loops replace the classic systemd timers (Fly doesn't run systemd as PID 1):
robin run --dueevery 5 minrefresh-tokens.jsevery 30 minupdate.shdaily, randomized 0–2h offset
Anthropic auth modes
You pick one at signup; both work, the difference is who Anthropic bills.
- Paste (default). Bring your own Anthropic API key. We never see it. Stored at
user-data/secrets/anthropic.jsonon your VM. Claude Code reads it on launch. We have zero say in your usage. - Broker. Anthropic billing pass-through. We hold the key on the control plane and proxy your traffic. Spec §16.1 ships this only after the ToS spike resolves.
Token-refresh dataflow
The most interesting cross-cutting piece. See Integrations → Token refresh for the user-facing version.
VM cron (every 30 min)
└─ refresh-tokens.js scans user-data/secrets/*.json
└─ for each near-expiring token:
└─ POST CONTROL_PLANE_URL/api/oauth/refresh
Headers: x-machine-id, x-robin-signature
Body: { provider, refreshToken }
↓
Control plane:
1. look up machines.inbound_key by fly_machine_id
2. verify HMAC over body
3. catalog.getProvider(id) → broker client_id/secret env
4. POST to provider's tokenEndpoint, grant_type=refresh_token
5. return { tokens } — never persisted
↓
VM writes tokens back to user-data/secrets/<provider>.json
Tested in two halves: the control-plane endpoint has 11 vitest cases covering HMAC gates, provider lookup, missing creds, success, refresh-token reuse, and provider failures. The VM-side script is exercised by the docker-build smoke test.
Why per-user VMs
A VM per user is the simplest model that gives you (a) real isolation, (b) a writable filesystem for user-data/, (c) a long-running tmux session that survives between turns, and (d) a place for incoming webhooks to land. We considered a single multi-tenant VM with workspace-per-user; the isolation story was bad enough we backed out.
Cost is fine because Fly auto-suspends idle machines. Warm-pool absorbs the cold-start hit so first-time signup is < 5 s.
Source
Code: github.com/kevinkiklee/askrobin.io
Spec: docs/spec.md, audit: docs/AUDIT.md, runbook: docs/RUNBOOK.md.