Architecture ============ Infrintia has three runtime surfaces: the broker, the host agent, and the SDK. Broker ------ The FastAPI broker owns user accounts, API keys, credit balances, model metadata, host registration, job state, streaming, metrics, and billing. Business logic lives in ``app.marketplace`` and route modules under ``app.routes`` keep the HTTP surface thin. Job execution follows this flow: 1. A user submits a model or equation job. 2. The broker reserves estimated credits and selects the cheapest compatible idle host. 3. The host claims the job with its host token. 4. The host streams chunks back to the broker. 5. The broker relays chunks to the user through Server-Sent Events. 6. Completion captures the final cost and refunds excess reserved credits. Host Agent ---------- The host agent detects CUDA, Apple MPS, or CPU capabilities and registers the models it can serve. ``host_agent.runtime.HostRuntime`` owns backend selection and delegates execution to HuggingFace, LangChain, or subprocess-isolated worker backends. Worker isolation is handled by ``host_agent.worker_manager.WorkerManager``. Each model can run in its own virtual environment, which lets hosts keep model dependencies isolated and optionally warm between jobs. SDK --- ``sdk.compute.ComputeClient`` and ``sdk.compute.AsyncComputeClient`` wrap broker HTTP endpoints for user-facing workflows: credits, key rotation, model listing, job submission, polling, cancellation, and token streaming. Deployment ---------- The production shape targets Google Cloud Run for the broker, Cloud SQL for PostgreSQL, Memorystore for Redis, Artifact Registry for images, and Secret Manager for sensitive configuration.