Architecture¶
Infrintia has three runtime surfaces: the broker, the host agent, and the SDK.
Broker¶
The FastAPI broker owns user accounts, API keys, credit balances, model
metadata, host registration, job state, streaming, metrics, and billing.
Business logic lives in app.marketplace and route modules under
app.routes keep the HTTP surface thin.
Job execution follows this flow:
A user submits a model or equation job.
The broker reserves estimated credits and selects the cheapest compatible idle host.
The host claims the job with its host token.
The host streams chunks back to the broker.
The broker relays chunks to the user through Server-Sent Events.
Completion captures the final cost and refunds excess reserved credits.
Host Agent¶
The host agent detects CUDA, Apple MPS, or CPU capabilities and registers the
models it can serve. host_agent.runtime.HostRuntime owns backend selection
and delegates execution to HuggingFace, LangChain, or subprocess-isolated worker
backends.
Worker isolation is handled by host_agent.worker_manager.WorkerManager. Each
model can run in its own virtual environment, which lets hosts keep model
dependencies isolated and optionally warm between jobs.
SDK¶
sdk.compute.ComputeClient and sdk.compute.AsyncComputeClient wrap broker
HTTP endpoints for user-facing workflows: credits, key rotation, model listing,
job submission, polling, cancellation, and token streaming.
Deployment¶
The production shape targets Google Cloud Run for the broker, Cloud SQL for PostgreSQL, Memorystore for Redis, Artifact Registry for images, and Secret Manager for sensitive configuration.