Runtime & Language Support
InfraMind is designed as a runtime-agnostic execution layer. The protocol does not enforce a specific framework, language, or serving toolchain—instead, it relies on container isolation and schema-conformant interfaces to validate that a given model behaves as declared. This allows developers to deploy models using the libraries and runtimes they are most familiar with, so long as the output can be verified and the runtime environment is self-contained.
Every model must be served from within a container that exposes either a REST or gRPC endpoint. Input/output must comply with the declared schema in model.yaml
. Beyond that, runtime selection is entirely up to the deployer.
Python Runtime (FastAPI, Flask, raw)
Python is the most widely supported runtime on the mesh, primarily using FastAPI
as the serving interface due to its speed, schema compliance, and async support.
Recommended for:
Transformers
Custom ML pipelines
Text-to-text models
Tabular inference
Finetuned LLMs (e.g.
transformers
,sentence-transformers
,scikit-learn
)
Example:
from fastapi import FastAPI, Request
app = FastAPI()
@app.post("/inference")
async def infer(req: Request):
data = await req.json()
output = process(data["input"])
return {"result": output}
In model.yaml
:
runtime: python3.10
protocol: rest
port: 9000
entrypoint: serve.py
The node agent uses the declared runtime to sandbox and execute the model, matching against expected schema and verifying response latency.
ONNX Runtime
ONNX models are supported via onnxruntime
or onnxruntime-gpu
. The container must install onnxruntime
and expose a wrapper script that accepts JSON input, feeds it to the session, and returns serialized output.
Example ONNX wrapper:
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model.onnx")
def predict(input_vector):
inputs = {"input": np.array(input_vector).astype(np.float32)}
return session.run(None, inputs)[0].tolist()
ONNX runtime is typically used for:
Exported scikit-learn pipelines
Quantized transformers
Edge-optimized vision models
Language classifiers
model.yaml
should declare:
runtime: python3.10
protocol: rest
entrypoint: run_onnx.py
TensorFlow Runtime
TensorFlow models must be served using either:
tensorflow-serving
inside a containerA custom Flask/FastAPI wrapper using
tf.keras.models.load_model
InfraMind recommends explicit wrappers for reproducibility and portability.
TensorFlow jobs often require GPU support. The base image should be:
FROM tensorflow/tensorflow:2.13.0-gpu
Containerized wrapper:
import tensorflow as tf
model = tf.keras.models.load_model("export/")
def infer(input_data):
return model.predict(input_data).tolist()
Job containers must disable eager execution if performance optimization is critical.
FastAPI Runtime (Preferred REST server)
FastAPI is the default for REST-based serving in InfraMind. It allows schema enforcement, async request processing, and auto-documentation.
Benefits:
Integrates easily with
model.yaml
input/output schemaCompatible with JSON Schema validation
Can run inside Gunicorn or Uvicorn
CLI tooling assumes a default /inference
route with a POST method, though custom paths can be configured in the manifest.
PyTorch (TorchServe)
TorchServe is supported as a model server backend. Containers must run torch-model-archiver
to bundle the model and expose the REST inference API via port 8080.
Container layout:
/model_store/
└── summarizer.mar
/config/
└── config.properties
Start TorchServe in Dockerfile:
CMD ["torchserve", "--start", "--model-store", "model_store", "--models", "summarizer.mar"]
model.yaml
example:
runtime: torchserve
entrypoint: summarizer.mar
port: 8080
resources:
gpu: true
TorchServe is ideal for large vision or speech models, multi-modal systems, or inference that requires GPU-optimized batch execution.
Rust (Coming Soon)
Rust support is in progress via WebAssembly (WASI) and native execution within secure enclaves. The goal is to allow lightweight, compiled model runners written in Rust to serve stateless functions with ultra-low latency.
Expected support:
tangram
,tract
,onnxruntime-sys
Native Fastify-like async frameworks (
axum
,actix-web
)WASM/TEE model execution with deterministic IO
Example:
#[post("/inference")]
async fn infer(payload: Json<Input>) -> Json<Output> {
let result = model.predict(&payload.data);
Json(Output { result })
}
Runtime specification will use:
runtime: rust
protocol: rest
port: 9090
These containers will require static compilation and ABI compatibility. Signed WASM models may also be supported through enclave isolation.
Summary of Supported Runtimes
Python
python:3.10
9000
optional
LLMs, NLP, tabular, text
ONNX
onnxruntime
9000
optional
Quantized, portable models
TensorFlow
tensorflow:2.x
9000
required
Image, audio, LLM fine-tunes
FastAPI
uvicorn
9000
optional
General REST wrapper
TorchServe
torchserve
8080
required
Large batch, computer vision
Rust (WASI)
Static binary / WASM
9090
no
Embedded, secure, ultra-fast
All runtimes must expose input/output handlers that comply with the declared schema in model.yaml
. The mesh assumes no language, framework, or dependency set beyond what is declared by the deployer inside the container.
InfraMind’s runtime layer is not prescriptive—it is flexible by design. The only requirement is that your model runs deterministically, accepts schema-conformant input, returns valid JSON, and can be validated without trust in the runtime. Whether written in Python, Rust, or wrapped in a compiled inference runner, any model can become a global endpoint with one registration.
Last updated