Creating a Custom Model Endpoint
InfraMind allows every deployed model to expose its own custom endpoint, making it accessible via a globally routed, verifiable interface. A model endpoint is not tied to any single node, IP, or region — it's an abstracted access layer over a decentralized runtime. When a request is sent to an endpoint, the mesh scheduler finds the optimal node, routes the job, and returns the result.
To make this work in a verifiable, trust-minimized way, each model must explicitly declare its interface — the expected input structure, output structure, optional metadata, and any routing constraints. This declaration lives in the model.yaml file and is enforced at runtime by the node agent.
Defining Input and Output Schema
InfraMind uses JSON Schema v7 to define the structure of model input and output. This enables automatic validation before execution, better model introspection, and security guarantees across arbitrary runtimes.
Example input schema for a summarization model:
input_schema:
type: object
properties:
text:
type: string
minLength: 10
maxLength: 8192
required: [text]Output schema:
output_schema:
type: object
properties:
summary:
type: string
required: [summary]When a job is received, the agent uses these schemas to verify:
That the incoming request matches expectations
That the returned response conforms to the schema
That the model can be used in chained executions without ambiguity
Schemas are enforced at three points:
At job submission
At node pre-execution
At node post-execution
Invalid responses (e.g., null, wrong keys, wrong types) cause job failure and downgrade the node's reputation.
Securing Endpoints
Each model endpoint is associated with a wallet address (from the deployer). By default, any public user can call a model endpoint. However, private endpoints can be configured with access control via token scopes or API key signatures.
To enable basic access control, generate a signed access token:
This token is included in the request header:
Tokens are JWT-encoded with scope, TTL, and model restrictions. Invalid tokens result in 401 Unauthorized before job scheduling.
To mark an endpoint as private in the manifest:
This disables public inference and requires a valid token signed by the deployer.
For enterprise use, token scopes can also define per-user or per-IP rate limits.
Rate Limiting
Rate limiting is enforced at the scheduler and node level. It prevents abuse and allows fine-grained metering of endpoint access.
Basic config in model.yaml:
Limits can also be scoped to:
IP addresses
Wallet addresses
Auth tokens
Example:
If limits are exceeded, the job is rejected with 429 Too Many Requests. Rejected jobs are not charged and not counted against the node.
Logging & Usage Tracking
Every model request is logged at the node and scheduler level. The following data is captured per job:
Logs are stored at:
To access them via CLI:
To view usage stats:
Example output:
For advanced operators, logs can be shipped to external collectors:
or exposed via Prometheus exporter:
Grafana dashboards can visualize:
Request frequency
Error rate
Node assignment heatmap
Average latency over time
Cache hit/miss ratio
Custom Endpoint Paths and Methods
While the default endpoint is /inference with POST method, custom paths can be used:
The container must handle these routes. They are advertised in the model registry and surfaced in the auto-generated OpenAPI docs.
Summary
To create a custom model endpoint:
Define clear input/output schemas using JSON Schema
Configure access control using JWT tokens and allowlists
Enable rate limiting for abuse protection and fairness
Log all executions and expose usage metrics
Optionally expose custom routes for richer APIs
This model endpoint becomes a secure, auditable, mesh-routed interface to your AI system — portable, reproducible, and independent of traditional API infrastructure.
Last updated

