Creating a Custom Model Endpoint
InfraMind allows every deployed model to expose its own custom endpoint, making it accessible via a globally routed, verifiable interface. A model endpoint is not tied to any single node, IP, or region — it's an abstracted access layer over a decentralized runtime. When a request is sent to an endpoint, the mesh scheduler finds the optimal node, routes the job, and returns the result.
To make this work in a verifiable, trust-minimized way, each model must explicitly declare its interface — the expected input structure, output structure, optional metadata, and any routing constraints. This declaration lives in the model.yaml
file and is enforced at runtime by the node agent.
Defining Input and Output Schema
InfraMind uses JSON Schema v7 to define the structure of model input and output. This enables automatic validation before execution, better model introspection, and security guarantees across arbitrary runtimes.
Example input schema for a summarization model:
input_schema:
type: object
properties:
text:
type: string
minLength: 10
maxLength: 8192
required: [text]
Output schema:
output_schema:
type: object
properties:
summary:
type: string
required: [summary]
When a job is received, the agent uses these schemas to verify:
That the incoming request matches expectations
That the returned response conforms to the schema
That the model can be used in chained executions without ambiguity
Schemas are enforced at three points:
At job submission
At node pre-execution
At node post-execution
Invalid responses (e.g., null
, wrong keys, wrong types) cause job failure and downgrade the node's reputation.
Securing Endpoints
Each model endpoint is associated with a wallet address (from the deployer). By default, any public user can call a model endpoint. However, private endpoints can be configured with access control via token scopes or API key signatures.
To enable basic access control, generate a signed access token:
infra token generate --model summarizer-v1 --expires-in 1h
This token is included in the request header:
curl -X POST https://api.inframind.host/inference/v1/summarizer-v1 \
-H "Authorization: Bearer <TOKEN>" \
-H "Content-Type: application/json" \
-d '{"text": "The runtime must be sovereign."}'
Tokens are JWT-encoded with scope, TTL, and model restrictions. Invalid tokens result in 401 Unauthorized
before job scheduling.
To mark an endpoint as private in the manifest:
access:
public: false
allow_list:
- 0xB1F3... # wallet address
- 0xD4E2...
auth_required: true
This disables public inference and requires a valid token signed by the deployer.
For enterprise use, token scopes can also define per-user or per-IP rate limits.
Rate Limiting
Rate limiting is enforced at the scheduler and node level. It prevents abuse and allows fine-grained metering of endpoint access.
Basic config in model.yaml
:
rate_limit:
max_requests_per_minute: 120
max_concurrent_jobs: 4
Limits can also be scoped to:
IP addresses
Wallet addresses
Auth tokens
Example:
rate_limit:
per_ip:
max_requests_per_hour: 1000
per_token:
max_rps: 5
If limits are exceeded, the job is rejected with 429 Too Many Requests
. Rejected jobs are not charged and not counted against the node.
Logging & Usage Tracking
Every model request is logged at the node and scheduler level. The following data is captured per job:
{
"job_id": "ff01-a832",
"model": "summarizer-v1",
"request_size": 512,
"execution_time_ms": 213,
"client_address": "0xDEADBEEF...",
"endpoint": "/inference",
"timestamp": 1720213240
}
Logs are stored at:
~/.inframind/logs/models/summarizer-v1.log
To access them via CLI:
infra logs --model summarizer-v1 --tail 20
To view usage stats:
infra usage --model summarizer-v1
Example output:
Model: summarizer-v1
Total Requests: 1,823
Avg Latency: 221ms
Success Rate: 99.2%
Last 24h: 243 calls
For advanced operators, logs can be shipped to external collectors:
infra log-forward --to http://your-elk-instance:5044
or exposed via Prometheus exporter:
infra metrics enable --port 9101
Grafana dashboards can visualize:
Request frequency
Error rate
Node assignment heatmap
Average latency over time
Cache hit/miss ratio
Custom Endpoint Paths and Methods
While the default endpoint is /inference
with POST method, custom paths can be used:
protocol: rest
custom_routes:
- path: /generate
method: POST
description: "Generate a summary"
- path: /version
method: GET
description: "Return version info"
The container must handle these routes. They are advertised in the model registry and surfaced in the auto-generated OpenAPI docs.
Summary
To create a custom model endpoint:
Define clear input/output schemas using JSON Schema
Configure access control using JWT tokens and allowlists
Enable rate limiting for abuse protection and fairness
Log all executions and expose usage metrics
Optionally expose custom routes for richer APIs
This model endpoint becomes a secure, auditable, mesh-routed interface to your AI system — portable, reproducible, and independent of traditional API infrastructure.
Last updated