Creating a Custom Model Endpoint

InfraMind allows every deployed model to expose its own custom endpoint, making it accessible via a globally routed, verifiable interface. A model endpoint is not tied to any single node, IP, or region — it's an abstracted access layer over a decentralized runtime. When a request is sent to an endpoint, the mesh scheduler finds the optimal node, routes the job, and returns the result.

To make this work in a verifiable, trust-minimized way, each model must explicitly declare its interface — the expected input structure, output structure, optional metadata, and any routing constraints. This declaration lives in the model.yaml file and is enforced at runtime by the node agent.


Defining Input and Output Schema

InfraMind uses JSON Schema v7 to define the structure of model input and output. This enables automatic validation before execution, better model introspection, and security guarantees across arbitrary runtimes.

Example input schema for a summarization model:

input_schema:
  type: object
  properties:
    text:
      type: string
      minLength: 10
      maxLength: 8192
  required: [text]

Output schema:

output_schema:
  type: object
  properties:
    summary:
      type: string
  required: [summary]

When a job is received, the agent uses these schemas to verify:

  • That the incoming request matches expectations

  • That the returned response conforms to the schema

  • That the model can be used in chained executions without ambiguity

Schemas are enforced at three points:

  1. At job submission

  2. At node pre-execution

  3. At node post-execution

Invalid responses (e.g., null, wrong keys, wrong types) cause job failure and downgrade the node's reputation.


Securing Endpoints

Each model endpoint is associated with a wallet address (from the deployer). By default, any public user can call a model endpoint. However, private endpoints can be configured with access control via token scopes or API key signatures.

To enable basic access control, generate a signed access token:

infra token generate --model summarizer-v1 --expires-in 1h

This token is included in the request header:

curl -X POST https://api.inframind.host/inference/v1/summarizer-v1 \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"text": "The runtime must be sovereign."}'

Tokens are JWT-encoded with scope, TTL, and model restrictions. Invalid tokens result in 401 Unauthorized before job scheduling.

To mark an endpoint as private in the manifest:

access:
  public: false
  allow_list:
    - 0xB1F3...      # wallet address
    - 0xD4E2...
  auth_required: true

This disables public inference and requires a valid token signed by the deployer.

For enterprise use, token scopes can also define per-user or per-IP rate limits.


Rate Limiting

Rate limiting is enforced at the scheduler and node level. It prevents abuse and allows fine-grained metering of endpoint access.

Basic config in model.yaml:

rate_limit:
  max_requests_per_minute: 120
  max_concurrent_jobs: 4

Limits can also be scoped to:

  • IP addresses

  • Wallet addresses

  • Auth tokens

Example:

rate_limit:
  per_ip:
    max_requests_per_hour: 1000
  per_token:
    max_rps: 5

If limits are exceeded, the job is rejected with 429 Too Many Requests. Rejected jobs are not charged and not counted against the node.


Logging & Usage Tracking

Every model request is logged at the node and scheduler level. The following data is captured per job:

{
  "job_id": "ff01-a832",
  "model": "summarizer-v1",
  "request_size": 512,
  "execution_time_ms": 213,
  "client_address": "0xDEADBEEF...",
  "endpoint": "/inference",
  "timestamp": 1720213240
}

Logs are stored at:

~/.inframind/logs/models/summarizer-v1.log

To access them via CLI:

infra logs --model summarizer-v1 --tail 20

To view usage stats:

infra usage --model summarizer-v1

Example output:

Model: summarizer-v1
Total Requests: 1,823
Avg Latency: 221ms
Success Rate: 99.2%
Last 24h: 243 calls

For advanced operators, logs can be shipped to external collectors:

infra log-forward --to http://your-elk-instance:5044

or exposed via Prometheus exporter:

infra metrics enable --port 9101

Grafana dashboards can visualize:

  • Request frequency

  • Error rate

  • Node assignment heatmap

  • Average latency over time

  • Cache hit/miss ratio


Custom Endpoint Paths and Methods

While the default endpoint is /inference with POST method, custom paths can be used:

protocol: rest
custom_routes:
  - path: /generate
    method: POST
    description: "Generate a summary"
  - path: /version
    method: GET
    description: "Return version info"

The container must handle these routes. They are advertised in the model registry and surfaced in the auto-generated OpenAPI docs.


Summary

To create a custom model endpoint:

  1. Define clear input/output schemas using JSON Schema

  2. Configure access control using JWT tokens and allowlists

  3. Enable rate limiting for abuse protection and fairness

  4. Log all executions and expose usage metrics

  5. Optionally expose custom routes for richer APIs

This model endpoint becomes a secure, auditable, mesh-routed interface to your AI system — portable, reproducible, and independent of traditional API infrastructure.

Last updated