Node Architecture

The InfraMind node is the atomic unit of decentralized compute in the protocol. Each node operates as a self-contained, portable inference environment capable of receiving jobs from the mesh, verifying inputs, executing AI workloads inside a secure containerized sandbox, and returning signed proofs of completion. A node can run on any Linux-compatible host with Docker or Podman, and it can scale from low-resource CPU-only servers to high-throughput GPU clusters.

The architecture of the node is modular, designed to minimize dependency on external services while maintaining compliance with the protocol’s job execution, reward, and telemetry standards.

The node agent is the entry point. It exposes a local control plane over gRPC and a public inference listening port (default: 9000) over REST. The agent is built with FastAPI and structured around an event loop that listens for inbound job assignments from the mesh scheduler and dispatches them to the container runtime.

The core structure is:

inframind-node/
├── agent/
│   ├── listener.py
│   ├── scheduler_client.py
│   ├── container_manager.py
│   ├── signer.py
│   └── telemetry.py
├── config.yaml
└── entrypoint.py

The listener handles job requests. When a scheduler issues a job to a node, the agent verifies the job signature, confirms hardware compatibility, and queues the job for execution.

Example gRPC job payload:

{
  "job_id": "a6b4-23fa",
  "model_hash": "QmWx9a....9Bc",
  "container_uri": "ipfs://QmWx9a...",
  "entrypoint": "serve.py",
  "expected_schema": {
    "input": "text",
    "output": "embedding<float>"
  },
  "deadline": 1719450000
}

Once received, the container manager is invoked. The manager checks the local container cache layer. If the requested model is not available locally, it is pulled from the registry (IPFS, HTTP, or fallback gateway), validated against the manifest hash, and loaded into a dedicated execution sandbox using Docker.

The execution context is launched with strict resource constraints defined in the model.yaml:

resources:
  cpu: 2
  memory: 2Gi
  gpu: true

The agent uses Docker’s native API to isolate jobs:

client.containers.run(
    image=model_image,
    command=["python3", entrypoint],
    detach=True,
    cpu_shares=2048,
    mem_limit="2g",
    environment={"JOB_ID": job_id},
    network_disabled=True,
    auto_remove=True
)

The output of the container is captured, parsed against the declared output schema, and passed to the wallet signer module.

The wallet signer is responsible for producing a verifiable receipt for every completed job. This receipt is cryptographically signed using the node’s private key (stored locally in ~/.inframind/identity.key) and includes all relevant execution metadata:

{
  "job_id": "a6b4-23fa",
  "output_hash": "0xc8f0..",
  "latency_ms": 204,
  "node": "0xB71c3f...",
  "timestamp": 1719450032,
  "signature": "0x52ff..."
}

This payload is submitted to the job coordinator or optionally included in an on-chain reward oracle batch for claiming.

The telemetry client publishes real-time stats to the local agent and optionally to a Prometheus endpoint or the mesh indexer. This includes:

Heartbeat (1–5s interval)
CPU/RAM utilization
Active container jobs
Container cache hit ratio
Job success/failure history
Mean latency (last 50 jobs)

Sample output:

infra status

Node ID:      0xB71c3f...
Jobs Served:  1,042
Avg Latency:  218ms
Uptime:       99.9%
Stake:        512 INFRA
Region:       europe-west

The container cache layer stores previously pulled models locally in a secure volume. Images are keyed by their hash and validated against their manifest before reuse. This reduces cold start latency, lowers network pull pressure, and allows for instant response on recurring workloads.

Node identity is deterministic. Upon first launch, the agent creates a keypair:

infra init --keygen

The key is stored securely and used for job signing, staking operations, and job dispute resolution. Node registration with the scheduler includes this key as part of the registration payload, signed to prevent impersonation.

The full node boot process on a fresh machine:

curl -sL https://inframind.host/install.sh | bash

This script performs the following:

Checks for Docker and installs if missing
Pulls the inframind/node:latest container
Creates a local identity keypair
Starts the node as a systemd service
Registers the node with the mesh index
Begins heartbeat broadcasting and job polling

InfraMind nodes are intentionally stateless between jobs unless the model explicitly requires session persistence (in development). This allows for safe horizontal scaling, portable deployment to edge zones, and automatic failover in case of job timeout or node failure.

The architecture is hardened to tolerate churn, regional instability, or malicious nodes. Slashing and reputation scoring are handled at the protocol layer, while each node independently maintains its own job history and local metrics.

Every part of the InfraMind node is auditable, inspectable, and customizable for developers, contributors, and advanced operators. The agent operates without platform dependencies, without centralized authentication, and without lock-in. The result is a fully programmable compute interface where sovereignty begins at the runtime.

PreviousInfraMind vs. Traditional Cloud AI NextMesh Scheduler Design

Last updated 2 months ago