Multi-Node Job Execution
InfraMind supports multi-node job execution for models and workloads that exceed the limits of a single container, either due to memory, latency, or compute constraints. Multi-node execution allows large inference jobs to be decomposed into discrete, parallelizable tasks that can be distributed across the mesh, executed independently or sequentially, and then recombined into a final result.
This is particularly useful for:
Vision models operating on image batches
Transformer models with long sequence inputs
LLMs requiring split-context streaming
Parallel pipelines (e.g., preprocessing → model A → model B)
Distributed fine-tuning or inference search (e.g., beam decode branches)
While InfraMind itself is not a model orchestration engine, its protocol supports container chaining, DAG-based job definitions, and runtime-aware workload splitting through modular hooks and scheduler directives.
Split Input
The first step in multi-node execution is partitioning the input into discrete units that can be processed independently.
Example: A vision model designed to detect features in satellite imagery may need to process a 20,000×20,000px image. This would be infeasible to load and run on a single node. Instead, a job is submitted with a partitioning strategy:
{
"job_id": "bd92-e3f1",
"input": {
"image_url": "ipfs://Qm123...",
"split_strategy": "tile_512"
},
"shard_count": 16
}
Each shard becomes a sub-job:
job_id: bd92-e3f1-00
job_id: bd92-e3f1-01
...
job_id: bd92-e3f1-15
These jobs are routed to nodes independently, using the same model container. Each processes one tile, returns a partial output, and then optionally pushes to an aggregator container (reducer) or back to the scheduler for recombination.
To define split strategy in model.yaml
:
multi_node:
enabled: true
split_mode: tile
tile_size: 512
merge_strategy: concat-json
max_shards: 32
Pipe Output Between Nodes
In streaming or multi-stage pipelines, the output of one container can be passed directly as input to another.
Example: Transcribe audio → run LLM on transcript → summarize output
InfraMind supports piped execution by defining a DAG in the model manifest or submission payload:
{
"job_graph": {
"stages": [
{
"model": "asr-v1",
"input_key": "audio_url"
},
{
"model": "llm-transcribe-v2",
"input_key": "transcript"
},
{
"model": "summary-v1",
"input_key": "text"
}
]
}
}
Each stage becomes a distinct job routed to the most capable node for that model. Node A runs ASR, sends its JSON output to Node B, which runs the LLM, then hands the response to Node C for summarization.
This enables distributed inference pipelines where each model is independently deployed and scaled, yet functionally composable via the protocol.
Define chaining explicitly in model.yaml
:
chained_models:
- name: transcription
model_id: asr-v1
output_key: transcript
- name: summarize
model_id: summary-v1
input_key: transcript
Intermediate results are logged and optionally persisted to IPFS for audit or caching.
Use of Ray / Horovod
For workloads that require tightly synchronized parallelism—such as large-scale LLM inference, distributed attention windows, or multi-GPU matrix ops—InfraMind allows job containers to internally initialize frameworks like:
Ray – for distributed Python task execution
Horovod – for multi-node gradient sharing (primarily during fine-tuning)
These libraries must be installed inside the container, and the job manifest must declare port availability and communication mode:
multi_node:
distributed_runtime: ray
shard_role: worker
cluster_size: 4
coordination_port: 6379
gpu: true
The container entrypoint must initiate the Ray node with the correct cluster configuration:
import ray
ray.init(address="auto")
@ray.remote
def run_model(shard):
...
results = ray.get([run_model.remote(s) for s in shards])
The InfraMind scheduler handles the assignment of nodes and ensures that cluster peers are colocated in compatible regions when possible. If the job requires synchronized startup, the node agents coordinate over gRPC to enforce delay windows and resource lock reservations.
Ideal Use Cases for Multi-Node Execution
Vision Tile Inference
Process massive images by tile on separate nodes
Batched Token Inference
Feed multiple prompts to LLM shards in parallel
Split Sequence Transformers
Context window > 8K tokens via horizontal slicing
Agent-Orchestrated Pipelines
Agents call sub-models in sequence or by priority
Distributed Training Epochs
Share weights across Horovod/ray nodes during update
Tree-Search or Beam Decoding
Multiple candidate generations across worker pool
Monitoring Multi-Node Jobs
Each shard or stage produces its own job receipt:
{
"job_id": "bd92-e3f1-06",
"shard_index": 6,
"latency_ms": 282,
"output_hash": "0x812f...",
"node_id": "0xD2Bc...",
"status": "success"
}
The final job aggregation includes:
Total shards
Success/failure count
Merge status
Final output hash
Completion timestamp
Check via CLI:
infra job --id bd92-e3f1 --details
For DAG-style executions:
infra graph --job bd92-e3f1
Returns a JSON structure representing the model graph execution trace.
Summary
Multi-node execution in InfraMind unlocks true parallelism at the protocol level. Instead of trying to vertically scale a single container, workloads are distributed across a global mesh and coordinated through declarative manifests, DAG logic, and intelligent scheduling. Whether you're processing massive image data, running 175B parameter models, or chaining 3–5 models into an autonomous agent loop, InfraMind allows you to do so across independent machines with zero central orchestration. Just containers, code, and verifiable compute.
Last updated