Multi-Node Job Execution

InfraMind supports multi-node job execution for models and workloads that exceed the limits of a single container, either due to memory, latency, or compute constraints. Multi-node execution allows large inference jobs to be decomposed into discrete, parallelizable tasks that can be distributed across the mesh, executed independently or sequentially, and then recombined into a final result.

This is particularly useful for:

Vision models operating on image batches
Transformer models with long sequence inputs
LLMs requiring split-context streaming
Parallel pipelines (e.g., preprocessing → model A → model B)
Distributed fine-tuning or inference search (e.g., beam decode branches)

While InfraMind itself is not a model orchestration engine, its protocol supports container chaining, DAG-based job definitions, and runtime-aware workload splitting through modular hooks and scheduler directives.

Split Input

The first step in multi-node execution is partitioning the input into discrete units that can be processed independently.

Example: A vision model designed to detect features in satellite imagery may need to process a 20,000×20,000px image. This would be infeasible to load and run on a single node. Instead, a job is submitted with a partitioning strategy:

{
  "job_id": "bd92-e3f1",
  "input": {
    "image_url": "ipfs://Qm123...",
    "split_strategy": "tile_512"
  },
  "shard_count": 16
}

Each shard becomes a sub-job:

job_id: bd92-e3f1-00
job_id: bd92-e3f1-01
...
job_id: bd92-e3f1-15

These jobs are routed to nodes independently, using the same model container. Each processes one tile, returns a partial output, and then optionally pushes to an aggregator container (reducer) or back to the scheduler for recombination.

To define split strategy in model.yaml:

multi_node:
  enabled: true
  split_mode: tile
  tile_size: 512
  merge_strategy: concat-json
  max_shards: 32

Pipe Output Between Nodes

In streaming or multi-stage pipelines, the output of one container can be passed directly as input to another.

Example: Transcribe audio → run LLM on transcript → summarize output

InfraMind supports piped execution by defining a DAG in the model manifest or submission payload:

{
  "job_graph": {
    "stages": [
      {
        "model": "asr-v1",
        "input_key": "audio_url"
      },
      {
        "model": "llm-transcribe-v2",
        "input_key": "transcript"
      },
      {
        "model": "summary-v1",
        "input_key": "text"
      }
    ]
  }
}

Each stage becomes a distinct job routed to the most capable node for that model. Node A runs ASR, sends its JSON output to Node B, which runs the LLM, then hands the response to Node C for summarization.

This enables distributed inference pipelines where each model is independently deployed and scaled, yet functionally composable via the protocol.

Define chaining explicitly in model.yaml:

chained_models:
  - name: transcription
    model_id: asr-v1
    output_key: transcript
  - name: summarize
    model_id: summary-v1
    input_key: transcript

Intermediate results are logged and optionally persisted to IPFS for audit or caching.

Use of Ray / Horovod

For workloads that require tightly synchronized parallelism—such as large-scale LLM inference, distributed attention windows, or multi-GPU matrix ops—InfraMind allows job containers to internally initialize frameworks like:

Ray – for distributed Python task execution
Horovod – for multi-node gradient sharing (primarily during fine-tuning)

These libraries must be installed inside the container, and the job manifest must declare port availability and communication mode:

multi_node:
  distributed_runtime: ray
  shard_role: worker
  cluster_size: 4
  coordination_port: 6379
  gpu: true

The container entrypoint must initiate the Ray node with the correct cluster configuration:

import ray
ray.init(address="auto")

@ray.remote
def run_model(shard):
    ...

results = ray.get([run_model.remote(s) for s in shards])

The InfraMind scheduler handles the assignment of nodes and ensures that cluster peers are colocated in compatible regions when possible. If the job requires synchronized startup, the node agents coordinate over gRPC to enforce delay windows and resource lock reservations.

Ideal Use Cases for Multi-Node Execution

Use Case

Description

Vision Tile Inference

Process massive images by tile on separate nodes

Batched Token Inference

Feed multiple prompts to LLM shards in parallel

Split Sequence Transformers

Context window > 8K tokens via horizontal slicing

Agent-Orchestrated Pipelines

Agents call sub-models in sequence or by priority

Distributed Training Epochs

Share weights across Horovod/ray nodes during update

Tree-Search or Beam Decoding

Multiple candidate generations across worker pool

Monitoring Multi-Node Jobs

Each shard or stage produces its own job receipt:

{
  "job_id": "bd92-e3f1-06",
  "shard_index": 6,
  "latency_ms": 282,
  "output_hash": "0x812f...",
  "node_id": "0xD2Bc...",
  "status": "success"
}

The final job aggregation includes:

Total shards
Success/failure count
Merge status
Final output hash
Completion timestamp

Check via CLI:

infra job --id bd92-e3f1 --details

For DAG-style executions:

infra graph --job bd92-e3f1

Returns a JSON structure representing the model graph execution trace.

Summary

Multi-node execution in InfraMind unlocks true parallelism at the protocol level. Instead of trying to vertically scale a single container, workloads are distributed across a global mesh and coordinated through declarative manifests, DAG logic, and intelligent scheduling. Whether you're processing massive image data, running 175B parameter models, or chaining 3–5 models into an autonomous agent loop, InfraMind allows you to do so across independent machines with zero central orchestration. Just containers, code, and verifiable compute.

PreviousCreating a Custom Model Endpoint NextTesting & Simulating Jobs Locally

Last updated 2 months ago