Node Configuration
InfraMind nodes are designed to run autonomously and securely, with minimal manual configuration. However, advanced operators and infrastructure teams may need to customize runtime behavior, optimize region placement, enable hardware features, or integrate nodes into their internal monitoring systems. Node configuration is handled through a mix of environment variables, config files, and systemd options, all of which are respected by the node agent at boot.
Most configuration is stored and interpreted from ~/.inframind/config.yaml
, but for ephemeral control and compatibility with orchestrated environments (e.g. Kubernetes, Nomad, systemd), all essential parameters can also be set using environment variables.
Core Environment Variables
These environment variables are read at runtime and override values in the YAML config.
INFRA_NODE_KEY=ed25519:MHcCAQEEIBvYhF6uFs+T1aALHt...
INFRA_LOCATION_HINT=asia-south
ALLOW_GPU=true
CONTAINER_CACHE_PATH=/mnt/inframind/cache
INFRA_LOG_LEVEL=info
INFRA_MAX_JOBS=3
INFRA_NODE_KEY
Defines the node’s private identity key used for signing job receipts and staking transactions. If not present, a key is generated and stored in ~/.inframind/identity.key
. Format is Ed25519 base64 encoded.
To generate a persistent key:
openssl genpkey -algorithm ed25519 -out ~/.inframind/identity.key
To load from environment:
export INFRA_NODE_KEY=$(cat ~/.inframind/identity.key | base64)
INFRA_LOCATION_HINT
Overrides automatic region detection. Used by the scheduler to optimize routing decisions.
Examples:
INFRA_LOCATION_HINT=us-west
INFRA_LOCATION_HINT=europe-central
INFRA_LOCATION_HINT=africa-north
Recommended when operating in regions with asymmetric latency or mixed IP geolocation.
ALLOW_GPU=true
Enables GPU-based job eligibility. Requires nvidia-container-runtime
and correct driver installation.
Validate:
nvidia-smi
If ALLOW_GPU
is set to true, the agent performs a CUDA test on boot. Jobs with GPU requirements are only routed to nodes with verified GPU capability.
Optional tuning:
CUDA_VISIBLE_DEVICES=0,1
INFRA_GPU_PROFILE=A100
CONTAINER_CACHE_PATH
Sets a persistent path for container images and model bundles. Default is ~/.inframind/cache
.
For large disk volumes:
export CONTAINER_CACHE_PATH=/var/lib/inframind/cache
Set via CLI:
infra config set container_cache=/data/cache
INFRA_LOG_LEVEL
Controls log verbosity.
error
– fatal failures onlywarn
– non-critical issuesinfo
– default runtime eventsdebug
– detailed trace, container logs, scheduler handshakes
Set persistently:
export INFRA_LOG_LEVEL=debug
Or update config:
infra config set log_level=debug
INFRA_MAX_JOBS
Limits concurrent running containers. Prevents overload on lower-spec systems.
INFRA_MAX_JOBS=1
If system resources are exceeded, excess jobs are rejected with a 503 Retry
.
Persistence and Resilience
Logs are stored at:
~/.inframind/logs/inframind.log
To persist logs across reboots, rotate them weekly:
logrotate /etc/logrotate.d/inframind
Example logrotate config:
~/.inframind/logs/inframind.log {
weekly
rotate 4
compress
delaycompress
missingok
notifempty
copytruncate
}
Restart on Failure
For systemd-managed nodes, restart policies should be enforced for high availability:
sudo systemctl edit inframind-node
Insert:
[Service]
Restart=always
RestartSec=5
Then:
sudo systemctl daemon-reexec
sudo systemctl restart inframind-node
This guarantees automatic recovery on crash, out-of-memory kill, or machine reboot.
Custom Configuration File
Modify ~/.inframind/config.yaml
for persistent configuration:
node_id: auto
region_hint: us-east
allow_gpu: true
max_jobs: 2
container_cache: /mnt/cache
log_level: info
heartbeat_interval: 5
Changes take effect on next restart.
Integration with Supervisors
For orchestration tools like Docker Compose or Kubernetes:
services:
inframind:
image: inframind/node:latest
ports:
- "9000:9000"
volumes:
- ~/.inframind:/node
- /mnt/cache:/mnt/cache
environment:
- INFRA_NODE_KEY=${INFRA_NODE_KEY}
- INFRA_LOCATION_HINT=us-east
- ALLOW_GPU=true
- INFRA_LOG_LEVEL=info
InfraMind node configuration is designed to be declarative, override-friendly, and compatible with local or large-scale deployments. Whether deployed on a single board computer or inside a multi-GPU cluster, nodes remain protocol-compliant and automatically reflect their configuration in all heartbeat payloads and registry metrics. This ensures the scheduler can route jobs intelligently, and that every operator has full control over how their node behaves inside the mesh.
Last updated