Monitoring your Node
Monitoring an InfraMind node is essential for maintaining high uptime, optimizing job performance, and ensuring that resource declarations reflect actual capacity. The node agent exposes a comprehensive local interface for telemetry inspection, historical usage analysis, and fault diagnostics. These observability layers are native to the InfraMind runtime and require no third-party plugins, though external integrations (Prometheus, Grafana) are also supported for full-stack operators.
Logs, job receipts, node statistics, and scheduler activity are all accessible via command-line tools and optionally via a local or remote Web UI.
Log Access
All node logs are written to:
~/.inframind/logs/inframind.log
These include:
Container execution logs
Job assignment metadata
Scheduling decisions and fallbacks
Heartbeat acknowledgments
Reward claim results
Errors and runtime exceptions
Example command to view in real-time:
tail -f ~/.inframind/logs/inframind.log
Or using journalctl
(for systemd deployments):
journalctl -u inframind-node -f
Logs are rotated every 24 hours by default and compressed after 7 days. To modify this behavior, update your system’s logrotate configuration.
Job History
Every job executed by a node is tracked locally as a receipt:
~/.inframind/receipts/{job_id}.json
Each receipt includes:
{
"job_id": "f23a-8e9c",
"timestamp": 1719823198,
"latency_ms": 247,
"model_ref": "ipfs://QmU1Wxje...",
"status": "success",
"output_hash": "0xab8f...",
"node_signature": "0x4921...",
"reward": "1.21 INFRA"
}
Receipts can be queried using the CLI:
infra jobs --limit 10
Or to inspect a specific job:
infra job --id f23a-8e9c
Historical job statistics (rolling averages, success rates, proof timestamps) are also calculated and presented at runtime via:
infra stats
Node Status
To monitor node health, system load, and connection state, use:
infra status
Typical output:
Node ID: 0xA39f2b...
Uptime: 19h 32m
Jobs Served: 234
Avg Latency: 213ms
Stake: 200 INFRA
GPU Enabled: true
CPU Usage: 22%
Memory Usage: 3.8 GB / 8.0 GB
Cache: 8.2 GB used / 25 GB allocated
Current Region: europe-west
Mesh Connected: true
Heartbeat OK: every 5s
This command aggregates local telemetry and the last scheduler response. If your node is offline or behind NAT/firewall, mesh status will return false and the node will be deprioritized.
Web UI Dashboard
The InfraMind node agent exposes a local monitoring dashboard by default at:
http://localhost:5050
If running remotely:
http://<your-node-ip>:5050
Dashboard sections include:
Job timeline
Live system metrics
Container cache viewer
Reward claim history
Model execution summaries
Scheduler handshake logs
Node reputation trend
This service is secured using local binding by default. To expose it over the internet, reverse proxy with nginx
or caddy
, and configure an auth layer.
To disable or change the port:
dashboard:
enabled: true
port: 5050
Or via environment:
export INFRA_DASHBOARD_PORT=8080
Prometheus & Grafana Integration
InfraMind nodes expose a native Prometheus exporter on port 9100
:
http://localhost:9100/metrics
Example metrics:
infra_node_jobs_total 342
infra_node_latency_avg_ms 213
infra_node_gpu_available 1
infra_node_container_cache_hits 82
infra_node_stake_total 200.0
infra_node_sla_uptime 0.9912
Scrape config for prometheus.yml
:
- job_name: 'inframind'
static_configs:
- targets: ['localhost:9100']
Grafana dashboards are available via InfraMind community templates or can be created manually using PromQL.
Example PromQL:
rate(infra_node_jobs_total[5m])
Custom Alerts and Diagnostics
To test execution health, simulate a local job run:
infra simulate --model ./summarizer.yaml --input test.json
For GPU stress tests:
infra benchmark --type=gpu --duration=60s
To configure watchdog-style auto-recovery:
infra watchdog enable
This process pings the agent every 60 seconds and restarts on memory exhaustion, timeout hangs, or failed handshake response.
Remote Monitoring (Optional)
You can register your node to a fleet monitoring UI via the cloud operator interface. This includes:
Uptime leaderboard
Reward heatmaps
Regional job density
Latency scatter plot
Offline notifications (via webhook or Telegram)
This is opt-in and uses anonymized metadata only. Activate with:
infra cloud join --node 0xABC --name "Node 🇳🇱 - Rotterdam"
Summary
All InfraMind nodes provide introspection tools for both personal use and automated monitoring. Whether you run a single bare-metal box or manage a GPU fleet across regions, logs, metrics, and diagnostics are first-class citizens. Full transparency of execution behavior ensures that nodes can be tuned, debugged, and optimized for consistent, high-performance operation in the mesh.
Last updated