Monitoring your Node
Monitoring an InfraMind node is essential for maintaining high uptime, optimizing job performance, and ensuring that resource declarations reflect actual capacity. The node agent exposes a comprehensive local interface for telemetry inspection, historical usage analysis, and fault diagnostics. These observability layers are native to the InfraMind runtime and require no third-party plugins, though external integrations (Prometheus, Grafana) are also supported for full-stack operators.
Logs, job receipts, node statistics, and scheduler activity are all accessible via command-line tools and optionally via a local or remote Web UI.
Log Access
All node logs are written to:
~/.inframind/logs/inframind.logThese include:
Container execution logs
Job assignment metadata
Scheduling decisions and fallbacks
Heartbeat acknowledgments
Reward claim results
Errors and runtime exceptions
Example command to view in real-time:
tail -f ~/.inframind/logs/inframind.logOr using journalctl (for systemd deployments):
journalctl -u inframind-node -fLogs are rotated every 24 hours by default and compressed after 7 days. To modify this behavior, update your system’s logrotate configuration.
Job History
Every job executed by a node is tracked locally as a receipt:
Each receipt includes:
Receipts can be queried using the CLI:
Or to inspect a specific job:
Historical job statistics (rolling averages, success rates, proof timestamps) are also calculated and presented at runtime via:
Node Status
To monitor node health, system load, and connection state, use:
Typical output:
This command aggregates local telemetry and the last scheduler response. If your node is offline or behind NAT/firewall, mesh status will return false and the node will be deprioritized.
Web UI Dashboard
The InfraMind node agent exposes a local monitoring dashboard by default at:
If running remotely:
Dashboard sections include:
Job timeline
Live system metrics
Container cache viewer
Reward claim history
Model execution summaries
Scheduler handshake logs
Node reputation trend
This service is secured using local binding by default. To expose it over the internet, reverse proxy with nginx or caddy, and configure an auth layer.
To disable or change the port:
Or via environment:
Prometheus & Grafana Integration
InfraMind nodes expose a native Prometheus exporter on port 9100:
Example metrics:
Scrape config for prometheus.yml:
Grafana dashboards are available via InfraMind community templates or can be created manually using PromQL.
Example PromQL:
Custom Alerts and Diagnostics
To test execution health, simulate a local job run:
For GPU stress tests:
To configure watchdog-style auto-recovery:
This process pings the agent every 60 seconds and restarts on memory exhaustion, timeout hangs, or failed handshake response.
Remote Monitoring (Optional)
You can register your node to a fleet monitoring UI via the cloud operator interface. This includes:
Uptime leaderboard
Reward heatmaps
Regional job density
Latency scatter plot
Offline notifications (via webhook or Telegram)
This is opt-in and uses anonymized metadata only. Activate with:
Summary
All InfraMind nodes provide introspection tools for both personal use and automated monitoring. Whether you run a single bare-metal box or manage a GPU fleet across regions, logs, metrics, and diagnostics are first-class citizens. Full transparency of execution behavior ensures that nodes can be tuned, debugged, and optimized for consistent, high-performance operation in the mesh.
Last updated

