This function enables System Administrators to track real-time RAM consumption on physical servers within the AI factory environment. By aggregating data from hardware sensors, it provides critical visibility into memory utilization trends, helping prevent overheating or resource exhaustion. The tool supports proactive capacity planning and alerts for anomalies, ensuring continuous operational availability for high-performance computing tasks.
The system continuously polls physical server nodes to gather current RAM usage metrics from internal hardware sensors.
Data is aggregated and normalized to display consumption rates per application instance or general server load.
Thresholds are configured to trigger immediate notifications when memory levels approach critical capacity limits.
Initialize the monitoring agent on target physical server nodes via the marketplace deployment interface.
Configure alert thresholds based on expected workload patterns and historical memory consumption data.
Enable real-time streaming of RAM metrics to the central dashboard for live visualization.
Review generated reports weekly to adjust capacity planning or trigger scaling actions as needed.
Visualizes real-time memory graphs and historical trends for all monitored physical servers in one view.
Allows admins to set custom warning levels based on specific RAM utilization percentages or absolute limits.
Provides detailed logs linking memory spikes to specific processes or hardware faults for root cause analysis.