Physical Status
Server Hardware Report
Monitoring temperature, load, and performance of validator components
CPU (AMD EPYC 7543)
Warning
Usage
94%
Temperature
78°C
Frequency
2.8 GHz
RAM (DDR4 ECC 256GB)
OK
Usage
68%
Free
81.9 GB
Used
174.1 GB
NVMe SSD (Samsung 980 PRO 2TB)
OK
Usage
42%
Temperature
52°C
Свободно
1.16 TB
Network Interface (10GbE)
Critical
Usage
98%
Inbound Traffic
9.8 Gbps
Outbound Traffic
8.2 Gbps
Packets/sec
2.4M
Power Supply (1200W)
OK
Load
892W
Efficiency
94%
12V Voltage
11.98V
Motherboard
OK
Chipset Temperature
45°C
VRM Temperature
58°C
Uptime
127 days
Critical CPU Load
Critical
The CPU is operating at 94% of maximum performance at 78°C temperature.
This exceeds the recommended threshold of 85% usage and 75°C temperature.
High load may lead to missed slots and reduced validator performance.
Current temperature is close to the critical mark of 80°C, which may trigger thermal throttling.
Current CPU Load
94%
CPU Temperature
78°C
Critical Threshold
85% / 80°C
Time to Throttling
~15 min
📋
Resolution Instructions
-
Immediate Actions: Check processes consuming CPU.
Run command
top -p $(pgrep solana-validator)to monitor the validator process. -
Cooling: Ensure the cooling system is working correctly.
Check fan speeds:
sensorsoripmitool sensorfor server systems. Increase fan speeds to 70-80% if temperature continues to rise. -
Validator Optimization: Check validator settings.
Reduce the number of RPC threads or limit concurrent connections.
Check configuration file:
~/.config/solana/validator.yml -
Process Monitoring: Find processes with high load:
ps aux --sort=-%cpu | head -10. Suspend or optimize non-essential processes. - Long-term Measures: Consider upgrading the cooling system or adding additional fans. If the problem persists, CPU frequency reduction (underclocking) or BIOS update with power-saving settings may be required.
-
Emergency Shutdown: If temperature exceeds 85°C, the system will automatically
reduce performance. At 90°C, it is recommended to immediately stop the validator
with command
systemctl stop solana-validatorto prevent hardware damage.
Network Interface Overload
Critical
The 10GbE network interface is operating at 98% of maximum bandwidth capacity.
Inbound traffic is 9.8 Gbps, outbound is 8.2 Gbps, which is close to the 10 Gbps limit.
Processing 2.4 million packets per second. Such load may lead to packet loss,
increased latency, and missed important messages from other validators. High network load
also increases CPU load due to network interrupt processing.
Channel Usage
98%
Inbound Traffic
9.8 Gbps
Outbound Traffic
8.2 Gbps
Packet Loss
0.12%
📋
Resolution Instructions
-
Immediate Analysis: Check network statistics:
iftop -i eth0ornethogsto identify traffic sources. Check packet loss:ethtool -S eth0 | grep -i drop -
Network Settings Optimization: Increase network stack buffer sizes:
sysctl -w net.core.rmem_max=134217728andsysctl -w net.core.wmem_max=134217728. Enable offloading features:ethtool -K eth0 gro on gso on tso on -
Validator Configuration: Limit the number of peers in validator configuration.
Check the
known_validatorsparameter and number of active connections. Reduce QUIC buffer sizes if QUIC protocol is used. - Traffic Prioritization: Configure QoS on the network switch to prioritize validator traffic. Use DSCP marking for critical Solana traffic. Configure firewall rules to limit non-essential traffic.
-
Connection Monitoring: Check the number of active TCP/UDP connections:
ss -s. Close unused connections. Check validator logs for network errors. - Hardware Solution: If the problem persists, consider upgrading to a 25GbE or 40GbE network card, or adding a second network card for load distribution. Check switch settings - firmware update or flow control configuration may be required.
-
Emergency Measures: If packet loss exceeds 1% or the channel is fully overloaded,
temporarily limit the number of peers through validator configuration.
In critical cases, you can temporarily reduce the priority of non-essential processes
using the network via
niceorionice.