Many times you check your Ubuntu server and you see high load averages.
First number is 1 minute average, second one is 5 minute average and third one is 15 minute average.
~/backup# uptime 11:09:31 up 40 days, 17:38, 2 users, load average: 1.01, 0.66, 0.47 ~/backup# cat /proc/loadavg 0.40 0.54 0.44 1/255 30135
- If the averages are 0.0, then your system is idle.
- If the 1 minute average is higher than the 5 or 15 minute averages, then load is increasing.
- If the 1 minute average is lower than the 5 or 15 minute averages, then load is decreasing.
- If they are higher than your CPU count, then you might have a performance problem (it depends).
What it means on Linux or Ubuntu is this:
On Linux, load averages are (or try to be) “system load averages“, for the system as a whole, measuring the number of threads that are working and waiting to work (CPU, disk, uninterruptible locks). Put differently, it measures the number of threads that aren’t completely idle. Advantage: includes demand for different resources.
Not the same as other OSes:
On other OSes, load averages are “CPU load averages“, measuring the number of CPU running + CPU runnable threads. Advantage: can be easier to understand and reason about (for CPUs only).
So it is important to know that on Linux it is not only the CPU that makes the load averages to go up.
Some tools recommended to troubleshoot high load averages:
- per-CPU utilization: eg, using mpstat -P ALL 1
- per-process CPU utilization: eg, top, pidstat 1, etc.
- per-thread run queue (scheduler) latency: eg, in /proc/PID/schedstats, delaystats, perf sched
- CPU run queue latency: eg, in /proc/schedstat, perf sched, my runqlat bcc tool.
- CPU run queue length: eg, using vmstat 1 and the ‘r’ column, or my runqlen bcc tool.
Here is the bcc tool mentioned.
This info was taken from http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
You can also use this command to debug
ps -e v
Which has these states
state The state is given by a sequence of characters, for example, "RWNA". The first character indicates the run state of the process: D Marks a process in disk (or other short term, uninterruptible) wait. I Marks a process that is idle (sleeping for longer than about 20 seconds). L Marks a process that is waiting to acquire a lock. R Marks a runnable process. S Marks a process that is sleeping for less than about 20 seconds. T Marks a stopped process. W Marks an idle interrupt thread. Z Marks a dead process (a "zombie").
Check for R and D processes.