Problem: scheduler doesn't seem to favor interactive processes:
On a desktop system with automatic cron-scheduled backups from one (btrfs
) disk to another (ext4
). The backup process mounts the idle disk (/dev/sda<X>
), backups to it, and finally unmounts it.
Every time the backup process kicks in, the system becomes unusable. The scheduler seems to be failing to do its most basic job of favoring interactive processes over batch ones. While the backup processes run, there's a lot of IO going on and everything else freezes. The keyboard and mouse pointer stop responding. Echo when keys are pressed in any terminal/shell is delayed by several seconds.
As soon as the backup completes, interactive response goes back to normal.
More details on the setup and configs:
The backup process uses rsnapshot
(which calls rsync
and cp -al
) and runs at a lower priority (the backup job is preceded by nice
), like so:
nice /usr/bin/rsnapshot -VD -c /etc/my-rsnapshot.conf daily
Running the backup under nice
doesn't seem to help. During backups, all interactive processes seem to be starved by the heavy CPU and IO of the rsync
and cp
processes.
This is a IA-64, iCore-7 system, which should be able to run 8-processes in parallel. Memory is 16GB and some of it is free. Trimmed down mount
output (when additional disk is mounted) is:
/dev/sdb2 on / type btrfs (rw,relatime,subvol=@,thread_pool=4)
/dev/sdb3 on /home type btrfs (rw,relatime,subvol=@home,thread_pool=4)
/dev/sda2 on /media/idisk/root ext4 (rw,relatime)
/dev/sda3 on /media/idisk/home ext4 (rw,relatime)
none on /sys/fs/cgroup type tmpfs (rw)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset,release_agent=/run/cgmanager/agents/cgm-release-agent.cpuset,clone_children)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu,release_agent=/run/cgmanager/agents/cgm-release-agent.cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct,release_agent=/run/cgmanager/agents/cgm-release-agent.cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory,release_agent=/run/cgmanager/agents/cgm-release-agent.memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,relatime,devices,release_agent=/run/cgmanager/agents/cgm-release-agent.devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer,release_agent=/run/cgmanager/agents/cgm-release-agent.freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,relatime,blkio,release_agent=/run/cgmanager/agents/cgm-release-agent.blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,relatime,perf_event,release_agent=/run/cgmanager/agents/cgm-release-agent.perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb)
This is on an up-to-date 14.04 LTS system. The scheduler, by default is set to completely-fair-queue (cfq
):
# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]
# cat /sys/block/sdb/queue/scheduler
noop deadline [cfq]
I was able to find one related question. scheduler starves processes which suggests to use nice
, but I'm already doing this.
Another related question with relevant information is: How do I change the noop
scheduler
How can I make keyboard, mouse, and interactive-shells more responsive when the backup is running?
Thanks in advance.
Just a partial answer, done more research and experiments since asking which have solved my problem, and seeing there are no responses
There are known issues/bugs in the Linux kernel schedulers as of early 2016.
The short summary is that under different circumstances, cores remain idle even though there are runnable processes in the process queue.
References:
A switch from btrfs to ext4 can alleviate these issues:
I personally switched back from btrfs to ext4. I/O performance has noticeably improved.
A switch to SSD can further alleviate IO performance
SSDs have dropped significantly in price and reliability. A 2TB Samsung SSD (EVO 850) now costs a little over $600. Switching the system (root and home) to SSD now makes the intensive backup activity completely unnoticeable (system SSD is snappy while doing heavy writing to a regular ext4-formatted disk on the same system).
Finally: with SSD, the benefit of complex schedulers in the kernel seems to becoming questionable. I changed my default to noop with no noticeable degradation whatsoever in performance. I fact, with a noop scheduler, I see a reduction in system load, lower CPU scaling numbers, and lower hardware temperatures.
$ cat /sys/block/sda/queue/scheduler
[noop] deadline cfq
$ cat /proc/cpuinfo | grep Hz
model name : Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz
cpu MHz : 836.308
model name : Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz
cpu MHz : 990.253
... similar low actual frequency scaling for all cores ...
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加