30 분 가동 시간 후 sync / fsync 호출 속도가 느려집니다.

debugcn 에 게시 Dev

alex.p

로모그래퍼 : 우분투 14.04를 사용하여 가동 시간 30 분 후 하이브리드 SSD 내가 사용하는 IO를 차단하는 많은 프로세스를 참조하십시오 iotop. 이것은 디스크 쓰기 중입니다. 예를 들어 gedit에서 빈 파일을 열고 닫으면 dconf 쓰기 설정으로 인해 종료하는 데 2 초가 걸릴 수 있습니다. 이는 다른 앱에도 유사한 방식으로 영향을 미칩니다. 전체 시스템을 상당히 느리게합니다.

strace를 사용하여 이것을 fsync 호출로 다시 추적하고 거기에서 sync 명령을 사용하여 재현 할 수있었습니다.

요약하자면, 단순히 sync터미널에서 반복적으로 실행 하면 1 ~ 2 초 정도 걸리지 만 가동 시간은 30 분 후에야합니다.

이를 증명하기 위해 동기화를 실행하는 데 걸린 시간에 대한 가동 시간을 초 단위로 출력하는 스크립트를 만들고 매초마다 실행했습니다.

while true;
do
cat /proc/uptime | awk '{printf "%f ",$1}'; /usr/bin/time -f '%e' sync;
sleep 1;
done;

위의 스크립트를 실행하고 한 시간 정도 기다린 다음 (시스템이 유휴 상태로 유지됨) gnuplot에 결과를 플로팅했습니다 (y = 동기화 실행 시간, x = 가동 시간 (초)).

가동 시간에 대해 플롯 된 동기화 시간

그래프가 급등하는 시점은 약 1780 (1780/60 = 약 30 분)입니다.

스크립트를 제외하고는 현재 디스크에 아무것도 쓰지 않아야합니다. 따라서 첫 번째 동기화 후 페이지 캐시에 아무것도 없어야합니다. 이후의 각 동기화는 대략 100 바이트가 될 스크립트에 기록되는 내용을 정확히 기록합니다. 그래서.

이 문제는 재부팅 후에도 지속됩니다. 예를 들어, 속도 저하를 위해 30 분을 기다렸다가 재부팅하면 속도 저하가 계속 발생합니다. 전원을 끄고 재부팅하면 30 분 후까지 문제가 사라집니다.

또 다른 호기심은 위의 그래프를 조사하고 감속이 발생하는 영역을 확대했을 때 다음과 같은 결과를 얻었습니다.

가동 시간에 대해 플롯 된 동기화 시간, 확대

최고점과 최저점은 반복됩니다. 이것은 최저점에서 최저점까지 거의 정확히 10 초마다 발생하며, 최고점은 내려갈 때 꼬임도 발생합니다.

속도 저하 전에 hdparm 테스트 ( hdparm -t /dev/sda및 hdparm -T /dev/sda) 도 실행했습니다 .

/dev/sda:
Timing cached reads:   23778 MB in  2.00 seconds = 11900.64 MB/sec
/dev/sda:
Timing buffered disk reads: 318 MB in  3.01 seconds = 105.63 MB/sec

감속 중 :

/dev/sda:
 Timing cached reads:     2 MB in  2.24 seconds = 915.50 kB/sec
/dev/sda:
Timing buffered disk reads: 300 MB in  3.01 seconds =  99.54 MB/sec

실제 디스크 읽기는 영향을받지 않지만 캐시 된 읽기는 영향을 받는다는 것을 보여 주면 이것이 결국 HD가 아닌 시스템 버스와 관련이 있음을 의미 할 수 있습니까?

내가 시도한 솔루션은 다음과 같습니다.

HD의 스핀 다운 설정을 변경하면 HD가 절전 모드로 전환 될 수 있습니다.
```
hdparm /dev/sda -S252 #(set it to 5 hours before spindown)
```
Change the filesystem's journalling type to writeback rather than ordered so that we get performance improvements - this isn't solving the problem though as it doesn't explain the 30 minutes slowdown-free uptime.
Disabled CRON as it seems to be occuring after a round 30 minutes.
CPU usage is fine and is completely idle so no processes can be blamed however I've tried shutting down every service including the session manager (lightdm) this does nothing as I believe the issue is lower level.
Analysing any new processes coming in at 30 minutes indicates no changes - I've diffed the output of PS before and after and there's no difference.

This only started occuring about 2 weeks ago, nothing was installed and no updates were done around that time. I'm thinking this issue is much lower level so would really appreciate some help here as I'm clueless, even pointing me in the right direction would be helpful - for example is there a way to examine what's being flushed out the page cache?

Write caching is enabled on the disk in question, I've also tried disabling write barriers. SMART data on the HD indicates no problems with the HD itself however I have my suspicions it's the HD doing something mysterious as it persists after reboots.

EDIT:

I've done :

watch -n 1 cat /proc/meminfo

... to see how the memory changes particularly looking at the dirty row and the writeback row which I believe is the HDs disk buffer. They all stay at zero for the most part highest being probably 300kb. Calling sync flushes these as expected back to 0 but during the slowdown calling sync when there is zero dirty pages and zero kb in the disk buffer still locks IO. What else could sync be doing if there's nothing to flush out the page cache and write cache?

Kjetil Jorgensen

The symptoms are very consistent with a mostly saturated IO system, however having for the most part ruled out IO load from the OS/userspace side, another possibility is the drive running self-tests on itself, which may include reading from all the sectors. This should be queryable/tunable from smartctl (At least one place being smartctl -c for querying).

오고 가고 갑자기 시작된 이유는 다음과 같습니다.

드라이브는 수명의 특정 단계 (기록 된 섹터 수, 회전 시간 등)를 통과했으며 드라이브의 펌웨어가 이러한 스캔 중 하나를 트리거했습니다.
나는 이것이 또한 smartctl을 통해 트리거 될 수 있다고 생각하므로 일부 자동화 된 프로세스가 트리거 할 수 있습니다.
이러한 스캔 중 하나가 진행 중이거나 시작된 것으로 표시되고 드라이브가 일정 시간 동안 전원을 켰을 때 드라이브가 처음부터 다시 트리거되거나 중단 된 지점에서 다시 시작됩니다.

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-05-28

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사