How can I find out the source of this glibc backtrace originating with clone()?

mgarey

This backtrace comes from a deadlock situation in a multi-threaded application. The other deadlocked threads are locking inside the call to malloc(), and appear to be waiting on this thread.

I don't understand what creates this thread, since it deadlocks before calling any functions in my application:

Thread 6 (Thread 0x7ff69d43a700 (LWP 14191)):
#0  0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1  0x00007ff6a299460d in _L_lock_27 () from /usr/lib64/libc.so.6
#2  0x00007ff6a29945bd in arena_thread_freeres () from /usr/lib64/libc.so.6
#3  0x00007ff6a2994662 in __libc_thread_freeres () from /usr/lib64/libc.so.6
#4  0x00007ff6a3875e38 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6

clone() is used to implement fork(), pthread_create(), and perhaps other functions. See here and here.

How can I find out if this trace comes from a fork(), pthread_create(), a signal handler, or something else? Do I just need to dig through the glibc code, or can I use gdb or some other tool? Why does this thread need the internal glibc lock? This would be useful in determining the cause of the deadlock.

Additional information and research:

malloc() is thread-safe, but not reentrant (recursive-safe) (see this and this, so malloc() is also not async-signal-safe. We don't define signal handlers for this process, so I know that we don't call malloc() from signal handlers. The deadlocked threads don't ever call recursive functions, and callbacks are handled in a new thread, so I don't think we should need to worry about reentrancy here. (Maybe I'm wrong?)

This deadlock happens when many callbacks are being spawned to signal (ultimately kill) different processes. The callbacks are spawned in their own threads.

Are we possibly using malloc in an unsafe way?

Possibly related:

glibc malloc internals

Malloc inside of signal handler causes deadlock.

How are signal handlers delivered in a multi-threaded application?

glibc fork/malloc deadlock bug that was fixed in glibc-2.17-162.el7. This looks similar, but is NOT my bug - I'm on a fixed version of glibc.

(I've been unsuccessful in creating a minimal, complete, verifiable example. Unfortunately the only way to reproduce is with the application (Slurm), and it's quite difficult to reproduce.)

EDIT: Here's the backtrace from all the threads. Thread 6 is the trace I originally posted. Thread 1 is just waiting on a pthread_join(). Threads 2-5 are locked after a call to malloc(). Thread 7 is listening for messages and spawning callbacks in new threads (threads 2-5). Those would be callbacks that would eventually signal other processes.

Thread 7 (Thread 0x7ff69e672700 (LWP 12650)):
#0  0x00007ff6a291aa3d in poll () from /usr/lib64/libc.so.6
#1  0x00007ff6a3c09064 in _poll_internal (shutdown_time=<optimized out>, nfds=2,
    pfds=0x7ff6980009f0) at ../../../../slurm/src/common/eio.c:364
#2  eio_handle_mainloop (eio=0xf1a970) at ../../../../slurm/src/common/eio.c:328
#3  0x000000000041ce78 in _msg_thr_internal (job_arg=0xf07760)
    at ../../../../../slurm/src/slurmd/slurmstepd/req.c:245
#4  0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6

Thread 6 (Thread 0x7ff69d43a700 (LWP 14191)):
#0  0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1  0x00007ff6a299460d in _L_lock_27 () from /usr/lib64/libc.so.6
#2  0x00007ff6a29945bd in arena_thread_freeres () from /usr/lib64/libc.so.6
#3  0x00007ff6a2994662 in __libc_thread_freeres () from /usr/lib64/libc.so.6
#4  0x00007ff6a3875e38 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6

Thread 5 (Thread 0x7ff69e773700 (LWP 22471)):
#0  0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1  0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2  0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3  0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4  0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
    file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
    line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
    at ../../../../slurm/src/common/xmalloc.c:86
#5  0x00007ff6a3b2e5b7 in init_buf (size=16384)
    at ../../../../slurm/src/common/pack.c:152
#6  0x000000000041caab in _handle_accept (arg=0x0)
    at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7  0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6

Thread 4 (Thread 0x7ff6a4086700 (LWP 5633)):
#0  0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1  0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2  0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3  0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4  0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
    file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
    line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
    at ../../../../slurm/src/common/xmalloc.c:86
#5  0x00007ff6a3b2e5b7 in init_buf (size=16384)
    at ../../../../slurm/src/common/pack.c:152
#6  0x000000000041caab in _handle_accept (arg=0x0)
    at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7  0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6

Thread 3 (Thread 0x7ff69d53b700 (LWP 12963)):
#0  0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1  0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2  0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3  0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4  0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
    file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
    line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
    at ../../../../slurm/src/common/xmalloc.c:86
#5  0x00007ff6a3b2e5b7 in init_buf (size=16384)
    at ../../../../slurm/src/common/pack.c:152
#6  0x000000000041caab in _handle_accept (arg=0x0)
    at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7  0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6

Thread 2 (Thread 0x7ff69f182700 (LWP 19734)):
#0  0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1  0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2  0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3  0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4  0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
    file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
    line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
    at ../../../../slurm/src/common/xmalloc.c:86
#5  0x00007ff6a3b2e5b7 in init_buf (size=16384)
    at ../../../../slurm/src/common/pack.c:152
#6  0x000000000041caab in _handle_accept (arg=0x0)
    at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7  0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6

Thread 1 (Thread 0x7ff6a4088880 (LWP 12616)):
#0  0x00007ff6a3876f57 in pthread_join () from /usr/lib64/libpthread.so.0
#1  0x000000000041084a in _wait_for_io (job=0xf07760)
    at ../../../../../slurm/src/slurmd/slurmstepd/mgr.c:2219
#2  job_manager (job=job@entry=0xf07760)
    at ../../../../../slurm/src/slurmd/slurmstepd/mgr.c:1397
#3  0x000000000040ca07 in main (argc=1, argv=0x7fffacab93d8)
    at ../../../../../slurm/src/slurmd/slurmstepd/slurmstepd.c:172
caf

The presence of start_thread() in the backtrace indicates that this is a pthread_create() thread.

__libc_thread_freeres() is a function that glibc calls at thread exit, which invokes a set of callbacks to free internal per-thread state. This indicates the thread you have highlighted is in the process of exiting.

arena_thread_freeres() is one of those callbacks. It is for the malloc arena allocator, and it moves the free list from the exiting thread's private arena to the global free list. To do this, it must take a lock that protects the global free list (this is the list_lock in arena.c).

It appears to be this lock that the highlighted thread (Thread 6) is blocked on.

The arena allocator installs pthread_atfork() handlers which lock the list lock at the start of fork() processing, and unlock it at the end. This means that while other pthread_atfork() handlers are running, all other threads will block on this lock.

Are you installing your own pthread_atfork() handlers? It seems likely that one of these may be causing your deadlock.

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

How can i find out facebook API version

分類Dev

How can i find out size of RethinkDB table?

分類Dev

How can I find out which ports are available in Kubernetes?

分類Dev

How can I find out radio blind spots from a map?

分類Dev

How can I find out radio blind spots from a map?

分類Dev

How can I fix Ghost (glibc) BUG on Debian 6

分類Dev

Where can I find the source of Nautilus?

分類Dev

How can I find the source of a css style loaded by customize.php in my Wordpress blog?

分類Dev

Using Rubberduck unit tests, how can I find out which one of multiple asserts failed?

分類Dev

How can I find out the exact location where the recursive operation is working?

分類Dev

How can I find out / print with which version of the protocol a pickle file has been generated

分類Dev

How can I find out if a dynamic component is being displayed or not on the screen and access its properties in Angular8

分類Dev

How can I find out Android and iOS versions when running a Xamarin.Forms application?

分類Dev

How can I find out the number of times an RSK transaction has been confirmed on the RSK blockchain?

分類Dev

How can I find out which program creates a file/folder in my C: drive?

分類Dev

How can I find out what output resolutions a laptop computer supports?

分類Dev

How I can find out difference of days between two dates in Java

分類Dev

How can I find out why nodejs isn't exiting cleanly?

分類Dev

How can i find list of all files checked out by myself in TFS 2010?

分類Dev

How can I find out if Debian 8.6.0 is safe to install and compatible with my computer?

分類Dev

How can I find out if a line is an exact match to a line in a text file?

分類Dev

How can I find out what version is passed to setup in setup.py?

分類Dev

How to clone source from Launchpad?

分類Dev

How do I find out if a value is in a deque?

分類Dev

Where I can find the system out prints in RCP Product?

分類Dev

What information can I find out about an eventpoll on a running thread?

分類Dev

Can I find out what process is generating certificate error popups?

分類Dev

How can I search the ubuntu source code?

分類Dev

How can I get the source code of a command?

Related 関連記事

  1. 1

    How can i find out facebook API version

  2. 2

    How can i find out size of RethinkDB table?

  3. 3

    How can I find out which ports are available in Kubernetes?

  4. 4

    How can I find out radio blind spots from a map?

  5. 5

    How can I find out radio blind spots from a map?

  6. 6

    How can I fix Ghost (glibc) BUG on Debian 6

  7. 7

    Where can I find the source of Nautilus?

  8. 8

    How can I find the source of a css style loaded by customize.php in my Wordpress blog?

  9. 9

    Using Rubberduck unit tests, how can I find out which one of multiple asserts failed?

  10. 10

    How can I find out the exact location where the recursive operation is working?

  11. 11

    How can I find out / print with which version of the protocol a pickle file has been generated

  12. 12

    How can I find out if a dynamic component is being displayed or not on the screen and access its properties in Angular8

  13. 13

    How can I find out Android and iOS versions when running a Xamarin.Forms application?

  14. 14

    How can I find out the number of times an RSK transaction has been confirmed on the RSK blockchain?

  15. 15

    How can I find out which program creates a file/folder in my C: drive?

  16. 16

    How can I find out what output resolutions a laptop computer supports?

  17. 17

    How I can find out difference of days between two dates in Java

  18. 18

    How can I find out why nodejs isn't exiting cleanly?

  19. 19

    How can i find list of all files checked out by myself in TFS 2010?

  20. 20

    How can I find out if Debian 8.6.0 is safe to install and compatible with my computer?

  21. 21

    How can I find out if a line is an exact match to a line in a text file?

  22. 22

    How can I find out what version is passed to setup in setup.py?

  23. 23

    How to clone source from Launchpad?

  24. 24

    How do I find out if a value is in a deque?

  25. 25

    Where I can find the system out prints in RCP Product?

  26. 26

    What information can I find out about an eventpoll on a running thread?

  27. 27

    Can I find out what process is generating certificate error popups?

  28. 28

    How can I search the ubuntu source code?

  29. 29

    How can I get the source code of a command?

ホットタグ

アーカイブ