qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

race condition in display device caused by run_on_cpu() dropping the iot


From: Peter Maydell
Subject: race condition in display device caused by run_on_cpu() dropping the iothread lock
Date: Mon, 1 Aug 2022 14:23:55 +0100

I've been debugging a segfault in the raspi3b display device, and I've
tracked it down to a race condition, but I'm not sure what the right
way to fix it is...

The race is that a vCPU thread is handling a guest register write that
says "resize the framebuffer", which it implements by calling
qemu_console_resize(). This ends up doing a qemu_free_displaysurface()
on the old surface and creating a new one.  However, at the same time,
the iothread is in gui_update(), which ends up calling the display
device gfx_update method. This code is thus actively using the old
DisplaySurface, and the crash happens because the memory is freed
while this function is still using it.

I think this happens like this:
 * the iothread runs the gui_update timer callback; it holds the
   iothread lock to do this
 * gui_update eventually calls into the bcm2835_fb.c fb_update_display()
   function
 * That function calls qemu_console_surface(s->con) to
   get the current DisplaySurface, and passes it to
   framebuffer_update_display() to do the work
 * framebuffer_update_display calls memory_region_snapshot_and_clear_dirty()
 * memory_region_snapshot_and_clear_dirty() ends up calling run_on_cpu(),
   which briefly drops the iothread lock.
 * This gives the vCPU thread a chance to dash in, grab the iothread
   lock, run the guest code to resize the fb and destroy the old
   DisplaySurface (and then dropping the iothread lock again)
 * eventually run_on_cpu() does its thing and retakes the iothread lock
 * however, when we get back up into framebuffer_update_display(), this
   function now has a stale DisplaySurface, and when it tries to copy
   the framebuffer pixels into that surface it is writing into freed
   memory, causing a segfault or worse

How is this intended to work? I feel like if run_on_cpu() silently
drops the iothread lock this probably invalidates a lot of assumptions
that QEMU code makes, especially in this kind of setup where
the code making the assumptions is several layers in the callstack
above whatever it is that ends up calling run_on_cpu()...


Here's the full backtraces of the relevant threads. First
the thread that frees the memory:

Thread 3 (Thread 2949085.2949109):
#0  __GI___libc_free (mem=0x73bc5c974010) at malloc.c:3087
#1  0x00006ac565853b1d in  () at /usr/lib/x86_64-linux-gnu/libpixman-1.so.0
#2  0x00006ac565853a5d in pixman_image_unref () at
/usr/lib/x86_64-linux-gnu/libpixman-1.so.0
#3  0x0000556260c82aee in qemu_pixman_image_unref
(image=0x556263261830) at ../../ui/qemu-pixman.c:225
#4  0x0000556260c793fa in qemu_free_displaysurface
(surface=0x5562638c45a0) at ../../ui/console.c:1582
#5  0x0000556260c79dc0 in dpy_gfx_replace_surface (con=0x5562632eda70,
surface=0x669760020270) at ../../ui/console.c:1808
#6  0x0000556260c7b967 in qemu_console_resize (s=0x5562632eda70,
width=1024, height=768) at ../../ui/console.c:2588
#7  0x0000556260d70ac4 in bcm2835_fb_reconfigure (s=0x39284b533bb0,
newconfig=0x5c54602c5600) at ../../hw/display/bcm2835_fb.c:
262
#8  0x0000556260e11cd8 in bcm2835_property_mbox_push
(s=0x39284b534f10, value=550952) at
../../hw/misc/bcm2835_property.c:301
#9  0x0000556260e11e52 in bcm2835_property_write
(opaque=0x39284b534f10, offset=0, value=550824, size=4) at
../../hw/misc/bcm28
35_property.c:344
#10 0x0000556261431bfa in memory_region_write_accessor
(mr=0x39284b5352a0, addr=0, value=0x5c54602c5788, size=4, shift=0,
mask=
4294967295, attrs=...) at ../--Type <RET> for more, q to quit, c to
continue without paging--
../softmmu/memory.c:492
#11 0x0000556261431e48 in access_with_adjusted_size (addr=0,
value=0x5c54602c5788, size=4, access_size_min=1, access_size_max=4
, access_fn=0x556261431b00 <memory_region_write_accessor>,
mr=0x39284b5352a0, attrs=...) at ../../softmmu/memory.c:554
#12 0x0000556261434fd7 in memory_region_dispatch_write
(mr=0x39284b5352a0, addr=0, data=550824, op=MO_32, attrs=...) at
../../s
oftmmu/memory.c:1514
#13 0x00005562614450e4 in address_space_stl_internal
(as=0x39284b535b60, addr=128, val=550824, attrs=..., result=0x0,
endian=DEVICE_LITTLE_ENDIAN) at
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/memory_ldst.c.inc:319
#14 0x0000556261445223 in address_space_stl_le (as=0x39284b535b60,
addr=128, val=550824, attrs=..., result=0x0) at
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/memory_ldst.c.inc:357
#15 0x0000556260e0f86f in stl_le_phys (as=0x39284b535b60, addr=128,
val=550824) at 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/exec/memory_ldst_phys.h.inc:121
#16 0x0000556260e102f0 in bcm2835_mbox_write (opaque=0x39284b535830,
offset=160, value=550824, size=4) at ../../hw/misc/bcm2835_mbox.c:227
#17 0x0000556261431bfa in memory_region_write_accessor
(mr=0x39284b535bc0, addr=160, value=0x5c54602c59d8, size=4, shift=0,
mask=4294967295, attrs=...) at ../../softmmu/memory.c:492
#18 0x0000556261431e48 in access_with_adjusted_size (addr=160,
value=0x5c54602c59d8, size=4, access_size_min=1, access_size_max=4,
access_fn=0x556261431b00 <memory_region_write_accessor>,
mr=0x39284b535bc0, attrs=...) at ../../softmmu/memory.c:554
#19 0x0000556261434fd7 in memory_region_dispatch_write
(mr=0x39284b535bc0, addr=160, data=550824, op=MO_32, attrs=...) at
../../softmmu/memory.c:1514
#20 0x00005562614d39dd in io_writex (env=0x39284b4bf030,
iotlbentry=0x5562640ff670, mmu_idx=14, val=550824, addr=1057011872,
retaddr=58326076618522, op=MO_32) at ../../accel/tcg/cputlb.c:1429
#21 0x00005562614d62ff in store_helper (env=0x39284b4bf030,
addr=1057011872, val=550824, oi=46, retaddr=58326076618522, op=MO_32)
at ../../accel/tcg/cputlb.c:2379
#22 0x00005562614d66da in full_le_stl_mmu (env=0x39284b4bf030,
addr=1057011872, val=550824, oi=46, retaddr=58326076618522) at
../../accel/tcg/cputlb.c:2467
#23 0x00005562614d6718 in helper_le_stl_mmu (env=0x39284b4bf030,
addr=1057011872, val=550824, oi=46, retaddr=58326076618522) at
../../accel/tcg/cputlb.c:2473
#24 0x0000350c1913f71a in code_gen_buffer ()
#25 0x00005562614c0335 in cpu_tb_exec (cpu=0x39284b4b5210,
itb=0x2f7b06d82c80, tb_exit=0x5c54602c6124) at
../../accel/tcg/cpu-exec.c:358
#26 0x00005562614c1208 in cpu_loop_exec_tb (cpu=0x39284b4b5210,
tb=0x2f7b06d82c80, last_tb=0x5c54602c6130, tb_exit=0x5c54602c6124) at
../../accel/tcg/cpu-exec.c:848
#27 0x00005562614c15ca in cpu_exec (cpu=0x39284b4b5210) at
../../accel/tcg/cpu-exec.c:1007
#28 0x00005562614e35bf in tcg_cpus_exec (cpu=0x39284b4b5210) at
../../accel/tcg/tcg-accel-ops.c:67
#29 0x00005562614e3952 in mttcg_cpu_thread_fn (arg=0x39284b4b5210) at
../../accel/tcg/tcg-accel-ops-mttcg.c:97
#30 0x00005562616d1e84 in qemu_thread_start (args=0x55626410ca20) at
../../util/qemu-thread-posix.c:504
#31 0x0000727421ef7609 in start_thread (arg=<optimised out>) at
pthread_create.c:477
#32 0x00006d0e4f1af133 in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95


and the thread that's still using it. (These traces produced running under 'rr',
which is why the top 6 frames look a bit odd.)

#0  0x0000000070000002 in syscall_traced ()
#1  0x00002ae51b07a9ed in _raw_syscall () at
/home/petmay01/src/rr-obj64/bin/../lib/rr/librrpreload.so
#2  0x00002ae51b075ff7 in syscall_hook_internal (call=0x681fffa0) at
/home/petmay01/src/rr/src/preload/syscallbuf.c:278
#3  0x00002ae51b0747ec in syscall_hook (call=0x681fffa0) at
/home/petmay01/src/rr/src/preload/syscallbuf.c:3454
#4  0x00002ae51b074120 in _syscall_hook_trampoline () at
/home/petmay01/src/rr-obj64/bin/../lib/rr/librrpreload.so
#5  0x00002ae51b07417f in __morestack () at
/home/petmay01/src/rr-obj64/bin/../lib/rr/librrpreload.so
#6  0x00002ae51b07419b in _syscall_hook_trampoline_48_3d_00_f0_ff_ff
() at /home/petmay01/src/rr-obj64/bin/../lib/rr/librrpreload.so
#7  0x0000727421efe37c in futex_wait_cancelable (private=<optimised
out>, expected=0, futex_word=0x556262180828 <qemu_work_cond+40>) at
../sysdeps/nptl/futex-internal.h:183
#8  __pthread_cond_wait_common (abstime=0x0, clockid=0,
mutex=0x556262182640 <qemu_global_mutex>, cond=0x556262180800
<qemu_work_cond>) at pthread_cond_wait.c:508
#9  __pthread_cond_wait (cond=0x556262180800 <qemu_work_cond>,
mutex=0x556262182640 <qemu_global_mutex>) at pthread_cond_wait.c:647
#10 0x00005562616d1639 in qemu_cond_wait_impl (cond=0x556262180800
<qemu_work_cond>, mutex=0x556262182640 <qemu_global_mutex>,
file=0x5562617355c1 "../../cpus-common.c", line=162) at
../../util/qemu-thread-posix.c:219
#11 0x0000556260c72283 in do_run_on_cpu (cpu=0x39284b4b5210,
func=0x556261442f65 <do_nothing>, data=..., mutex=0x556262182640
<qemu_global_mutex>) at ../../cpus-common.c:162
#12 0x0000556260ff0efa in run_on_cpu (cpu=0x39284b4b5210,
func=0x556261442f65 <do_nothing>, data=...) at
../../softmmu/cpus.c:391
#13 0x0000556261442fc1 in tcg_log_global_after_sync
(listener=0x5562641304b0) at ../../softmmu/physmem.c:2652
#14 0x00005562614389e3 in memory_global_after_dirty_log_sync () at
../../softmmu/memory.c:2789
#15 0x0000556261436ee1 in memory_region_snapshot_and_clear_dirty
(mr=0x5562635ed060, addr=1007681536, size=614400, client=0) at
../../softmmu/memory.c:2251
#16 0x0000556260d68e85 in framebuffer_update_display
(ds=0x5562638c45a0, mem_section=0x39284b534050, cols=640, rows=480,
src_width=1280, dest_row_pitch=2560, dest_col_pitch=0, invalidate=1,
fn=0x556260d70312 <draw_line_src16>, opaque=0x39284b533bb0,
first_row=0x7ffdca6ab698, last_row=0x7ffdca6ab69c) at
../../hw/display/framebuffer.c:102
#17 0x0000556260d70854 in fb_update_display (opaque=0x39284b533bb0) at
../../hw/display/bcm2835_fb.c:203
#18 0x0000556260c757d3 in graphic_hw_update (con=0x5562632eda70) at
../../ui/console.c:230
#19 0x000055626109b639 in gd_refresh (dcl=0x556264050370) at ../../ui/gtk.c:438
#20 0x0000556260c79ec2 in dpy_refresh (s=0x5562638c4280) at
../../ui/console.c:1842
#21 0x0000556260c75514 in gui_update (opaque=0x5562638c4280) at
../../ui/console.c:165
#22 0x00005562616fed2e in timerlist_run_timers
(timer_list=0x5562632f42a0) at ../../util/qemu-timer.c:576
#23 0x00005562616fedd9 in qemu_clock_run_timers
(type=QEMU_CLOCK_REALTIME) at ../../util/qemu-timer.c:590
#24 0x00005562616ff0bf in qemu_clock_run_all_timers () at
../../util/qemu-timer.c:672
#25 0x00005562616f8b8f in main_loop_wait (nonblocking=0) at
../../util/main-loop.c:607
#26 0x0000556260ff9b28 in qemu_main_loop () at ../../softmmu/runstate.c:734
#27 0x0000556260c706ea in qemu_main (argc=9, argv=0x7ffdca6ab9f8,
envp=0x0) at ../../softmmu/main.c:38
#28 0x0000556260c70720 in main (argc=9, argv=0x7ffdca6ab9f8) at
../../softmmu/main.c:47

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]