bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault


From: James Clarke
Subject: Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault
Date: Sat, 11 Jul 2015 21:33:44 +0100

I did some more digging around today. I think what’s happening is that ext2fs 
tries to handle a pager RPC while the disk is being remounted.

We do call ports_inhibit_class_rpcs, which will wait until all RPCs for that 
class have finished. However, we call this with diskfs_protoid_class, which 
does *not* include the pager ports. These are added to _pager_class 
(libpager/priv.h) in pager_create (libpager/pager-create.c:32) and 
disk_pager_bucket (ext2fs/pager.c) in create_disk_pager (ext2fs/pager.c), and 
so as a result I believe we can get pager RPCs while remounting, leading to the 
call to ext2_getblk. Below is the stack for the call to ext2_getblk that leads 
to dereferencing sblock when it is NULL:

 0  ext2fs/getblk.c:253 (ext2_getblk)
 1  ext2fs/pager.c:147 (find_block)
 2  ext2fs/pager.c:244 (file_pager_read_page)
 3  ext2fs/pager.c:550 (pager_read_page)
 4  libpager/data-request.c:113 (_pager_S_memory_object_data_request)
 5  libpager/memory_objectServer.c:443 (_Xmemory_object_data_request)
 6  libpager/demuxer.c:215 (worker_func)
 7  libpthread/pthread/pt-create.c:64 (entry_point)

James Clarke

> On 27 Jun 2015, at 20:34, Richard Braun <rbraun@sceen.net> wrote:
> 
> On Sat, Jun 27, 2015 at 03:39:58PM +0100, James Clarke wrote:
>> I have been suffering a lot from my Hurd system (running in VirtualBox) 
>> hanging at startup, just after "Hurd server bootstrap..." but before "INIT: 
>> version 2.88 booting".
>> 
>> I have been able to trace it back to getblk.c:248 (unsigned long 
>> addr_per_block = EXT2_ADDR_PER_BLOCK (sblock);) in ext2_getblk. It faults 
>> because sblock is NULL.
>> 
>> I have traced the execution with debugging statements, and what seems to 
>> happen is as follows:
>> 
>> 1. diskfs_remount is called (because root is remounted as rw)
>> 2. RPCs are inhibited
>> 3. diskfs_reload_global_state is called
>> 4. sblock is set to NULL
>> 5. While this is happening, ext2_getblk is called
>> 
>> If you’re lucky, the superblock is read and sblock is set to point to this 
>> data before 5 (or at least before it gets to dereferencing sblock). If not, 
>> sblock is still NULL and thus a page fault is raised, causing the system to 
>> be stuck.
>> 
>> Does anyone have an idea how this situation could be occurring?
> 
> My initial thought would be "how could it not happen ?".
> 
> Despite diskfs_remount calling ports_inhibit_class_rpcs, other threads
> can very well be running to process previously received messages. There
> seems to be no other form of access synchronization such as locks in
> diskfs_reload_global_state.
> 
> Can you get the call trace leading to ext2_getblk ? I'm not sure about
> backtrace(3) in static executables but it might be worth trying.
> 
> -- 
> Richard Braun




reply via email to

[Prev in Thread] Current Thread [Next in Thread]