[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] WIP: Fix ext2fs remount race condition
From: |
Justus Winter |
Subject: |
Re: [PATCH] WIP: Fix ext2fs remount race condition |
Date: |
Wed, 22 Jul 2015 16:56:53 +0200 |
User-agent: |
alot/0.3.5 |
Hi James :)
a few nitpicks upfront:
Quoting James Clarke (2015-07-22 02:46:48)
> --- a/ext2fs/ext2fs.c
> +++ b/ext2fs/ext2fs.c
> @@ -207,10 +207,28 @@ main (int argc, char **argv)
> error_t
> diskfs_reload_global_state ()
> {
> + error_t err;
> +
> pokel_flush (&global_pokel);
> pager_flush (diskfs_disk_pager, 1);
> +
> + /* Paging must specifically be inhibited; if not, a paging request
> + could be handled while sblock is still NULL.
> + While some RPCs are inhibited when this function is called by
> + libdiskfs, paging RPCs are still enabled. Even if we were to
> + inhibit paging RPCs, libpager has its own pool of workers to handle
> + requests asynchronously, which libports is unaware of, so requests
> + can be handled even after the relevant RPCs are disabled. This is
> + all dealt with by {inhibit,resume}_disk_pager. */
> + err = inhibit_disk_pager ();
I don't like that name, it implies that merely the disk pager is
inhibited.
> + if (err)
> + return err;
> +
> sblock = NULL;
Please remove that as discussed. Nulling sblock merely prevents it
from being munmapped.
> --- a/fatfs/fatfs.h
> +++ b/fatfs/fatfs.h
Thanks for looking after fatfs. I usually mimick all changes that I'm
doing to ext2fs. Note however, that fatfs is overall in a bad shape,
to the point that I don't expect it to work, even though it kindof
does for some reason.
> /* Start the worker threads libpager uses to service requests. */
> error_t
> -pager_start_workers (struct port_bucket *pager_bucket)
> +pager_start_workers (struct requests **out_requests,
> + struct port_bucket *pager_bucket)
It might be a matter of taste, but I would have added the out
parameter at the end of the parameter list. Not sure how this is
usually done in the Hurd codebase tbh.
> {
> error_t err;
> int i;
> pthread_t t;
> struct requests *requests;
>
> + if (out_requests == NULL)
> + /* Return rather than using goto done, since that would dereference
> + out_requests. */
> + return EINVAL;
> +
If at all, then `assert (out_requests);'.
> +error_t
> +pager_inhibit_workers (struct requests *requests)
> +{
> + error_t err;
> +
> + pthread_mutex_lock (&requests->lock);
> +
> + /* Check the workers are not already inhibited nor in the process of
> + being inhibited, and only create a new queue if necessary;
> + otherwise the queued requests would be discarded, and queue_out
> + would be leaked. */
> + if (requests->queue_out == requests->queue_in)
Likewise, `assert (requests->queue_out == requests->queue_in);'.
Calling inhibit twice is a breach of protocol.
You might need to serialize the diskfs_reload_global_state function,
but that should reduce the complexity....
> + {
> + /* Any new paging requests will go into a new queue. */
> + struct queue *new_queue = malloc (sizeof *new_queue);
> + if (new_queue == NULL)
> + {
> + err = ENOMEM;
> + goto done_locked;
> + }
> + queue_init (new_queue);
> + requests->queue_in = new_queue;
> + }
> +
> + /* Wait until all the workers are asleep, as then the old queue and
> + all individual worker queues have been drained. */
> + while (requests->asleep < WORKER_COUNT)
> + pthread_cond_wait (&requests->inhibit_wakeup, &requests->lock);
> +
> + pthread_cond_broadcast (&requests->resume_wakeup);
... here, trying to deal with pager_resume_workers being called before
pager_inhibit_workers is finished. That's not a valid use case imho.
> +
> +done_locked:
> + pthread_mutex_unlock (&requests->lock);
> + return err;
> +}
> +
> +void
> +pager_resume_workers (struct requests *requests)
> +{
> + pthread_mutex_lock (&requests->lock);
> +
> + /* Check the workers are inhibited/being inhibited. */
> + if (requests->queue_out != requests->queue_in)
Assert that.
> + {
> + /* If the inhibiting has not yet finished (the old queue has not
> + drained), wait for it to do so. */
> + while (requests->asleep < WORKER_COUNT)
Assert that.
> + pthread_cond_wait (&requests->resume_wakeup, &requests->lock);
> +
> + /* Another resume may have run in the meantime, in which case the
> + old queue has already been freed, so queue_out should not be
> + freed and updated to be queue_in. */
> + if (requests->queue_out != requests->queue_in)
> + {
> + /* The queue has been drained and will no longer be used. */
> + free (requests->queue_out);
> + requests->queue_out = requests->queue_in;
> + /* We need to wake up all workers, as there could be multiple
> + requests in the new queue. */
> + pthread_cond_broadcast (&requests->wakeup);
> + }
> + }
> +
> + pthread_mutex_unlock (&requests->lock);
> +}
Many thanks :)
Justus
- Fixing re-mounting of ext2fs, (continued)
- Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault, Justus Winter, 2015/07/14
- Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault, James Clarke, 2015/07/15
- Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault, Justus Winter, 2015/07/19
- Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault, James Clarke, 2015/07/19
- Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault, Richard Braun, 2015/07/19
- Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault, James Clarke, 2015/07/20
- [PATCH] WIP: Fix ext2fs remount race condition, James Clarke, 2015/07/21
- Re: [PATCH] WIP: Fix ext2fs remount race condition,
Justus Winter <=