Re: [PATCH] WIP: Fix ext2fs remount race condition

bug-hurd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] WIP: Fix ext2fs remount race condition

From:	Justus Winter
Subject:	Re: [PATCH] WIP: Fix ext2fs remount race condition
Date:	Wed, 22 Jul 2015 16:56:53 +0200
User-agent:	alot/0.3.5

Hi James :)

a few nitpicks upfront:

Quoting James Clarke (2015-07-22 02:46:48)
> --- a/ext2fs/ext2fs.c
> +++ b/ext2fs/ext2fs.c
> @@ -207,10 +207,28 @@ main (int argc, char **argv)
>  error_t
>  diskfs_reload_global_state ()
>  {
> +  error_t err;
> +
>    pokel_flush (&global_pokel);
>    pager_flush (diskfs_disk_pager, 1);
> +
> +  /* Paging must specifically be inhibited; if not, a paging request
> +     could be handled while sblock is still NULL.
> +     While some RPCs are inhibited when this function is called by
> +     libdiskfs, paging RPCs are still enabled. Even if we were to
> +     inhibit paging RPCs, libpager has its own pool of workers to handle
> +     requests asynchronously, which libports is unaware of, so requests
> +     can be handled even after the relevant RPCs are disabled.  This is
> +     all dealt with by {inhibit,resume}_disk_pager.  */
> +  err = inhibit_disk_pager ();

I don't like that name, it implies that merely the disk pager is
inhibited.

> +  if (err)
> +    return err;
> +
>    sblock = NULL;

Please remove that as discussed.  Nulling sblock merely prevents it
from being munmapped.

> --- a/fatfs/fatfs.h
> +++ b/fatfs/fatfs.h

Thanks for looking after fatfs.  I usually mimick all changes that I'm
doing to ext2fs.  Note however, that fatfs is overall in a bad shape,
to the point that I don't expect it to work, even though it kindof
does for some reason.

>  /* Start the worker threads libpager uses to service requests.  */
>  error_t
> -pager_start_workers (struct port_bucket *pager_bucket)
> +pager_start_workers (struct requests **out_requests,
> +                    struct port_bucket *pager_bucket)

It might be a matter of taste, but I would have added the out
parameter at the end of the parameter list.  Not sure how this is
usually done in the Hurd codebase tbh.

>  {
>    error_t err;
>    int i;
>    pthread_t t;
>    struct requests *requests;
>  
> +  if (out_requests == NULL)
> +    /* Return rather than using goto done, since that would dereference
> +       out_requests.  */
> +    return EINVAL;
> +

If at all, then `assert (out_requests);'.

> +error_t
> +pager_inhibit_workers (struct requests *requests)
> +{
> +  error_t err;
> +
> +  pthread_mutex_lock (&requests->lock);
> +
> +  /* Check the workers are not already inhibited nor in the process of
> +     being inhibited, and only create a new queue if necessary;
> +     otherwise the queued requests would be discarded, and queue_out
> +     would be leaked.  */
> +  if (requests->queue_out == requests->queue_in)

Likewise, `assert (requests->queue_out == requests->queue_in);'.
Calling inhibit twice is a breach of protocol.

You might need to serialize the diskfs_reload_global_state function,
but that should reduce the complexity....

> +    {
> +      /* Any new paging requests will go into a new queue.  */
> +      struct queue *new_queue = malloc (sizeof *new_queue);
> +      if (new_queue == NULL)
> +       {
> +         err = ENOMEM;
> +         goto done_locked;
> +       }
> +      queue_init (new_queue);
> +      requests->queue_in = new_queue;
> +    }
> +
> +  /* Wait until all the workers are asleep, as then the old queue and
> +     all individual worker queues have been drained.  */
> +  while (requests->asleep < WORKER_COUNT)
> +    pthread_cond_wait (&requests->inhibit_wakeup, &requests->lock);
> +
> +  pthread_cond_broadcast (&requests->resume_wakeup);

... here, trying to deal with pager_resume_workers being called before
pager_inhibit_workers is finished.  That's not a valid use case imho.

> +
> +done_locked:
> +  pthread_mutex_unlock (&requests->lock);
> +  return err;
> +}
> +
> +void
> +pager_resume_workers (struct requests *requests)
> +{
> +  pthread_mutex_lock (&requests->lock);
> +
> +  /* Check the workers are inhibited/being inhibited.  */
> +  if (requests->queue_out != requests->queue_in)

Assert that.

> +    {
> +      /* If the inhibiting has not yet finished (the old queue has not
> +        drained), wait for it to do so.  */
> +      while (requests->asleep < WORKER_COUNT)

Assert that.

> +       pthread_cond_wait (&requests->resume_wakeup, &requests->lock);
> +
> +      /* Another resume may have run in the meantime, in which case the
> +        old queue has already been freed, so queue_out should not be
> +        freed and updated to be queue_in.  */
> +      if (requests->queue_out != requests->queue_in)
> +       {
> +         /* The queue has been drained and will no longer be used.  */
> +         free (requests->queue_out);
> +         requests->queue_out = requests->queue_in;
> +         /* We need to wake up all workers, as there could be multiple
> +            requests in the new queue.  */
> +         pthread_cond_broadcast (&requests->wakeup);
> +       }
> +    }
> +
> +  pthread_mutex_unlock (&requests->lock);
> +}

Many thanks :)
Justus

[Prev in Thread]

Current Thread

[Next in Thread]

Fixing re-mounting of ext2fs, (continued)

Prev by Date: [PATCH] WIP: Fix ext2fs remount race condition
Next by Date: [PATCH] Fix race condition in ext2fs when remounting
Previous by thread: [PATCH] WIP: Fix ext2fs remount race condition
Next by thread: Problem with natively built binaries on Hurd from Guix
Index(es):
- Date
- Thread