bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bug#77857: hurd: write_node assertion failed building emacs


From: Roland McGrath
Subject: Bug#77857: hurd: write_node assertion failed building emacs
Date: Sat, 2 Dec 2000 23:32:56 -0500 (EST)

> On Thu, Nov 23, 2000 at 10:34:53PM -0500, Roland McGrath wrote:
> > The first thing to figure out is what the code path looks like that reaches
> > getblk when it sets dn_set_mtime there (I think it should be
> > pager_unlock_page that is doing it, but you should check).
> 
> pager_unlock_page and diskfs_grow are the only callers of ext2_getblk with
> create = 1, so those are the only code paths here. Does it matter if it is
> one or the other?

Yes, because the two situations have different sets of locking issues.

> > It matters to understand whether or not that code is going to
> > eventually call set_node_times before it matters that the mtime update
> > be reflected.  If so, then it doesn't matter whether that mtime update
> > happens before or after the node sync, as long as it doesn't get lost.
> > That should be achievable by rewriting diskfs_set_node_times so it
> > does:
> > 
> >     if (np->dn_set_mtime)
> >       {
> >         ...
> >         np->dn_set_mtime = 0;
> >         np->dn_stat_dirty = 1;
> >       }
> > 
> > instead of the separate tests done now (and same for the other two).  That
> > way, dn_set_mtime is only touched if it was just acted upon--if it gets set
> > right after the test, then it will remain set for later.  Then it would be
> > safe to just remove that assert and let the update take place later.
> 
> Mmh. This makes the race window smaller, but doesn't do away with it
> entirely.

I don't know what you mean.  Please cite the precise race scenario you are
concerned with.  I see no race in the sequence above: if dn_set_mtime is
clear at the time of the test and then set immediately after the test, the
test fails and it won't be cleared by the sequence.

> > But we can't do that before we make sure it's really true that the code
> > path that does the problematic mutation of dn_set_mtime is going to handle
> > it properly in due course and with all appropriate synchronization.  (I
> > would have to read the 1003.1 rules to be sure what the requirements are.)
> 
> Please do so. I don't understand what your concerns are. The way I see it,
> every failure condition with the assertion removed can also happen with the
> assertion, if it happens directly after the assertion check instead shortly
> before. I am probably overlooking something. In any way I am not seeing how
> one could approach if there actually is a failure condition to worry about.

I really don't know what you mean by "failure condition" here.  You must be
specific.  I have been talking about the specific situations in which a
dn_set_mtime setting could or couldn't be lost altogether, and the order of
operations.

I have checked 1003.1-1996 vis a vis access to mmap'd file data (read
access that should update st_atime, and write access that should update
both st_mtime and st_ctime).  It is clear that the requirement is quite
loose: the time updates must happen eventually, but the only required
synchronization is with msync (which we don't have anyway).

For diskfs_grow we need to verify that diskfs_set_node_times (or
diskfs_node_update, which calls it) gets called eventually afterward.  I
think that it certainly does, but I haven't checked all the code carefully.

pager_unlock_page is only called as part of page fault handling in the file
pager.  In the mmap case, it doesn't matter that diskfs_set_node_times get
called any time soon, as long as the update mark is not lost.  The other
case is _diskfs_rdwr_internal; there the update mark likewise must not be
lost but need not be processed immediately.  There is a requirement for
stat, which is taken care of in diskfs_S_io_stat.  

So I think that, in fact, if we eliminate races that could lose update
marks (as opposed to just delaying their processing) then there will be no
danger.  The assert is then overaggressive and can be removed.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]