bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bug#77857: hurd: write_node assertion failed building emacs


From: Roland McGrath
Subject: Bug#77857: hurd: write_node assertion failed building emacs
Date: Thu, 23 Nov 2000 19:57:08 -0500 (EST)

> This seems to be some race condition betweenm the sync thread and other
> dn_set_?time mangling stuff. 

I would tend to agree.  Notice for example thread 3, which appears to be a
peropen in the process of dying.  That is probably the temacs open file
descriptor on the file being written, being closed within the few seconds
while your sleep call was blocking ext2fs from crashing.  Something to
think about is that (last I knew) temacs is writing the data with mmap
rather than write; that indicates the possibility of the file pager being
the suspect agent interacting with the sync thread.

> It's only strange that it never happened before, and building emacs is
> such a reproducible test case (huge file with a hardlink? Can't be the
> only reason, as it does only happen with a full build, not with an
> interrupted and restarted build).

You are repeating it by booting and doing the same pattern of file access
on the same machine, right?  I would tend to think you've just gotten lucky
with the timing that works out in your configuration with this access
pattern, and that timing stays the same when you reboot and do about the
same thing.

> I can reproduce this easily, so if more testing and debugging is requried, I
> am happy to do that.

Yes, please.  I can only speculate about what might be going on from
looking at a few of your observations.  You can instrument and tweak things
and learn much more as long as you keep seeing the problem.  A wacky idea
that might help narrow down quickly is to frob every place that sets
dn_set_?time so that instead of setting them to 1 it uses a unique nonzero
value in each location (or maybe fetches the caller's PC or something);
then you should be able to see what code touched it last in the race.

> * The node->lock is held, which should probably avoid syncing?

> * Not all parts of the system which set some dn_set_?time flag call
> diskfs_node_update consecutively (for example, write_symlink). 

I don't think that ought to be a problem; in fact, it would be seriously
inefficient to make them do so.  This just means that the time fields don't
need to be updated (e.g. atime for a read) until the node is making its
normal way out to be synchronized.  Whenever someone does call
diskfs_node_update, that will take care of dn_set_?time.  But I'm not
entirely clearly on all this code.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]