xenomai-main
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xenomai-main] Possible spurious wakeups in vrtx/posix for xenomai


From: Pulle, Philip (Ex AS01)
Subject: [Xenomai-main] Possible spurious wakeups in vrtx/posix for xenomai
Date: Fri, 15 Aug 2003 15:31:36 +1000

Thanks to any who are still working on the xenomai project or can provide
some insight....

I am porting an old VRTX32 68k system to a PC104 SBC based Linux box, the
application doesn't use any peripherals like network, video, X etc, only the
serial ports to talk to field devices.

So far the xenomai system has been great. Many thanks to all those who have
help to create it.

My Setup:
        Advantech PCM5820 - PC104 motherboard, with CF IDE.
        Advantech PCM3614 - serial expander daughter board
        Advantech PCM3725 - io card
        Stock Linux RH8 with 2.4.18 kernel patched to do core dump threads.
Note errors occured with stock 2.4.18 kernel before     patch used to help
in debugging.
        Xenomai-1.1.1 compiled with posix support and using the VRTX
emulator
        libpthread-0.10.so (is there a more recent one.....if so where?, I
tried but haven't had time to implement ngpt properly)

The Problem:

As we've activated more features in our system (hence more tasks/threads
active, more serial activity) I'm getting a seemingly random set of errors
after about 2-3 hours of running including:
        'illegal context for call'
        'invalid conjuctive wait'
        'mutex owner not runnable' 
        etc
We've got about 20-30 threads running and there is nearly continuous serial
traffic on 2 serial lines (so far, we want 6) . The code runs at less than
2-3% CPU.

I modified xenomai to dump the core when it got these errors (any way of
doing this with clean code?), though the errors occured before the patch.
>From the core dumps these errors seem to occur in a variety of tasks, not
just one. They always occur when xenomai seems to be checking if a task is
in the right state to execute a blocking VRTX function (eg sc_pend,
sc_delay, sc_qpend etc). It seemed that a thread has started for some
unknown reason, whilst xenomai was doing something else, and this meant that
the thread was not in a valid state, hence xenomai bailed out. All threads
except for the one that caused the error are generally in sigsuspend.

>From the web today I've learnt a new term 'spurious wakeup' and that there
is a possibility that pthreads can occasionly wakeup.......which sounded
like exactly my situation. I haven't been able to de-cypher a consistant
story from the web yet, so I thought I'd try here first.

Lastly when I run the system under gdb/ddd I get a SIG32 and the system
stops. If I use gdb command 'handle SIG32 nostop noprint nopass' then the
system works properly, then I cannot get it to crash. I'll be trying it over
the weekend to check it over several days of operation. Of course we cannot
ship with our system running gdb though!!

Questions:
Can anyone help with the following...
1) I'm using abort() to dump the core when I get these errors in
xn_pod_fatal(), would this obscure the actual details in the core dump? Is
there a better way to snapshot the conditions causing the error?
2) would the 'faked' functions in posix.h be a factor, would using a more RT
Linux core help...any suggestions?
3) is there a known problem in the xenomai core related to this? Can the
original designers think of any issue that they were uncertain of?
4) can anyone provide a coherent viewpoint of the spurious thread wakeup
issue wrt how xenomai works? Anyone familiar with this issue, are there
kernel patches, updated libraries, compilers etc I should be using to stop
the problem, or have I got the issue around the wrong way and this is proper
behaviour?
5) if there isn't sufficient information above to detail the problem, can
you suggest what I would need to do to get more details. I can follow
straightforward kernel compiling/patching etc from a FAQ but am not a Linux
expert enough to work with very obscure stuff without a bit of direction
(I'd be interested to learn though!).
6) what could be significance of the SIG32 handling under gdb? Could it be
slower operation under gdb, the supressing of SIG32? How would I do the
equivalent of the gdb command 'handle SIG32 nostop...' in the actual code.
7) Is there a way to determine which (if any) signal has caused the thread
to wakeup? From the core, or some construct in the code.

again any help, leads or observations would be appreciated.

Phil





reply via email to

[Prev in Thread] Current Thread [Next in Thread]