paparazzi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Paparazzi-devel] Race condition


From: heechul Yun
Subject: Re: [Paparazzi-devel] Race condition
Date: Mon, 25 Oct 2010 22:18:41 -0500

Thank you very much for your kind explanation. 
If I understand your explanation correctly, the race was between the following two function. 

(a) Interrupt handler 
dma1_ch2_irq_handler() - lisa/arch/stm32/lisa_overo_link_arch.c 
{
overo_link.status = DATA_AVAILABLE; 

(b) userland periodic function (called from main loop) 
OveroLinkPeriodic() - lisa/lisa_overo_link.h 
{
if ( overo_link.status != LOST && 
     overo_link.status != DATA_AVAILABLE )  
{  
                // <-- if the interrupt occurs here, then the received data will be lost. 
overo_link.status = LOST; 
}

I think this is a common mistake and disabling the interrupt is a right solution. 

I will be very thankful if you share similar race problems (interrupt or thread related) you experienced during the development of paparazzi. 

Regards, 

Heechul 

On Sat, Oct 23, 2010 at 4:38 PM, Tobias Fuchs <address@hidden> wrote:
Hi,

i think i can.
We implemented a straight forward SPI communication between two controllers.
In every SPI transaction, a MISO and a MOSI package (i.e. structs) are exchanged on the respective busses. Nothing special there.

In the communication process, the following components are involved:
- An SPI driver for handling SPI transfers in an interrupt (so, non-userland)
- An SPI userland periodic connection handler that is in effect running in the main loop, calling function OveroLinkPeriodic and OveroLinkEvent.

Some details that are important to understand the context of the problem:
The code you are looking at is for the slave side of the connection. In this case, SPI data is received in a DMA interrupt. In its handler, a flag is set that indicates the reception of new data, while in userland this flag is checked constantly in a periodic event OveroLinkPeriodic. If the flag is set to DATA_AVAILABLE, a data receive handler OveroLinkEvent is called, again in userland. This is a common routine so far.

We discovered that the connection was stable for higher package rates, but failed almost immediately on low package rates.
The problem was in the connection loss detection.

If i remember correctly (i am sorry, this is some time ago and i worked on lots of drivers), the critical part was this one:

[ ... ]

+    if (overo_link.timeout_cnt < OVERO_LINK_TIMEOUT)                   \
+      overo_link.timeout_cnt++;                                                \
+    else {                                                             \
+      __disable_irq();                                                 \
+      if (overo_link.status != LOST && overo_link.status != DATA_AVAILABLE ) { \
+       overo_link.status = LOST;                                       \
+       __enable_irq();                                                 \
[ ... ]

We are disabling interrupts - that is, the DMA interrupt handling SPI transfers - for a single if statement.
Why? Well, the flag "overo_link.status" is set in the DMA interrupt. When we were detecting a connection timeout, once in a while the DMA interrupt was run before we could test this flag to decide the handling ( = race condition) and would set the flag overo_link.status to DATA_AVAILABLE, as now there in fact was data on the buffer again.
So: Disable interrupts to ensure the DMA interrupt handler leaves our flags the way they are.

This actually is not an academic best practice for synchronizing - semaphores and mutexes usually are what you are told at university. In this case, the low-level approach was preferred over the introduction of a sophisticated (i.e. time consuming and error inducing) refactoring of how this flag is shared between driver and user land.

Side note: In total, this race condition took days of frustrating debugging and discussions over Skype until everyone could reproduce the bug (first problem with RCs) and we could locate its origin. The fix was written within some minutes and the connection is stable since :)

I hope this was helpful for your needs - otherwise don't hesitate to write some more questions, there is a lot of great embedded engineers on this mailing list!

Cheers,
tobi



2010/10/12 heechul Yun <address@hidden>
Hi, 

I am studying examples of race condition in real embedded systems. 
From the commit log of paparazzi, I found the following "race condition" description between revision 5348 and 5349. 
Can someone explain more detailed information about what this problem is? 

Thanks 

------------------------------------------------------------------------
r5349 | poine | 2010-08-12 16:36:29 -0500 (Thu, 12 Aug 2010) | 2 lines

removed a race condition in the timeout detection of the overo_link

------------------------------------------------------------------------


Index: sw/airborne/lisa/lisa_overo_link.c
===================================================================
--- sw/airborne/lisa/lisa_overo_link.c  (revision 5348)
+++ sw/airborne/lisa/lisa_overo_link.c  (revision 5349)
@@ -4,10 +4,11 @@

 void overo_link_init(void) {
  overo_link.status = IDLE;
-  overo_link.timeout = OVERO_LINK_TIMEOUT-1;
+  overo_link.timeout_cnt = OVERO_LINK_TIMEOUT-1;
  overo_link.msg_cnt = 0;
  overo_link.crc_err_cnt = 0;
  overo_link.crc_error = FALSE;
+  overo_link.timeout = FALSE;
  overo_link_arch_init();
 }

Index: sw/airborne/lisa/lisa_overo_link.h
===================================================================
--- sw/airborne/lisa/lisa_overo_link.h  (revision 5348)
+++ sw/airborne/lisa/lisa_overo_link.h  (revision 5349)
@@ -21,8 +21,10 @@
    struct OVERO_LINK_MSG_DOWN msg;
    uint8_t array[sizeof(union AutopilotMessage)];
  } down;
+  uint8_t timeout_cnt;
+  /* flags used to reset hardware */
+  uint8_t crc_error;
  uint8_t timeout;
-  uint8_t crc_error;
 };

 extern struct LisaOveroLink overo_link;
@@ -36,21 +38,44 @@

 #include "lisa_overo_link_arch.h"

+#if 0  /* that doesn't work yet */
 #define OveroLinkPeriodic(_timeout_handler) {                          \
-    if (overo_link.timeout < OVERO_LINK_TIMEOUT)                       \
-      overo_link.timeout++;                                            \
+    if (overo_link.timeout_cnt < OVERO_LINK_TIMEOUT)                   \
+      overo_link.timeout_cnt++;                                                \
    else {                                                             \
      if (overo_link.status != LOST && overo_link.status != DATA_AVAILABLE ) { \
+       SPI_Cmd(SPI1, DISABLE);                                         \
       overo_link.status = LOST;                                       \
       LED_OFF(OVERO_LINK_LED_OK);                                     \
       LED_ON(OVERO_LINK_LED_KO);                                      \
+       overo_link.timeout = TRUE;                                      \
       _timeout_handler();                                             \
      }                                                                        \
    }                                                                  \
  }
+#else   /* this one does */
+#define OveroLinkPeriodic(_timeout_handler) {                          \
+    if (overo_link.timeout_cnt < OVERO_LINK_TIMEOUT)                   \
+      overo_link.timeout_cnt++;                                                \
+    else {                                                             \
+      __disable_irq();                                                 \
+      if (overo_link.status != LOST && overo_link.status != DATA_AVAILABLE ) { \
+       overo_link.status = LOST;                                       \
+       __enable_irq();                                                 \
+       LED_OFF(OVERO_LINK_LED_OK);                                     \
+       LED_ON(OVERO_LINK_LED_KO);                                      \
+       _timeout_handler();                                             \
+      }                                                                        \
+      __enable_irq();                                                  \
+    }                                                                  \
+  }
+#endif



+
+
+
 /*
 *
 * Passing telemetry through Overo Link
Index: sw/airborne/lisa/arch/stm32/lisa_overo_link_arch.h
===================================================================
--- sw/airborne/lisa/arch/stm32/lisa_overo_link_arch.h  (revision 5348)
+++ sw/airborne/lisa/arch/stm32/lisa_overo_link_arch.h  (revision 5349)
@@ -5,11 +5,11 @@


 #define OveroLinkEvent(_data_received_handler, _crc_failed_handler) {  \
-    if (overo_link.status == DATA_AVAILABLE) {                   /* set by DMA interrupt */ \
+    if (overo_link.status == DATA_AVAILABLE) {        /* set by DMA interrupt */ \
      while(SPI_I2S_GetFlagStatus(SPI1, SPI_I2S_FLAG_RXNE)==RESET);    \
      while(SPI_I2S_GetFlagStatus(SPI1, SPI_I2S_FLAG_TXE) ==RESET);    \
      while(SPI_I2S_GetFlagStatus(SPI1, SPI_I2S_FLAG_BSY) ==SET);      \
-      overo_link.timeout = 0;                                          \
+      overo_link.timeout_cnt = 0;                                      \
      if((SPI_I2S_GetFlagStatus(SPI1, SPI_FLAG_CRCERR)) == RESET) {    \
       LED_ON(OVERO_LINK_LED_OK);                                      \
       LED_OFF(OVERO_LINK_LED_KO);                                     \
@@ -32,6 +32,11 @@
      overo_link_arch_prepare_next_transfert();                                \
      overo_link.crc_error = FALSE;                                    \
    }                                                                  \
+    if (overo_link.timeout &&                       /* if we've had a timeout         */ \
+       !GPIO_ReadInputDataBit(GPIOA, GPIO_Pin_4)) { /* and we're not selected anymore */ \
+      overo_link_arch_prepare_next_transfert();                                \
+      overo_link.timeout = FALSE;                                      \
+    }                                                                  \
  }





--

Heechul

_______________________________________________
Paparazzi-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/paparazzi-devel



_______________________________________________
Paparazzi-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/paparazzi-devel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]