lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-users] Device crashes while connected via TCP and Serial simultane


From: Julio Cesar Aguilar Zerpa
Subject: [lwip-users] Device crashes while connected via TCP and Serial simultaneously
Date: Thu, 26 Jan 2017 11:42:50 +0100
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0

Hi there guys,

I don't know how to approach this problem. Hopefully, you can give me some tips.

I am working with lwip 1.4.1 and the Texas Instrument HDK RM57 without an OS.

Sorry for the long email. I need you to get a good overview of my problem :-)

How my program works:
  • sensor data comes in through a serial interface in asynchronous mode
  • once all data is received, it is sent to a client on a PC via TCP (the server is the board)

Problem:

  • after some random time (normally around an hour), the program crashes with a "data fetch" error

Things I noticed:

  • The L4, L4_ABT, L4_USR registers point towards a problem with the serial interface (bad address). I know that whatever is pointed to by the L4 register doesn't necessarily mean that the problem lies at that instruction. The L4 register is set at the time the debugger or board notices the problem, but the time at which the problem occured could be several instructions before. However, I also notice that when the problem occurs the buffers (in the double buffer) I use in the receiving interrupt routine of the serial interface point to an address outside the allowed region. This buffers are part of my application, not system level buffers.
  • I use another serial interface to send some debug information to my developing PC (in synchronous mode). I noticed that when I increased the amount of debug data and the frequency at which it is sent, the program crashes faster. It normally takes about an hour to crash, with more debug info it took about 15 minutes or less.

Tests I did:

  • I left the program running without the TCP server for a whole a day. Increased the amount and frequency of data sent over the debug serial interface. Sensor data was being received in asynchronous mode. Program did NOT crashed.
  • I left the program running with the TCP server for a whole a day with a single static buffer which was initialized once and never changed. Serial interface to the sensor was not active. Debug serial interface was active but sending data once in a while. Program did NOT crashed.
  • I left the program running with the TCP server for 4 hours with a double static buffer which was being updated every 60ms with dummy values. Serial interface to the sensor was not active. Debug serial interface was active but sending data once in a while. Program did NOT crashed. (I did this to test if my copy function somehow was fault).
  • I tried running the TCP Server and the sensor serial interface at the same time but without copying the serial buffer to the tcp buffer. The TCP server was sending, in one test, the single static buffer that is never changed, and in the other test, the double static buffer (being updated every 60ms with dummy values).  In both tests, the program crashed.

The problem only occurs when both the TCP and the asynchronous serial communication are active at the same time.

(Maybe related) The TCP client on the PC is actually a GUI that displays my sensor data. When connected, the "image" of the sensor data "jumps" once every ~2 seconds. The data is wrong. I thought this could be a copy error from the serial buffer to the tcp buffer. But, before the board sends the data over TCP, it processes it and checks if the data is wrong or corrupted. If it is, it sends several error signals (LEDs and serial debug data). When the image in the GUI jumps, I also get the error signals from the board (which means, the data is actually wrong). When the server is not connected to the GUI, I do NOT get those error signals from the board.

This looks as if the TCP server somehow affects the interrupt routine of the serial interface (or the same memory area?) (and somehow corrupts its buffers?). So, I saw in my port (which I got from a Texas Instrument tutorial on LwIP and the board I am using) that the functions below are called the whole time because of SYS_LIGHTWEIGHT_PROT = 1. I thought that maybe the serial interface doesn't like it when its interrupt is being enabled and disabled that fast.

sys_prot_t
sys_arch_protect(void)
{
  sys_prot_t status;
  status = (IntMasterStatusGet() & 0xFF);

  IntMasterIRQDisable();
  return status;
}
void
sys_arch_unprotect(sys_prot_t lev)
{
  /* Only turn interrupts back on if they were originally on when the matching
     sys_arch_protect() call was made. */
  if((lev & 0x80) == 0) {
    IntMasterIRQEnable();
  }

void IntMasterIRQEnable(void)
{
    _enable_IRQ();
    return;
}

void IntMasterIRQDisable(void)
{
    _disable_IRQ();
    return;
}

I then changed that define to SYS_LIGHTWEIGHT_PROT = 0. This functions were not called again, but the program still crashes.

(Maybe related) I have another connection to another part of the GUI. Both connections work at the same time. The board crashes faster than with one TCP connection.

(Unrelated) I checked both connections with Wireshark. I noticed that the protocol of connection A (52 bytes data size) is TCP and the description shows "PSH", "ACK" and similar things. However, the protocol of connection B (sensor data: 1056 bytes) says ECHO and the description just shows "response". I use two servers with the same structure. Why would one have another protocol type? What does that ECHO mean?

By the way, I've had the lwip stats active when the crash occured but it doesn't show any error.

I don't have much experience in embedded programming. I don't know how to investigate this deeper. What else can I check? Maybe my port is not right? Has someone a port to the HDK RM57 that I can compare?

I really appreciate any help.

Best regards,

Julio


reply via email to

[Prev in Thread] Current Thread [Next in Thread]