lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] Device crashes while connected via TCP and Serial simul


From: Sergio R. Caprile
Subject: Re: [lwip-users] Device crashes while connected via TCP and Serial simultaneously
Date: Mon, 30 Jan 2017 10:54:53 -0300
User-agent: Mozilla/5.0 (Windows NT 6.1; rv:45.0) Gecko/20100101 Thunderbird/45.7.0

The micro you have is a Cortex-R, it does have an MPU.
Your system might be setting up protected spaces and that could trigger exceptions. Those would more likely be at start, not sporadically in the long run, but you need to be sure. You should know your init functions and ne able to tell if the MPU is on and if there are protected spaces.

You should be able to get exactly who is causing that exception by fetching in a particular place. That would probably point you right away to the culprit without much guessing and debugging. I can't help you with Cortex-R; I'm an -M guy. The MPU or the fetch unit triggered an exception because an instruction in some address (who) wanted to access memory in some place (where) where it is either non-existent (fetch unit) or not allowed (MPU). Since you said it is a data fetch exception, I bet it is the fetch unit and not the MPU, but it is your processor in your hardware and you have to know its intricacies. Once you have the address, you can check the map file the linker outputs to find the function that is doing that. Then you can probably put a breakpoint there and try to get why it is so.

Running an "I wrote it myself" application is not good enough, particularly if you never did before. A known to work application is for example those in the contrib tree (1.4.1) or in the app tree (2.0.x). Run one of those apps AND your serial stuff, remove all kind of communication between those, and when you are sure none of them is blowing your system, then start writing each other's buffer. If you could make the echo app run without a hassle, that is a good sign, but not enough. I've seen that run on bad ports mixing contexts. Netio would probably be the same, unless you run it as a master.

Again:
You must have:
 NO_SYS=1
 a main loop
 some interrupt handlers.

You must call lwIP functions in only one context, either main or interrupt, but not both. The code snippet you sent looks OK-enough to me not to cause a blowup (but that proves nothing), but you should check there is enough room before calling tcp_write(). You are sending frames on the main context. What do you do when the hardware signals there is an Ethernet frame ready ? You must not call lwIP from there, you need to queue it somewhere or keep it in the chip and raise a flag or equivalent; then the main loop will deliver the frame to lwIP. You can not have frames delivered to lwIP on interrupts and call lwIP to send frames on the main loop without wreaking havoc. Your raw_data_current structure might get corrupt by a wandering pointer, you could add an assert on raw_data_current->buffer before calling tcp_write() or you could add breakpoints or print it out to check it is where it should be. You can also enable

Please verify these and by all means learn your hardware and decode all the information in the exception, because that will let you decide what to do next.

And I read your first mail again... your serial handler has a big problem and is trashing memory, your debugger is telling you that with color signs and bells, you already found the culprit: "I also notice that when the problem occurs the buffers (in the double buffer) I use in the receiving interrupt routine of the serial interface point to an address outside the allowed region"

Once you verified you are using lwIP properly, fix that first.
How ?
Well, put breakpoints, log the pointer addresses, simulate the code, execute step by step, use the MPU, ask in the forum.
Why ?
Maybe your code is OK but the pointer gets trashed by another function, which causes the handler to trigger the exception. In such a case, you'll have to find who is pestering your pointer. The more common cause is arrays out bounds, trying to fit 20 elements in a 15-element array and not having noticed... but your mileage may vary. One tactic I use mostly is to disable all probable functions and enable them one by one. If you have a good debugger, you could trigger on accesses to the pointer address, and check wether those are from your function or "someone else", or you could program your MPU to detect that and trigger an exception. Again, ask in the forum or "hire an engineer" ;)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]