lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] Infinite hang in tcp_slowtmr()


From: Sylvain Rochet
Subject: Re: [lwip-users] Infinite hang in tcp_slowtmr()
Date: Wed, 14 Oct 2015 19:33:51 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Stephen,

On Wed, Oct 14, 2015 at 09:13:59AM -0500, Stephen Cowell wrote:
> Hey Enrico,
> I'm using GNU toolchain/compiler, supplied with Atmel Studio 6.1.
> Since I've added the code I've had no other problems; I really don't
> have much time to research this, what with other pressures at work.
> 
> It seems the issue is not unknown... sometimes the pdb ends up pointing
> to itself.  These times appear to be correlated to high-stress I/O.
> 
> Obviously the last pdb should point to null... and it should never point
> to itself.  It is easy enough to catch it pointing to itself and make that
> null.  I verified that this was the first pdb, that we weren't going to
> have a memory leak when we just terminated the list.  I did not have
> the resources to chase down when the pointer to self happened...
> I only know that it does, and that the pdb that this happens to is
> at the first allocated pdb address.  The obvious thing to do was to
> correct the pointer to break the endless loop... seems to work.
> 
> As Sylvain wrote, the Atmel port has some serious differences from
> what he's used to seeing... I'm assuming this has something to do
> with it.  As I get more time (the product ships soon) I'll be able to
> spend some more time on this issue.  I'm just glad to get it out there
> and let others know it's happening.

A linked list corruption is a very serious problem, you really must not 
ship your product with such a known bug. Your workaround only mitigate a 
single common corruption pattern on linked list, but that's only one of 
them. It will break soon or later with an other pattern.

If a linked list is corrupted it's because there is a reentrancy problem 
in functions modifying the linked list. Which really limit the scope 
where reentrancy can occur. We have critical sections for !NO_SYS 
systems, you could use the critical sections hooks to check if 
reentrancy constraints are respected, 
SYS_ARCH_DECL_PROTECT/SYS_ARCH_PROTECT/SYS_ARCH_UNPROTECT.

At least, if you want to ship your product very quickly, just define 
those hooks to something appropriate (those are recursive locks so 
you'll have take care of that) and you should be safe, for now.

Sylvain

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]