discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Thread safety of PMT objects in python


From: Marcus Müller
Subject: Re: [Discuss-gnuradio] Thread safety of PMT objects in python
Date: Sat, 9 Jul 2016 14:58:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1

Hello Jonas,

The problem arises when accessing 'old' PMTs. That is PMTs, that were
handed over to python from the C++ domain in the past, i.e. through a
message handling callback. It appears the PMTs are only valid throughout
the duration of the function they were handed to.
Hm, yes, that sounds like the typical C++ object life time. (In fact, as I'll explain below, the problem lies deeper than threading – it's about object ownership, which is kind of borked for PMTs, here, and actually, not only those.)

So, great that you attached a test case! By the way, segfaults in valid code (i.e. with small exceptions in any python code) are usually bugs, and you're more than invited to open a bug report under [1], but you'll need a gnuradio.org redmine account to see the "New Issue" button.

Rather than just present an answer I'll explain what I'm doing here, so that you (and others) might recreate. I think there will not be much new info in here for you, Jonas, but rather than just doing what I did to verify, I'd thought I share

Roughly:

  1. Get a test case. You supplied one; I can easily verify that, yes, this crashes! Great! (This is the rare occasion where one can say "great, it crashes!"; cherish these moments...)
  2. Understand the test case; you already supplied an explanation of what it does, and that is greatly helpful here
  3. Throw your debugger at the problem
  4. ???
  5. Profit!

So, basically, we're stuck with 3. There's this [2] wiki page that explains what you can do with bog-normal GDB and python scripts. The current state of affairs is that at least Fedora (and I suspect Arch, too) ship GDB and python-devel (or their Arch/pacman equivalents) with a script that automatically enables python symbol name resolution when running a python process – which is great, because that allows us to see in which python functions things go wrong!

Then it all comes down to running (after installing the debug infos for a lot of libraries – luckily, my GDB even prints out the actual package manager commands I need to run to install the missing debug symbols)

gdb --args python /tmp/min_err_repro.py

then, on the GDB shell, "run", wait for the crash, and then "bt" (short for "backtrace"). This led to this output for me:

#0  0x00007fffef62d2c5 in boost::detail::atomic_count::atomic_exchange_and_add (dv=1, pw=0x39) at /usr/include/boost/smart_ptr/detail/atomic_count_gcc_x86.hpp:67
#1  boost::detail::atomic_count::operator++ (this=0x39) at /usr/include/boost/smart_ptr/detail/atomic_count_gcc_x86.hpp:30
#2  pmt::intrusive_ptr_add_ref (address@hidden) at /home/marcus/src/gnuradio/gnuradio-runtime/lib/pmt/pmt.cc:66
#3  0x00007fffe7e184c5 in boost::intrusive_ptr<pmt::pmt_base>::intrusive_ptr (rhs=..., this=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:92
#4  boost::intrusive_ptr<pmt::pmt_base>::operator= (rhs=..., this=<synthetic pointer>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:129
#5  _wrap_write_string (args=<optimized out>, kwargs=<optimized out>) at /home/marcus/src/gnuradio/build/gnuradio-runtime/swig/pmt_swigPYTHON_wrap.cxx:39897
#6  0x00007ffff7af2796 in call_function (oparg=<optimized out>, pp_stack=0x7fffde621220) at /usr/src/debug/Python-2.7.11/Python/ceval.c:4427
#7  PyEval_EvalFrameEx (
    address@hidden 0x7fffdf682730, for file /home/marcus/.usrlocal/lib64/python2.7/site-packages/pmt/pmt_swig.py, line 3295, in write_string (obj=<swig_int_ptr(this=<SwigPyObject at remote 0x7fffdfcfaed0>) at remote 0x7fffdf659a10>), 
    address@hidden) at /usr/src/debug/Python-2.7.11/Python/ceval.c:3061
#8  0x00007ffff7af23e2 in fast_function (nk=<optimized out>, na=<optimized out>, n=1, pp_stack=0x7fffde621360, func=<optimized out>) at /usr/src/debug/Python-2.7.11/Python/ceval.c:4513
#9  call_function (oparg=<optimized out>, pp_stack=0x7fffde621360) at /usr/src/debug/Python-2.7.11/Python/ceval.c:4448

So, yes, your suspicion was pretty right, this has something to do with with the handling of objects in "pythonland".

PMTs are a bit special in a number of ways. I don't like all of these, because they make those polymorphic types meant to be used for portability less portable :)

So, first of all, pmt::pmt_t is actually a typedef for boost::intrusive_pointer<pmt_base>, which is a refcounting pointer wrapper.

Now, if you hand over pmt_t from C++ to Python, Python needs your object to be a CPython PyObject, which is the Python-internal "universal" struct that's behind every single Python object. GNU Radio could have written "glue code" for every single thing that we want to expose to Python from C++, but instead, SWIG is used – which (kind of) fully automatically generates wrapper code for C++/C functions, and adds PyObjects with the appropriate properties and function delegates (including type conversions etc) to all the classes that we need in Python.

So, this all is a bit of an onion situation:

Python(SWIG-generated PyObject(SWIG type abstraction(Intrusive Pointer (pmt_base) ) ) )

Notice how we have a bit of a problem here:

Python has its own refcounting for the PyObject* that it handles. In other words, as you do

        key = self.get_tags_in_range(0, offs, offs+1)[0].key

Python increases the refcount of the PyObject that "self.get...[0].key" is, and makes the "key" refer to that, but that does not increase the refcount the intrusive_ptr has! In other words, after the GNU Radio scheduler is done calling work (through C++/Python PyEval delegation), it executes a "pruning" algorithm to identify the tags that do no longer need to be held in the block's internal tag registry, and removes them from the same, reducing their refcount – and if that count hits 0, then the pmt_base the intrusive_ptr points to (and the intrusive_ptr itself) gets deallocated.

Python's PyObject* doesn't notice any of that. It just happily calls pmt:: functions on non-existing objects when you do 

print self.tags

which can lead to a seg fault already at the second iteration.

Absolutely the same business is happening with your self.messages contents – only that messages in a single sender/single receiver scenario hit zero refcount more reliably.

Workaround: yeah.
Either extract the actual information you need from the PMTs the moment you get them and store it in native python types, which is what I do most of the time, or generate a copy by means of PMT functions to store the same, or fix the PMT code (which would arguably be the only sane thing to do, but my time currently doesn't allow for that).

Cheers,

Marcus

[1] http://gnuradio.org/redmine/projects/gnuradio/issues
[2] http://gnuradio.org/redmine/projects/gnuradio/wiki/TutorialsGDB



On 04.07.2016 21:33, Jonas Deitmerg wrote:
Hello everyone,

I've recently experienced some unexpected behavior when working with
PMTs in messages and tags. Although I have already figured out how to
avoid this issue, I'd like to know whether it's a systematic error or
just a misunderstanding on my part.

The problem arises when accessing 'old' PMTs. That is PMTs, that were
handed over to python from the C++ domain in the past, i.e. through a
message handling callback. It appears the PMTs are only valid throughout
the duration of the function they were handed to.

To illustrate the problem I have attached some python code which will
reliably crash with a segmentation fault.


Here's my current understanding of what's happening:

1. The block's thread sees a message that needs to be processed.

2. It dispatches the message (packed as pmt::pmt_t) to the callback
function. Through Swig. I assume the reference counting of the pmt
object is lost here.

3. The python function works on the data, e.g. saves it for later use.

4. Control returns to the C++ side, the pmt object goes out of scope and
is freed.

5. Some other python code tries to access the pmt object and a segfault
occurs.


Is this roughly correct? If so, is there a way to solve this nicely?
It's obviously possible to unpack the pmt object in step 3 and save the
contained data for later use. But I'm probably not the last one to get
bitten by this, and it's not exactly fun to debug.

My setup consists of gnuradio 3.7.9.2, swig 3.0.10 and python 2.7.11
running on Arch Linux, kernel 4.6.3, 64 bit.

Thanks in advance
Jonas


_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio


reply via email to

[Prev in Thread] Current Thread [Next in Thread]