toon-members
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Toon-members] PyCPP: interchanging TooN, CVD, and STL objects between P


From: Damian Eads
Subject: [Toon-members] PyCPP: interchanging TooN, CVD, and STL objects between Python and C++
Date: Mon, 13 Jul 2009 04:04:42 -0600

Hi Ed (and other people who might be interested),

Late last week, I started working on a new template library for
interchanging TooN, LIBCVD, and STL objects between C++ and Python. It
makes ample use of template-based pattern matching. You can see a
draft of the library here on my private SVN server,

    https://svn.eadsware.com/pycpp

There may be some template style issues with which you may take issue,
however, I think it is a great start, much better than our earlier
interfaces. The most basic class is the Converter<X,Y> class. It
contains functions for converting from an X object to a Y object. An
individual instance recursively instantiates another Converter class
in order to completely convert all branches and leaves of the
structure of an instance of X into an instance of Y, and vice versa.

The template function convert<X, Y> function simply calls Converter<X,
Y>::convert(x, y), and if you're lucky, you don't even need the
template arguments, the C++ pattern matching facility will figure that
out for you. When converting from a C++ object to a Python object, the
structure of the conversion is enforced at compile time, so rarely
will a run-time error occur.

    PyDictObject *myPy;
    map<string, vector<float> > m;
    vector<float> v;
    v.push_back(4.0);
    m['test'] = v;
    convert(m, myPy);
    return myPy; /*** return the last object */

In the example above, a converter function is built recursively at
compile time (nice!). The structure of the C++ to Python object is
also enforced at compile time (nice!). The STL map of vectors gets
converted to a Python dictionary object mapping Python strings to
Python lists of floats.

When converting from Python to C++, there is little structure in the
Python object that can be enforced at compile time. With templates, we
can enforce the *expected* structure of the C++ object but most of the
checks of a valid match have to be done at run time (no free lunch
again, such is life) unlike conversion in the opposite direction.
Consider the following example of converting from a Python object to a
C++ object.

          // pyDict is a PyDictObject*
          map<string, vector<float> > m;
          convert(pyDict, m);

We can guarantee the outer-most Python object is a Python dictionary
but any substructure inside isn't known until it is examined at run
time. Fatal mismatches are handled by throwing a run-time C++
exception. However, the conversion functions can properly resolve some
kinds of mismatch. For example, if there is a Python string where a
float is expected in a Python list, the Converter will attempt to
convert the string to a float.

In many cases, convert(x, y) is more preferable than y=convert(x) when
converting from Python to C++. Why? The y=convert(x) form is
return-by-value and there is a run-time overhead of an extra copy of
the return value of x in addition to the conversion. To avoid this
extra copy, a better approach is to create an empty skeleton y and
pass it by reference using the convert(x, y) form. All Converter
classes keep to the 2-arity form so that any arbitrary structure can
be recursively matched. But, we will see in a moment why it is not a
universal solution.

In some cases y=convert(x) is the only appropriate form and has
significantly different semantics from convert(x, y). Consider the
following example, which tries to copy a NumPy array's buffer pointer
into a TooN::Vector<*, *, Reference>'s buffer but doesn't.

    /** Let's think we're being clever and use 0 as a sentinel. **/
    TooN::Vector<TooN::Dynamic> x((double*)0, len(myPyArray));
    convert(myPyArray, x);

The TooN buffer is immutable (i.e. Precision const *) so passing an
uninitialized TooN::Reference array as y to convert(x, y) does not
produce desired behavior. Invoking Vector::operator= in the conversion
function passing a new Vector<*, *, Reference> of the actual NumPy
array buffer causes a core dump,

        /// operator = from copy
        inline Vector& operator= (const Vector& from){
                SizeMismatch<Size,Size>::test(size(), from.size());
                const int s=size();
                for(int i=0; i<s; i++){
                        (*this)[i]=from[i];
                }
                return *this;
        }

Although this behavior was unexpected at first, it is the desired
behavior. The buffer should remain immutable after construction. It
occurred to me that one might want to copy the contents of a NumPy
array into a Vector<*,*,Reference> object that has a meaningful buffer
already associated with it. Therefore, convert(x, y) can't perform the
full space of conversions like I had originally hoped. y=convert(x)
seems like the only form that makes sense for copying NumPy array
buffer pointers (but not the content pointed by it).

     /** The correct way to copy the NumPy buffer pointer. */
    TooN::Vector<TooN::Dynamic> x(convert(myPyArray));

Thus, in the context of converting a NumPy array to a
Vector/Matrix<..., Reference>, the two forms have two separate uses:

     1. convert(x, y):    copy the contents of the NumPy array x into
an already meaningful buffer stored in the Vector/Matrix<...,
Reference> object y.
     2. y=convert(x):    copy the NumPy buffer pointer in x into a new
Vector/Matrix<..., Reference> object y, which gets returned and then
copied (but that's not a problem because only a pointer gets copied).

The convert(x,y) and y=convert(x) functions are defined as template functions.

  template <typename From, typename To>
  void convert(const From &from, To &to) {
    Converter<From, To>::convert(from, to);
  }

  template <typename From, typename To>
  To convert_copy(const From &from) {
    return Converter<From, To>::convert(from);
  }

The compiler has no difficulty with any of my examples when I omit the
template arguments for convert(x, y). However, the compiler cannot
match the return type for the second form, which should be possible
for a language from what I know about type systems, but may not be
possible for C++. Any ideas on how to get around this?

The PyCPP library is split into four different headers, each of which
can be optionally included.

     * PyCPP/PyTooN.hpp: for converting between NumPy arrays and TooN
matrices/vectors.
     * PyCPP/PySTL.hpp: for converting between arbitrary STL
structures and Python objects.
     * PyCPP/PyCVD.hpp: for converting between NumPy arrays and
CVD::Image arrays.
     * PyCPP/PyC.hpp: for converting between NumPy arrays and C arrays.

The compiler can convert combinations of STL, CVD, and TooN structures
when their respective headers are included together. For example, one
could convert a Python dictionary mapping strings to NumPy arrays to
an STL map mapping std::string objects to CVD::BasicImage objects
(without copying the image buffers, just their buffer pointers!)

Also note that when a NumPy array buffer is copied into a TooN or CVD
data structure, the buffer can be discontiguous. The buffer must only
be contiguous when no copy is performed, i.e. the buffer pointer is
copied. In the future, TooN could support more complicated striding
like what's supported in NumPy but this should do for now.

Being less than a week old, the library is still very, very alpha, and
should not be used for production or research code yet. Please let me
know what you think.

Thanks!

Cheers,

Damian

PS: As an optimization, should Vector::operator= only perform the deep
copy when (this==&from) is true?



-----------------------------------------------------
Damian Eads                           Ph.D. Candidate
University of California             Computer Science
1156 High Street         Machine Learning Lab, E2-489
Santa Cruz, CA 95064    http://www.soe.ucsc.edu/~eads




reply via email to

[Prev in Thread] Current Thread [Next in Thread]