Ok, very nice. I profiled with OProfile (
) and sorted the function calls demanding more CPU for a particular script of mine:
00546b60 99 0.0666 liboctave.so.1.0.1 xgemm(Matrix const&, Matrix const&, blas_trans_type, blas_trans_type)
0053c190 94 0.0633 liboctave.so.1.0.1 Matrix::is_symmetric() const
004d92a0 92 0.0619 liboctave.so.1.0.1 ComplexNDArray::all_elements_are_real() const
00bc6380 9 0.0061 liboctinterp.so.1.0.1 symbol_table::fcn_info::fcn_info_rep::find_user_function()
00b8c150 9 0.0061 liboctinterp.so.1.0.1 std::_Rb_tree<std::string, std::pair<std::string const, std::list<load_path::file_info, std::allocator<load_path::file_info> > >, std::_Select1st<std::pair<std::string const, std::list<load_path::file_info, std::allocator<load_path::file_info> > > >, std::less<std::string>, std::allocator<std::pair<std::string const, std::list<load_path::file_info, std::allocator<load_path::file_info> > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::string const, std::list<load_path::file_info, std::allocator<load_path::file_info> > > >*)
004a7770 9 0.0061 liboctinterp.so.1.0.1 std::_Deque_base<action_container::elem*, std::allocator<action_container::elem*> >::_Deque_base(std::_Deque_base<action_container::elem*, std::allocator<action_container::elem*> >&&)
009489c0 9 0.0061 liboctinterp.so.1.0.1 octave_value_list::make_storable_values()
004a9350 9 0.0061 liboctinterp.so.1.0.1 octave_value_list::elem(int)
00483cdd 9 0.0061 liboctinterp.so.1.0.1 __gnu_cxx::__exchange_and_add_dispatch(int*, int) [clone .constprop.313]
00773ff0 9 0.0061 liboctinterp.so.1.0.1 Array<octave_value>::operator=(Array<octave_value> const&)
...
...
...
As we can see, xgemm() is on the top. I see lots of good candidates for OpenMP parallel regions in the array classes, bear with me. ;)
If you have some heavy scripts, i would like to profile them, please send to this thread.
Júlio.