discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Work around for gr.multiply_cc( ) for non NEON en


From: Nick Foster
Subject: Re: [Discuss-gnuradio] Work around for gr.multiply_cc( ) for non NEON enabled ARM devices
Date: Tue, 19 Feb 2013 11:07:08 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2

I'm a little confused here. The RPi has no NEON unit. So Volk, which is designed to accelerate common functions for vector processing units like NEON, won't actually do anything on RPi; every kernel executed will be the unaccelerated generic version. I understand the motivation behind Tom's earlier suggestion to fix Volk on non-NEON ARM devices by forcing all kernels to "generic" in volk_config, but I don't understand the reason to attempt to get volk_profile to run on a device which doesn't support anything other than the generic kernels.

That said, Volk includes a routine to detect NEON support on ARM platforms (volk/tmpl/volk_cpu.tmpl.c:static int has_neon(void)), and should correctly refuse to run NEON kernels on non-NEON platforms, although I only have one report from a user who tested the negative case (no NEON) successfully. So all this is to say I have no idea why volk_profile is hanging, but it's pointless to run it on your platform anyway. If you really still want to run volk_profile, try turning down the number of iterations of each routine to 1 or something and see if it completes for you.

--n

On 02/19/2013 09:14 AM, Karl Petrow wrote:

I recompiled the program with commenting out all non-multiply kernels(see attached).  Still pegged out for 2 hours and still going.  Do you know specifically which kernel relates to the multiply_cc function? 

 

Thanks again,

 

kyp

 

From: address@hidden [mailto:address@hidden] On Behalf Of Tom Rondeau
Sent: Friday, February 15, 2013 1:09 PM
To: Karl Petrow
Cc: address@hidden
Subject: Re: [Discuss-gnuradio] Work around for gr.multiply_cc( ) for non NEON enabled ARM devices

 

On Fri, Feb 15, 2013 at 12:21 PM, Karl Petrow <address@hidden> wrote:

The Pi is having some problems running:

sudo volk_profile

 

It pegs out at 100% and I just cut it at 24 hours.  I am going to reboot and run it on terminal only with it overclocked for the weekend, but do you have any other suggestions?  I was really excited about Volk after reading up on it, but unfortunately it seems to be too much for the RPi.

 

Thanks

 

karl

 

You can edit volk/apps/volk_profile.cc and change the the number of samples and iterations it runs for each kernel. The function call is (from volk/lib/qa_utils.h):

 

#define VOLK_PROFILE(func, tol, scalar, len, iter, results)

 

So len and iter can be decreased for your processor's needs.

 

Tom

 

 

From: address@hidden [mailto:address@hidden] On Behalf Of Tom Rondeau
Sent: Thursday, February 14, 2013 10:42 AM
To: Karl Petrow
Cc: address@hidden
Subject: Re: [Discuss-gnuradio] Work around for gr.multiply_cc( ) for non NEON enabled ARM devices

 

On Thu, Feb 14, 2013 at 10:08 AM, Karl Petrow <address@hidden> wrote:

Has anyone developed a work around for replacing gr.multiple_cc() to work on ARM devices that have to disable NEON in order to install?  I have two lines in gr.ais that keep giving me overflow.

 

Thanks ahead of time,

 

karl

 

You can turn it off. Have you run 'volk_profile' on the system, yet? If so, that would have generated a ~/.volk/volk_config. You can edit that file by hand to specify the architecture you want as 'generic' for whatever kernel is causing you trouble. This just runs a standard C for loop with no SIMD instructions (except whatever the compiler tries to do).

 

Tom

 

 


volk_profile.cc

#include "qa_utils.h"
extern "C" {
#include <volk/volk.h>
#include <volk/volk_prefs.h>
}
#include <vector>
#include <boost/foreach.hpp>
#include <iostream>
#include <fstream>
#include <sys/stat.h>
#include <sys/types.h>

int main(int argc, char *argv[]) {

    std::vector<std::string> results;

    //VOLK_PROFILE(volk_16i_x5_add_quad_16i_x4_a, 1e-4, 2046, 10000, &results);
    //VOLK_PROFILE(volk_16i_branch_4_state_8_a, 1e-4, 2046, 10000, &results);
   // VOLK_PUPPET_PROFILE(volk_32fc_s32fc_rotatorpuppet_32fc_a, volk_32fc_s32fc_x2_rotator_32fc_a, 1e-2, (lv_32fc_t)lv_cmake(.95393, .3), 20460, 10000, &results);
  //  VOLK_PROFILE(volk_16ic_s32f_deinterleave_real_32f_a, 1e-5, 32768.0, 204600, 10000, &results);
  //  VOLK_PROFILE(volk_16ic_deinterleave_real_8i_a, 0, 0, 204600, 10000, &results);
  //  VOLK_PROFILE(volk_16ic_deinterleave_16i_x2_a, 0, 0, 204600, 10000, &results);
  //  VOLK_PROFILE(volk_16ic_s32f_deinterleave_32f_x2_a, 1e-4, 32768.0, 204600, 1000, &results);
  //  VOLK_PROFILE(volk_16ic_deinterleave_real_16i_a, 0, 0, 204600, 10000, &results);
 //   VOLK_PROFILE(volk_16ic_magnitude_16i_a, 1, 0, 204600, 100, &results);
  //  VOLK_PROFILE(volk_16ic_s32f_magnitude_32f_a, 1e-5, 32768.0, 204600, 1000, &results);
   // VOLK_PROFILE(volk_16i_s32f_convert_32f_a, 1e-4, 32768.0, 204600, 10000, &results);
   // VOLK_PROFILE(volk_16i_s32f_convert_32f_u, 1e-4, 32768.0, 204600, 10000, &results);
   // VOLK_PROFILE(volk_16i_convert_8i_a, 0, 0, 204600, 10000, &results);
  //  VOLK_PROFILE(volk_16i_convert_8i_u, 0, 0, 204600, 10000, &results);
    //VOLK_PROFILE(volk_16i_max_star_16i_a, 0, 0, 204600, 10000, &results);
    //VOLK_PROFILE(volk_16i_max_star_horizontal_16i_a, 0, 0, 204600, 10000, &results);
    //VOLK_PROFILE(volk_16i_permute_and_scalar_add_a, 1e-4, 0, 2046, 10000, &results);
    //VOLK_PROFILE(volk_16i_x4_quad_max_star_16i_a, 1e-4, 0, 2046, 10000, &results);
   // VOLK_PROFILE(volk_16u_byteswap_a, 0, 0, 204600, 10000, &results);
  //  VOLK_PROFILE(volk_16u_byteswap_u, 0, 0, 204600, 10000, &results);
 //   VOLK_PROFILE(volk_16i_32fc_dot_prod_32fc_a, 1e-4, 0, 204600, 10000, &results);
 //   VOLK_PROFILE(volk_32f_accumulator_s32f_a, 1e-4, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_x2_add_32f_a, 1e-4, 0, 204600, 10000, &results);
 //   VOLK_PROFILE(volk_32f_x2_add_32f_u, 1e-4, 0, 204600, 10000, &results);
    VOLK_PROFILE(volk_32fc_32f_multiply_32fc_a, 1e-4, 0, 204600, 1000, &results);
 //   VOLK_PROFILE(volk_32fc_s32f_power_32fc_a, 1e-4, 0, 204600, 50, &results);
 //   VOLK_PROFILE(volk_32f_s32f_calc_spectral_noise_floor_32f_a, 1e-4, 20.0, 204600, 1000, &results);
 //   VOLK_PROFILE(volk_32fc_s32f_atan2_32f_a, 1e-4, 10.0, 204600, 100, &results);
    //VOLK_PROFILE(volk_32fc_x2_conjugate_dot_prod_32fc_a, 1e-4, 0, 2046, 10000, &results);
//    VOLK_PROFILE(volk_32fc_x2_conjugate_dot_prod_32fc_u, 1e-4, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32fc_deinterleave_32f_x2_a, 1e-4, 0, 204600, 1000, &results);
 //   VOLK_PROFILE(volk_32fc_deinterleave_64f_x2_a, 1e-4, 0, 204600, 1000, &results);
 //   VOLK_PROFILE(volk_32fc_s32f_deinterleave_real_16i_a, 0, 32768, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32fc_deinterleave_imag_32f_a, 1e-4, 0, 204600, 5000, &results);
//    VOLK_PROFILE(volk_32fc_deinterleave_real_32f_a, 1e-4, 0, 204600, 5000, &results);
//    VOLK_PROFILE(volk_32fc_deinterleave_real_64f_a, 1e-4, 0, 204600, 1000, &results);
//    VOLK_PROFILE(volk_32fc_x2_dot_prod_32fc_a, 1e-4, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32fc_32f_dot_prod_32fc_a, 1e-4, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32fc_index_max_16u_a, 3, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32fc_s32f_magnitude_16i_a, 1, 32768, 204600, 100, &results);
//    VOLK_PROFILE(volk_32fc_magnitude_32f_a, 1e-4, 0, 204600, 1000, &results);
//    VOLK_PROFILE(volk_32fc_magnitude_32f_u, 1e-4, 0, 204600, 1000, &results);
//    VOLK_PROFILE(volk_32fc_magnitude_squared_32f_a, 1e-4, 0, 204600, 1000, &results);
//    VOLK_PROFILE(volk_32fc_magnitude_squared_32f_u, 1e-4, 0, 204600, 1000, &results);
    VOLK_PROFILE(volk_32fc_x2_multiply_32fc_a, 1e-4, 0, 204600, 1000, &results);
    VOLK_PROFILE(volk_32fc_x2_multiply_32fc_u, 1e-4, 0, 204600, 1000, &results);
    VOLK_PROFILE(volk_32fc_x2_multiply_conjugate_32fc_a, 1e-4, 0, 204600, 1000, &results);
    VOLK_PROFILE(volk_32fc_x2_multiply_conjugate_32fc_u, 1e-4, 0, 204600, 1000, &results);
 //   VOLK_PROFILE(volk_32fc_conjugate_32fc_a, 1e-4, 0, 204600, 1000, &results);
//    VOLK_PROFILE(volk_32fc_conjugate_32fc_u, 1e-4, 0, 204600, 1000, &results);
//    VOLK_PROFILE(volk_32f_s32f_convert_16i_a, 1, 32768, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_s32f_convert_16i_u, 1, 32768, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_s32f_convert_32i_a, 1, 2<<31, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_s32f_convert_32i_u, 1, 2<<31, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_convert_64f_a, 1e-4, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_convert_64f_u, 1e-4, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_s32f_convert_8i_a, 1, 128, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_s32f_convert_8i_u, 1, 128, 204600, 10000, &results);
    //VOLK_PROFILE(volk_32fc_s32f_x2_power_spectral_density_32f_a, 1e-4, 2046, 10000, &results);
//    VOLK_PROFILE(volk_32fc_s32f_power_spectrum_32f_a, 1e-4, 0, 20460, 100, &results);
//    VOLK_PROFILE(volk_32fc_x2_square_dist_32f_a, 1e-4, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32fc_x2_s32f_square_dist_scalar_mult_32f_a, 1e-4, 10, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_x2_divide_32f_a, 1e-4, 0, 204600, 2000, &results);
//    VOLK_PROFILE(volk_32f_x2_dot_prod_32f_a, 1e-4, 0, 204600, 5000, &results);
//    VOLK_PROFILE(volk_32f_x2_dot_prod_32f_u, 1e-4, 0, 204600, 5000, &results);
 //   VOLK_PROFILE(volk_32f_x2_dot_prod_16i_a, 1e-4, 0, 204600, 5000, &results);
    //VOLK_PROFILE(volk_32f_s32f_32f_fm_detect_32f_a, 1e-4, 2046, 10000, &results);
 //   VOLK_PROFILE(volk_32f_index_max_16u_a, 3, 0, 204600, 5000, &results);
//    VOLK_PROFILE(volk_32f_x2_s32f_interleave_16ic_a, 1, 32768, 204600, 3000, &results);
//    VOLK_PROFILE(volk_32f_x2_interleave_32fc_a, 0, 0, 204600, 5000, &results);
//    VOLK_PROFILE(volk_32f_x2_max_32f_a, 1e-4, 0, 204600, 2000, &results);
//    VOLK_PROFILE(volk_32f_x2_min_32f_a, 1e-4, 0, 204600, 2000, &results);
    VOLK_PROFILE(volk_32f_x2_multiply_32f_a, 1e-4, 0, 204600, 10000, &results);
    VOLK_PROFILE(volk_32f_x2_multiply_32f_u, 1e-4, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_s32f_normalize_a, 1e-4, 100, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32f_s32f_power_32f_a, 1e-4, 4, 204600, 100, &results);
//    VOLK_PROFILE(volk_32f_sqrt_32f_a, 1e-4, 0, 204600, 100, &results);
//    VOLK_PROFILE(volk_32f_s32f_stddev_32f_a, 1e-4, 100, 204600, 3000, &results);
//    VOLK_PROFILE(volk_32f_stddev_and_mean_32f_x2_a, 1e-4, 0, 204600, 3000, &results);
//    VOLK_PROFILE(volk_32f_x2_subtract_32f_a, 1e-4, 0, 204600, 5000, &results);
//    VOLK_PROFILE(volk_32f_x3_sum_of_poly_32f_a, 1e-4, 0, 204600, 5000, &results);
 //   VOLK_PROFILE(volk_32i_x2_and_32i_a, 0, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_32i_s32f_convert_32f_a, 1e-4, 100, 204600, 10000, &results);
 //   VOLK_PROFILE(volk_32i_s32f_convert_32f_u, 1e-4, 100, 204600, 10000, &results);
 //   VOLK_PROFILE(volk_32i_x2_or_32i_a, 0, 0, 204600, 10000, &results);
 //   VOLK_PROFILE(volk_32u_byteswap_a, 0, 0, 204600, 2000, &results);
    //VOLK_PROFILE(volk_32u_popcnt_a, 0, 0, 2046, 10000, &results);
//    VOLK_PROFILE(volk_64f_convert_32f_a, 1e-4, 0, 204600, 10000, &results);
//    VOLK_PROFILE(volk_64f_convert_32f_u, 1e-4, 0, 204600, 10000, &results);
 //   VOLK_PROFILE(volk_64f_x2_max_64f_a, 1e-4, 0, 204600, 1000, &results);
 //   VOLK_PROFILE(volk_64f_x2_min_64f_a, 1e-4, 0, 204600, 1000, &results);
 //   VOLK_PROFILE(volk_64u_byteswap_a, 0, 0, 204600, 1000, &results);
    //VOLK_PROFILE(volk_64u_popcnt_a, 0, 0, 2046, 10000, &results);
 //   VOLK_PROFILE(volk_8ic_deinterleave_16i_x2_a, 0, 0, 204600, 3000, &results);
 //   VOLK_PROFILE(volk_8ic_s32f_deinterleave_32f_x2_a, 1e-4, 100, 204600, 3000, &results);
 //   VOLK_PROFILE(volk_8ic_deinterleave_real_16i_a, 0, 256, 204600, 3000, &results);
 //   VOLK_PROFILE(volk_8ic_s32f_deinterleave_real_32f_a, 1e-4, 100, 204600, 3000, &results);
 //   VOLK_PROFILE(volk_8ic_deinterleave_real_8i_a, 0, 0, 204600, 10000, &results);
    VOLK_PROFILE(volk_8ic_x2_multiply_conjugate_16ic_a, 0, 0, 204600, 400, &results);
    VOLK_PROFILE(volk_8ic_x2_s32f_multiply_conjugate_32fc_a, 1e-4, 100, 204600, 400, &results);
//    VOLK_PROFILE(volk_8i_convert_16i_a, 0, 0, 204600, 20000, &results);
//    VOLK_PROFILE(volk_8i_convert_16i_u, 0, 0, 204600, 2000, &results);
//    VOLK_PROFILE(volk_8i_s32f_convert_32f_a, 1e-4, 100, 204600, 2000, &results);
//    VOLK_PROFILE(volk_8i_s32f_convert_32f_u, 1e-4, 100, 204600, 2000, &results);
    //VOLK_PROFILE(volk_32fc_s32fc_multiply_32fc_a, 1e-4, lv_32fc_t(1.0, 0.5), 204600, 1000, &results);
    VOLK_PROFILE(volk_32fc_s32fc_multiply_32fc_u, 1e-4, 0, 204600, 1000, &results);
    VOLK_PROFILE(volk_32f_s32f_multiply_32f_a, 1e-4, 1.0, 204600, 10000, &results);
    VOLK_PROFILE(volk_32f_s32f_multiply_32f_u, 1e-4, 0, 204600, 1000, &results);


    char path[256];
    get_config_path(path);
    std::string config_path(path);
    std::ofstream config;
    std::cout << "filename: " << config_path << std::endl;
    config.open(config_path.c_str());
    if(!config.is_open()) { //either we don't have write access or we don't have the dir yet
        std::string dir(getenv("HOME"));
        dir += "/.volk";
        if(mkdir(dir.c_str(), 0777) == -1) {
            std::cout << "Error creating directory " << dir << std::endl;
            return -1;
        }
        config.open(config_path.c_str());
        if(!config.is_open()) {
            std::cout << "Error opening file " << config_path << std::endl;
            return -1;
        }
    }

    config << "\
#this file is generated by volk_profile.\n\
#the function name is followed by the preferred architecture.\n\
";

    BOOST_FOREACH(std::string result, results) {
        config << result << std::endl;
    }
    config.close();
}


_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio


reply via email to

[Prev in Thread] Current Thread [Next in Thread]