Code Example of Power/Performance Optimization on Android Using Intel Intrinsics

Code Example of Power/Performance Optimization on Android  Using Intel Intrinsics

Introduction

It goes without saying that battery life, especially of mobile devices, is critically important for users. We’ve all been in situations where we lose power right when we need it the most—navigating a new city, mid-conversation on an important call, and so on. It may not be completely intuitive, but by optimizing application performance, developers reduce power consumption and that helps users.

Analyzing Apps with a Combination of Intel® Graphics Performance Analyzers  + VTune™ Amplifier

What is the first step to improve the power/performance of your application? First, you have to understand whether your app is CPU or GPU bound. And you can do it using a combination of Intel® tools:

Intel® Graphics Performance Analyzers or GPA is a tool for graphics analysis and optimization of Microsoft DirectX* applications and Android* OpenGL ES* applications. You can find more about it here:https://software.intel.com/en-us/articles/gpa-which-version

For purposes of Android optimization I prefer the GPA console client. You can read about it here:https://software.intel.com/en-us/android/articles/using-intel-graphics-performance-analyzers-console-client-for-android-application
VTune™ Amplifier helps you analyze the algorithm choices and identify where and how your application can benefit from available hardware resources. Use VTune Amplifier to locate or determine the following:

  • The most time-consuming functions (hotspots) in your application and/or on the whole system
  • Sections of code that do not effectively utilize available processor time
  • The best sections of code to optimize for sequential performance and for threaded performance
  • Synchronization objects that affect the application performance
  • Whether, where, and how your application spends time on input/output operations
  • The performance impact of different synchronization methods, different numbers of threads, or different algorithms
  • Thread activity and transitions
  • Hardware-related bottlenecks in your code

Configure the data collection on the host system (Linux*, OS X*, or Windows*) and run the analysis on a remote system (Linux or Android). Remote analysis on Android and embedded Linux systems is supported by the VTune Amplifier for systems only.

You can read more here: https://software.intel.com/en-us/node/496918
The figure below shows how to use a combination of GPA and VTune Amplifier to analyze and optimize your application.

What are Intel® Intrinsics

Intel® intrinsics are assembly-coded functions that allow you to use C/C++ function calls and variables instead of assembly instructions. Intrinsics provide access to instructions that cannot be generated using the standard constructs of the C and C++ languages.

Intrinsics are expanded inline, eliminating function call overhead. Providing the same benefit as using inline assembly, intrinsics improve code readability, assist instruction scheduling, and help reduce debugging.

You can read more here: https://software.intel.com/en-us/node/523351
How to find and connect Intel® C++ Compiler for Android* OS to your project?
Intel® C++ Compiler for Android* OS is included in Intel® INDE suite. Inte®l C++ Compiler for Android* integrates in Android NDK and provides an optimized alternative to compile x86 libraries.

Download and install Intel C++ Compiler for Android. Provide a path to NDK directory during the installation to integrate Intel C++ Compiler for Android into Android NDK.

After the successful installation, the Intel® C++ Compiler for Android will be automatically integrated into the Android NDK toolchain and will compile optimized libraries for x86 architecture.

Example

To demonstrate the usage of Intel intrinsics, let’s look at the C++ code:
Float x = 1.0f / sqrtf( y );
This type of code (especially in physics algorithms) often takes place in hotspots.
By analyzing this string in the VTune Amplifier, the profile  will show you that the compiler generates sqrt + div instead of rsqrt.
The way to fix it is using Intel intrinsics:
Float x = rsqrt( y );
Where rsqrt is:
#include 

         …

         inline float rsqrt(const float x)
         {
             float r;
             _mm_store_ss(&r, _mm_rsqrt_ss( _mm_load_ss(&x)));
             return r;
         }

For more such Android resources and tools from Intel, please visit the Intel® Developer Zone

Source: https://software.intel.com/en-us/articles/code-example-of-powerperformance-optimization-on-android-using-intel-intrinsics

Digit.in
Logo
Digit.in
Logo