Low-power real-time image processing on the edge with Intel® Movidius™ Neural Compute Stick

Updated on 21-Dec-2018

HIGHLIGHTS

A primer on how you can use the Intel Movidius Neural Compute Stick to run an image processing project on the edge

Deep learning has gone deep, penetrating the very fibre of tasks like speech recognition, image processing and machine translation. With the significantly heavy matrix operations involved, there has been a surge in demand for GPUs (graphical processing units) and TPUs (tensor processing units) that are capable of accelerating these computations. But never before has there been a smart, sleek, one-stop-shop providing both hardware and software tools for deep learning applications.

What is Intel^® Movidius^™ ?

Enter the Intel^® Movidius™ Neural Compute Stick (NCS) – an ambitious, potentially game-changing offering by Intel. A completely offline device the size of a thumb drive, the Intel^® Movidius™ NCS has brought deep-learning to applications on the edge of the Internet-of-Things. Its high energy efficiency is a neat plus that makes it a viable companion to controller boards such as the Aaeon UP2. Bonus feature: it has no fan! The NCS 2, launched recently, features Intel’s latest VPU, the Myriad™ X. This nifty chip boasts eight times the performance boost compared to the previous generation of NCS – thank Intel’s upgrades of more cores and a new hardware accelerator.

The Intel^® Neural Compute Stick 2 is miles ahead of its previous version.

As a huge favour to the aspiring developer, the Intel^® Movidius™ NCS comes with a full-fledged Software Development Kit (SDK) called OpenVINO to facilitate rapid prototyping, testing, validation and deployment of neural network architectures. The Intel^® Movidius™ NCS basically serves as a high-performance USB-connected GPU for offline deep learning. Do note, however, that the models on it must be trained in conjunction with another processor – the Intel^® Movidius™ VPU (Vision Processing Unit) only serves to compute in real-time. In short, the NCS is an inference-time coprocessor – not a training processor.

Here is an overview of the compatibilities of the Intel^® Movidius™:

It can be used with Ubuntu 16.04, CentOS 7.4, Yocto, Windows 10 and Linux for FPGA.
It supports TensorFlow, Caffe, MxNet, Kaldi as well as ONNX for Deep Neural Networks.
Intel^® Movidius™ allows the running of deep learning models like SqueezeNet and Yann LeCun’s GoogLeNet even on a machine with low processing capability.

The Intel^® Movidius™ Visual Processing Unit (VPU) in fact provides real-time visual computing capabilities to a number of battery-powered consumer and industrial edge devices like Google Clips, DJI Spark drone, Motorola 360 camera, HuaRay industrial smart cameras, and many more.

TensorFlow? Caffe? What?

TensorFlow is an open source library that allows heavy matrix computations to be represented as a ‘computational graph’, where nodes in the graph represent mathematical operations, and the graph edges represent multidimensional arrays communicated between them. TensorFlow is the natural computational representation of the matrix algebra, back-propagation and other methods essential for neural networks – and is one of the industry standards in deep learning today. Written in C++, Python and CUDA, TensorFlow boasts compatibility with not just the top 3 operating systems (Windows, Mac, Ubuntu) but also with Android and JavaScript. No surprise, as it’s a Google Brain product.

Caffe, on the other hand, is a similar deep learning framework written in C++, with a Python interface. Although notoriously hard to install, Caffe is one of the best libraries for image processing today. After all, it was developed at Berkeley, one of the top schools for computer vision.

Intimidating as these may sound, one does not need to dive into the depths of hardware acceleration, tensor algebra, or back-propagation in order to get a working model off the ground. In fact, it’s very likely that freely available open source neural network code could be directly adapted to a number of interesting applications.

Why Intel^® Movidius™ and not a TPU or GPU?

This seems to be a common misconception among those first introduced to the Intel^® Movidius™. A TPU or GPU is a processing unit that can perform the heavy linear algebraic operations required to train a deep neural network – at pretty high speeds. The Intel^® Movidius™ is not a processor, but is a coprocessor. So, it will not train a network for you – that’s not what it’s selling at all!

Rather, the Intel^® Movidius™ is a low-powered, portable device that allows you to run inference on pre-trained neural networks at high speeds. How are these different, you ask? It’s actually pretty straightforward.

Any deep neural network accepts its input (be it an image, audio file, or anything else) as a collection of numbers. These could be pixel values, audio frequencies, or anything else. These numbers are combined together in various ways by the different ‘layers’ of the network to spit out a new number or collection of numbers. This result actually indicates to us useful information! For example, we could have a code where 1, 2 and 3 mean ‘cat’, ‘dog’, or ‘unidentified’ respectively. If the neural net outputs 1 on an image, it thinks the image is a cat.

We usually start with a ‘randomly initialised’ network – this will probably spit out random outputs for each input. In order to ‘train’ the network, we use cool algorithms like back-propagation to use a combination of mathematics and trial and error to adjust these weights so that the predictions are more and more accurate. This process of training involves tons of linear algebra and calculus. GPUs and TPUs are significantly faster than the standard CPU at these calculations, and this is precisely why they’re used for training.

Now, after we have a well-trained network, all we have to do is push new inputs into the existing network. If we have given it enough data to learn on, it will probably do a good job of identifying cats and dogs, or even millions of other objects. Tricks like convolutional neural networks have made deep learning ridiculously good at such tasks. While this ‘inference’ is not as computationally demanding as ‘training’ the network, it still requires a fairly powerful processor and is hard to do on portable devices.

What the Intel^® Movidius™ NCS offers is a well-designed, energy efficient coprocessor specifically for pushing new inputs through these pre-trained networks. Aside from the portability, the NCS is available in multiple variants based on core-counts. So you can get a 1-core, 2-core, 4-core or even 8-core NCS and provide more compute power for your project. Intel is also working with ODMs to bring out variants with even greater core counts. The low power consumption means that it can run for longer, and work harder. And the energy efficiency means that it can even process large amounts of data per second at relatively low energy cost – a boon for portable applications!

The Intel^® Movidius™ also scales well with your throughput. More than one Intel^® Movidius™ NCS can easily be connected to a device, scaling up the rate at which you can make inferences. Intel^® Movidius™ offers good support for interfacing more than one NCS, with straightforward commands to connect to, use, and troubleshoot individual hardware images.

Playing around with Intel^® Movidius™

It might be useful for any beginner to understand the Intel^® Movidius™ workflow. Crucially, only a pre-trained model can be loaded onto the NCS. After that, the NCS can be used in conjunction with a computer for profiling, tuning and compiling (if you’re using those nifty C++ codes). Finally, the NCS can be deployed using a computer or Aaeon UP2 board or other USB enabled microprocessor for on-the-go inference.

The Intel^® Movidius™ workflow, culminating in an inference model that can be run using either a computer or a microprocessor.

So how exactly does can you load a DL Model onto the Intel^® Movidius™ stick?

How do I get started?

Step 1: Install the SDK

Once you have the Intel^® Movidius™ Neural Compute Stick, you should set up your SDK.

Pre-requisites:

An x86_64 computer running Ubuntu 16.04 (instructions may vary for other operating systems)
An Intel® Neural Compute Stick 2
An internet connection

Go ahead and download the Intel Distribution of the OpenVINO toolkit. Then, open your Terminal and navigate to the Downloads folder. Run the following commands in your Terminal to extract the tarball, download the dependencies and install the toolkit:

Do note that for users that like to use Python virtual environments, the toolkit has a configuration file that can be changed prior to installation. This will allow you to install in a virtual environment. For OpenVINO, it is definitely recommended that you use a virtual environment.

```tar xvf l_openvino_toolkit_<VERSION>.tgzcd l_openvino_toolkit_<VERSION>./install_cv_sdk_dependencies.sh./install_GUI.sh```

While you’re free to customize your installation, these are the recommended settings for standard functionality.

Step 2: Get the drivers you need
In order for the NCS to communicate with your computer, you may need to modify the udev rules. Run the following commands in your Terminal to do this (don’t forget to navigate to the Downloads folder):

```cat <<EOF > 97-usbboot.rulesSUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"EOF```

This creates a file called ‘97-usbboot.rules’ containing – you guessed it – rules for booting the NCS via USB. Then, run the following to load these rules:

```sudo cp 97-usbboot.rules /etc/udev/rules.d/sudo udevadm control --reload-rulessudo udevadm triggersudo ldconfigrm 97-usbboot.rules```

Step 3: Test your installation

Plus the NCS into your computer and open a new Terminal window. Run the sample code below:

```cd ~/intel/computer_vision_sdk/deployment_tools/model_optimizer/install_prerequisites/./install_prerequisites.shcd ~/intel/computer_vision_sdk/deployment_tools/demo./demo_squeezenet_download_convert_run.sh -d MYRIAD```

This might take a while as the NCS will begin by installing all prerequisites, as seen in the first line.

There is a chance that you will get an error that looks like ‘[ ERROR ] Can not init USB device: NC_DEVICE_NOT_FOUND’. In that case, try unplugging your device, rerunning this step, or running “`sudo udevadm test /dev/sda“` to test udev rules. If this command shows any invalid key/value pairs, you may need help fixing the udev rules.
And your installation should be done!

Now, if you’re feeling experimental, feel free to play around with the other sample models in the Intel repository. One of our favourites is the face, emotion, age, gender and pose detection application. Do note that it requires a webcam, and that results may not always be accurate! If you ran the first 3 steps, the code you need is below:

```cd ~/inference_engine_samples/intel64/Release./interactive_face_detection_demo -d MYRIAD -m ~/intel/computer_vision_sdk/deployment_tools/intel_models/face-detection-retail-0004/FP16/face-detection-retail-0004.xml -d_ag MYRIAD -m_ag ~/intel/computer_vision_sdk/deployment_tools/intel_models/age-gender-recognition-retail-0013/FP16/age-gender-recognition-retail-0013.xml -d_em MYRIAD -m_em ~/intel/computer_vision_sdk/deployment_tools/intel_models/emotions-recognition-retail-0003/FP16/emotions-recognition-retail-0003.xml -d_hp MYRIAD -m_hp ~/intel/computer_vision_sdk/deployment_tools/intel_models/head-pose-estimation-adas-0001/FP16/head-pose-estimation-adas-0001.xml```

The significantly easier way if you managed to install from Github is below:

Navigate to the folder containing the network you want to run (say, examples/tensorflow/inception_v1) in your Terminal
Type make all – to build the example
Type make run – to download a single image to the NCS and run it on the samples coded in the file.

Support from Intel

Intel is working to increase the availability of freely available examples for the Intel^® Movidius™ NCS. In an endorsement of Python’s vibrant open-source community, Intel has created the Neural Compute Application Zoo – a Github repo full of scripts to download models and compile graphs for Caffe and TensorFlow. Also on this repo are a number of example applications and data to use with them.

Intel has also made sure the OpenVINO suite for Intel^® Movidius™ is extremely well-documented. Make sure to use help(function_name) extensively in Python for anything you might not understand how to work with.

Open source enthusiasts should fork the repo and contribute any of their own applications and use cases. Just follow the instructions to upload additions via GitHub pull requests.

If you face any issues with the NCS or OpenVINO, the NCS Troubleshooting Guide might come in extremely handy. For other tech support, discussions, and simply the feeling of belonging to the community, there is also the NCS User Forum that comes highly recommended and contains community discussions on tonnes of common issues and topics.

To keep in touch with Intel’s innovations in the AIDL community, make sure to check Intel’s AI Academy.

To know more about what's going on at Intel, check out the Intel Developer Zone here.

Promotion

21-Dec-2018

Low-power real-time image processing on the edge with Intel® Movidius™ Neural Compute Stick

A primer on how you can use the Intel Movidius Neural Compute Stick to run an image processing project on the edge

What is Intel® Movidius™ ?

TensorFlow? Caffe? What?

Why Intel® Movidius™ and not a TPU or GPU?

Playing around with Intel® Movidius™

How do I get started?

Support from Intel

Latest Article

What is Intel^® Movidius^™ ?

Why Intel^® Movidius™ and not a TPU or GPU?

Playing around with Intel^® Movidius™