Erich Elsen, systems researcher at Baidu’s Silicon Valley AI Lab, has written a blog post on “Optimizing RNN Performance.”  This is the first in a series of technical posts by SVAIL researchers and engineers on AI techniques, tips and trends.

Erich writes:  

“Most researchers engaging in neural network research have been using GPUs for training for some time now due to the speed advantage they have over CPUs. GPUs from NVIDIA are almost universally preferred because they come with high quality BLAS (cuBLAS) and convolution (cuDNN) libraries.

Achieving optimal performance across a wide range of hardware and input sizes is extremely challenging for library writers and there has been some work outside of NVIDIA on libraries focused on achieving even better performance for problem sizes relevant to deep learning.

Scott Gray of Nervana Systems, has written high performance GEMM and space-domain convolution libraries for Maxwell architecture GPUs which are used in their high performance deep learning framework Neon. Facebook has focused on frequency-domain based techniques for their convolution libraries. We have also written some libraries internally for special cases not covered by any existing libraries….”

Erich’s post is targeted to:

– People that find deep learning exciting and want to learn more about it.

– Researchers using a deep learning framework such as Torch7Theano or Caffe

– Authors of deep learning frameworks. 

– Low level library implementors. 

Read the full post here: