SVAIL Tech Notes: Optimizing RNNs with Differentiable Graphs

This week we posted a new Tech Note in which Jesse Engel discusses a new technique for speeding up the training of deep recurrent neural networks. This is Part II of a multi-part series detailing some of the techniques we’ve used here at Baidu’s Silicon Valley AI Lab (SVAIL) to accelerate the training of recurrent neural networks. While Part I focused on the role that minibatch and memory layout play on recurrent GEMM performance, we shift our focus here to tricks we can use to optimize the algorithms themselves.

Jesse comments: 

There are two main takeaways in this blog post. First, differentiable graphs are a simple and useful tool for visually calculating complicated derivatives. Second, these graphs can also inspire algorithmic optimizations. As an example, we show how to accelerate Gated Recurrent Units (GRUs) by up to 40 percent.

The post is targeted to:

– Researchers using frameworks that require explicit gradient calculation, such as Torch or Caffe. (We will see how to easily visualize and infer gradients in terms of GPU kernels.)

– Researchers developing new iterative algorithms. (We will develop variations of iterative algorithms such as RNNs that are more efficiently parallelized.)

– Authors of Deep Learning frameworks that apply auto-differentiation such as TheanoTensorflowTorch Autograd, or Neon. (These methods will hopefully provide inspiration for implicit graph optimizations to move towards systems that can better balance tradeoffs of memory usage and computation.)

Read the full post here:

 SVAIL Tech Notes are written by engineers for engineers on topics related to AI technologies, techniques, tips and trends.

 Previous issues:

Around the World in 60 Days, by Ryan Prenger and Tony Han

Deploying Deep Neural Networks Efficiently, by Chris Fougner

Optimizing RNN Performance, by Erich Elsen

Relevant Links:

SVAIL GitHub Blog

2018-01-17T16:20:53+00:00 June 15th, 2016|