Time: 15:00-16:00
Location: TPOC-4 room 351
Dr. Pavlo Molchanov (NVIDIA)
Convolutional neural networks (CNN) are used extensively in computer vision applications. While modern deep CNNs are composed of a variety of layer types, runtime during prediction is dominated by the evaluation of convolutional layers. With the goal of speeding up inference, a number of different accelerating techniques were proposed to reduce computations. In the talk I will focus on two methods we have being working at NVIDIA.In the beginning, I will focus on methods to remove the least important neurons from trained network, also known as pruning. Particularly we will talk on removing entire feature maps (neurons) by a variety of criterias that heuristically approximate their importance. We evaluate a number of such methods on a transfer learning task where training a smaller network is impossible without overfitting. By applying pruning we observe 6x to 10x reduction in computations for fine-grained classification. In the second part of the talk I will focus on another technique for speeding-up inference, called conditional inference. Such methods condition computation of the following layers based on the features evaluated previously and avoid redundant computations. I will talk on details of such architecture named IamNN that evolved from ResNet family and has 12x less parameters together with 3x save on computations.
Bio: Dr. Pavlo Molchanov obtained PhD from Tampere University of Technology, Finland in the area of signal processing in 2014. His dissertation was focused on designing automatic target recognition systems for radars. Since 2015 he is with Learning and Perception Research team at NVIDIA, currently holding a senior research scientist position. His research is focused on methods for neural network acceleration, and designing novel human-computer interaction systems for in-car driver monitoring. He received EuRAD best paper award in 2011 and EuRAD young engineer award in 2013.