MAESTRO

An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators

Hyoukjun Kwon and Tushar Krishna
Georgia Institute of Technology

Michael Pellauer
NVIDIA

Emails: Please refer to our members page


News


Overview

Deep learning techniques, especially convolutional neural networks
(CNN), have pervaded vision applications across image classification, face recognition, video processing, and so on due to the high degree of accuracy they provide. Both industry and academia are exploring specialized hardware accelerator ASICs as a solution to provide low-latency and high-throughput for CNN workloads.

The convolution operation is a deeply nested multiply-accumulate loop. For throughput and energy efficiency, each accelerator chooses different strategies to manipulate the loop order/tiling of the convolution operations and the spatial/temporal mapping of data on compute units, which we collectively refer to as dataflow. The throughput and energy efficiency of a dataflow changes dramatically depending on both the DNN topology (i.e., layer shapes and sizes), and accelerator hardware resources (buffer size, and network-on-chip (NoC) bandwidth). This demonstrates the importance of dataflow as a first-order consideration for deep learning accelerator ASICs, both at design-time when hardware resources (buffers and interconnects) are being allocated on-chip, and compile-time when different layers need to be optimally mapped for high utilization and energy-efficiency.

The research community lacks a tool to accurately model and reason about dataflows.

We present MAESTRO (Modeling Accelerator Efficiency via
Spatio-Temporal Resource Occupancy), an open-source tool for
modeling and evaluating the performance and energy-efficiency of
different dataflows.

Key Features

  •  A concise DSL to describe arbitrary convolution dataflows using a set of concise pragmas
  • An analysis framework that accepts the dataflow description, hardware resource description, and DNN layer description as inputs and generates buffer requirements, buffer access counts, network-on-chip (NoC) bandwidth requirements, and roofline performance information.

Update:

  • We are actively working on an enhanced validated version of MAESTRO which we will release at our HPCA 2019 Tutorial. Watch out for updates on this page. You can also sign-up on the request page for news on the release.

Resources


Publications

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators
Hyoukjun Kwon, Michael Pellauer, and Tushar Krishna
arXiv 2018
[paper]

counter

The whole is greater than the sum of its parts