MAERI Tutorial @ ISCA 2018

MAERI: Enabling Rapid Design Space Exploration and Prototyping of DNN Accelerators

Tutorial at ISCA 2018.

Date: June 3, 2018: 8:30 AM to 12 Noon


Tushar Krishna
Assistant Professor
Georgia Tech

Michael Pellauer
Sr. Research Scientist
NVIDIA Research

Hyoukjun Kwon
PhD student
Georgia Tech

  • More details about MAESTRO and MAERI.
  • To request MAESTRO binary and MAERI source code, please use this link.
  • Slides and Videos from the tutorial are attached below as part of the tutorial schedule
  • For any questions or feedback, email Hyoukjun, Tushar and/or Michael.


The right microarchitecture of a DNN ASIC accelerator is an area of active research. Here are a few key challenges that computer architects face:

  • DNN topologies are evolving at a rapid rate, and it is common to have convolution, recurrent, pooling, and fully-connected layers with varying input and filter sizes in the most recent topologies.
  • DNNs today have millions of parameters. Moreover, they may be dense or sparse, with a variety of encoding schemes.
  • Owing to fixed number of PEs on-chip, DNNs can be partitioned in myriad ways (both within and across layers) to exploit data reuse (weights and intermediate outputs) and mapped over the PEs. Different partitioning and mapping strategies can lead to energy trade-offs due to the amount of data reuse at various levels of the memory hierarchy.

All the above lead to myriad dataflows within the accelerator substrate, making the dataflow optimization a first-class component of the micro-architecture.

The research community today lacks a simulation infrastructure to evaluate DNN dataflows and architectures systematically and reason about performance, power, and area implications of various design choices.

In this tutorial, we will present two tools for enabling rapid design-space exploration of DNN accelerators:

  • MAESTRO [arXiv 2018 paper][website] is an analytical tool for modeling and analyzing different convolutional dataflows.  Using a simple DSL, it enables users to simulate different dataflows by varying loop-ordering, loop unrolling, spatial tiling, and temporal tiling, and study the effects on overall runtime and energy on a spatial DNN accelerator with user-specified number of PEs and buffer sizes.
  • MAERI [ASPLOS 2018 paper][website]is a parameterizable DNN accelerator generator (published at ASPLOS 2018) that builds accelerators using a suite of plug-and-play building blocks rather than as a monolithic tightly-coupled entity. It outputs the RTL for the accelerator, which can then be sent through an ASIC or FPGA flow for latency/power/area estimates. MAERI allows mapping of convolutional, LSTM, pooling, and fully-connected layers, allowing an end-to-end run of modern DNNs.

Tutorial Schedule


  • We will be distributing the MAESTRO and MAERI code as a VM.
  • Please install Virtual Box on your laptop before coming for the tutorial.
  • We will bring some pen-drives with the VM image at the tutorial.
8:30 – 9:00 AM Introduction and Background on DNN accelerators [Slides] [Video]
9:00 – 9:40 AM MAESTRO: A performance and cost model for DNN dataflows 
– Dataflow Taxonomy based on tile, temporal/spatial map, and merge/unroll pragmas
– DSL for describing dataflows



9:40 – 10:00 AM MAESTRO hands-on exercises
– Impact of Dataflow
– Describe and evaluate state-of-the-art accelerator dataflows (Eyeriss, NVDLA, ShiDianNao, and more)
– Impact of DNN Topology
– Evaluate different layers of VGGNet
– Impact of Microarchitecture
– Vary number of PEs buffer sizes, interconnect bandwidth

Bring your laptop!

VM will be distributed on a pen-drive

10:00 – 10:30 AM Coffee Break (+ continued hands-on exercises)
10:30- 11:15 AM MAERI – An Open Source RTL for Flexible DNN Accelerators
– MAERI Building Blocks
– Multiplier Switches, Adder Switches, Simple Switches, Distribution Network, Collection Network, Activation Units, Prefetch Buffer
– Full Microarchitecture
– RTL Modules and Code Organization



11:15-11:45 AM MAERI hands-on exercises
– Configure MAERI and generate RTL
– Mapping a DNN over MAERI
– Running performance evaluations
– ASIC and FPGA Synthesis flow for Area and Power

Bring your laptop!

VM will be distributed on a pen-drive.

Wrap up and Future extensions [Slides][Video]

Target Audience:

The tutorial targets students, faculty, and researchers who want to

  • architect novel DNN accelerators, or
  • study performance implications of dataflow mapping strategies, or
  • plug a DNN accelerator RTL into their system

Pre-requisite Knowledge: A brief understanding of DNNs and a brief understanding of RTL.


web counter free

The whole is greater than the sum of its parts