MAERI: An Open Source Framework for Generating Modular DNN Accelerators supporting Flexible Dataflow 

Tutorial at ISCA 2018.

Date: June 3, 2018: 8 AM to 12 Noon (Tentative)


More details about the MAERI Project can be found here.

Stay tuned for more details!

The right microarchitecture of a DNN ASIC accelerator is an area of active research. There are a few key challenges that computer architects face:

  • DNN topologies are evolving at a rapid rate, and it is common to have convolution, recurrent, pooling, and fully-connected layers with varying input and filter sizes in the most recent topologies.
  • DNNs may be dense or sparse, with a variety of encoding schemes.
  • Owing to fixed number of PEs on-chip, DNNs can be partitioned in myriad ways (within and across layers) to exploit data reuse (weights and intermediate outputs) and mapped over the PEs. Different partitioning and mapping strategies can lead to energy trade-offs due to the amount of data reuse at various levels of the memory hierarchy.

All the above lead to myriad dataflows within the accelerator substrate, making the dataflow optimization a first-class component of the micro-architecture. Unfortunately, most DNN accelerators today support fixed dataflow patterns internally as they perform a careful co-design of the PEs and the network-on-chip (NoC). In fact, the majority of them are only optimized for traffic within a convolutional layer. This makes it challenging to map arbitrary dataflows on the fabric efficiently, and can lead to underutilization of the available compute resources. In fact, each new optimization has resulted in a new accelerator proposal optimized for the optimization.

The research community today lacks a simulation infrastructure to evaluate DNN dataflows and architectures systematically and reason about performance, power, and area implications of various design choices.

We recently proposed MAERI (Multiply-Accumulate Engine with Reconfigurable Interconnect) which is a modular design-methodology for building DNN accelerators. MAERI makes a case for assembling accelerators using a suite of plug-and-play building blocks rather than as a monolithic tightly-coupled entity. These building blocks can be tuned at runtime using MAERI’s novel tree-based configurable interconnection fabrics to enable efficient mapping of myriad dataflows.

In this tutorial, we will present the MAERI platform. MAERI serves three key roles:

  • Rapid-Design Space Exploration: MAERI can simulate different dataflows by allowing users to vary loop-ordering, loop unrolling, spatial tiling, and temporal tiling, and study the effects on overall runtime and energy on a spatial DNN accelerator with user-specified number of PEs and buffer sizes.
  • DNN Accelerator RTL Generation: MAERI can generate DNN accelerators with configurable interconnects, fine-grained computing building blocks, and tiny distributed buffers, enabling efficient mapping of myriad dataflows. It outputs the RTL for the accelerator, which can then be sent through an ASIC or FPGA flow for latency/power/area estimates.
  • End-to-End DNN Simulation. MAERI allows mapping of convolutional, LSTM, pooling, and fully-connected layers, allowing an end-to-end run of modern DNNs.

List of Topics to be Covered (Tentative)

  • Brief Introduction to DNN Dataflows
    • Taxonomy, Performance/Power Trade-offs
  • MAERI Building Blocks
    • Multiplier Switches, Adder Switches, Simple Switches, Distribution Tree, Reduction Tree, LSTM blocks, Look Up Table, Prefetch Buffer
  • Design-space Exploration of Dataflows with MAERI
    • Loop Ordering, Loop Unrolling, Spatial Tiling, Temporal Tiling
  • Mapping Dataflows over MAERI
  • RTL Modules and Code Organization
  • Hands-on Exercises
    • Assembling a DNN accelerator using MAERI building blocks
    • Mapping a DNN over MAERI
    • Running performance evaluations
    • Running power and area analysis
  • Extensions and Future Development

Target Audience:

The tutorial targets students, faculty, and researchers who want to

  • architect novel DNN accelerators, or
  • study performance implications of dataflow mapping strategies, or
  • plug a DNN accelerator RTL into their system

Pre-requisite Knowledge: A brief understanding of DNNs and a brief understanding of RTL.

The whole is greater than the sum of its parts