MAPLE: A Massively Parallel Digital Learning Processor T67 H. Graf, S. Cadambi, I. Durdanovic, V. Jakkula, M. Sankaradass, E. Cosatto, S. Chakradhar; NEC Laboratories America Architecture scales to thousands of cores · Parallel Vector Processing Elements (VPE) · Parallel data I/O · Parallel memory banks 128 VPE on one FPGA chip; 6 memory banks Comparison of compute speed 50x 40x 30x (relative to CPU) Compare with CPU: · 19x ­ 30x higher performance 20x 10x · 7x reduced power dissipation · FPGA: Configurable architecture