High Performance Soft Processor Architectures for Applications with Irregular Data- and Instruction-level Parallelism

Date

2014-07-14

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Embedded systems based on FPGAs frequently incorporate soft processors. The prevalence of soft processors in embedded systems is due to their flexibility and adaptability to the application. However, soft processors provide moderate performance compared to hard cores and custom logic, hence faster performing soft processors are desirable.

Many soft processor architectures have been studied in the past including Vector processors and VLIWs. These architectures focus on regular applications in which it is possible to extract data and/or instruction level parallelism offline. However, applications with irregular parallelism only benefit marginally from such architectures. Targeting such applications, we investigate superscalar, out-of-order, and Runahead execution on FPGAs. Although these architectures have been investigated in the ASIC world, they have not been studied thoroughly for FPGA implementations.

We start by investigating the challenges of implementing a typical inorder pipeline on FPGAs and propose effective solutions to shorten the processor critical path. We then show that superscalar processing is undesirable on FPGAs as it leads to low clock frequency and high area cost due to wide datapaths. Accordingly, we focus on investigating and proposing FPGA-friendly OoO and Runahead soft processors.

We propose FPGA-friendly alternatives for various mechanisms and components used in OoO execution. We introduce CFC, a novel copy-free checkpointing which exploits FPGA block RAMs for fast and dense storage. Using CFC, we propose an FPGA-friendly register renamer and investigate the design and implementation of instruction schedulers on FPGAs.

We then investigate Runahead execution and introduce NCOR, an FPGA-friendly non-blocking cache tailored for FPGAs. NCOR removes CAM-based structures used in conventional designs and achieves the high clock frequency of 278 MHz. Finally, we introduce SPREX, a complete Runahead soft core incorporating CFC and NCOR. Compared to Nios~II, SPREX provides as much as 38% higher performance for applications with irregular data-level parallelism with minimal area overhead.

Description

Keywords

FPGA, Soft Core, Architecture, High Performance

Citation

DOI

ISSN

Creative Commons

Attribution-ShareAlike 2.5 Canada

Items in TSpace are protected by copyright, with all rights reserved, unless otherwise indicated.