# Benchmark data for an AJITx1x1x32 FPGA based system

Department of Electrical Engineering Indian Institute of Technology, Bombay Mumbai, India

March 17, 2023

#### 1 Overview

The AJIT 1x1x32 is a single core processor in the AJIT family of processors. The single core contains a single issue, in-order pipelined thread, and this thread implements the SPARC-V8 instruction set architecture. The processor core includes 32KB data and instruction caches, an MMU with a 256 entry TLB, a 2-bit branch-predictor with 256 entries, and a return address stack with 16 entries. A pipelined floating point unit with single and double precision support (IEEE 754) is included, together with non-pipelined square-root and division units.

The reference evaluation system uses a Xilinx KC705 FPGA card, with a DRAM interface with 1GB of memory available. The system is nominally clocked at 100MHz. There is no L2 cache in the system used for these tests.

The benchmarks checked are the following: Dhrystone, Coremark, Whetstone, and Linpack.

## 2 Dhrystone

The Dhrystone benchmark consists of two separate source files. The *legal* way to run Dhrystone is to compile the files separately, without inlining functions, and without using link-time optimization. We use the following options to compile the two files separately.

#### -02 -funroll-loops

The *best-effort* run of Dhrystone is to compile the two files together, permitting function inlining. This results in fewer instructions being used for each iteration, leading to dramatic improvements in performance. We use the following compile time option for the best-effort run.

-03

We report both numbers (legal and best-effort) together with achieved instructions per cycle (IPC).

| Mode              | DMIPS | DMIPS/MHz | IPC  |
|-------------------|-------|-----------|------|
| Legal (no-inline) | 229K  | 1.3       | 0.73 |
| Best-effort       | 420K  | 2.39      | 0.78 |

## 3 Coremark

Coremark is a synthetic benchmark which addresses some of the shortcomings of Dhrystone. We have run Coremark without modifying the source code and by compiling it with GCC 4.7.4 and compile option:

#### -02

This is the legal way to run Coremark.

However, compilers can use specifically targeted optimizations which do code transformations as for example, is described in

```
https://community.arm.com/arm-community-blogs/
b/embedded-blog/posts/coremark-and-compiler-performance
```

We have replicated the transformation described above and run Coremark on this transformed code. We call this the best-effort Coremark run.

We report both numbers (legal and best-effort) together with achieved instructions per cycle (IPC).

| Mode        | Coremark | Coremarks/MHz | IPC  |
|-------------|----------|---------------|------|
| Legal       | 208.98   | 2.09          | 0.84 |
| Best-effort | 239.64   | 2.40          | 0.79 |

#### 4 Whetstone

Whetstone is a synthetic benchmark to check floating point performance. We compile it using the -O3 option and obtain the following score:

| MWIPS  | Mflops1 | Mflops2 | Mflops3 | Cosmops | Expmops | Fixpmops | Ifmops | Eqmops |
|--------|---------|---------|---------|---------|---------|----------|--------|--------|
| 39.153 | 36.363  | 37.943  | 14.286  | 1.280   | 0.281   | 46.875   | 99.991 | 21.428 |
|        |         |         |         |         |         |          |        |        |

## 5 Linpack

The Linpack benchmark was run for a matrix size of 64x64. The results obtained are shown below.

MFLOPS = 20.614634