Intel® High Level Synthesis Compiler Standard Edition: User Guide

ID 683306
Date 12/18/2019
Public
Document Table of Contents

A.3.1. Loop Analysis Example

Figure 4 shows an example High Level Design Report (report.html) file that shows the loop analysis of a component design taken from the transpose_and_fold.cpp file (part of the tutorial files provided in <quartus_installdir>/hls/examples/tutorials/best_practices/loop_memory_dependency).

Consider the following example code snippet for transpose_and_fold.cpp:

01: #include "HLS/hls.h"
02: #include <stdio.h>
03: #include <stdlib.h>
04: 
05: #define SIZE 32
06: 
07: typedef ihc::stream_in<int> my_operand;
08: typedef ihc::stream_out<int> my_result;
09: 
10: component void transpose_and_fold(my_operand &data_in, my_result &res)
11: {
12:   int i;
13:   int j;
14:   int in_buf[SIZE][SIZE];
15:   int tmp_buf[SIZE][SIZE];
16:   for (i = 0; i < SIZE * SIZE; i++) {
17:     in_buf[i / SIZE][i % SIZE] = data_in.read();
18:     tmp_buf[i / SIZE][i % SIZE] = 0;
19:   }
20: 
21:   #ifdef USE_IVDEP
22:   #pragma ivdep safelen(SIZE)
23:   #endif
24:   for (j = 0; j < SIZE * SIZE * SIZE; j++) {
25:   #pragma unroll
26:     for (i = 0; i < SIZE; i++) {
27:       tmp_buf[j % SIZE][i] += in_buf[i][j % SIZE];
28:     }
29:   }
30:   for (i = 0; i < SIZE * SIZE; i++) {
31:     res.write(tmp_buf[i / SIZE][i % SIZE]);
32:   }
33: }
Figure 4. Loop Analysis Report of the transpose_and_fold Component

The transpose_and_fold component has four loops. The loop analysis report shows that the compiler performed different kinds of loop optimizations:

  • The loop on line 26 is fully unrolled, as defined by #pragma unroll.
  • The loops on lines 16 and 30 are pipelined with an II value of ~1. The value is ~1 because both loops contain access to streams that could stall. If these access stall, then the loop II becomes greater than 1.

The Block1.start loop in the loop analysis report is not present in the code. It is an implicit infinite loop that the compiler adds to allow the component to run continuously, instead of only once. In hardware, the component run continuously and checks its inputs to see if it should start executing.