Intel® High Level Synthesis Compiler Pro Edition: Reference Manual

ID 683349
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

6.8. Loop Interleaving Control (max_interleaving Pragma)

The Intel® HLS Compiler Pro Edition tries to maximize the throughput and hardware resource occupancy of pipelined inner loops in a loop nest by issuing new inner loop iterations as frequently as possible (minimizing the loop initiation interval). When the compiler cannot achieve a loop II of 1 for an inner loop, the compiler configures the loop nest to interleave iterations of one invocation of the inner loop with iterations of other invocations of the inner loop.
Terminology Reminder
A loop iteration is the single execution of a loop body. A loop invocation is the start of pipelined execution of loop iterations.
Figure 12. Interleaving Example 1

Interleaving Example 2

As another example, consider the following loop nest:
// Loop j is pipelined with ii=1
for (int j = 0; j < M; j++) {
  int a[N];
  // Loop i is pipelined with ii=2
  for (int i = 1; i < N; i++) {
      a[i] = foo(i)
  }
}

In this example, the inner loop i is pipelined with II=2. Under normal pipelining, this II means that the inner loop hardware only achieves 50% utilization, since one iteration of the i loop is initiated every other cycle. To take advantage of these idle cycles, the compiler interleaves a second invocation of the i loop from the next iteration of the outer j loop.

Because the i loop resides inside the j loop, and the j loop has a trip count of M, the i loop is invoked M times. The j loop is the outermost loop and is invoked once.

The following table shows the difference between normal pipelined execution of the i loop versus interleaved execution for this example for N=5.

Cycle Pipelined Loop Iterations

(j loop, i loop)

Interleaved Loop Iterations

(j loop, i loop)

0 (0,0) (0,0)
1 --- (1,0)
2 (0,1) (0,1)
3 --- (1,1)
4 (0,2) (0,2)
5 --- (1,2)
6 (0,3) (0,3)
7 --- (1,3)
8 (0,4) (0,4)
9 --- (1,4)
10 (1,0) (2,0)
11 --- (3,0)
12 (1,1) (2,1)
13 --- (3,1)
14 (1,2) (2,2)
15 --- (3,2)
16 (1,3) (2,3)
17 --- (3,3)
18 (1,4) (2,4)
19 --- (3,4)

This table shows the values (j,i) for each inner loop iteration that is initiated at each cycle. At cycle 0, both modes of execution initiate the (0,0)th iteration of the i loop. Under normal pipelined execution, no i loop iteration is initiated at cycle 1. Under interleaved execution, the (1,0)th iteration of the innermost loop, i.e. the first iteration of the next (j=1) invocation of the i loop, is initiated. By cycle 10, interleaved execution has initiated all of the iterations of both the j=0 invocation of the i loop, and the j=1 invocation of the i loop. This represents twice the efficiency of the normal pipelined execution.

Sometimes you might determine that this interleaving does not give you a performance benefit relative to the additional FPGA area needed to enable interleaving. In these cases, you can limit or restrict the amount of interleaving to reduce FPGA area utilization.

Using the max_interleaving Pragma

To limit the number of interleaved invocations of an inner loop that can be executed simultaneously, annotate the inner loop with the max_interleaving pragma. The annotated loop must be contained inside another pipelined loop.

The required parameter ( n) specifies an upper bound on the degree of interleaving allowed, That is, how many invocations of the containing loop can execute the annotated loop at a given time.

Specify the max_interleaving pragma in one of the following ways:
  • #pragma max_interleaving 1

    The compiler restricts the annotated (inner) loop to be invoked only once per outer loop iteration. That is, all iterations of the inner loop travel the pipeline before the next invocation of the inner loop can occur.

  • #pragma max_interleaving 0

    The compiler allows the pipeline to contain a number simultaneous invocations of the inner loop equal to the loop initiation interval (II) of the inner loop. For example, an inner loop with an II of 2 can have iterations from two invocations in the pipeline at a time.

    This behavior is the default behavior for the compiler if you do not specify the max_interleaving pragma.

In the following code snippet, the compiler restricts the pipelined execution of the i loop. A new invocation of the i loop corresponds only to subsequent iteration of the j loop.
// Loop j is pipelined with ii=1
for (int j = 0; j < M; j++) {
  int a[N];
  // Loop i is pipelined with ii=2 
  #pragma max_interleaving 1
  for (int i = 1; i < N; i++) {
      a[i] = foo(i)
  }
  …
  }
For another example of the effects of using the max_interleaving pragma, refer to the following tutorial:
<quartus_installdir>/hls/examples/tutorials/loop_controls/max_interleaving