#pragma stream_unroll

Description

Breaks a stream contained in a for loop into multiple streams.

Syntax

Read syntax diagramSkip visual syntax diagram>>-#--pragma--stream_unroll--(--+---+--)-----------------------><
                                '-n-'

where n is a loop unrolling factor. In C programs, the value of n is a positive integral constant expression. In C++ programs, the value of n is a positive scalar integer or compile-time constant initialization expression. An unroll factor of 1 disables unrolling. If n is not specified and if -qhot, -qsmp, or -O4 or higher is specified, the optimizer determines an appropriate unrolling factor for each nested loop.

Notes

Neither -O3 nor -qipa=level=2 is sufficient to enable stream unrolling. You must additionally specify -qhot or -qsmp, or use optimization level -O4 or higher.

For stream unrolling to occur, the #pragma stream_unroll directive must be the last pragma specified preceding a for loop. Specifying #pragma stream_unroll more than once for the same for loop or combining it with other loop unrolling pragmas (unroll, nounroll, unrollandfuse, nounrollandfuse) also results in a warning from XL C; XL C++ silently ignores all but the last of multiple loop unrolling pragmas specified on the same for loop.

Stream unrolling is also suppressed by compilation under certain optimization options. If option -qstrict is in effect, no stream unrolling takes place. Therefore, if you want to enable stream unrolling with the -qhot option alone, you must also specify -qnostrict.

Examples

The following is an example of how #pragma stream_unroll can increase performance.

int i, m, n;
int a[1000][1000];
int b[1000][1000];
int c[1000][1000];


....

#pragma stream_unroll(4)
for (i=1; i<n; i++) {
    a[i] = b[i] * c[i];
}

The unroll factor of 4 reduces the number of iterations from n to n/4, as follows:

for (i=1; i<n/4; i++) {
    a[i] = b[i] + c[i];
    a[i+m] = b[i+m] + c[i+m];
    a[i+2*m] = b[i+2*m] + c[i+2*m];
    a[i+3*m] = b[i+3*m] + c[i+3*m];
}

The increased number of read and store operations are distributed among a number of streams determined by the compiler, reducing computation time and boosting performance.

Related information