Dynamic for-loop in HLSL Pixel Shader
Hi there,
does somebody know if (how) it is possible to use a for loop in a HLSL pixel shader with a dynamic count?
For example:
int count = 5 / a;
for (int i = 0; i < count; i++)
...
I didn't manage, whatever I tried the effect creation fails. Even with shader model 3.
Nico
Suddenly it works! I don't know what I made wrong before.
Anyway, when compiling a shader with a dynamic for loop there
is a lot of preprocessing. Does anybody know what is going on?
The more I limit the dynamic count, the shorter the preprocessing.
I just waited about 5 minutes for a loop with a count limit of 80 to compile (!).

So my shader code looks something like this:
int count = (some formula) % 80;
for (int i = 0; i < count; i++)
[876 byte] By [
NicoRi] at [2008-2-14]
is it possible that you are exceeding the instruction count? Lots of hardware don't support loops and the like but will compile, by unrolling the loops which can result in many instruction calls depending on what you are doing inside the loop.
and the corelation between the dynamic count and the length of preprocessing makes perfect sense to me. Less rolling out of the loop happening during compile when you have a smaller loop count.
Hm, how can the loop be unrolled when its count is completely dynamic?
So its impossible to know for the compiler how many times the loop will be executed for each pixel. The loop count also depends on calculations in the vertex shader, which makes this even more difficult. Or does it mean the loop is always unrolled up to the maximum possible count, with a conditional break like statement? I'm curious to know.
Also I'm using a GeForce 6800 GT that should be fully capable of loops.
By the way in my loop there is a texture fetch and a small if-statement.
About the instruction count: It should be very high with SM 3.0, but yes, maybe its possible that I exceed it.
Nico
well, it gets worse. Some hardware don't support branching, most support only static branching (switching blocks of code based on a boolean constant), and dynamic branching is only supported in vertex shaders. I don't know what the ASM looks like for 3. I've got a pretty good Idea how ugly and huge it would be under 1_1 but not 3. Maybe run the command line compiler and take a look at the resulting file and see what's going on in there. You may find your shader's logic paths are too complex.
Your card may support these features but the question is how many of your users will have the hardware support?
See Shader X^2 Introductions & Tutorials with DirectX 9 (Wolfgang F. Engel).
The intro section briefly touches on these areas. Other than that I don't know how to answer your ?. If you find some answers please post them, I'm interested in what the 3.0 model is capable of. Also, can you post the loop?
Hi, thanks for your help.

Well, yes probably my shader is quite complex. It's just that I still don't really understand what's going on in the preprocessing.
I'm doing research for university, so at least at the moment it is not that relevant if others can use my technique. I like to use every feature that current graphics hardware offers me. :)
Here is my loop (guess what's it about? yeah, right, relief mapping... ;-) )
for( int i = 0; i < linSearchSteps-1; i++ )
{
float4 t = tex2D(g_samReliefMap, p.xy);
if (p.z < t.w)
p += v;
}
Nico
I just tried the following:
(linSearchSteps = 40)
int searchSteps = (int)(linSearchSteps / 1.414 * length(v.xy) + 1) % linSearchSteps;
v /= linSearchSteps;
for( int i = 0; i < searchSteps -1; i++ )
{
float4 t = tex2D(g_samReliefMap, p.xy);
if (p.z < t.w)
p += v;
}
The result is a frame rate dropped from 50 to 20 in comparison to the original implementation:
v /= linSearchSteps;
for( int i = 0; i < linSearchSteps-1; i++ )
{
float4 t = tex2D(g_samReliefMap, p.xy);
if (p.z < t.w)
p += v;
}
So it seems to be very inefficient to use a dynamic loop count.
Instead of the frame rate rising as expected, there is a heavy drop.
Probably because of the conditions introduced to evaluate the count for each pixel seperately? My guess is that the loop is unrolled 40 times like this:
count = 0;
if ( count < dynamicCount){
float4 t = tex2D(g_samReliefMap, p.xy);
if (p.z < t.w)
p += v;
count++;
}
if ( count < dynamicCount){
float4 t = tex2D(g_samReliefMap, p.xy);
if (p.z < t.w)
p += v;
count++;
}
if ( count < dynamicCount){
float4 t = tex2D(g_samReliefMap, p.xy);
if (p.z < t.w)
p += v;
count++;
}
.
.
.
Or maybe there is a kind of break statement to jump to the end of the unrolled loop. I never programmed shaders in assembler, so It's just a guess, maybe I'm completely wrong! ;-)
Also to note is that the shader doesn't compile at all when the count is not limited using a modulo (%).
The problem is probably due to what you are doing in the loop. If you use texture lookups that compute gradients or index into the constant register set, the loop will be unrolled. Make certain that you are not indexing constants and use tex2Dlod instead of tex2D. And when testing for this, you should enable prefer flow control. That way you can make sure that the compiler will only unroll when it has to, which helps for determining what is causing it to be unrolled.
Craig Peeper