Performace: can't this thing draw any faster?
I'm certain that I my code must have some critical inefficiency somewhere; I can run my favorite 3D video games at maximum settings, but my little DX app slows down to < 90 frames per second when it tries to draw 100 little boxes. Here's the code in a nutshell:
protected D3D.Texture _texture;
protected static D3D.Sprite _sprite;
_texture = MakeTexture(); // ends up producing a 1x1 white texture
_sprite = new D3D.Sprite(graphics);
dx_graphics.BeginScene();
for (int i = 0; i < 100; ++i)
{
_sprite.Begin(D3D.SpriteFlags.AlphaBlend);
_sprite.Draw(_texture,
new Rectangle(0, 0, (int)500, (int)500),
new Vector3(0.0f, 0.0f, 0.0f),
new Vector3((int)300, (int)200, 0.0f),
Color.FromArgb(255, 255, 255, 255));
_sprite.End();
}
dx_graphics.EndScene();
dx_graphics.Present();
Yeah I know, it's just drawing 100 sqaures on top of each other. This is a simple "stress test", to see how I can draw textured squares more efficiently. Now, I'm certain that there's a lot more than 200 triangles present on the screen when I'm playing a game, so what can I do to make this program crank out textured squares faster?
Don't call Sprite.Begin for every single quad that you are drawing with it, as that destroys its batching capabilities. Instead, only call Sprite.Begin and Sprite.End when you are switching textures/renderstates and the such.
You are also creating a new texture and sprite object each frame, which is unnecessary.
Ah... actually, the texture is created in a totally different method (which is only called once); I probably shouldn't have posted it with the rest of the method.
As far as Begin() and End() goes, what you say makes perfect sense. I was therefore surprised to find that changing it as suggested didn't affect the framerate at all! Here's what the drawing part looks like now; it yields exactly the same framerate:
_sprite.Begin(D3D.SpriteFlags.AlphaBlend);
for (int i = 0; i < 100; ++i)
{
_sprite.Draw(_texture,
new Rectangle(0, 0, (int)500, (int)500),
new Vector3(0.0f, 0.0f, 0.0f),
new Vector3((int)300, (int)200, 0.0f),
Color.FromArgb(255, 255, 255, 255));
}
_sprite.End();
If it matters any, this is what my graphics device looks like:
protected void InitializeGraphics()
{
// set up a device
D3D.PresentParameters presentParams = new D3D.PresentParameters();
presentParams.Windowed = true;
presentParams.SwapEffect = D3D.SwapEffect.Discard;
presentParams.PresentationInterval = D3D.PresentInterval.Immediate;
dx_graphics = new D3D.Device(0, D3D.DeviceType.Hardware, this, D3D.CreateFlags.SoftwareVertexProcessing, presentParams);
// Setup the event handlers for the device
dx_graphics.DeviceLost += new EventHandler(this.DeviceLost);
dx_graphics.DeviceReset += new EventHandler(this.DeviceReset);
dx_graphics.Disposing += new EventHandler(this.DeviceDestroy);
dx_graphics.DeviceResizing += new CancelEventHandler(this.DeviceResize);
// set up various drawing options
dx_graphics.RenderState.CullMode = D3D.Cull.None;
dx_graphics.RenderState.AlphaBlendEnable = true;
dx_graphics.RenderState.AlphaBlendOperation = D3D.BlendOperation.Add;
dx_graphics.RenderState.DestinationBlend = D3D.Blend.InvSourceAlpha;
dx_graphics.RenderState.SourceBlend = D3D.Blend.SourceAlpha;
}
Surely DX can draw 100 rectangles at > 90fps, on a high-end machine?....
Edit: So far, the single greatest factor in how many fps I get is the amount of surface area my rectangles cover... nothing else seems to make any appreciable impact...
Professor Mustard wrote: |
Yeah I know, it's just drawing 100 sqaures on top of each other. This is a simple "stress test", to see how I can draw textured squares more efficiently. Now, I'm certain that there's a lot more than 200 triangles present on the screen when I'm playing a game, so what can I do to make this program crank out textured squares faster?
|
|
Well, the games you're playing are probably not using MDX. Not that this should make a huge difference, but it's worth noting. Since you asked what you can do to make this particular program run faster, I've got a couple suggestions:
1) You're using software vertex processing. Any special reason? If your hardware supports hardware vertex shading, and I can't imagine it wouldn't, you should use it -- this should make a HUGE difference. If you run some of the DXSDK samples you can see the effect of switching between the two processing modes.
2) Alpha blending slows the drawing process down (I assume).
3) Since you're only drawing rectangles and they're all oriented parallel to the view pane (right?), use Sprite.Draw2D instead of Sprite.Draw.
3) Since you're only drawing rectangles and they're all oriented parallel to the view pane (right?), use Sprite.Draw2D instead of Sprite.Draw. |
|
This shouldn't make any difference. There is no Draw2D in the native Sprite class so I suspect Draw2D just ends up calling the Draw function after a rigging some transformation matrices.
Good catch on the software transformation though - I looked at the code a couple times and missed that one.
The crazy thing is, HardwareVertexProcessing vs SoftwareVertexProcessing seems to have no impact on my framerate - even though my understanding of DirectX states that it should make a huge difference.
The "bottom line" seems to be this... when I draw 100 500x500 rectangles, I can get about 90fps. When I draw 100 50x50 retangles, I can get about 1150fps. I have yet to find any flags or techniques that make any real difference in my framerate, which seems very strange...
Its not the vertices that'll kill you drawing a 100 quads. But, a 100 500x500 rectangles is 25000000 pixels per frame. Multiply that by your 90 fps, and you get 2250000000 pixels per second (which is probably close to your fill rate). My X1600 for example has a theoretical fill rate of 2 giga-pixels/sec.
I'm not an expert like some of the other commenters, but one thing I notice is that it looks like you're not creating a "pure" device. Here's what I have in my code:
// If the graphics card supports vertex processing check if the device can
// do rasterization, matrix transformations, and lighting and shading operations
// This combination provides the fastest game experience
if
( ( caps.DeviceCaps.SupportsPureDevice )&& ( createFlags ==
CreateFlags.HardwareVertexProcessing ) ){
createFlags |=
CreateFlags.PureDevice;}
I don't know if it will help your framerate issue or not, but the reading I've done indicates that creating a pure device will give you the fastest rendering performance.
Pete Nelson
Thomas Pittman wrote: |
| Its not the vertices that'll kill you drawing a 100 quads. But, a 100 500x500 rectangles is 25000000 pixels per frame. Multiply that by your 90 fps, and you get 2250000000 pixels per second (which is probably close to your fill rate). My X1600 for example has a theoretical fill rate of 2 giga-pixels/sec. |
|
Heh, I never actually thought of it that way, but that makes perfect sense.
In that case, perhaps the only way to really increase efficiency at this point is to somehow keep track of where all my rendering is going to be, so that I don't render over the same place multiple times and waste pixel pandwidth. Hm.... that sounds like a lot of work; I don't suppose there's a native way of doing this?
Professor Mustard wrote: |
| Thomas Pittman wrote: | | Its not the vertices that'll kill you drawing a 100 quads. But, a 100 500x500 rectangles is 25000000 pixels per frame. Multiply that by your 90 fps, and you get 2250000000 pixels per second (which is probably close to your fill rate). My X1600 for example has a theoretical fill rate of 2 giga-pixels/sec. | | Heh, I never actually thought of it that way, but that makes perfect sense. In that case, perhaps the only way to really increase efficiency at this point is to somehow keep track of where all my rendering is going to be, so that I don't render over the same place multiple times and waste pixel pandwidth. Hm.... that sounds like a lot of work; I don't suppose there's a native way of doing this? |
|
The concept of "over draw" isn't particularly new in computer graphics - it can be a pretty big problem. In a lot of cases good use of the depth buffer and sorting for near-to-far order is sufficient to better use fill-rate. I'm not really an MDX person and don't really know if the sprite class will be making any (or optimal) use of the depth buffer. I'm usually happy to make things difficult for myself and write the code direct to the API where I know it does do things optimally

hth
Jack