Question on XNA performance

Before I begin, I'd like to say I enjoy the work Microsoft has put into XNA. As I develop for DirectX I find XNA a great tool for possible projects down the road. As I was reading more into the docs, a random question popped up in my head and I thought it was worth asking about here.

Before I get to my question though, I'd like to share how it related to OpenGL's early stage on Vista's platform. The intention for Vista was to have OpenGL 1.4 on there but layered with Direct3D underneath still making calls to the hardware. Esentially, making an OpenGL call would actually call a Direct3D call. Now, we're informed that if an ICD is installed, it will work without Direct3D making calls underneath. The response from developers though without the ICD was very negative and the performance was quite lacking for beta testing on Longhorn (now Vista). A commercial 3D game that I helped develop for was tested on Vista and the project manager said it was utterly slow like half the performance and quite sluggish. A few months later an ICD beta released and it eventually cured the problem.

Now that we saw the outcome on a API calling another API, I'd like to begin my question on XNA. Microsoft estimates that the XNA performance is a 6-8% decrease, but it theoretically does the same thing as OpenGL calling Direct3D to make the calls. You make an XNA call, it makes a Direct3D call underneath. That takes 2 calls instead of just 1 (if it's a simple call) meaning it doubled the calls for actually getting that command to the hardware. All XNA is doing is calling the native Direct3D COM files underneath which also makes calls to hardware, from my understanding. How can XNA be considered a 6-8% decrease in performance then if it killed an commercial OpenGL application by half using the same "API calling other API" solution? How are the implementations different?

[1978 byte] By [redshock] at [2007-12-26]
# 1

API calling API isn't necessary the performance bottleneck. Adding a single extra call into anything in a modern CPU is going to be tiny.

Which means that the OpenGL calling DirectX must have been doing a lot more work in that intermediate layer. I'm not sure how it works so you would need to ask in main DirectX forum, though since its been fixed I doubt you would get much of an answer.

In XNA, like ManagedDirectX a lot of the calls are thin layers over the DirectX native stuff - thats what takes the 6-8%. I suspect you can find some calls that are 50% slower becuase they do a lot of extra work, and some that are less than 1% slower becuase they do almost nothing. Plus of course we've probably not seen the final optimised builds yet.

Remember too that most of the .Net framework is an API calling the WIn32 API and that performs very well - check out the historic battle between Raymond and Rico http://blogs.msdn.com/ricom/archive/2005/05/10/416151.aspx

TheZMan at 2007-9-4 > top of Msdn Tech,Game Technologies: DirectX, XNA, XACT, etc.,XNA Framework...
# 2
If your batching is good, then the overhead of getting calls to the device is but a small portion of the total cost of the system.

For example, if "getting calls to the hardware" was 5% of your total application running time, and XNA was 2x slower than native, then you'd see a performance decrese of 5%. (This is related to Amdahl's Law)

If your game is simple (say, a typical 2D game) then chances are that you'll feed the harware faster than it can render, or at least faster than the vblank interval, which means that there is no difference between an XNA application and a native application.

Also, XNA is similar in overhead to Managed DirectX (MDX) -- given that it's a managed library, I don't really see how you could make it better from that point of view.

Where I see the cost in using XNA is if you start going haywire on things that have real performance cost, like allocating lots of object each processing tick, instead of re-using previously allocated objects, or using events and delegates all over the place. That'll cost you! Also, it's not clear whether the CLR can optimize your code to SSE/AltiVec instructions at this time, so if you do a lot of batch geometry processing (a la particle systems), you'll get a noticeable performance hit.

For just straight-forward code, the runtime is typically dependent on the amount of memory you touch, so whether it's native or managed won't matter.

JonWatte at 2007-9-4 > top of Msdn Tech,Game Technologies: DirectX, XNA, XACT, etc.,XNA Framework...
# 3
Thanks for the helpful information.
redshock at 2007-9-4 > top of Msdn Tech,Game Technologies: DirectX, XNA, XACT, etc.,XNA Framework...
# 4
You'd be surprised - if you want a simple benchmark, I can run a little over 200 concurrent animations of tiny.x in XNA on a 3 ghz single core processor before I get a noticeable framerate hit in non-release mode (run without debugging, and this is before code optimization). And tiny.x is not a simple model.

With optimizations I can get 400+, and my animation controller is anything but perfect in terms of performance.
leclerc9 at 2007-9-4 > top of Msdn Tech,Game Technologies: DirectX, XNA, XACT, etc.,XNA Framework...