bug in arithmetic conversion?

I've found what I think is a bug in how the C compiler treats binary operators when the operands are of different types.

According the rules of 'standard arithmetic conversion', if one operand is a "float" and one is an "int", I would expect the operation to be performed with 'float' precision. The following code illustrates the bug.

Following was typed into a new project, using the "Visual C++ / Win32 Console Application" template in Visual Studio 2005. (Version 8.0.50727.762 (SP.050727-7600), Microsoft .NET Framework Version 2.0.50727, Professional, Microsoft Visual C++ 2005 77626-009-0000007-41814).

// seehttp://msdn2.microsoft.com/en-us/library/09ka8bxx(vs.80).aspx

long n = 4000;

float f = 10000;

float correct1 = n / f;

double correct2 = (float) ((float)n / (float)f);

double correct3 = (float) (n/f);

double correct4 = (float) ((float)n / f);

double correct5 = correct1;

double incorrect1 = n / f;

double incorrect2 = (float) n / f;

double incorrect3 = ((float)n) / f;

Can someone can tell me what's wrong with my understanding, or how to get Visual C++ 2005 to behave?

[1896 byte] By [eric.slosser] at [2008-1-10]
# 1
Hi,
that's weird, I have the same version as you but I'm getting correct values. Are you sure that that code you posted has the same problem as the code you tested?
n0n4m3 at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 2

Yes, I'm sure.

Every variable named "correctN" has the value 0.40000000596046448, every "incorrectN" has the value 0.40000000000000002.

Maybe there's a difference in the command line passed to the compiler? Mine says:

Code Block

/Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE"

/D "_UNICODE" /D "UNICODE"

/Gm /EHsc

/RTC1 /MDd /Yu"stdafx.h"

/Fp"Debug\ArithmeticConversion.pch"

/Fo"Debug\\" /Fd"Debug\vc80.pdb" /W3 /nologo /c

/Wp64 /ZI /TP /errorReport:prompt

eric.slosser at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 3

0.4 can't be represented exactly by either float or double. How are you displaying your values?

0.40000000596046448 is not the value in the float (i.e. ithose extra digits to the right of .40000000 is error), it's only the printed-out approximation of whats stored in those 32-bits, and because the printing algorithm shifted out all the mantissa digits, it continues to print out to the number of digits requested based on "carry residue."

So I don't understand why you think this is a bug.

Brian

BrianKramer at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 4
I thought you were having other kind of issue, something like getting 0 instead of ~0.4.

Just to add to what Brian said, here are some links that explain this in more detail:
http://docs.sun.com/source/806-3568/ncg_goldberg.html
http://en.wikipedia.org/wiki/Floating_point

n0n4m3 at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 5
Every variable named "correctN" has the value 0.40000000596046448, every "incorrectN" has the value 0.40000000000000002.

I'm not sure why you think a difference in meaningless digits is a problem. The VC++ 'float' has 7 digits of precision; 'double' has 15. So your 'float' values are exactly 0.4 to 7 digits, and the 'double' values are exactly 0.4 to 15 digits. You only get to compare values out to the guaranteed precision of the type; if two floating-point values differ by less than the associated "epsilon" value, they're equal.

Your value: 0.40000000596046448

"Exact" value: 0.4

Difference: 0.00000000596046448

FLT_EPSILON 1.192092896e-07F 0.0000001192092896

Difference < "epsilon" ==>> the values are equal. The same analysis holds for the 'double' values.
Sdi at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 6

This has nothing to do with VC++ by the way. any language that uses the float or double format will have the same issues.

Do a google search for 'what every programmer should know about floating point' and you will get a good explanation.

BrunovanDooren at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 7
Bruno van Dooren wrote:

This has nothing to do with VC++ by the way. any language that uses the float or double format will have the same issues.

It does concern VC++, and any other C implementation that claims to implement 'standard arithmetic conversion'.

For example, gnu's gcc doesn't have this bug, every double in the code sample I posted is given the same answer.

eric.slosser at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 8

First, thanks to everyone who has taken an interest in my question, and especially those who have taken the time to try to explain things to me.

Second, I understand that a float is less precise than a double. I also understand that the mathematically correct value (4/10) can't be represented exactly by an IEEE-754 float or double (the representation used by VC++).

Let me reiterate my point/question:

I'm claiming that the right-hand-side (RHS) of every expression in my code example should be evaluated with float accuracy. The value of the RHS can only be as accurate as a float, and when that value is assigned to the LHS, it doesn't magically get more accurate. Note how 'correct5' is still just accurate to 'float' standards.

All the incorrect values should be the same as 'correct5', as per Microsoft's own claim of adherence to 'standard arithmetic conversion', but they're not. The RHS is being evaluated in 'double' land, when (I claim) it should be done with 'floats'. It's okay by me if the compiler does all the calculation in 64 bits, but it has to convert that to 32 bits to arrive at the correct value for the RHS.

At least, that's my understanding, I could be wrong. But if I am, than so is GNU's gcc suite. I understand the weakness of any argument that says "my compiler does it this way". That's why I've been relying on Microsoft's and the C standard's description of how things are supposed to work.

eric.slosser at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 9

Here are some things that I think are beside the point I'm making, but since we're all interested in precision, and some people have made some incorrect claims, let's look at the bits and do some math.

Any claims I make below are in reference to IEEE-754 floats, and may not apply to other representations.

For a general discussion of how to compute the value represented in a IEEE-754 float, refer to <http://en.wikipedia.org/wiki/Single_precision> (or the IEEE standards doc, if you have it).

The correct float result is 0x3ecccccd.

0x3ecccccd = 0011 1110 1100 1100 1100 1100 1100 1101 (base 2)

sign bit = 0
raw exponent = 011 1110 1 = 0111 1101 = 0x7D = 125
mantissa = 100 1100 1100 1100 1100 1101
= 1100 1100 1100 1100 1100 1101 (implicit 24th bit included)
decoded mantissa
= sum of 2^(-N), where N = 0,1,4,5,8,9,12,13,16,17,20,21,23

Given the following table of the powers of 2...

2^( 0) = 1
2^(-1) = 0.5
2^(-2) = 0.25
2^(-3) = 0.125
2^(-4) = 0.0625
2^(-5) = 0.03125
2^(-6) = 0.015625
2^(-7) = 0.0078125
2^(-8) = 0.00390625
2^(-9) = 0.001953125
2^(-10)= 0.0009765625
2^(-11)= 0.00048828125
2^(-12)= 0.000244140625
2^(-13)= 0.0001220703125
2^(-14)= 0.00006103515625
2^(-15)= 0.000030517578125
2^(-16)= 0.0000152587890625
2^(-17)= 0.00000762939453125
2^(-18)= 0.000003814697265625
2^(-19)= 0.0000019073486328125
2^(-20)= 0.00000095367431640625
2^(-21)= 0.000000476837158203125
2^(-22)= 0.0000002384185791015625
2^(-23)= 0.00000011920928955078125

... I calculate the mantissa as the sum of the following

2^( 0) = 1
2^(-1) = 0.5
2^(-4) = 0.0625
2^(-5) = 0.03125
2^(-8) = 0.00390625
2^(-9) = 0.001953125
2^(-12)= 0.000244140625
2^(-13)= 0.0001220703125
2^(-16)= 0.0000152587890625
2^(-17)= 0.00000762939453125
2^(-20)= 0.00000095367431640625
2^(-21)= 0.000000476837158203125
2^(-23)= 0.00000011920928955078125
= 1.600000023841857910156250

decoded exponent = 125 - 127
= -2
value = mantissa * 2 ^ (decoded exponent)
= mantissa / 4
= 0.40000000596046447753906250

That's more digits that I was getting from the VC++ debugger or from a printf statement containing "%.30f". That number (0.40000000596046448) is less precise, but is still just as accurate for as many digits as it has. Neither number has any 'garbage digits'.

Also, the number I calculate above, even though it has a whole lot of base-10 digits, has no 'approximation' or 'error' or 'carry residue'. It's what's really in the float.

- - - A (very small) note about FLT_EPSILON - - -

FLT_EPSILON the compiler macro may be defined as 1.192092896e-07F, but in mathematical terms, it's 0.00000011920928955078125 == 2^(-23).

FLT_EPSILON is not the smallest number that can be added to a 'float' and be detected. It's the smallest number that can be added to a float's mantissa. The float's value is (mantissa * 2^exponent). You have to scale the epsilon before comparing it to any given float. Just think about how a float can represent numbers in the range of 1.175494351E-38 to 3.402823466E+38, and you'll see that a straight comparison of a float to FLT_EPSILON is the wrong operation.

eric.slosser at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 10
Let me reiterate my point/question:

I'm claiming that the right-hand-side (RHS) of every expression in my code example should be evaluated with float accuracy. The value of the RHS can only be as accurate as a float, and when that value is assigned to the LHS, it doesn't magically get more accurate. Note how 'correct5' is still just accurate to 'float' standards.

What's the basis of this "claim"? Does the language standard specify that floating-point calculations must be carried out at a precision no greater than what is required by the operand types? I don't know, but I'd guess not.

Sdi at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 11

Could you give a complete program (with just one correct/incorrect example) that compiles under both VC++, GCC, and also share with us the output? This helps me identify what your measurement really is: (i.e. printf("%g"), etc).

You haven't disproven my thesis, which is that you're seeing a difference only on the output, not the original computation and representation.


Printing values is less subject to conformance constraints across platforms than C++ floating point arithmetic.

Another thing to try is to disassemble the output, to see if in fact you are correct, i.e where the RHS evaluation depends on the type of the LHS.

Brian

BrianKramer at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 12

long n = 4000;

00469B9E mov dword ptr No,0FA0h

float f = 10000.0f;

00469BA5 fld dword ptr [__real@461c4000 (4FF9F0h)]

00469BAB fstp dword ptr [f]

float correct1 = n / f;

00469BAE fild dword ptr No

00469BB1 fdiv dword ptr [f]

00469BB4 fstp dword ptr [correct1]

double correct2 = (float) ((float)n / (float)f);

00469BB7 fild dword ptr No

00469BBA fdiv dword ptr [f]

00469BBD fstp dword ptr [ebp-2D8h]

00469BC3 fld dword ptr [ebp-2D8h]

00469BC9 fstp qword ptr [correct2]

double correct3 = (float) (n/f);

00469BCC fild dword ptr No

00469BCF fdiv dword ptr [f]

00469BD2 fstp dword ptr [ebp-2D8h]

00469BD8 fld dword ptr [ebp-2D8h]

00469BDE fstp qword ptr [correct3]

double correct4 = (float) ((float)n / f);

00469BE1 fild dword ptr No

00469BE4 fdiv dword ptr [f]

00469BE7 fstp dword ptr [ebp-2D8h]

00469BED fld dword ptr [ebp-2D8h]

00469BF3 fstp qword ptr [correct4]

double correct5 = correct1;

00469BF6 fld dword ptr [correct1]

00469BF9 fstp qword ptr [correct5]

double incorrect1 = n / f;

00469BFC fild dword ptr No

00469BFF fdiv dword ptr [f]

00469C02 fstp qword ptr [incorrect1]

double incorrect2 = (float) n / f;

00469C05 fild dword ptr No

00469C08 fdiv dword ptr [f]

00469C0B fstp qword ptr [incorrect2]

double incorrect3 = ((float)n) / f;

00469C0E fild dword ptr No

00469C11 fdiv dword ptr [f]

00469C14 fstp qword ptr [incorrect3]

So

1) All calculations (including 'correct1') are carried out at hardware precision

2) The expressions like 'correct3' that cast the entire RHS to (float) cause the code to throw away the bottom 32 bits by doing a 32-bit store, re-loading that 32-bit value, and then doing a 64-bit store for the "=" operation

I still don't see a problem here.

Sdi at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 13

eric.slosser wrote:
According the rules of 'standard arithmetic conversion', if one operand is a "float" and one is an "int", I would expect the operation to be performed with 'float' precision.

I think your problem lies within that expectation. The arithmetic conversion rules specify which operand is converted and how it is converted. They do not specify how the resulting calculation is performed.

If you can find a copy of the C99 standard, look for section 5.2.4.2.2 "Charactersitics of floating types <float.h>" which in part says this...

The values of operations with floating operands and values subject to the usual arithmetic

conversions and of floating constants are evaluated to a format whose range and precision

may be greater than required by the type. The use of evaluation formats is characterized

by the implementation-defined value of FLT_EVAL_METHOD:

-1 indeterminable;

0 evaluate all operations and constants just to the range and precision of

the type;

1 evaluate operations and constants of type float and double to the

range and precision of the double type, evaluate long double

operations and constants to the range and precision of the long double

type;

2 evaluate all operations and constants to the range and precision of the

long double type.

All other negative values for FLT_EVAL_METHOD characterize implementation-defined behavior.

I think you are assuming that all compilers use FLT_EVAL_METHOD = 0?

Rather annoyingly VC++ doesn't seem to implement the FLT_EVAL_METHOD macro so we can't really tell how the expression is being evaluated. However the documentation for the /fp compiler switch says this about /fp: precise

Expression evaluation will follow the C99 FLT_EVAL_METHOD=2, with one exception. When programming for x86 processors, because the FPU is set to 53-bit precision, this will be considered long double precision.

Since you are getting different behaviour from GCC I assume they implement FLT_EVAL_METHOD 0.

More discussion of Microsoft Visual C++ Floating-Point Optimisation can be found here:

http://msdn2.microsoft.com/en-us/library/aa289157(VS.71).aspx

FrankBoyne at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...
# 14
Frank Boyne wrote:

I think your problem lies within that expectation. The arithmetic conversion rules specify which operand is converted and how it is converted. They do not specify how the resulting calculation is performed.

.....

I think you are assuming that all compilers use FLT_EVAL_METHOD = 0?

....

More discussion of Microsoft Visual C++ Floating-Point Optimisation can be found here:

http://msdn2.microsoft.com/en-us/library/aa289157(VS.71).aspx

Thanks, yes, those were the flaws in my reasoning. I didn't know about FLT_EVAL_METHOD. You're correct, gnu/gcc is using method 0.

I investigated different values of the /fp setting, and none of them yield a behavior equivalent to FLT_EVAL_METHOD==0. Since my goal is to get the same results from these two compilers, the only option I see is to stop using floats. (Or doubles, but it'll be easier to convince people to use the more precise form).

Thanks again to everyone.

eric.slosser at 2007-10-3 > top of Msdn Tech,Visual C++,Visual C++ Language...