C++ handling of excess precision

debugcn Published at Dev

MvG

I'm currently looking at code which does multi-precision floating-point arithmetic. To work correctly, that code requires values to be reduced to their final precision at well-defined points. So even if an intermediate result was computed to an 80 bit extended precision floating point register, at some point it has to be rounded to 64 bit double for subsequent operations.

The code uses a macro INEXACT to describe this requirement, but doesn't have a perfect definition. The gcc manual mentions -fexcess-precision=standard as a way to force well-defined precision for cast and assignment operations. However, it also writes:

‘-fexcess-precision=standard’ is not implemented for languages other than C

Now I'm thinking about porting those ideas to C++ (comments welcome if anyone knows an existing implementation). So it seems I can't use that switch for C++. But what is the g++ default behavior in absence of any switch? Are there more C++-like ways to control the handling of excess precision?

I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know. But I'm still curious.

Pascal Cuoq

Are there more C++-like ways to control the handling of excess precision?

The C99 standard defines FLT_EVAL_METHOD, a compiler-set macro that defines how excess precision should happen in a C program (many C compilers still behave in a way that does not exactly conform to the most reasonable interpretation of the value of FP_EVAL_METHOD that they define: older GCC versions generating 387 code, Clang when generating 387 code, …). Subtle points in relation with the effects of FLT_EVAL_METHOD were clarified in the C11 standard.

Since the 2011 standard, C++ defers to C99 for the definition of FLT_EVAL_METHOD (header cfloat).

So GCC should simply allow -fexcess-precision=standard for C++, and hopefully it eventually will. The same semantics as that of C are already in the C++ standard, they only need to be implemented in C++ compilers.

I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know.

That is the usual solution.

Be aware that C99 also defines FP_CONTRACT in math.h that you may want to look at: it relates to the same problem of some expressions being computed at a higher precision, striking from a completely different side (the modern fused-multiply-add instruction instead of the old 387 instruction set). This is a pragma to decide whether the compiler is allowed to replace source-level additions and multiplications with FMA instructions (this has the effect that the multiplication is virtually computed at infinite precision, because this is how this instruction works, instead of being rounded to the precision of the type as it would be with separate multiplication and addition instructions). This pragma has apparently not been incorporated in the C++ standard (as far as I can see).

The default value for this option is implementation-defined and some people argue for the default to be to allow FMA instructions to be generated (for C compilers that otherwise define FLT_EVAL_METHOD as 0). You should, in C, future-proof your code with:

#include <math.h>
#pragma STDC FP_CONTRACT off

And the equivalent incantation in C++ if your compiler documents one.

what is the g++ default behavior in absence of any switch?

I am afraid that the answer to this question is that GCC's behavior, say, when generating 387 code, is nonsensical. See the description of the situation that motivated Joseph Myers to fix the situation for C. If g++ does not implement -fexcess-precision=standard, it probably means that 80-bit computations are randomly rounded to the precision of the type when the compiler happened to have to spill some floating-point registers to memory, leading the program below to print "foo" in some circumstances outside the programmer's control:

if (x == 0.0) return;
... // code that does not modify x
if (x == 0.0) printf("foo\n");

… because the code in the ellipsis caused x, that was held in an 80-bit floating-point register, to be spilt to a 64-bit slot on the stack.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-06-24

Comments

0 comments

From Dev

Related Related

Article

C++ handling of excess precision

C++ handling of excess precision

excess elements in scalar initializer c

C: Stdin - Removing Excess Data

excess @ in printing array elements in c

Eliminating excess synchronization and improving error handling when updating a shared variable

Eliminating excess synchronization and improving error handling when updating a shared variable

Handling DateTime precision > 6 in Elixir

Deleting excess variables from stack - C++

Is Python incorrectly handling this "arbitrary precision integer"?

Linux Shell in C - warning: excess elements in struct initializer

Why does my program give me the excess output? C program

C# float precision

timestamp in c with milliseconds precision

Issues with C++ Precision

Fixed precision in C

Computer Precision in C language

double numbers precision in C

Handling a quadruple precision floating point (128-bit) number in java

Handling loss of precision in subtracting two doubles that are close to each other

Handling a quadruple precision floating point (128-bit) number in java

Handling large numbers and precision of Ricean Fading PDF python

Precision of acos function in c++

Precision Related Error in C++

Precision of acos function in c++

read single-precision use double precision variable in C

Guaranteed precision of sqrt function in C/C++

Guaranteed precision of sqrt function in C/C++

signal handling in linux for c

Signal handling c programming

C++ Event Handling