Excel, data loss, IEEE754, and precision
It is often said that the world runs on Excel. Excel has helped democratise automated data processing, enabling individuals to achieve things that would once have required a team of developers. An unfortunate consequence is that people sometimes overstretch Excel—it often becomes an element in processes on a scale it wasn't designed to cope with.
For example, Public Health England infamously underreported COVID cases for a while because a data ingestion process was using an old Excel file format that couldn't cope once data volumes increased. Sometimes you can hit Excel's limits in what might seem like much smaller scenarios. I'm going to explain one such case, and then, as is my wont, go into gratuitous detail on the nature of the limitation that causes this.
We had a customer who thought that a 3rd party data source was failing to use unique identifiers—it looked like data from a few different sources was being reported under a single id. In fact, there was no such problem in the source data. Each distinct source had a unique identifier. The loss of uniqueness occurred during the import process. Specifically, this process included a step in which the data was loaded into Excel, and Excel was changing these identifiers!
You can watch Excel doing this yourself. Create a new spreadsheet and type in this number:
Excel will show it in exponential format—
5.41E+16—but you can change the cell format to "Number" and it will show it more straightforwardly as
Now try changing the number to
54100000054328401 (i.e., 1 higher than it was before). Excel will appear to ignore you. The number stays at
54100000054328400.00. The same will happen if you make any change to the final two digits—no matter what you set them to, they will remain at 0.
Unfortunately, the system I was describing had unique identifiers that ran into this problem. There were various identifiers such as these:
54100000054328412 54100000054328416 54100000054328474 54100000054328480 ...
Excel replaces every single one of these with
54100000054328400. So although these IDs were unique when they came from the data source, Excel mapped multiple different IDs onto the same value.
This behaviour is well documented—there's a whole page explaining that Excel will only store 15 decimal digits of precision. The identifiers I'm showing here are 17 digits long, so they run into this. It's frustrating that Excel doesn't tell you it is replacing several of the digits in your input with 0. There are some scenarios in which this silent loss of precision is reasonable—for example if the values are readings from sensors, the chances are that it's all noise by the time you get to the 15th digit, in which case truncation of precision wouldn't really change much. But given how often Excel is used in accounting, it's surprising how willing it is to change the effective value of the numbers you give it.
Why is Excel doing this?
The page I linked to above has a section purporting to explain why Excel truncates to 15 digits. However, it's an unsatisfactory explanation:
Excel follows the IEEE 754 specification for how to store and calculate floating-point numbers. Excel therefore stores only 15 significant digits in a number and changes digits after the fifteenth place to zeros
The fact is there's nothing in the IEEE 754 specification that requires Excel to replace everything after the 15th significant digit with a zero. It is true that the IEEE 754 specification describes formats with precision limits, but with these particular examples, although the IEEE 754 format in question lacks the precision to represent all 17 digits, it can in fact represent 16 of them. And yet Excel is cutting it down to 15. It would be more accurate to say this:
"The IEEE 754 specification cannot offer infinitely precise calculations. Excel uses the
binary64 format, for which the limits of precision cannot be described exactly in decimal terms: in most cases it can accurately represent 16-digit decimal values, but in some cases it can only precisely represent 15 decimal digits. To provide consistent decimal behaviour, Excel discards everything after the 15th decimal digit even in cases where the IEEE 754 specification would have been able to offer higher precision."
More accurate, but considerably less clear. It looks like Microsoft chose to discard some precision to be able to offer a simpler explanation. It's easier to understand "You have 15 decimal digits of precision" than it is to understand how IEEE 754 floating point actually works and what that means.
How do I fix this?
Don't let Excel interpret large numeric identifiers as numbers. It doesn't need to—it's not like you'll be applying any numeric formulae to these values. They're opaque, unique identifiers that happen to look like numbers. If you tell Excel that they're not really numbers, and that it should treat them as strings, it will no longer truncate them.
Problem solved. Now I can get back to the bit I actually wanted to talk about.
What's going on?
Why can't we have a simple description of the decimal precision available with floating-point numbers? How is it that in some cases there will be 16 digits of precision available, but in others only 15? If you look at the Wikipedia entry for IEEE 754 it states that the 64-bit binary format offers 15.95 decimal digits; what does that even mean? The root of these oddities is that IEEE 754 defines binary formats (and yes, the 2008 edition of IEEE 754 added decimal formats, but Excel doesn't appear to use those) and there isn't a straightforward relationship between binary digits and decimal digits.
For example, if you have a 3-digit binary number, what is the equivalent number of decimal digits? A single decimal digit is sufficient to capture every value that can fit in 3 binary digits (or 'bits', as they're usually called). But the converse is not true: 3 bits is insufficient to capture a single decimal digit. There are only 8 distinct values for a 3-bit number, but there are 10 distinct values that one decimal digit might have. What if we tried more bits? 4 bits is enough to hold one decimal digit, but now the converse is not true: our one decimal digit is too small to hold the information that fits in 4 bits. There are 16 possible values for 4-bit numbers, but only 10 for a single decimal digit, so we now need two decimal digits. But that allows 100 possibilities, which is way more than a 4-bit number can do. You'd need 7 binary digits to have enough space to hold all those different values. But now we're back to having excess capacity in our binary: there are 128 different 7-bit values, which means that while you need at least 7 bits to represent any 2-digit decimal, you need a 3-digit decimal to represent any 7-bit number.
This goes on forever. No matter what length of binary number you pick, there will never be a corresponding length of decimal number which has exactly the same number of distinct values. (The reason is that the number of possible distinct values for a binary number is 2 raised to the power of the number of digits, whereas for decimal, it's 10 raised to the power of the number of digits. So for decimals, the number of distinct values will always contain at least one factor of 5; raising 2 to any integer power gives you a number whose prime factors are all 2, so it will never be a multiple of 5.)
But what does this mean in practice? It's all very well to argue about prime factors and to gloss over the differences between binary and decimal with strange ideas such as 15.95 decimal digits, but what does it actually look like when we hit the limits of precision?
Let's take a number we saw earlier:
54100000054328400. This is a number that can be represented exactly in the IEEE 754 64-bit format, but before we get to that, here's what it looks like as plain old binary:
That's just a normal binary integer. That's not how it looks in IEEE 754's
binary64 format, because IEEE 754 represents numbers as three distinct pieces of information. A single bit represents the sign. Then there's the exponent and the significand. These work a bit like scientific notation in decimal: we might write 3.0×10⁸ as a compact way to denote a fairly large number. (The exponent here is 8, and the significand is 3.0.) Binary floating point works in a very similar way, although with some quirks. If you can understand the decimal expression 3.0×10⁸, then a similar binary expression of the form to 1001×10¹¹⁰ isn't a great leap. To be clear, all the numbers in my 1001×10¹¹⁰ example are in binary, so we could rewrite it as the following decimal-based expression: 9×2⁶, which is 576 in decimal. If we write that back out as binary it shows a direct relationship with the 1001×10¹¹⁰ form:
How does that relate to 1001×10¹¹⁰? Well, you can see the full binary value starts with the significand, 1001 (the part on the left of the ×). It's followed by 6 zeros which corresponds to the exponent of 110. (110 is the binary for six. So the binary expression 10¹¹⁰, written as 2⁶ in decimal, has a binary value of 1 followed by 6 zeros.) Of course, with decimal scientific notation, we conventionally have a single digit before a decimal point. We can follow suit with a single binary digit before a binary point, rewriting the example as: 1.001×10¹⁰⁰¹. (Remember, that's a binary point. So in decimal this becomes 1⅛×2⁹. And if you do that calculation you'll find it's 576, so that all checks out.)
So what would our earlier number above,
54100000054328400, look like in this binary scientific notation? This:
1.100000000110011101010011101010110010110001111000101 × 10¹¹⁰¹¹¹
And just to check the working, let's turn that back into decimal:
1.50157664231344023519909569586162 × 2⁵⁵
and if you do that calculation, you get
54100000054328400, which is the number we first thought of. Tada!
(By the way, some people find the notion of a binary point mystifying. But it works much like a decimal. With decimal, each digit position after the point has units one tenth smaller than its predecessor, so the columns are in 1/10ths, 1/100ths, 1/1000ths, and so on. So 0.352 means 3/10 + 5/100 + 2/1000. Likewise in binary each successive column has units one half smaller than its predecessor, so we have 1/2, 1/4, 1/8, and so on. So The binary value 1.001 is 1 + 1/8; the binary value 0.1 is 1/2; binary 0.10101 is 1/2 + 1/8 + 1/32.)
Having shown all that, it's not quite how IEEE 754
binary64 works. As I said, there are some quirks. One is that in most cases the leading
1 before the significand (the part on the left of the multiplication) is presumed to be present, and doesn't actually show up in the binary representation, which frees up one more bit, improving the precision. (Exceptions: the special values of 0, the infinities, the NaNs, and extremely small values.) Also, the exponent is offset by decimal 1023 because reasons, meaning that our exponent of positive 55 (binary 110111) is stored as 1078 (binary 10000110110), and so we end up with these values:
- Sign bit: 0 (positive)
- Exponent 1078 (binary 10000110110)
- Fractional part of significand: 0.50157664231344023519909569586162 (binary 0.100000000110011101010011101010110010110001111000101)
And so the full IEEE 754
binary64 floating point representation of our number is those three components concatenated, and with any spare digits in the significand padded out with 0. (In this case, our significand fraction is 51 bits, but the
binary64 format has 52 bits for that, meaning there's 1 spare digit, so we just append a 0 to our significand fraction). Here's how that looks:
You might find that slightly easier to read if I separate out the sign, exponent, and significand fraction (but in your computer's memory it's just stored as that single 64-bit value):
|S| exponent | fractional part of significand | |0|10000110110|1000000001100111010100111010101100101100011110001010|
Again, I'm just going to check my working. Putting Windows Calculator into Programmer mode, I can convert that binary to hex: 0x436806753AB2C78A. .NET's
double type uses IEEE 754
binary64, so I can get PowerShell to reinterpret that hex value as a double to check its value:
$data = [System.BitConverter]::GetBytes([long]0x436806753AB2C78A) [System.BitConverter]::ToDouble($data).ToString("n")
and it displays
54,100,000,054,328,400.00. So my attempt to build an IEEE 754
binary64 number from scratch has worked!
What does this tell us about precision? We can answer that by asking what are the nearest values above and below this one that we can represent. In this particular example, we can do that by bumping the significand up or down by the smallest amount possible. If the significand fraction had been all 1s the addition would overflow and it would be more complex (and likewise, with an all-0 significand fraction subtraction would underflow) but that doesn't happen with this particular number, so it's simple:
|S| exponent | fractional part of significand | Lower: |0|10000110110|1000000001100111010100111010101100101100011110001001| (0x436806753AB2C789) Higher: |0|10000110110|1000000001100111010100111010101100101100011110001011| (0x436806753AB2C78B)
The effective value of those is, in decimal:
If you just look at the last 3 digits, you can see that we've got 392 and 408, and our original value ended in 400.
This lets us see what it really means to say that for these examples, we have 16 digits of decimal precision. At this point on the number line, the values that can be represented in IEEE 754
binary64 occur only on every 8th integer, and that's actually a slightly higher density than we need to represent 16 digits of precision. If you pick a 17 digit number ending in 0 (so 16 significant digits) from this region of the number line, there will always be at least one IEEE 754
binary64 value which, when rounded to 16 decimal digits, will evaluate to that number. And the reason I'm saying we have a slightly higher density than necessary is that for some cases, we'll have a choice of two
binary64 values for a particular target number. For example, suppose we wanted to represent, with 16 digits of precision, the number 54100000054328420. Although that doesn't have an exact representation in IEEE 754
binary64, but there are two nearby numbers that do: 54100000054328416 and 54100000054328424, and either of those would round to the number we're looking for when rounded to 16 significant places.
I sneaked in a vague requirement there: I said that we get an effective 16 digits of precision if the numbers are "from this region of the number line." But it doesn't work everywhere, because the density of IEEE 754
binary64 values varies. There are some interesting threshold points in the IEEE 754
binary64 representation. What's the lowest number we can represent that's higher than this?
|S| exponent | fractional part of significand | |0|10000110110|1111111111111111111111111111111111111111111111111111| (0x436FFFFFFFFFFFFF)
Finding the next highest value will cause overflow in the fractional part, meaning that we need to change the exponent. Here's the sequence of adjacent IEEE 754
binary64 values around that threshold (I'm showing 3 before, then the number itself, then the 3 after):
|S| exponent | fractional part of significand | |0|10000110110|1111111111111111111111111111111111111111111111111100| (0x436FFFFFFFFFFFFC) |0|10000110110|1111111111111111111111111111111111111111111111111101| (0x436FFFFFFFFFFFFD) |0|10000110110|1111111111111111111111111111111111111111111111111110| (0x436FFFFFFFFFFFFE) |0|10000110110|1111111111111111111111111111111111111111111111111111| (0x436FFFFFFFFFFFFF) |0|10000110111|0000000000000000000000000000000000000000000000000000| (0x4370000000000000) |0|10000110111|0000000000000000000000000000000000000000000000000001| (0x4370000000000001) |0|10000110111|0000000000000000000000000000000000000000000000000010| (0x4370000000000002)
You can see very clearly the point where our fractional part changes from being full of 1s to being full of 0s. Easier to miss is that the exponent goes up by one at the same point. This table shows the corresponding decimal values, and also the difference between each value and its predecessor:
|IEEE 754 binary64 (hex)||Effective Value||Difference|
Notice that while for the first few entries, the representable values come at an interval of 8, once we go beyond the point where the the exponent had to go up, the interval between the numbers doubles. This makes sense when you think about what the exponent means: it determines the power of two by which we multiply the significand. So incrementing the exponent doubles the place value of any particular digit in the fractional part. The smallest change we can effect is to modify the least significant digit of the fractional part, and the magnitude of that change is determined by the exponent. Therefore, any time an increase in value requires an increase in the exponent (because our fractional part was full of 1s) the spacing between numbers that can be represented in IEEE 754
You might have noticed in the table above that I highlighted the the fact that the value where such a transition occurs happens to be an exact power of 2. That's no coincidence: these changes in the spacing of numbers that IEEE 754
binary64 (or its other binary representations for that matter) can represent always occur on powers of two, because they will correspond to the fractional part of the significand being 0. Since the significand has an implied 1 before the binary point, a value of 0 here represents a significand of 1.0. In other words the value of these transition points will always be of the form 1×2ⁿ.
So getting back to the question of how many decimal places of precision are available, we can look at where the transition points—the points where the density of numbers expressible as IEEE 754
binary64 values halves—occur for numbers that are 17 digits long:
|Number where density drops||as power of 2||as hex integer||Spacing of binary64 values above this point|
The eagle-eyed amongst you will have noticed that the first number in this table has only 16 digits and the last has 18. But I've put those in there so that we've covered the entire range from 10000000000000000 to 99999999999999999.
So for 17-digit numbers in the range from 10000000000000000 to 18014398509481984 inclusive, IEEE 754
binary64 offers an exact representation only for even integers (and cannot represent any non-integer values in this range). So we definitely don't have 17 digits of precision here, and it only gets worse as the numbers go above 18014398509481984, because the available representations start to spread out further. Up to 36028797018963968 only multiples of 4 are available, then above that up to 72057594037927936, we have only multiples of 8, from there all the way up to 99999999999999999, only multiples of 16.
That threshold where the gaps between representable numbers grows to 16 is the point beyond which we are no longer able to represent 16 decimal digits of precision. Suppose you wanted to represent 72057594037927960, when rounding to 16 digits. The two nearest values available with IEEE 754
binary64 are 72057594037927952 and 72057594037927968, but those round to 72057594037927950 and 72057594037927970 respectively. There is no available value that would round to
So in general, we can only count on having 15 digits of decimal precision. In a way this understates the actual precision—it averages out at 15.95 decimal digits, but if the worst case in decimal is what you care about, it's 15 digits.
And then Excel ensures that you always get the worst case even when the underlying IEEE 754 format doesn't actually require it.