Learning to Program - A Beginners Guide - Part Seven - Representing Numbers
In this section we're going to look at the Data Memory in more detail.
So far, we've assumed we know what "store 1 in the memory location at offset 4" means. If we've told it to store 1 at offset 4, when we read back the value at offset 4, we get 1 as expected.
We know, then, that a memory location can hold a whole number greater than or equal to zero (the "positive integers"). Is there a maximum number that we can hold in that location?
Let's write a short program to find out. This uses a loop like the one we saw in the previous exercise to increment r0 all the way up to 255. When r0 reaches 255, we end the loop and proceed to add another 1 to the result, and write it back to the memory at offset 0.
(Don't forget that you can get the tool we use to run these "assembly language" programs from here. The setup instructions for this F# environment are here.)
load r0 0 ; set the result to 0 add r0 1 ; add 1 to the result (<--- jump back here) compare r0 255 ; is the result 255? jumpne -2 ; if not, jump back up 2 add r0 1 ; add another 1 to the result (255 + 1=??) write 0 r0 ; write the result to the memory at offset 0 exit
If you run this, you'll see R0 increase one-by-one, until it reaches 255. Then the final few lines of the output are as follows:
compare r0 255 R0=255 R1=0 R2=0, WR0=0, WR1=0, WR2=0, PC=3, FL=0 0 0 0 0 0 0 0 0 0 0 jumpne -2 R0=255 R1=0 R2=0, WR0=0, WR1=0, WR2=0, PC=4, FL=0 0 0 0 0 0 0 0 0 0 0 add r0 1 R0=0 R1=0 R2=0, WR0=0, WR1=0, WR2=0, PC=5, FL=0 0 0 0 0 0 0 0 0 0 0 write 0 r0 R0=0 R1=0 R2=0, WR0=0, WR1=0, WR2=0, PC=6, FL=0 0 0 0 0 0 0 0 0 0 0 exit R0=0 R1=0 R2=0, WR0=0, WR1=0, WR2=0, PC=65535, FL=0 0 0 0 0 0 0 0 0 0 0
You can see that when we reached the value 255 in R0, the comparison set the FL register to 0, so the JUMPNE instruction did not decrement the program counter, and we went on to execute the ADD instruction.
When we added 1 again, R0 seemed to reset back to 0!
What's going on?
Well, it's all to do with the fact that each memory location in our computer is a fixed size called a byte (remember that the x86 MOV to memory instruction referred to a BYTE in the previous section). What's a byte? And what's the maximum number we can represent with a byte? To understand, we need to look at binary arithmetic.
Remember in the "How does a computer work" section we talked about transistors being switches that can be On and Off. It turns out that we can represent integers just using On and Off values.
Representing numbers in different bases
We're all familiar with decimal arithmetic - counting 0…1…2…3…4…5…6…7…8…9… When we run out of the symbols we use for numbers, we then add a second digit 10…11…12…13…14…15…16…17…18…19, then 20…21…22…23…24…25…26…27…28…29 and so on, until we run out of symbols again. Then we add a third digit 100…101…102…103…104… and so on. We start out by counting up digit1 (0-9). Then, when we add a second digit, the number is (ten x digit2) + digit1. When we add a third digit, the number is (ten x ten x digit3) + (ten x digit2) + digit1. And so on.
Here are a couple of examples (like you need examples of the number system you use every day!). But notice specifically that each digit represents an additional power of 10. So we call this "base 10" (or decimal). (Remember that anything to the power 0 is 1.)
|Base 10: 234||2||3||4|
|Base 10: 84||0||8||4|
This is so familiar to us, we've usually forgotten that we were once taught how to interpret numbers like this. (Often, this was done with cubes of wood or plastic in blocks of 1, 10, 100 etc. when we were very tiny.)
Imagine, instead, that we only had symbols up to 7.
We'd count 1…2…3…4…5…6…7… Then, when we ran out of symbols, we would have to add a digit and start counting again 11…12…13…14…15…16…17… and then 20…21…22…23…24…25…26…27… and so on until 100…101…102…103…104…105…106…107…
As with decimal, we start out by counting up digit1 (0-7). Then, when we add a second digit, the number is (eight x digit2) + digit1. When we add a third digit, the number is (eight x eight x digit3) + (eight x digit2) + digit1. And so on.
We call this counting system "base 8" (or octal). As you might expect, each digit represents an increasing power of 8.
Here are some examples of values in base 8
|Base 10: 256 Base 8: 400||4||0||0|
|Base 10: 84 Base 8: 124||1||2||4|
What if we had more symbols for numbers than the 10 (0-9) that we're familiar with? What if we had sixteen, say? We call this "base 16" or hexadecimal (hex for short). Rather than resort to Wingdings or Smiley Faces for the extra symbols, it is conventional to use the letters A-F as the 'numbers' greater than 9.
|Base 10: 256 Base 16: 100||1||0||0|
|Base 10: 84 Base 16: 54||0||5||4|
|Base 10: 255 Base 16: FF||0||F||F|
|Base 10: 12 Base 16: C||0||0||C|
Now, imagine we're really symbol-poor. We've only got 0 and 1 (the "off" and "on" of the transistor-switches in our processor). It works just the same. First, we count units: 0…1 We've run out of symbols, so we add a digit and count on 10…11. Out of symbols again, so add another digit: 100…101…110…111. We run out of symbols really quickly, so we need another digit: 1000…1001…1010…1011…1100…1101…1110…1111 and so on.
Here are some examples in "base 2" (or binary).
|(As power of 2)||2⁷||2⁶||2⁵||2⁴||2³||2²||2¹||2⁰|
|Base 10: 255||1||1||1||1||1||1||1||1|
|Base 10: 84||0||1||0||1||0||1||0||0|
Look at the first example in that table. The decimal (base 10) number 255 is represented by a full set of 1s in each of 8 binary digits.
(Saying "binary digit" is a bit long winded, so we've shortened it in the jargon to the word bit.)
We call a number made up of 8 bits a byte.
We number the bits from the right-most (0) to the left-most (7). Because the right-most represents the smallest power of 2, we call it the least significant bit (LSB). The left-most bit represents the largest power of 2, so we call it the most significant bit (MSB).
|Power of 2||2⁷||2⁶||2⁵||2⁴||2³||2²||2¹||2⁰|
As we've seen, the maximum (decimal) number we can store in a byte is 255 (a '1' in all 8 bits).
|Power of 2||2⁷||2⁶||2⁵||2⁴||2³||2²||2¹||2⁰|
|Base 10: 255||1||1||1||1||1||1||1||1|
The fact that something screwy happens when we exceed this 8-bit maximum value hints that our register R0 is probably 1 byte in size.
But why does something screwy happen?
To understand that, we need to learn about binary addition.
We're so used to doing decimal addition, that we probably don't even think about it. But let's remind ourselves how we add up regular decimal numbers by hand.
We write the numbers to be added in a stack, with each equivalent digit lined up in columns. Then, starting from the right hand column (which, remember, we call the least significant) we add up the total of that column and record it at the bottom. If the total is greater than 9 (the biggest number we can represent in a single column), we "carry" that number into the next column, and include it in the addition for that column.
If a particular number has no value in the column, we treat it as a 0
Here are a couple of examples
054 + 163 ----- 217 carry 1 0099 +0999 ---- 1098 carry 111
We can do exactly the same thing for binary addition. Here's an example of two bytes being added together:
00000101 +00001011 -------- 00010000 carry 1111
Now, let's see what happens when we add 1 to a byte containing the decimal value 255:
11111111 +00000001 -------- 100000000 carry 11111111
But, hang on a minute - that's 9 bits! And a byte can contain only 8 bits. We can't just make up a new '9th' bit.
What happens is that we're left with the least significant 8 bits (i.e. the right-most 8 bits) - and they are binary 00000000 - or 0 in decimal!
And that's why our program wrapped round to 0 when we added 1 to 255.
It's all very well knowing this limitation, but how can we represent larger numbers?
One way is to increase the number of bits in your storage.
It turns out that our computer has some larger registers called
WR2 that can do just that. Each of these registers can store values up to 16 bits (or two bytes) in size. We sometimes call a 16 bit value a short (and on computers with a 16-bit heritage, a word).
Spot test: What's the largest number you can represent with 16 bits?
Hint: 2⁸ = 256
Answer: If the largest decimal you can store in 8 bits is 255 (= 2⁸ - 1), then the largest you can store in 16 bits is (2¹⁶ - 1) = 65535
With n bits, you can store a positive decimal integer up to (2^(n) - 1).
Let's load up a number larger than 255 into one of these 16 bit registers. We'll create a new program to do that.
LOAD WR0 16384 EXIT
If you run that, you should see the following output. 16384 is loaded into the WR0 register.
load wr0 16384 R0=0 R1=0 R2=0, WR0=16384, WR1=0, WR2=0, PC=1, FL=0 0 0 0 0 0 0 0 0 0 0 exit R0=0 R1=0 R2=0, WR0=16384, WR1=0, WR2=0, PC=65535, FL=0 0 0 0 0 0 0 0 0 0 0
But what happens when we write the contents of that register to memory?
LOAD WR0 16384 WRITE 0 WR0 EXIT
This time the output looks like this:
load wr0 16384 R0=0 R1=0 R2=0, WR0=16384, WR1=0, WR2=0, PC=1, FL=0 0 0 0 0 0 0 0 0 0 0 write 0 wr0 R0=0 R1=0 R2=0, WR0=16384, WR1=0, WR2=0, PC=2, FL=0 0 64 0 0 0 0 0 0 0 0 exit R0=0 R1=0 R2=0, WR0=16384, WR1=0, WR2=0, PC=65535, FL=0 0 64 0 0 0 0 0 0 0 0
It didn't write 16384 to the memory location at offset 0, it wrote 64 to the memory location at offset 1?!
Try it with a different value: (16384 + 1) = 16385
LOAD WR0 16385 WRITE 0 WR0 EXIT
This is the output:
load wr0 16385 R0=0 R1=0 R2=0, WR0=16385, WR1=0, WR2=0, PC=1, FL=0 0 0 0 0 0 0 0 0 0 0 write 0 wr0 R0=0 R1=0 R2=0, WR0=16385, WR1=0, WR2=0, PC=2, FL=0 1 64 0 0 0 0 0 0 0 0 exit R0=0 R1=0 R2=0, WR0=16385, WR1=0, WR2=0, PC=65535, FL=0 1 64 0 0 0 0 0 0 0 0
This time, it has stored 1 in the memory at offset 0, and 64 at offset 1.
What will the result be if we if store 16386? Have a guess before you look at the answer.
LOAD WR0 16386 WRITE 0 WR0 EXIT
Here's the answer:
R0=0 R1=0 R2=0, WR0=16385, WR1=0, WR2=0, PC=65535, FL=0 2 64 0 0 0 0 0 0 0 0
Did you guess right? OK - how about 16640? Again, have a guess before you look at the answer. (There's a hint just below if you need it.)
Hint: 16640 = 16384 + 256
LOAD WR0 16640 WRITE 0 WR0 EXIT
And here's the answer again:
R0=0 R1=0 R2=0, WR0=16640, WR1=0, WR2=0, PC=65535, FL=0 0 65 0 0 0 0 0 0 0 0
If you haven't spotted the pattern yet, this should give you an even bigger clue - what happens if we store the number 256?
LOAD WR0 256 WRITE 0 WR0 EXIT R0=0 R1=0 R2=0, WR0=16640, WR1=0, WR2=0, PC=65535, FL=0 0 1 0 0 0 0 0 0 0 0
It is rather like these two memory locations are storing the number in base 256!
We've seen that we can hold the numbers 0-255 in 1 byte, so when we need to store a larger number, we add a second byte, and count in the usual manner (256 * second byte) + (first byte). You could imagine storing a 24 bit number by adding a third byte, and a 32 bit number by adding a fourth byte and so on. (We sometimes call a 32-bit number a dword, from 'double word' and a 64-bit number a qword, from 'quadruple word'.)
|(As power of 256)||256¹||256⁰|
|Base 10: 256||1||0|
|Base 10: 16384||64||0|
As with our Most Significant Bit and Least Significant Bit, we call the byte that stores the largest power of 256 the Most Significant Byte, and the other the Least Significant Byte - or sometimes High Byte and Low Byte.
Ordering the bytes
When we were ordering the bits in our byte, we numbered them from right-to-left, low-to-high. You'll notice that when we store the 16bit number in our 2 8-bit memory locations, we're storing the high byte in the memory location at offset 1, and the low byte in the memory location at offset 0.
|High (most-significant) byte||Low (least-significant) byte|
|Offset in memory||1||0|
|(As power of 256)||256¹||256⁰|
We could equally well have chosen to do that the other way around!
Computers that choose this particular byte ordering are called little endian memory architectures. Our computer does it this way, as does Intel's x86 series. The Z80 and PowerPC are both big endian - they store the high byte in the lower memory offset, and the low byte in the higher memory offset. Some architectures (like the ARM) let the programmer choose which to use.
(Another quick jargon update: when we send data over a network, it is often encoded in a big-endian way, so you sometimes see big-endian ordering called network ordering.)
This is an example of encoding a value into computer memory, in this case, a positive integer. Notice that even with something as simple as a positive integer, we have to think about how it is represented in the computer! We'll see this need for encoding again and again for decimals, text, images and all sorts of data. We need to encode whenever we have to represent some information in a computer. Most of the time, higher-level languages or other people's code will take care of the details for us, but if we forget about it entirely, we open ourselves up to all sorts of seemingly-mysterious behaviour and bugs.
Displaying multi-byte values
Sometimes, displaying numbers in decimal is not the most convenient way to read them - particularly when we are looking at multi-byte values.
Here's the decimal number 65534 represented as two decimal bytes
|Low byte||High byte|
Now, look what happens if we represent it hexadecimal (base 16) instead of decimal.
Here's 65534 in hex:
And here it is represented as two hexadecimal bytes in memory
|Low byte||High byte|
Notice that they are represented using exactly the same digits (barring the byte-ordering). Whereas the decimal values "255 254" look nothing like "65534".
It can be very convenient to get used to reading numbers in base 16 (hex), because each byte is always just a pair of hex digits, and their representation in memory is very similar to the way you would write them on the page.
We've got a handy switch in our program that lets us start to represent the output in hex, instead of decimal.
Just below your program, you'll see the following lines:
(* THIS IS THE START OF THE CODE THAT 'RUNS' THE COMPUTER *) let outputInHex = false
Change this second line to read:
let outputInHex = true
Now, let's update our program to load 65534 in decimal into memory:
LOAD WR0 65534 WRITE 0 WR0 EXIT
And run the program again. The output is now displayed in hex.
load wr0 65534 R0=00 R1=00 R2=00, WR0=FFFE, WR1=0000, WR2=0000, PC=0001, FL=00 00 00 00 00 00 00 00 00 00 00 write 0 wr0 R0=00 R1=00 R2=00, WR0=FFFE, WR1=0000, WR2=0000, PC=0002, FL=00 FE FF 00 00 00 00 00 00 00 00 exit R0=00 R1=00 R2=00, WR0=FFFE, WR1=0000, WR2=0000, PC=FFFF, FL=00 FE FF 00 00 00 00 00 00 00 00
What if we want to write the values in our program in hex? How do we distinguish between the decimal value 99, and the hex value 99 (=153 decimal)? Different languages use different syntax, but our model computer supports two of the most common - both of which use a prefix. You either add the prefix 0x, or use the prefix #.
(You might have seen that # prefix used when specifying colours in HTML mark-up)
LOAD WR0 #FFFE WRITE 0 WR0 EXIT
LOAD WR0 0xFFFE WRITE 0 WR0 EXIT
Try updating the program to use one of these hex representations and run it again.
load wr0 0xFFFE R0=00 R1=00 R2=00, WR0=FFFE, WR1=0000, WR2=0000, PC=0001, FL=00 00 00 00 00 00 00 00 00 00 00 write 0 wr0 R0=00 R1=00 R2=00, WR0=FFFE, WR1=0000, WR2=0000, PC=0002, FL=00 FE FF 00 00 00 00 00 00 00 00 exit R0=00 R1=00 R2=00, WR0=FFFE, WR1=0000, WR2=0000, PC=FFFF, FL=00 FE FF 00 00 00 00 00 00 00 00
If you want to, you can also use a binary notation to specify a number. For that we use the prefix 0b
(This is a less common notation, and is not supported across all languages, but it is quite useful.)
LOAD WR0 0b1111111111111110 WRITE 0 WR0 EXIT
Here's the result of running that version of the program.
load wr0 0b1111111111111110 R0=00 R1=00 R2=00, WR0=FFFE, WR1=0000, WR2=0000, PC=0001, FL=00 00 00 00 00 00 00 00 00 00 00 write 0 wr0 R0=00 R1=00 R2=00, WR0=FFFE, WR1=0000, WR2=0000, PC=0002, FL=00 FE FF 00 00 00 00 00 00 00 00 exit R0=00 R1=00 R2=00, WR0=FFFE, WR1=0000, WR2=0000, PC=FFFF, FL=00 FE FF 00 00 00 00 00 00 00 00
Representing negative numbers
So far, (almost) all of the numbers that we've represented have been the positive integers.
How do we represent negative numbers?
This is much the same question as "how do we do subtraction". Why? You'll probably recall from basic maths that subtracting the number B from the number A is equivalent to adding the negative of B to A. You might have seen that written down in this form:
(A - B) = A + (-B)
So, perhaps we can use the idea of subtraction to help us to represent a negative number?
We've already seen a sort of example of subtraction: what happened when we saw the addition of 8-bit numbers carry beyond an 8-bit value? Let's remind ourselves.
For 8-bit values, if we try to add two numbers such that the result would be greater than 255, we are left with the least significant 8-bits, and we drop the carried-over 9th bit.
Here are a few examples
255 + 1 = 0 (with carry)
255 + 2 = 1 (with carry)
255 + 3 = 2 (with carry)
(You can write a short program to test that, if you like.)
load r0 255 load r1 255 load r2 255 add r0 1 add r1 2 add r2 3 exit
What we know about basic arithmetic tells us that:
255 - 255 = 0
255 - 254 = 1
255 - 253 = 2
If we replace the right-hand-side of the first expression with the left-hand-side of the second expression, we get
255 + 1 = 255 - 255
255 + 2 = 255 - 254
255 + 3 = 255 - 253
This would seem to imply that
1 = (- 255)
2 = (-254)
3 = (-253)
How is this possible?!
Well, we've already answered that question: it is precisely because there is a maximum number we can represent in a fixed number of bits, and the way that larger numbers carry over.
Let's represent those numbers in binary:
00000001 = -(11111111)
00000010 = -(11111110)
00000011 = -(11111101)
A quick rearrangement of the expression shows us that that that is really true!
(remember that the expression a = -b is equivalent to a + b = 0)
00000001 + 11111111 = 00000000 (with carry)
00000010 + 11111110 = 00000000 (with carry)
00000011 + 11111101 = 00000000 (with carry)
In fact, we could write a little program to test that:
load r0 0b00000001 load r1 0b00000010 load r2 0b00000011 add r0 0b11111111 add r1 0b11111110 add r2 0b11111101 exit
So, for every positive number that we can represent in binary, there is a complementary negative number.
We call this representation of a negative number the two's complement representation.
It is fairly easy to calculate - you just take the binary representation of the absolute number (the number without its sign) and flip all of the 0s to 1s, and 1s to 0s (we call this the one's complement of the number), then add 1, to make the two's complement.
Let's have an example of that: what's the two's complement representation of the number -27, in an 8-bit store?
First, we need to represent 27 in binary:
Then we flip all the bits to get the one's complement representation:
Finally, we add 1 to get the two's complement representation:
So, -27 in two's complement binary notation is 11100101
We can test this
In decimal, 60 - 27 = 33
60 in binary is 00111100
-27 in two's complement binary notation is 11100101
So, 60 - 27 in binary is 00111100 + 11100101
That is 00100001 (with a carry), which is 33 in decimal, as required!
Originally, we stressed that we could store positive integers from 0 - 255 in an 8-bit value.
Now, we have added the possibility of storing both positive and negative integers in an 8-bit value, by the use of a two's complement representation.
So, every number has an equivalent two's complement "negative" number. How do we tell the difference between 11100101 when it is meant to be -27 and 11100101 when it is meant to be 229?
The short answer is that you can't - it is up to the programmer to determine when a result might be negative and when it might be positive.
We could specify (by documentation, for example) that a particular value we were storing was unsigned (i.e. a positive integer) or signed (i.e. any integer, positive or negative).
It turns out we can remove the ambiguity in the case of the signed integer by limiting the maximum and minimum values we can represent.
Think about the decimal numbers 1…127.
These can be represented in binary as 0b00000001…0b01111111.
Now, think about the decimal numbers -127…-1.
These can be represented in binary as 0b10000001…0b11111111.
Notice how the positive numbers all have a 0 in the top bit, and the negative numbers all have a 1 in the top bit. We often call this the sign bit.
If we choose to limit the numbers we represent in a single byte to numbers that can be represented in 7 bits, rather than 8, then we have room for this sign bit, and there is no ambiguity about what we mean. But how can we tell whether some memory is signed or not? The short answer is that we can't. The only way to tell is to document how the storage is being used. We're reaching the limits of what we can conveniently express in such a low-level language. We need to move to a higher-level of abstraction, and a richer language, to help us with that.
Copying a value from a smaller representation into a larger representation
There's a little wrinkle that happens when we copy a value from a smaller representation. Here's decimal 1 as a byte
0b00000001. And here's decimal 1 as a 16-bit value: 0b0000000000000001. So far so good.
What about -127? That's 0b10000001 in a byte, but it's 0b1111111110000001 when stored in a word.
Notice that in each case, the sign bit gets extended across the whole of the most-significant byte when you copy from a byte to a word (the zero for the case of a positive number, or the 1 for a negative number.)
This sign extension when you copy between storage sizes is very important: 0b10000001 is -127 decimal when stored in a byte, but naively copy that into a word and it becomes 0b00000000100000001, which is 129 decimal!
Copying a value from a larger representation into a smaller representation
There's an obvious danger with copying numbers from a larger representation (e.g. a word) to a smaller one (e.g. a byte) and that's called truncation.
There's no problem if the number can be fully represented in the number of bits of the target representation e.g. 1 is 0x0001 in 16 bits, and 0x01 in 8 bits - you just lose the high byte. Similarly, -1 is 0xFFFF in 16 bits, and 0xFF in 8 bits - you just lose the high byte again, and it remains correct. But what about 258? That's 0x0102 in 16-bits, but if you lose the high byte, it becomes 0x02 - not the number you were thinking of at all!
Most higher level languages will warn you about possible truncation (or even prevent you from doing it directly). But it is a very common source of bugs in programs.
In this section, we've looked at how we can represent numbers (specifically the integers) in data memory.
We've learned how we can convert the number between different bases - in particular base 10 (decimal), base 16 (hexadecimal) and base 2 (binary).
We then looked at binary arithmetic, and saw what happens when we add binary numbers together.
We investigated the largest value we can store in one of our registers, and determined that it represented an 8-bit value (which we call a byte).
We then looked at what happens when we try to store a number larger than that which a particular register can hold, and introduced the concept of a 16-bit number.
Next, we looked at how a 16-bit (or larger) number can be represented in memory, including the concepts of big-endian and little-endian memory layouts. We also switched our computer over to displaying values using hex instead of decimal, and looked at why we might want to do that.
Finally, we looked at how to store negative numbers in a consistent way, such that we could perform basic addition and subtraction, by using the two's complement representation, and how to constrain the range of numbers we can store, to remove the ambiguity when storing positive and negative numbers.
Next time, we'll look at how we can bring logic to bear on our problems.