Arithmetic coding (method of compression)

Suppose we have a message that only contains the characters A, B, and C, 
with the following frequencies, expressed as fractions:

   A:  0.5
   B:  0.2
   C:  0.3

To show how arithmetic compression works, we first set up a table, 
listing characters with their probabilities along with 
the cumulative sum of those probabilities. The cumulative sum 
defines "intervals", ranging from the bottom value to less than,
but not equal to, the top value. The order in which characters are 
listed in the table does not seem to be important, except
to the extent that both the coder and decoder have to know 
what the order is.

   letter   probability   interval
   ______   ___________   _________

   C:       0.3           0.0 : 0.3
   B:       0.2           0.3 : 0.5
   A:       0.5           0.5 : 1.0
   ______   ___________   _________

Now each character can be coded by the shortest binary fraction 
whose value falls in the character's probability interval:

   letter   probability   interval    binary fraction
   ______   ___________   _________   ____________________

   C:       0.3           0.0 : 0.3   0        
   B:       0.2           0.3 : 0.5   0.011 = 3/8 = 0.375
   A:       0.5           0.5 : 1.0   0.1    = 1/2 = 0.5
   ______   ___________   _________   ____________________

This shows how single characters can be assigned 
minimum-length binary codes. However, arithmetic coding 
doesn't stop there and simply translate the individual 
characters in a message as these binary codes. 
It takes a subtler approach, assigning binary 
fractions to complete messages.

To start, let's consider sending messages consisting 
of all possible permutations of two of these three 
characters. We determine the probability of 
the two-character strings by multiplying the probabilities 
of the two characters, and then set up a series of intervals 
using those probabilities.

 

  string   probability   interval       binary fraction

   ______   ___________   ____________   _______________________

   CC:      0.09          0.00 : 0.09    0.0001 = 1/16  = 0.0625
   CB:      0.06          0.09 : 0.15    0.001   = 1/8   = 0.125
   CA:      0.15          0.15 : 0.30    0.01     = 1/4   = 0.25
   BC:      0.06          0.30 : 0.36    0.0101 = 5/16  = 0.3125
   BB:      0.04          0.36 : 0.40    0.011   = 3/8   = 0.375
   BA:      0.10          0.40 : 0.50    0.0111 = 7/16  = 0.4375
   AC:      0.15          0.50 : 0.65    0.1      = 1/2    = 0.5
   AB:      0.10          0.65 : 0.75    0.1011 = 11/16 = 0.6875
   AA:      0.25          0.75 : 1.00    0.11    = 3/4     = 0.75
   ______   ___________   ____________   _______________________

The higher the probability of the string, in general the shorter 
the binary fraction needed to represent it.

Let's build a similar table for three characters now:

 

  string   probability   interval        binary fraction

______   ___________   _____________   _______________________________

   CCC      0.027         0.000 : 0.027   0.000001   =  1/64   = 0.015625
   CCB      0.018         0.027 : 0.045   0.00001    =  1/32   = 0.03125
   CCA      0.045         0.045 : 0.090   0.0001     =  1/16   = 0.0625
   CBC      0.018         0.090 : 0.108   0.00011    =  3/32   = 0.09375
   CBB      0.012         0.108 : 0.120   0.000111   =  7/64   = 0.109375
   CBA      0.03          0.120 : 0.150   0.001      =  1/8    = 0.125
   CAC      0.045         0.150 : 0.195   0.0011     =  3/16   = 0.1875
   CAB      0.03          0.195 : 0.225   0.00111    =  7/32   = 0.21875
   CAA      0.075         0.225 : 0.300   0.01       =  1/4    = 0.25

   BCC      0.018         0.300 : 0.318   0.0101     =  5/16   = 0.3125
   BCB      0.012         0.318 : 0.330   0.010101   =  21/64  = 0.328125
   BCA      0.03          0.330 : 0.360   0.01011    =  11/32  = 0.34375
   BBC      0.012         0.360 : 0.372   0.0101111  =  47/128 = 0.3671875
   BBB      0.008         0.372 : 0.380   0.011      =  3/8    = 0.375
   BBA      0.02          0.380 : 0.400   0.011001   =  25/64  = 0.390625
   BAC      0.03          0.400 : 0.430   0.01101    =  13/32  = 0.40625
   BAB      0.02          0.430 : 0.450   0.0111     =  7/16   = 0.4375
   BAA      0.05          0.450 : 0.500   0.01111    =  15/32  = 0.46875

   ACC      0.045         0.500 : 0.545   0.1        =  1/2    = 0.5
   ACB      0.03          0.545 : 0.575   0.1001     =  9/16   = 0.5625
   ACA      0.075         0.575 : 0.650   0.101      =  5/8    = 0.625
   ABC      0.03          0.650 : 0.680   0.10101    =  21/32  = 0.65625
   ABB      0.02          0.680 : 0.700   0.1011     =  11/16  = 0.6875
   ABA      0.05          0.700 : 0.750   0.10111    =  23/32  = 0.71875
   AAC      0.075         0.750 : 0.825   0.11       =  3/4    = 0.75
   AAB      0.05          0.825 : 0.875   0.11011    =  27/32  = 0.84375
   AAA      0.125         0.875 : 1.000   0.111      =  7/8    = 0.875
   ______   ___________   _____________   _______________________________


* Let's stop here and send one of the binary strings defined        (DECODING !)
in the table above to a decoder.
We'll arbitrarily select the binary string with the decimal value 
of 0.21875 from the table above.

This value was obtained using the probability values and intervals 
defined earlier:

   string   probability   interval 
   ______   ___________   _________

   C:       0.3           0.0 : 0.3
   B:       0.2           0.3 : 0.5
   A:       0.5           0.5 : 1.0
   ______   ___________   _________

The value 0.21875 clearly falls into the interval for "C", 
so "C" must be the first character. 
We can then "zoom in" on the characters that follow 
the "C" by subtracting the bottom 
value of the interval for "C", which happens 
to be 0, and dividing the result by the width 
of the probability interval for "C", which is 0.3:

   (0.21875 - 0) / 0.3  =  0.72917

This is a simple shift and scaling operation.

The result falls into the probability interval for "A", 
and so the second character must be "A". We can then 
zoom in on the next character by the same approach as 
before, subtracting the bottom value of the interval 
for "A", which is 0.5, and dividing the result
by the width of the probability interval for "A", 
which is also 0.5:

   (0.72917 - 0.5) / 0.5  = 0.4583

This clearly falls into the probability interval for "B", 
and so the string has been correctly
uncompressed to "CAB", which is the correct answer. 




    Poniżej przedstawiono schemat programu
realizującego kodowanie arytmetyczna (tzn. realizującego
kompresję tą metodą).


Set low to 0.0
Set high to 1.0
While there are still input symbols do
    get an input symbol
    range = high - low.
    high = low + range*high_range(symbol)
    low = low + range*low_range(symbol)
End of While
output low