Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 5: Numbers & Bits (Appendix)¶

In this appendix, we look at the Decimal and Fraction types that can be used as replacements for the built-in float type mitigating its imprecision. The content is put in an appendix as the data science practitioner can live without knowing about it for quite some time. Eventually, when working with financial data, for example, knowing how to not use the float type in a bad way pays off.

The `Decimal` Type¶

The decimal module in the standard library provides a Decimal type that may be used to represent any real number to a user-defined level of precision: "User-defined" does not mean an infinite or exact precision! The Decimal type merely allows us to work with a number of bits different from the $64$ as specified for the float type and also to customize the rounding rules and some other settings.

We import the Decimal type and also the getcontext() function from the decimal module.

In [1]:

from decimal import Decimal, getcontext

getcontext() shows us how the decimal module is set up. By default, the precision is set to 28 significant digits, which is roughly twice as many as with float objects.

In [2]:

getcontext()

Out[2]:

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])

The two simplest ways to create a Decimal object is to either instantiate it with an int or a str object consisting of all the significant digits. In the latter case, the scientific notation is also possible.

In [3]:

Decimal(42)

Out[3]:

Decimal('42')

In [4]:

Decimal("0.1")

Out[4]:

Decimal('0.1')

In [5]:

Decimal("1e-3")

Out[5]:

Decimal('0.001')

It is not a good idea to create a Decimal from a float object. If we did so, we would create a Decimal object that internally used extra bits to store the "random" digits that are not stored in the float object in the first place.

In [6]:

Decimal(0.1)  # do not do this!

Out[6]:

Decimal('0.1000000000000000055511151231257827021181583404541015625')

With the Decimal type, the imprecisions in the arithmetic and equality comparisons go away.

In [7]:

Decimal("0.1") + Decimal("0.2")

Out[7]:

Decimal('0.3')

In [8]:

Decimal("0.1") + Decimal("0.2") == Decimal("0.3")

Out[8]:

True

Decimal numbers preserve the significant digits, even in cases where this is not needed.

In [9]:

Decimal("0.10000") + Decimal("0.20000")

Out[9]:

Decimal('0.30000')

In [10]:

Decimal("0.10000") + Decimal("0.20000") == Decimal("0.3")

Out[10]:

True

Arithmetic operations between Decimal and int objects work as the latter are inherently precise: The results are new Decimal objects.

In [11]:

21 + Decimal(21)

Out[11]:

Decimal('42')

In [12]:

10 * Decimal("4.2")

Out[12]:

Decimal('42.0')

In [13]:

Decimal(1) / 10

Out[13]:

Decimal('0.1')

To verify the precision, we apply the built-in format() function to the previous code cell and compare it with the same division resulting in a float object.

In [14]:

format(Decimal(1) / 10, ".50f")

Out[14]:

'0.10000000000000000000000000000000000000000000000000'

In [15]:

format(1 / 10, ".50f")

Out[15]:

'0.10000000000000000555111512312578270211815834045410'

However, mixing Decimal and float objects raises a TypeError: So, Python prevents us from potentially introducing imprecisions via innocent-looking arithmetic by failing loudly.

In [16]:

1.0 * Decimal(42)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 1.0 * Decimal(42)

TypeError: unsupported operand type(s) for *: 'float' and 'decimal.Decimal'

To preserve the precision for more advanced mathematical functions, Decimal objects come with many methods bound on them. For example, ln() and log10() take the logarithm while sqrt() calculates the square root. The methods always return a new Decimal object. We must never use the functions in the math module in the standard library with Decimal objects as they do not preserve precision.

In [17]:

Decimal(100).log10()

Out[17]:

Decimal('2')

In [18]:

Decimal(2).sqrt()

Out[18]:

Decimal('1.414213562373095048801688724')

The object returned by the sqrt() method is still limited in precision: This must be so as, for example, $\sqrt{2}$ is an irrational number that cannot be expressed with absolute precision using any number of bits, even in theory.

We see this as raising $\sqrt{2}$ to the power of $2$ results in an imprecise value as before!

In [19]:

two = Decimal(2).sqrt() ** 2

two

Out[19]:

Decimal('1.999999999999999999999999999')

However, the quantize() method allows us to quantize (i.e., "round") a Decimal number at any precision that is smaller than the set precision. It takes the number of decimals to the right of the period of the Decimal argument we pass in and rounds accordingly.

For example, as the overall imprecise value of two still has an internal precision of 28 digits, we can correctly round it to four decimals (i.e., Decimal("0.0000") has four decimals).

In [20]:

two.quantize(Decimal("0.0000"))

Out[20]:

Decimal('2.0000')

We can never round a Decimal number and obtain a greater precision than before: The InvalidOperation exception tells us that loudly.

In [21]:

two.quantize(Decimal("1e-28"))

---------------------------------------------------------------------------
InvalidOperation                          Traceback (most recent call last)
Cell In[21], line 1
----> 1 two.quantize(Decimal("1e-28"))

InvalidOperation: [<class 'decimal.InvalidOperation'>]

Consequently, with this little workaround $\sqrt{2}^2 = 2$ works, even in Python.

In [22]:

two.quantize(Decimal("0.0000")) == 2

Out[22]:

True

The downside is that the entire expression is not as pretty as sqrt(2) ** 2 == 2.

In [23]:

(Decimal(2).sqrt() ** 2).quantize(Decimal("0.0000")) == 2

Out[23]:

True

nan and positive and negative inf exist as well, and the same remarks from the discussion of the float type apply.

In [24]:

Decimal("nan")

Out[24]:

Decimal('NaN')

Decimal("nan")s never compare equal to anything, not even to themselves.

In [25]:

Decimal("nan") == Decimal("nan")

Out[25]:

False

Infinity is larger than any concrete number.

In [26]:

Decimal("inf")

Out[26]:

Decimal('Infinity')

In [27]:

Decimal("-inf")

Out[27]:

Decimal('-Infinity')

In [28]:

Decimal("inf") + 42

Out[28]:

Decimal('Infinity')

In [29]:

Decimal("inf") + 42 == Decimal("inf")

Out[29]:

True

As with float objects, we cannot add infinities of different signs: Now, get a module-specific InvalidOperation exception instead of a nan value. Here, failing loudly is a good thing as it prevents us from working with invalid results.

In [30]:

Decimal("inf") + Decimal("-inf")

---------------------------------------------------------------------------
InvalidOperation                          Traceback (most recent call last)
Cell In[30], line 1
----> 1 Decimal("inf") + Decimal("-inf")

InvalidOperation: [<class 'decimal.InvalidOperation'>]

In [31]:

Decimal("inf") - Decimal("inf")

---------------------------------------------------------------------------
InvalidOperation                          Traceback (most recent call last)
Cell In[31], line 1
----> 1 Decimal("inf") - Decimal("inf")

InvalidOperation: [<class 'decimal.InvalidOperation'>]

For more information on the Decimal type, see the tutorial at PYMOTW or the official documentation .

The `Fraction` Type¶

If the numbers in an application can be expressed as rational numbers (i.e., the set $\mathbb{Q}$ ), we may model them as a Fraction type from the fractions module in the standard library . As any fraction can always be formulated as the division of one integer by another, Fraction objects are inherently precise, just as int objects on their own. Further, we maintain the precision as long as we do not use them in a mathematical operation that could result in an irrational number (e.g., taking the square root).

We import the Fraction type from the fractions module.

In [32]:

from fractions import Fraction

Among others, there are two simple ways to create a Fraction object: We either instantiate one with two int objects representing the numerator and denominator or with a str object. In the latter case, we have two options again and use either the format "numerator/denominator" (i.e., without any spaces) or the same format as for float and Decimal objects.

In [33]:

Fraction(1, 3)  # 1 / 3 with "full" precision

Out[33]:

Fraction(1, 3)

In [34]:

Fraction("1/3")  # 1 / 3 with "full" precision

Out[34]:

Fraction(1, 3)

In [35]:

Fraction("0.3333333333")  # 1 / 3 with a precision of 10 significant digits

Out[35]:

Fraction(3333333333, 10000000000)

In [36]:

Fraction("3333333333e-10")  # scientific notation is also allowed

Out[36]:

Fraction(3333333333, 10000000000)

Only the lowest common denominator version is maintained after creation: For example, $\frac{3}{2}$ and $\frac{6}{4}$ are the same, and both become Fraction(3, 2).

In [37]:

Fraction(3, 2)

Out[37]:

Fraction(3, 2)

In [38]:

Fraction(6, 4)

Out[38]:

Fraction(3, 2)

We could also cast a Decimal object as a Fraction object: This only makes sense as Decimal objects come with a pre-defined precision.

In [39]:

Fraction(Decimal("0.1"))

Out[39]:

Fraction(1, 10)

float objects may syntactically be cast as Fraction objects as well. However, then we create a Fraction object that precisely remembers the float object's imprecision: A bad idea!

In [40]:

Fraction(0.1)

Out[40]:

Fraction(3602879701896397, 36028797018963968)

Fraction objects follow the arithmetic rules from middle school and may be mixed with int objects without any loss of precision. The result is always a new Fraction object.

In [41]:

Fraction(3, 2) + Fraction(1, 4)

Out[41]:

Fraction(7, 4)

In [42]:

Fraction(5, 2) - 2

Out[42]:

Fraction(1, 2)

In [43]:

3 * Fraction(1, 3)

Out[43]:

Fraction(1, 1)

In [44]:

Fraction(3, 2) * Fraction(2, 3)

Out[44]:

Fraction(1, 1)

Fraction and float objects may also be mixed syntactically. However, then the results may exhibit imprecision again, even if we do not see them at first sight! This is another example of code failing silently.

In [45]:

10.0 * Fraction(1, 100)  # do not do this!

Out[45]:

0.1

In [46]:

format(10.0 * Fraction(1, 100), ".50f")

Out[46]:

'0.10000000000000000555111512312578270211815834045410'

For more examples and discussions, see the tutorial at PYMOTW or the official documentation .

Chapter 5: Numbers & Bits (Appendix)¶

The Decimal Type¶

The Fraction Type¶

The `Decimal` Type¶

The `Fraction` Type¶