Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .
In this appendix, we look at the Decimal
and Fraction
types that can be used as replacements for the built-in float
type mitigating its imprecision. The content is put in an appendix as the data science practitioner can live without knowing about it for quite some time. Eventually, when working with financial data, for example, knowing how to not use the float
type in a bad way pays off.
Decimal
Type¶The decimal module in the standard library
provides a Decimal
type that may be used to represent any real number to a user-defined level of precision: "User-defined" does not mean an infinite or exact precision! The
Decimal
type merely allows us to work with a number of bits different from the 64 as specified for the float
type and also to customize the rounding rules and some other settings.
We import the Decimal
type and also the getcontext() function from the decimal
module.
from decimal import Decimal, getcontext
getcontext() shows us how the decimal
module is set up. By default, the precision is set to
28
significant digits, which is roughly twice as many as with float
objects.
getcontext()
Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])
The two simplest ways to create a Decimal
object is to either instantiate it with an int
or a str
object consisting of all the significant digits. In the latter case, the scientific notation is also possible.
Decimal(42)
Decimal('42')
Decimal("0.1")
Decimal('0.1')
Decimal("1e-3")
Decimal('0.001')
It is not a good idea to create a Decimal
from a float
object. If we did so, we would create a Decimal
object that internally used extra bits to store the "random" digits that are not stored in the float
object in the first place.
Decimal(0.1) # do not do this!
Decimal('0.1000000000000000055511151231257827021181583404541015625')
With the Decimal
type, the imprecisions in the arithmetic and equality comparisons go away.
Decimal("0.1") + Decimal("0.2")
Decimal('0.3')
Decimal("0.1") + Decimal("0.2") == Decimal("0.3")
True
Decimal
numbers preserve the significant digits, even in cases where this is not needed.
Decimal("0.10000") + Decimal("0.20000")
Decimal('0.30000')
Decimal("0.10000") + Decimal("0.20000") == Decimal("0.3")
True
Arithmetic operations between Decimal
and int
objects work as the latter are inherently precise: The results are new Decimal
objects.
21 + Decimal(21)
Decimal('42')
10 * Decimal("4.2")
Decimal('42.0')
Decimal(1) / 10
Decimal('0.1')
To verify the precision, we apply the built-in format() function to the previous code cell and compare it with the same division resulting in a
float
object.
format(Decimal(1) / 10, ".50f")
'0.10000000000000000000000000000000000000000000000000'
format(1 / 10, ".50f")
'0.10000000000000000555111512312578270211815834045410'
However, mixing Decimal
and float
objects raises a TypeError
: So, Python prevents us from potentially introducing imprecisions via innocent-looking arithmetic by failing loudly.
1.0 * Decimal(42)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[16], line 1 ----> 1 1.0 * Decimal(42) TypeError: unsupported operand type(s) for *: 'float' and 'decimal.Decimal'
To preserve the precision for more advanced mathematical functions, Decimal
objects come with many methods bound on them. For example, ln() and log10()
take the logarithm while sqrt()
calculates the square root. The methods always return a new
Decimal
object. We must never use the functions in the math module in the standard library
with
Decimal
objects as they do not preserve precision.
Decimal(100).log10()
Decimal('2')
Decimal(2).sqrt()
Decimal('1.414213562373095048801688724')
The object returned by the sqrt() method is still limited in precision: This must be so as, for example, √2 is an irrational number
that cannot be expressed with absolute precision using any number of bits, even in theory.
We see this as raising √2 to the power of 2 results in an imprecise value as before!
two = Decimal(2).sqrt() ** 2
two
Decimal('1.999999999999999999999999999')
However, the quantize() method allows us to quantize (i.e., "round") a
Decimal
number at any precision that is smaller than the set precision. It takes the number of decimals to the right of the period of the Decimal
argument we pass in and rounds accordingly.
For example, as the overall imprecise value of two
still has an internal precision of 28
digits, we can correctly round it to four decimals (i.e., Decimal("0.0000")
has four decimals).
two.quantize(Decimal("0.0000"))
Decimal('2.0000')
We can never round a Decimal
number and obtain a greater precision than before: The InvalidOperation
exception tells us that loudly.
two.quantize(Decimal("1e-28"))
--------------------------------------------------------------------------- InvalidOperation Traceback (most recent call last) Cell In[21], line 1 ----> 1 two.quantize(Decimal("1e-28")) InvalidOperation: [<class 'decimal.InvalidOperation'>]
Consequently, with this little workaround √22=2 works, even in Python.
two.quantize(Decimal("0.0000")) == 2
True
The downside is that the entire expression is not as pretty as sqrt(2) ** 2 == 2
.
(Decimal(2).sqrt() ** 2).quantize(Decimal("0.0000")) == 2
True
nan
and positive and negative inf
exist as well, and the same remarks from the discussion of the float
type apply.
Decimal("nan")
Decimal('NaN')
Decimal("nan")
s never compare equal to anything, not even to themselves.
Decimal("nan") == Decimal("nan")
False
Infinity is larger than any concrete number.
Decimal("inf")
Decimal('Infinity')
Decimal("-inf")
Decimal('-Infinity')
Decimal("inf") + 42
Decimal('Infinity')
Decimal("inf") + 42 == Decimal("inf")
True
As with float
objects, we cannot add infinities of different signs: Now, get a module-specific InvalidOperation
exception instead of a nan
value. Here, failing loudly is a good thing as it prevents us from working with invalid results.
Decimal("inf") + Decimal("-inf")
--------------------------------------------------------------------------- InvalidOperation Traceback (most recent call last) Cell In[30], line 1 ----> 1 Decimal("inf") + Decimal("-inf") InvalidOperation: [<class 'decimal.InvalidOperation'>]
Decimal("inf") - Decimal("inf")
--------------------------------------------------------------------------- InvalidOperation Traceback (most recent call last) Cell In[31], line 1 ----> 1 Decimal("inf") - Decimal("inf") InvalidOperation: [<class 'decimal.InvalidOperation'>]
For more information on the Decimal
type, see the tutorial at PYMOTW or the official documentation .
Fraction
Type¶If the numbers in an application can be expressed as rational numbers (i.e., the set Q), we may model them as a Fraction
type from the fractions
module in the standard library
. As any fraction can always be formulated as the division of one integer by another,
Fraction
objects are inherently precise, just as int
objects on their own. Further, we maintain the precision as long as we do not use them in a mathematical operation that could result in an irrational number (e.g., taking the square root).
We import the Fraction
type from the fractions module.
from fractions import Fraction
Among others, there are two simple ways to create a Fraction
object: We either instantiate one with two int
objects representing the numerator and denominator or with a str
object. In the latter case, we have two options again and use either the format "numerator/denominator" (i.e., without any spaces) or the same format as for float
and Decimal
objects.
Fraction(1, 3) # 1 / 3 with "full" precision
Fraction(1, 3)
Fraction("1/3") # 1 / 3 with "full" precision
Fraction(1, 3)
Fraction("0.3333333333") # 1 / 3 with a precision of 10 significant digits
Fraction(3333333333, 10000000000)
Fraction("3333333333e-10") # scientific notation is also allowed
Fraction(3333333333, 10000000000)
Only the lowest common denominator version is maintained after creation: For example, 32 and 64 are the same, and both become Fraction(3, 2)
.
Fraction(3, 2)
Fraction(3, 2)
Fraction(6, 4)
Fraction(3, 2)
We could also cast a Decimal
object as a Fraction
object: This only makes sense as Decimal
objects come with a pre-defined precision.
Fraction(Decimal("0.1"))
Fraction(1, 10)
float
objects may syntactically be cast as Fraction
objects as well. However, then we create a Fraction
object that precisely remembers the float
object's imprecision: A bad idea!
Fraction(0.1)
Fraction(3602879701896397, 36028797018963968)
Fraction
objects follow the arithmetic rules from middle school and may be mixed with int
objects without any loss of precision. The result is always a new Fraction
object.
Fraction(3, 2) + Fraction(1, 4)
Fraction(7, 4)
Fraction(5, 2) - 2
Fraction(1, 2)
3 * Fraction(1, 3)
Fraction(1, 1)
Fraction(3, 2) * Fraction(2, 3)
Fraction(1, 1)
Fraction
and float
objects may also be mixed syntactically. However, then the results may exhibit imprecision again, even if we do not see them at first sight! This is another example of code failing silently.
10.0 * Fraction(1, 100) # do not do this!
0.1
format(10.0 * Fraction(1, 100), ".50f")
'0.10000000000000000555111512312578270211815834045410'
For more examples and discussions, see the tutorial at PYMOTW or the official documentation .