# 3076 Assignment 1: Desigining a Float8 type¶

Please complete the following assignment in Jupyter, and email the completed .ipynb file to [email protected] by midnight on 14 April 2016.

You are strongly encouraged to work out the problems on your own. If you use existing solutions, existing code, help online or collaborate with other students, you will still receive full credit provided that you make this clear and cite your sources. If you collaborate with other students, please submit individual solutions and be specific in giving credit for what each student contributed.

In this assignment you will construct a Float8 type, containing one field data, an 8-bit UInt8:

In [6]:
import Base: show, bits, exponent, significand, +, -, *, /, Float64

type Float8
data::UInt8
end


For this types, we will interpret the 8-bits in data as: 1 sign bit, 3 exponent bits and 4 signficiand bits. That is, numbers will be represented by

$$x = \pm 2^{q-S}*(1.b_1b_2b_3b_4)_2$$

where $S = 3$, $q$ is an unsigned 3-bit integer in the 2nd through 4th bits and $b_1b_2b_3b_4$ are in the last 4 bits.

We will not implement Inf, NaN or subnormal numbers. We will however implement both $±0$.

We will use a globally defined constant S to represent $S$:

In [8]:
const S=UInt8(3)  # The shift

Out[8]:
0x03

We will use the bits to get access to the bits. The following function defines bits for a Float8:

In [11]:
function bits(x::Float8)
bits(x.data)
end

Out[11]:
bits (generic function with 6 methods)

Exercise 1

(a) Complete the following exponent function, that returns $q-S$ (as an integer).

(b) Check that

exponent(Float8(UInt8(123)))


returns 4.

In [18]:
function exponent(x::Float8)
## TODO: interpret the bits and return an integer
end

Out[18]:
exponent (generic function with 6 methods)

Exercise 2

(a) Complete the following significand function, that returns the significand of a Float8. Don't forget to incorporate the sign bit.

(c) Check that

significand(Float8(UInt8(123)))


returns 1.6875.

In [28]:
function significand(x::Float8)
if x.data==0
0.0
elseif x.data==128
-0.0
else
bts=bits(x)
#TODO: convert to 8-bits in bts to a Floating point
end
end

Out[28]:
significand (generic function with 6 methods)

Excercise 3

(a) Use exponent and significand to complete the definition of Float64(x::Float8), which converts a Float8 to a Float64

(b) Check that

Float8(UInt8(123))


now displays as 27.0f8.

In [35]:
function Float64(x::Float8)
#TODO: return a Float8 that equals the input
end

function show(io::IO,x::Float8)
print(io,Float64(x))
print(io,"f8")
end

Out[35]:
show (generic function with 106 methods)

Exercise 4

(a) Complete the following chop_to_8_bits function that returns a string for normal numbers containing the 8-bits for the Float8 representation. For this question, you can simply chop the significand bits of a Float64. (Recall that a Float64 has 1 sign bit, 11 exponent bits and 52 significand bits.)

(b) Add comments explaining the definition of Float8(::Float64).

(c) Check that

Float8(1.25)


returns 1.25f8.

(d) Explain why

Float8(1.3)


returns the same number.

In [47]:
function chop_to_8_bits(x::Float64)
#TODO: returns a string containing 8-bits for the Float8 approximation to x
end

function Float8(x::Float64)
if x===0.0
Float8(UInt8(0))
elseif x===-0.0
Float8(UInt8(128))
else
Float8(parse(UInt8,chop_to_8_bits(x),2))
end
end

Out[47]:
Float8

Exercise 5

Complete the following function that negates a Float8:

In [ ]:
function -(x::Float8)
#TODO: return the bits corresponding to -x
end


Exercise 6

(a) Complete the following algebra operations, ensuring that each one returns a Float8. You can use Float64(x::Float8) and Float8(x::Float64) to use the inbuilt Float64 arithmetic.

(b) Check that

Float8(1.25)+Float8(2.25)


returns 3.5f8

In [51]:
function +(x::Float8,y::Float8)
#TODO: return x+y as Float8
end

function *(x::Float8,y::Float8)
#TODO: return x*y as Float8
end

function /(x::Float8,y::Float8)
#TODO: return x/y as Float8
end

function -(x::Float8,y::Float8)
#TODO: return x-y as Float8
end

Out[51]:
- (generic function with 205 methods)

Exercise 7

(a) Implement the following routine round_to_8bits that rounds to the nearest Float8, rather than chops.

(b) Check that

Float8(parse(UInt8,round_to_8_bits(1.3),2))


returns 1.3125f8.

In [61]:
function round_to_8_bits(x::Float64)
# TODO: Round a Float64 to a Float8
end

Out[61]:
round_to_8_bits (generic function with 1 method)