Recommended Reading:
Please complete this notebook by filling in the cells provided. Before you begin, execute the following cell to load some necessary information for the assignment.
Homework 2 is due Thursday, 1/30 at 11:59pm. Start early so that you can come to office hours if you're stuck. Check the website for the office hours schedule. Late work will not be accepted as per the policies of this course.
# Don't change this cell; just run it.
import numpy as np
from datascience import *
Question 1.1. Make an array called weird_numbers
containing the following numbers (in the given order):
Hint: sin
and cos
are functions in the math
module.
# Our solution involved one extra line of code before creating
# weird_numbers.
...
weird_numbers = ...
weird_numbers
Question 1.2. Make an array called book_title_words
containing the following three strings: "Eats", "Shoots", and "and Leaves".
book_title_words = ...
book_title_words
Strings have a method called join
. join
takes one argument, an array of strings. It returns a single string. Specifically, the value of a_string.join(an_array)
is a single string that's the concatenation ("putting together") of all the strings in an_array
, except a_string
is inserted in between each string.
Question 1.3. Use the array book_title_words
and the method join
to make two strings:
with_commas
)without_commas
)Hint: If you're not sure what join
does, first try just calling, for example, "foo".join(book_title_words)
.
with_commas = ...
without_commas = ...
# These lines are provided just to print out your answers.
print('with_commas:', with_commas)
print('without_commas:', without_commas)
These exercises give you practice accessing individual elements of arrays. In Python (and in many programming languages), elements are accessed by index, so the first element is the element at index 0.
Question 2.1. The cell below creates an array of some numbers. Set third_element
to the third element of some_numbers
.
some_numbers = make_array(-1, -3, -6, -10, -15)
third_element = ...
third_element
Question 2.2. The next cell creates a table that displays some information about the elements of some_numbers
and their order. Run the cell to see the partially-completed table, then fill in the missing information in the cell (the strings that are currently "???") to complete the table.
elements_of_some_numbers = Table().with_columns(
"English name for position", make_array("first", "second", "???", "???", "fifth"),
"Index", make_array("???", "1", "2", "???", "4"),
"Element", some_numbers)
elements_of_some_numbers
Question 2.3. You'll sometimes want to find the last element of an array. Suppose an array has 142 elements. What is the index of its last element?
index_of_last_element = ...
More often, you don't know the number of elements in an array, its length. (For example, it might be a large dataset you found on the Internet.) The function len
takes a single argument, an array, and returns the len
gth of that array (an integer).
Question 2.4. The cell below loads an array called president_birth_years
. The last element in that array is the most recent birth year of any deceased president as of 2017. Assign that year to most_recent_birth_year
.
president_birth_years = Table.read_table("president_births.csv").column('Birth Year')
most_recent_birth_year = ...
most_recent_birth_year
Question 2.5. Finally, assign sum_of_birth_years
to the sum of the first, tenth, and last birth year in president_birth_years
sum_of_birth_years = ...
Question 3.1. Multiply the numbers 42, 4224, 42422424, and -250 by 157. For this question, don't use arrays.
first_product = ...
second_product = ...
third_product = ...
fourth_product = ...
print(first_product, second_product, third_product, fourth_product)
Question 3.2. Now, do the same calculation, but using an array called numbers
and only a single multiplication (*
) operator. Store the 4 results in an array named products
.
numbers = ...
products = ...
products
Question 3.3. Oops, we made a typo! Instead of 157, we wanted to multiply each number by 1577. Compute the fixed products in the cell below using array arithmetic. Notice that your job is really easy if you previously defined an array containing the 4 numbers.
fixed_products = ...
fixed_products
Question 3.4. We've loaded an array of temperatures in the next cell. Each number is the highest temperature observed on a day at a climate observation station, mostly from the US. Since they're from the US government agency NOAA, all the temperatures are in Fahrenheit. Convert them all to Celsius by first subtracting 32 from them, then multiplying the results by $\frac{5}{9}$. Make sure to ROUND each result to the nearest integer using the np.round
function.
max_temperatures = Table.read_table("temperatures.csv").column("Daily Max Temperature")
celsius_max_temperatures = ...
celsius_max_temperatures
Question 3.5. The cell below loads all the lowest temperatures from each day (in Fahrenheit). Compute the size of the daily temperature range for each day. That is, compute the difference between each daily maximum temperature and the corresponding daily minimum temperature. Give your answer in Celsius! Make sure NOT to round your answer for this question!
min_temperatures = Table.read_table("temperatures.csv").column("Daily Min Temperature")
celsius_temperature_ranges = ...
celsius_temperature_ranges
The cell below loads a table of estimates of the world population for different years, starting in 1950. The estimates come from the US Census Bureau website.
world = Table.read_table("world_population.csv").select('Year', 'Population')
world.show(4)
The name population
is assigned to an array of population estimates.
population = world.column(1)
population
In this question, you will apply some built-in Numpy functions to this array.
The difference function np.diff
subtracts each element in an array by the element that preceeds it. As a result, the length of the array np.diff
returns will always be one less than the length of the input array.
The cumulative sum function np.cumsum
outputs an array of partial sums. For example, the third element in the output array corresponds to the sum of the first, second, and third elements.
Question 4.1. Very often in data science, we are interested understanding how values change with time. Use np.diff
and np.max
(or just max
) to calculate the largest annual change in population between any two consecutive years.
largest_population_change = ...
largest_population_change
Question 4.2. Describe in words the result of the following expression. What do the values in the resulting array represent (choose one)?
np.cumsum(np.diff(population))
The total population change between consecutive years, starting at 1951.
The total population change between 1950 and each later year, starting at 1951.
The total population change between 1950 and each later year, starting inclusively at 1950 (with a total change of 0).
# Assign cumulative_sum_answer to 1, 2, or 3
cumulative_sum_answer = ...
Old Faithful is a geyser in Yellowstone that erupts every 44 to 125 minutes (according to Wikipedia). People are often told that the geyser erupts every hour, but in fact the waiting time between eruptions is more variable. Let's take a look.
Question 5.1. The first line below assigns waiting_times
to an array of 272 consecutive waiting times between eruptions, taken from a classic 1938 dataset. Assign the names shortest
, longest
, and average
so that the print
statement is correct.
waiting_times = Table.read_table('old_faithful.csv').column('waiting')
shortest = ...
longest = ...
average = ...
print("Old Faithful erupts every", shortest, "to", longest, "minutes and every", average, "minutes on average.")
Question 5.2. Assign biggest_decrease
to the biggest decrease in waiting time between two consecutive eruptions. For example, the third eruption occurred after 74 minutes and the fourth after 62 minutes, so the decrease in waiting time was 74 - 62 = 12 minutes.
Hint: You'll need an array arithmetic function mentioned in the textbook.
Hint 2: The function you use may report positive or negative values. You will have to determine if the biggest decrease corresponds to the highest or lowest value. Ultimately, we want to return the absolute value of the biggest decrease so if it is a negative number, make it positive.
biggest_decrease = ...
biggest_decrease
Question 5.3. If you expected Old Faithful to erupt every hour, you would expect to wait a total of 60 * k
minutes to see k
eruptions. Set difference_from_expected
to an array with 272 elements, where the element at index i
is the absolute difference between the expected and actual total amount of waiting time to see the first i+1
eruptions. Hint: You'll need to compare a cumulative sum to a range.
For example, since the first three waiting times are 79, 54, and 74, the total waiting time for 3 eruptions is 79 + 54 + 74 = 207. The expected waiting time for 3 eruptions is 60 * 3 = 180. Therefore, difference_from_expected.item(2)
should be $|207 - 180| = 27$.
difference_from_expected = ...
difference_from_expected
Question 5.4. If instead you guess that each waiting time will be the same as the previous waiting time, how many minutes would your guess differ from the actual time, averaging over every wait time except the first one.
For example, since the first three waiting times are 79, 54, and 74, the average difference between your guess and the actual time for just the second and third eruption would be $\frac{|79-54|+ |54-74|}{2} = 22.5$.
average_error = ...
average_error
Question 6.1. Suppose you have 4 apples, 3 oranges, and 3 pineapples. (Perhaps you're using Python to solve a high school Algebra problem.) Create a table that contains this information. It should have two columns: "fruit name" and "count". Give it the name fruits
.
Note: Use lower-case and singular words for the name of each fruit, like "apple"
.
# Our solution uses 1 statement split over 3 lines.
fruits = ...
...
...
fruits
Question 6.2. The file inventory.csv
contains information about the inventory at a fruit stand. Each row represents the contents of one box of fruit. Load it as a table named inventory
.
inventory = ...
inventory
Question 6.3. Does each box at the fruit stand contain a different fruit?
# Set all_different to "Yes" if each box contains a different fruit or
# to "No" if multiple boxes contain the same fruit
all_different = ...
all_different
Question 6.4. The file sales.csv
contains the number of fruit sold from each box last Saturday. It has an extra column called "price per fruit ($)" that's the price per item of fruit for fruit in that box. The rows are in the same order as the inventory
table. Load these data into a table called sales
.
sales = ...
sales
Question 6.5. How many fruits did the store sell in total on that day?
total_fruits_sold = ...
total_fruits_sold
Question 6.6. What was the store's total revenue (the total price of all fruits sold) on that day?
Hint: If you're stuck, think first about how you would compute the total revenue from just the grape sales.
total_revenue = ...
total_revenue
Question 6.7. Make a new table called remaining_inventory
. It should have the same rows and columns as inventory
, except that the amount of fruit sold from each box should be subtracted from that box's count, so that the "count" is the amount of fruit remaining after Saturday.
remaining_inventory = ...
...
...
...
remaining_inventory
Once you're finished, submit your assignment as a .ipynb (Jupyter Notebook) and .pdf (download as .html, then print to save as a .pdf) on the class Canvas site.