Built-in functions: https://docs.python.org/3/library/functions.html Tutorial Jupyter: https://www.dataquest.io/blog/jupyter-notebook-tutorial/ Pandas on Python: https://pandas.pydata.org/ Seaborn on Python: https://seaborn.pydata.org/ NumPy on Python: https://numpy.org/install/ Free datasets for machine learning projects: https://www.dataquest.io/blog/free-datasets-for-projects/
from csv import reader
import pandas as pd
import numpy as np
# Read the `artworks_clean.csv` file
moma = pd.read_csv('artworks.csv')
moma = moma[1:]
moma.head()
Title | Artist | Nationality | BeginDate | EndDate | Gender | Date | Department | |
---|---|---|---|---|---|---|---|---|
1 | Duplicate of plate from folio 11 verso (supple... | Pablo Palazuelo | (Spanish) | (1916) | (2007) | (Male) | 1978 | Prints & Illustrated Books |
2 | Tailpiece (page 55) from SAGESSE | Maurice Denis | (French) | (1870) | (1943) | (Male) | 1889-1911 | Prints & Illustrated Books |
3 | Headpiece (page 129) from LIVRET DE FOLASTRIES... | Aristide Maillol | (French) | (1861) | (1944) | (Male) | 1927-1940 | Prints & Illustrated Books |
4 | 97 rue du Bac | Eugène Atget | (French) | (1857) | (1927) | (Male) | 1903 | Photography |
5 | Pictorial ornament (folio 11) from WOODCUTS | Antonio Frasconi | (American) | (1919) | (2013) | (Male) | 1957 | Prints & Illustrated Books |
for row in moma:
# remove parentheses from the nationality column
nationality = row[2]
nationality = nationality.replace("(","")
nationality = nationality.replace(")","")
row[2] = nationality
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-3-fe9fab49ca48> in <module> 4 nationality = nationality.replace("(","") 5 nationality = nationality.replace(")","") ----> 6 row[2] = nationality TypeError: 'str' object does not support item assignment
# remove parentheses from the gender column
gender = row[5]
gender = gender.replace("(","")
gender = gender.replace(")","")
row[5] = gender
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-17-e185b1ef4ab2> in <module> 4 nationality = nationality.replace("(","") 5 nationality = nationality.replace(")","") ----> 6 row[2] = nationality 7 8 # remove parentheses from the gender column TypeError: 'str' object does not support item assignment
for row in moma:
gender = row[5]
gender = gender.title()
if gender == "":
gender = "Gender Unknown/Other"
row[5] = gender
nationality = row[2]
nationality = nationality.title()
if not nationality:
nationality = "Nationality Unknown"
row[2] = nationality
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-14-d751d8825b0c> in <module> 1 for row in moma: ----> 2 gender = row[5] 3 gender = gender.title() 4 if gender == "": 5 gender = "Gender Unknown/Other" IndexError: string index out of range
def clean_and_convert(date):
# check that we don't have an empty string
if date != "":
# move the rest of the function inside
# the if statement
date = date.replace("(", "")
date = date.replace(")", "")
date = int(date)
return date
for row in moma:
Begin = row[3]
Last = row[4]
Begin = clean_and_convert(Begin)
Last =clean_and_convert(Last)
row[3] = Begin
row[4] = Last
# Convert the birthdate values
for row in moma:
birth_date = row[3]
if birth_date != "":
birth_date = int(birth_date)
row[3] = birth_date
# Convert the death date values
for row in moma:
death_date = row[4]
if death_date != "":
death_date = int(death_date)
row[4] = death_date
# Write your code below
for row in moma:
date = row[6]
if date != "":
date = int(date)
row[6] = date
--------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-2-707556aba54c> in <module> 4 opened_file = open('artworks_clean.csv') 5 read_file = reader(opened_file) ----> 6 moma = list(read_file) 7 moma = moma[1:] 8 ~\anaconda3\lib\encodings\cp1252.py in decode(self, input, final) 21 class IncrementalDecoder(codecs.IncrementalDecoder): 22 def decode(self, input, final=False): ---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0] 24 25 class StreamWriter(Codec,codecs.StreamWriter): UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6508: character maps to <undefined>
We now have a list — ages — containing the artist ages during which each artwork was produced. Because there are many unique ages, we'll calculate only the decade in which the artist created each work. For instance, if we calculate that the artist was 24, we'll record that as the artist being in their "20s."
ages = []
for row in moma:
birth = row[3]
date = row[6]
if type(birth) == int:
age = date - birth
else:
age = 0
ages.append(age)
final_ages = []
for age in ages:
if age > 20:
final_age = age
else:
final_age = "Unknown"
final_ages.append(final_age)
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-23-de4438032e68> in <module> 2 for row in moma: 3 birth = row[3] ----> 4 date = row[6] 5 if type(birth) == int: 6 age = date - birth IndexError: string index out of range
decades = []
for age in final_ages:
if age == "Unknown":
decade = age
else:
decade = str(age)
decade = decade[:-1]
decade = decade + "0s"
decades.append(decade)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-24-1fde09053313> in <module> 1 decades = [] ----> 2 for age in final_ages: 3 if age == "Unknown": 4 decade = age 5 else: NameError: name 'final_ages' is not defined
decade_frequency = {}
for f in decades:
if f not in decade_frequency:
decade_frequency[f] = 1
else:
decade_frequency[f] += 1
artist_freq = {}
for row in moma:
artist = row[1]
if artist not in artist_freq:
artist_freq[artist] = 1
else:
artist_freq[artist] += 1
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-21-9b2ce7c5702e> in <module> 1 decade_frequency = {} ----> 2 for f in decades: 3 if f not in decade_frequency: 4 decade_frequency[f] = 1 5 else: NameError: name 'decades' is not defined
Inside the function, we'll need to:
Retrieve the number of artworks by the artist from the artist_freq dictionary. Define a template for our output. Use str.format() to insert the artists name and number of artworks into our template. Use the print() function to display the output. The artist_freq dictionary you made in the previous screen will be available to you.
def artist_summary(artist):
num_artworks = artist_freq[artist]
template = "There are {num} artworks by {name} in the data set"
output = template.format(name=artist, num=num_artworks)
print(output)
artist_summary("Henri Matisse")
Create a template string that will insert the country name and population as shown in the example above. The country population should have a precision of two and use a comma separator. Use a for loop to iterate over the pop_millions list of lists and in each iteration: Assign the country name and population to two variables. Use str.format() to insert the two variables into your template string. Use the print() function to display the result of your str.format() call.
pop_millions = [
["China", 1379.302771],
["India", 1281.935991],
["USA", 326.625791],
["Indonesia", 260.580739],
["Brazil", 207.353391],
]
template = "The population of {} is {:,.2f} million"
for country in pop_millions:
name = country[0]
pop = country[1]
output = template.format(name, pop)
print(output)
You'll create a frequency table dictionary containing counts of the values in the Gender column. You'll loop over the dictionary, and use str.format() to print a formatted line of output summarizing each key-value pair in the dictionary. The format of the output will be:
gender_freq = {}
for row in moma:
gender = row[5]
if gender not in gender_freq:
gender_freq[gender] = 1
else:
gender_freq[gender] += 1
for gender, num in gender_freq.items():
template = "There are {n:,} artworks by {g} artists"
print(template.format(g=gender, n=num))