This notebook showcases methods to extract data from CSVs:
# import required libraries
import csv
import pandas as pd
from pprint import pprint
def print_basic_csv(file_name, delimiter=','):
"""This function extracts and prints csv content from given filename
Details: https://docs.python.org/2/library/csv.html
Args:
file_name (str): file path to be read
delimiter (str): delimiter used in csv. Default is comma (',')
Returns:
None
"""
csv_rows = list()
csv_attr_dict = dict()
csv_reader = None
# read csv
csv_reader = csv.reader(open(file_name, 'r'), delimiter=delimiter)
# iterate and extract data
for row in csv_reader:
print(row)
csv_rows.append(row)
# prepare attribute lists
for col in csv_rows[0]:
csv_attr_dict[col]=list()
# iterate and add data to attribute lists
for row in csv_rows[1:]:
csv_attr_dict['sno'].append(row[0])
csv_attr_dict['fruit'].append(row[1])
csv_attr_dict['color'].append(row[2])
csv_attr_dict['price'].append(row[3])
# print the result
print("\n\n")
print("CSV Attributes::")
pprint(csv_attr_dict)
def print_tabular_data(file_name,delimiter=","):
"""This function extracts and prints tabular csv content from given filename
Details: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Args:
file_name (str): file path to be read
delimiter (str): delimiter used in csv. Default is comma ('\t')
Returns:
None
"""
df = pd.read_csv(file_name,sep=delimiter)
print(df)
The print_basic_csv() function takes the input file name along with delimiter as input parameters.
print_basic_csv(r'tabular_csv.csv')
['sno', 'fruit', 'color', 'price'] ['1', 'apple', 'red', '110.85'] ['2', 'banana', 'yellow', '50.12'] ['3', 'mango', 'yellow', '70.29'] ['4', 'orange', 'orange', '80.00'] ['5', 'kiwi', 'green', '150.00'] ['6', 'pineapple', 'yellow', '90.00'] ['7', 'guava', 'green', '20.00'] CSV Attributes:: {'color': ['red', 'yellow', 'yellow', 'orange', 'green', 'yellow', 'green'], 'fruit': ['apple', 'banana', 'mango', 'orange', 'kiwi', 'pineapple', 'guava'], 'price': ['110.85', '50.12', '70.29', '80.00', '150.00', '90.00', '20.00'], 'sno': ['1', '2', '3', '4', '5', '6', '7']}
The first output in the above cell shows the data in the csv as-is. The second one is the parsed output showcasing the contents of the csv as key-value pairs
The print_tabular_data() function takes the input file name along with delimiter as input parameters. It uses pandas to do the heavy lifting
print_tabular_data(r'tabular_csv.csv')
sno fruit color price 0 1 apple red 110.85 1 2 banana yellow 50.12 2 3 mango yellow 70.29 3 4 orange orange 80.00 4 5 kiwi green 150.00 5 6 pineapple yellow 90.00 6 7 guava green 20.00
The output in the above cell shows how pandas reads a csv and prepares a tabular dataframe