Handling Comma Separated Value Files

This notebook showcases methods to extract data from CSVs:

  • csv containing delimiter separated values
  • csv containing tabular data
In [1]:
# import required libraries
import csv
import pandas as pd
from pprint import pprint

Utilities

In [2]:
def print_basic_csv(file_name, delimiter=','):
    """This function extracts and prints csv content from given filename
       Details: https://docs.python.org/2/library/csv.html
    Args:
        file_name (str): file path to be read
        delimiter (str): delimiter used in csv. Default is comma (',')

    Returns:
        None

    """
    csv_rows = list()
    csv_attr_dict = dict()
    csv_reader = None

    # read csv
    csv_reader = csv.reader(open(file_name, 'r'), delimiter=delimiter)
        
    # iterate and extract data    
    for row in csv_reader:
        print(row)
        csv_rows.append(row)
    
    # prepare attribute lists
    for col in csv_rows[0]:
        csv_attr_dict[col]=list()
    
    # iterate and add data to attribute lists
    for row in csv_rows[1:]:
        csv_attr_dict['sno'].append(row[0])
        csv_attr_dict['fruit'].append(row[1])
        csv_attr_dict['color'].append(row[2])
        csv_attr_dict['price'].append(row[3])
    
    # print the result
    print("\n\n")
    print("CSV Attributes::")
    pprint(csv_attr_dict)
            


def print_tabular_data(file_name,delimiter=","):
    """This function extracts and prints tabular csv content from given filename
       Details: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
    Args:
        file_name (str): file path to be read
        delimiter (str): delimiter used in csv. Default is comma ('\t')

    Returns:
        None

    """
    df = pd.read_csv(file_name,sep=delimiter)
    print(df)

Parse using CSV module

The print_basic_csv() function takes the input file name along with delimiter as input parameters.

In [3]:
print_basic_csv(r'tabular_csv.csv')
['sno', 'fruit', 'color', 'price']
['1', 'apple', 'red', '110.85']
['2', 'banana', 'yellow', '50.12']
['3', 'mango', 'yellow', '70.29']
['4', 'orange', 'orange', '80.00']
['5', 'kiwi', 'green', '150.00']
['6', 'pineapple', 'yellow', '90.00']
['7', 'guava', 'green', '20.00']



CSV Attributes::
{'color': ['red', 'yellow', 'yellow', 'orange', 'green', 'yellow', 'green'],
 'fruit': ['apple', 'banana', 'mango', 'orange', 'kiwi', 'pineapple', 'guava'],
 'price': ['110.85', '50.12', '70.29', '80.00', '150.00', '90.00', '20.00'],
 'sno': ['1', '2', '3', '4', '5', '6', '7']}

The first output in the above cell shows the data in the csv as-is. The second one is the parsed output showcasing the contents of the csv as key-value pairs


Parse using pandas

The print_tabular_data() function takes the input file name along with delimiter as input parameters. It uses pandas to do the heavy lifting

In [4]:
print_tabular_data(r'tabular_csv.csv')
   sno      fruit   color   price
0    1      apple     red  110.85
1    2     banana  yellow   50.12
2    3      mango  yellow   70.29
3    4     orange  orange   80.00
4    5       kiwi   green  150.00
5    6  pineapple  yellow   90.00
6    7      guava   green   20.00

The output in the above cell shows how pandas reads a csv and prepares a tabular dataframe