Sebastian Raschka 05/09/2014

Fixing CSV files

We have a directory ../CSV_files_raw/ with CSV files where some of them have 'tab-separated' and some of them 'comma-separated' columns.
Here, we will 'fix' them, i.e., have them all comma-separated, and save them to a new directory ../CSV_fixed.

First, we create a dictionary with the file basenames as keys. The values are lists of the file paths to the raw and new fixed CSV files. e.g.,

{
'abc.csv': ['../CSV_files_raw/abc.csv', '../CSV_fixed/abc.csv'], 
'def.csv': ['../CSV_files_raw/def.csv', '../CSV_fixed/def.csv'], 
...
}
In [8]:
import sys
import os

raw_dir = '../CSV_files_raw/'
fixed_dir = '../CSV_fixed'

if not os.path.exists(fixed_dir):
    os.mkdir(fixed_dir)

f_dict = {os.path.basename(f):[os.path.join(raw_dir, f),
                                  os.path.join(fixed_dir, f)]
             for f in os.listdir(raw_dir) if f.endswith('.csv')} 

Now, we can replace the tabs with commas for the new files very easily:

In [11]:
for f in f_dict.keys():
    with open(f_dict[f][0], 'r') as raw, open(f_dict[f][1], 'w') as fixed:
        for line in raw:
            line = line.strip().split('\t')
            fixed.write(','.join(line) + '\n')