Sebastian Raschka 05/09/2014
We have a directory ../CSV_files_raw/
with CSV files where some of them have 'tab-separated' and some of them 'comma-separated' columns.
Here, we will 'fix' them, i.e., have them all comma-separated, and save them to a new directory ../CSV_fixed
.
First, we create a dictionary with the file basenames as keys. The values are lists of the file paths to the raw and new fixed CSV files. e.g.,
{
'abc.csv': ['../CSV_files_raw/abc.csv', '../CSV_fixed/abc.csv'],
'def.csv': ['../CSV_files_raw/def.csv', '../CSV_fixed/def.csv'],
...
}
import sys
import os
raw_dir = '../CSV_files_raw/'
fixed_dir = '../CSV_fixed'
if not os.path.exists(fixed_dir):
os.mkdir(fixed_dir)
f_dict = {os.path.basename(f):[os.path.join(raw_dir, f),
os.path.join(fixed_dir, f)]
for f in os.listdir(raw_dir) if f.endswith('.csv')}
Now, we can replace the tabs with commas for the new files very easily:
for f in f_dict.keys():
with open(f_dict[f][0], 'r') as raw, open(f_dict[f][1], 'w') as fixed:
for line in raw:
line = line.strip().split('\t')
fixed.write(','.join(line) + '\n')