read and write files

Several ways to read and write files are intrudoced in this section. Please see I/O and Network part in official documentation for more details.

1. <font,color="blue">readdlm</font> is used to read a whole file.

Read a matrix from a file with headers and elements separated by the delimeter ( , ).

In [2]:
;cat data.txt
age,weight
12,110
54,165
26,131
In [3]:
d=readdlm("data.txt",',',header=true)
Out[3]:
(
3x2 Array{Float64,2}:
 12.0  110.0
 54.0  165.0
 26.0  131.0,

1x2 Array{String,2}:
 "age"  "weight")

The type of the array in d is determined by Julia. The type of the array can also be define with options. For example, the type of the array can be defined as String.

In [4]:
d=readdlm("data.txt",',',String,header=true,)
Out[4]:
(
3x2 Array{String,2}:
 "12"  "110"
 "54"  "165"
 "26"  "131",

1x2 Array{String,2}:
 "age"  "weight")
2. <font,color="blue">readline</font> is used to read files line by line.

The function readline is more flexible, because working with one line uses less memory and may be much faster.

Sometime it's impossible to read a big file with readdlm. An function to handle reading a big genotype using readline is shown below. The function open is used to open a file (here for read only without specifying a mode). The function readline is used to read one line from the file as an array. split and int are used to split the single array to tokens and transfer the type from string to int. The functin close is used to close the stream.

In [5]:
function read_genotypes(file,nrow,ncol,header=true)
    f=open(file)

    if header==true
        readline(f)
        nrow=nrow-1
    end

    mat = zeros(Int64,nrow,ncol)

    for i=1:nrow
        mat[i,:]=int(split(readline(f)))

        if(i%1000==0)
            println("This is line ",i)
        end
    end

    close(f)
    return mat
end
Out[5]:
read_genotypes (generic function with 2 methods)
3. <font,color="blue">writedlm</font> is used to write a file.

The d matrix is modified and write to another file using writedlm( filename, matrix, delimeter ).

In [6]:
newdata=d[1]
Out[6]:
3x2 Array{String,2}:
 "12"  "110"
 "54"  "165"
 "26"  "131"
In [8]:
newdata[1,2]="314"
Out[8]:
"314"
In [16]:
writedlm("datanew.txt",newdata," ")
In [18]:
;cat datanew.txt
12 314
54 165
26 131

You may find there is no header in datanew.txt. This can be solved as

In [23]:
myheader=d[2]
f = open("datanew2.txt", "w")
writedlm(f, myheader)
writedlm(f, newdata)
close(f)
In [25]:
;cat datanew2.txt
age	weight
12	314
54	165
26	131
4. read and write <font,color="blue">binary</font> files

If you need to work with a file several times, it's better to save it as a binary file to read and write in order to save memory and increase reading speed.

To write a variable as a binary file,

In [38]:
x=123
y=314
myfile=open("file.bin","w")
write(myfile,x)
write(myfile,y)
close(myfile)

To read a binary file,

In [40]:
myfile2=open("file.bin")
x2=read(myfile2,Int64)
y2=read(myfile2,Int64)
close(myfile2)
In [45]:
println(x2," ",y2)
123 314

<font,color="blue">write</font> can also be used to write formatted numbers.

In [ ]:
outfile = open("new.txt", "w")
write(outfile,@sprintf("%0.3f", .9999))
close(outfile)

To learn more about reading and writing files, please see the official documentation. A package called DataFrames may be helpful. Another package ODBC can be used to work with a database.