Overview

In general, converting photographic images to vector images in order to answer biological questions requires at least three steps:

  1. Making maps: process the image into a schematic of the image that recognizes and retains the parts of the photo that are informational.

  2. Measuring raw data from from this map: calling zones, counting spots, etc.

  3. Statistical analyses on these data

All three of these steps require a lot of decision-making and coding. What follows is a review of the path we've taken to get to statistical analysis for our images.

If the progamming code is confusing to read at any point, skip it. Read the human-language stuff.

Working environment

Due to the history of the project, the first bit of code below is for matlab/octave. After that, we'll use mostly BASH and python(>3.6). The custom modules used here, contained in the makeFlowerPolygons package, are written for unix-like environments, so windows won't work at this point, but MacOS and linux systems should be compatible.

To start, clone the github repo at https://github.com/danchurch/mimulusSpeckling.git. If you don't already have git on your computer, I would recommend it for this project.

git clone https://github.com/danchurch/mimulusSpeckling.git

I will try to remember to make all paths used below relative to the main directory of the is repo, which is called "mimulusSpeckling".

After this install the custom modules we use to phenotype our flowers, in the package "makeFlowerPolygons". It is available in pypi, install it using pip, something like:

pip3 install makeFlowerPolygons-dcthom

You may want superuser privileges for that:

sudo pip3 install makeFlowerPolygons-dcthom

You may find that you are missing some dependencies for the modules, you'll have to install them as they come up. Apologies, in the future if I have time I'll define the requirements in the setup files.

You may find it useful to make a shortcut to your repo directory. I put this in my ~/.bashrc (linux) or ~/.bash_profile for mac OS.

In [2]:
alias speck="cd /home/daniel/Documents/cooley_lab/mimulusSpeckling"

Python setup

We do a lot of this work in python. Here is what you need when in the python environment:

In [1]:
import os, pickle, sys, pathlib
## have to do makeFlower import before our matplotlib import, 
## because makeFlowerPolygons sets the proper backend for matplotlib:
from makeFlowerPolygons import flowerPetal, geojsonIO
## but if we want to use tkagg inside the notebook, we have to do this?:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import shapely.geometry as sg
import shapely.affinity as sa
import matplotlib.image as mpimg
from skimage import measure
from descartes import PolygonPatch
from scipy.spatial import distance

Make sure that you're in the home directory of the repo. This will vary depending on where you put it, of course, but mine looks like this:

In [2]:
os.chdir("/home/daniel/Documents/cooley_lab/mimulusSpeckling")

Getting images out of Doug's raster pipeline

Flower photos were broken into their various petals of interest, and color differences were categorized, by forcing all pixels into three color poles using kmeans clustering. Doug coded this pipeline, and Melia implemented it, also doing manual corrections where needed. A "bottom" petal, for instance looks like this, after Doug's pipeline (converted to grayscale):

In [9]:
## stay in python for a sec, before we go to matlab/octave
exFlowerImage="dougRaster/Rotated_and_Cropped/plate1/P431F1.JPG"

plt.rcParams['figure.figsize'] = [15, 5]

img=mpimg.imread(exFlowerImage)
plt.imshow(img)
Out[9]:
<matplotlib.image.AxesImage at 0x7f1f02f582b0>

In matlab or octave, we can see how Doug's pipeline sees these petals:

In [13]:
octave --no-gui
In [1]:
load("dougRaster/Rotated_and_Cropped/plate1/P431F1.mat")
pkg load image
aa = mat2gray(Petals.Clusters.right);
imshow(aa) %% that works.

We want to make polygons of spots and their petals. We can begin by "peeling" these two apart into separate, solid-black images:

In [2]:
bb = aa < 1;
imshow(bb);
In [3]:
cc = aa == 0;
imshow(cc);

We can generalize the above process, to get all of the available photos from Doug's efforts into a form ready to digitized. All of our image processing was originally done in chunks, named by the plates in which flower DNA was placed and thus data was outputted from the sequencers. This is code for plate 4 for instance:

In [ ]:
wd='/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/polygons/plate4'

dougRasterDir='/home/daniel/Documents/cooley_lab/mimulusSpeckling/dougRaster/Rotated_and_Cropped/plate4'

cd(dougRasterDir)

files = dir('P*.mat');

for file = files';
    im = file.name;
    imName = regexprep(im,'\.mat', ''); 
    %% go get our file, come back
    cd(dougRasterDir);
    rast=load(im);
    cd(wd);
    %% make a spot for our image, go to it:
    mkdir(imName);
    cd (imName);
    %% get our petal names (left, right mid)
    petNames = fieldnames(rast.Petals.Clusters);
    %% split images into petal and spot, export, for each of the three petals:
    for i = 1:length(petNames);
        pet = rast.Petals.(petNames{i}).data; %petal at hand
        mkdir(petNames{i});
        cd(petNames{i});
        %% for octave
        fileNamePetal = [imName "_" char(petNames(i)) "_" 'melted.csv'];
        %% for matlab, this may work better?
        %%fileNamePetal = imName + "_" +  petNames(i) + "_" + 'melted.csv';
        csvwrite(fileNamePetal,pet);
        cd ..;
    end;

(This is brittle code, if the file architectures or Doug's matlab structures changes, it will break. But for the moment it works. And it gets us out of matlab real quick-like.)

It is important to note what is going on here, because this may be one of the more universal entry points in to the pipeline. At this step, we are going into the matlab objects that are created by Doug's color simplification process. These are stored as matlab-type ".mat" files, in the following directory of our repo, in doug's folder, by plate. The jpg images in this folder are also stored here:

In [9]:
ls dougRaster/Rotated_and_Cropped
plate1  plate2  plate3  plate3Additional  plate4  plate4Additional
In [10]:
ls dougRaster/Rotated_and_Cropped/plate3
P773F1.JPG  P788F2.JPG  P814F3.JPG  P828F1.JPG  P842F1.JPG  P861F1.JPG
P773F1.mat  P788F2.mat  P814F3.mat  P828F1.mat  P842F1.mat  P861F1.mat
P773F2.JPG  P789F1.JPG  P815F2.JPG  P828F2.JPG  P842F2.JPG  P862F1.JPG
P773F2.mat  P789F1.mat  P815F2.mat  P828F2.mat  P842F2.mat  P862F2.JPG
P774F2.JPG  P789F2.JPG  P815F3.JPG  P829F1.JPG  P843F1.JPG  P863F1.JPG
P774F2.mat  P789F2.mat  P815F3.mat  P829F1.mat  P843F1.mat  P863F1.mat
P774F3.JPG  P791F1.JPG  P816F1.JPG  P829F2.JPG  P843F2.JPG  P863F2.JPG
P774F3.mat  P791F1.mat  P816F1.mat  P829F2.mat  P843F2.mat  P863F2.mat
P775F2.JPG  P791F2.JPG  P816F2.JPG  P830F1.JPG  P844F1.JPG  P864F1.JPG
P775F2.mat  P791F2.mat  P816F2.mat  P830F1.mat  P844F1.mat  P864F2.JPG
P775F3.JPG  P792F4.JPG  P817F1.JPG  P830F2.JPG  P844F2.JPG  P865F1.JPG
P775F3.mat  P792F4.mat  P817F1.mat  P830F2.mat  P844F2.mat  P865F1.mat
P776F1.JPG  P793F1.JPG  P817F2.JPG  P831F1.JPG  P845F1.JPG  P865F2.JPG
P776F1.mat  P793F1.mat  P817F2.mat  P831F1.mat  P845F1.mat  P865F2.mat
P776F2.JPG  P793F2.JPG  P818F1.JPG  P831F2.JPG  P845F2.JPG  P866F1.JPG
P776F2.mat  P793F2.mat  P818F1.mat  P831F2.mat  P845F2.mat  P866F1.mat
P777F3.JPG  P804F1.JPG  P818F2.JPG  P832F1.JPG  P846F1.JPG  P866F2.JPG
P777F3.mat  P804F1.mat  P818F2.mat  P832F1.mat  P846F1.mat  P866F2.mat
P777F4.JPG  P804F2.JPG  P819F1.JPG  P832F2.JPG  P846F2.JPG  P867F1.JPG
P777F4.mat  P804F2.mat  P819F1.mat  P832F2.mat  P846F2.mat  P867F1.mat
P778F1.JPG  P805F1.JPG  P819F2.JPG  P833F1.JPG  P847F1.JPG  P867F2.JPG
P778F1.mat  P805F1.mat  P819F2.mat  P833F1.mat  P847F1.mat  P867F2.mat
P778F2.JPG  P805F3.JPG  P820F1.JPG  P833F2.JPG  P847F2.JPG  P868F1.JPG
P778F2.mat  P805F3.mat  P820F1.mat  P833F2.mat  P847F2.mat  P868F1.mat
P779F2.JPG  P806F1.JPG  P820F2.JPG  P834F1.JPG  P848F1.JPG  P868F2.JPG
P779F2.mat  P806F1.mat  P820F2.mat  P834F1.mat  P848F1.mat  P868F2.mat
P779F3.JPG  P806F2.JPG  P821F1.JPG  P834F2.JPG  P848F2.JPG  P869F1.JPG
P779F3.mat  P806F2.mat  P821F1.mat  P834F2.mat  P848F2.mat  P869F1.mat
P782F1.JPG  P807F1.JPG  P821F2.JPG  P835F1.JPG  P849F1.JPG  P869F2.JPG
P782F1.mat  P807F1.mat  P821F2.mat  P835F1.mat  P849F1.mat  P869F2.mat
P782F2.JPG  P807F2.JPG  P822F1.JPG  P835F2.JPG  P849F2.JPG  P870F1.JPG
P782F2.mat  P807F2.mat  P822F1.mat  P835F2.mat  P849F2.mat  P870F1.mat
P783F1.JPG  P808F1.JPG  P822F2.JPG  P836F1.JPG  P850F1.JPG  P870F2.JPG
P783F1.mat  P808F1.mat  P822F2.mat  P836F1.mat  P850F1.mat  P870F2.mat
P783F2.JPG  P808F2.JPG  P823F1.JPG  P836F2.JPG  P850F2.JPG  P871F1.JPG
P783F2.mat  P808F2.mat  P823F1.mat  P836F2.mat  P850F2.mat  P871F1.mat
P784F1.JPG  P810F1.JPG  P823F2.JPG  P837F1.JPG  P852F1.JPG  P871F2.JPG
P784F1.mat  P810F1.mat  P823F2.mat  P837F1.mat  P852F1.mat  P871F2.mat
P784F2.JPG  P810F2.JPG  P824F1.JPG  P837F2.JPG  P852F2.JPG  P872F1.JPG
P784F2.mat  P810F2.mat  P824F1.mat  P837F2.mat  P852F2.mat  P872F2.JPG
P785F1.JPG  P811F1.JPG  P824F2.JPG  P838F1.JPG  P853F1.JPG  P873F1.JPG
P785F1.mat  P811F1.mat  P824F2.mat  P838F1.mat  P853F1.mat  P873F1.mat
P785F2.JPG  P811F2.JPG  P825F1.JPG  P838F2.JPG  P855F1.JPG  P873F2.JPG
P785F2.mat  P811F2.mat  P825F1.mat  P838F2.mat  P855F2.JPG  P873F2.mat
P786F1.JPG  P812F1.JPG  P825F2.JPG  P839F1.JPG  P856F1.JPG  P875F1.JPG
P786F1.mat  P812F1.mat  P825F2.mat  P839F1.mat  P857F1.JPG  P875F1.mat
P786F2.JPG  P812F2.JPG  P826F1.JPG  P839F2.JPG  P859F1.JPG  P875F2.JPG
P786F2.mat  P812F2.mat  P826F1.mat  P839F2.mat  P859F1.mat  P875F2.mat
P787F2.JPG  P813F1.JPG  P826F2.JPG  P840F1.JPG  P859F2.JPG  P876F2.JPG
P787F2.mat  P813F1.mat  P826F2.mat  P840F1.mat  P859F2.mat  P876F3.JPG
P787F3.JPG  P813F2.JPG  P827F1.JPG  P841F1.JPG  P860F1.JPG  P877F1.JPG
P787F3.mat  P813F2.mat  P827F1.mat  P841F1.mat  P860F1.mat  P877F1.mat
P788F1.JPG  P814F2.JPG  P827F2.JPG  P841F2.JPG  P860F2.JPG  P877F2.JPG
P788F1.mat  P814F2.mat  P827F2.mat  P841F2.mat  P860F2.mat  P877F2.mat

In the matlab/octave script above we grab a particular component of these matlab object files, a matrix that contains the pixel color and location info, and export it as a csv. Each petal is labeled by Doug by the order they appear on the photos: "left", "mid", and "right". These names have nothing to do with the position of the petals on the flowers.

The script above then makes a directory for each flower, then subdirectories for each petal, where the CSV files are placed. Later, we will also store our geojson files here, see below. The file names for these new CSVs are:

[Plant-number] + [Flower-number] + "_" + [left/mid/right] + "_melted.csv"

For instance: P773F1_left_melted.csv

Each of these matrix-turned-CSVs have four columns, and a row for each pixel. There is no header for these files, so here is an explanation:

  • The first and second columns are X and Y coordinates.
  • The third column cells should all contain the same value, either all 1, 2, or 3. 1 is a 'left' petal, 2 is a 'mid' petal, and 3 is a 'right' petal.
  • The fourth column contains the color information. Doug's formatting only retains non-background (non-white-tape) pixels. 2 indicates petal color, 3 is spot color. Thus this column should contain only 2's and 3's.

Looking at a few lines of one of these files:

In [3]:
sed -n '4310,4320'p make_polygons/polygons/plate3/P773F1/left/P773F1_left_melted.csv
120,631,1,2
120,632,1,2
120,633,1,2
120,634,1,2
120,635,1,2
120,636,1,2
120,637,1,3
120,638,1,3
120,639,1,3
120,640,1,3
120,641,1,3

This is showing pixels with an X of 120, a y-positions from 631 to 641. In the third column, we can see this is a 1='left' petal, and in the fourth column we see some pixels are petal ('2') and some are spots ('3').

Some notes for future development

These pixel position data get inverted often because coordinate systems vary in their origins when dealing with pixels and rasters. Raster/graphical coordinate systems often have the origin in the upperleft corner and increase in number as they go down the screen, we'll see that this flips the images when we go from viewing our petals as a raster to viewing their polygons in pyplot. I tried not to worry about these inversions too much, and did not mess with the original x, y positions at each step, just to try to retain the integrity of the data as much as possible. This means, however, that we end up looking at and working with upsidedown images quite often.

Perhaps more important here is that if this place is to be used as a starting point for image inputs to our pipeline, we may want to simplify here for others' ease of use. This pipeline is written specifically to deal with the quirks of the outputs from Doug's matlab scripts (2 and 3 for colors, no 1's, an unnecessary third column, etc etc.). If other labs were to use this, they would probably find these requirements annoying,

Digitizing petal outlines

Now we create map-like features out of our images, a process in GIS known as digitizing. We'll be using our custom python package for this project, flowerPetal, which can be found at https://pypi.org/project/makeFlowerPolygons-dcthom/. The github repo for this package is currently here . flowerPetal is built mostly using tools from scikit-image, and shapely. These are among the standard packages for manipulating images and polygons in python. We'll also import the other packages as necessary for this analysis.

Calling spots and petals from images

Above we dug into Doug's matlab pipeline, and extracted the data we needed from his matlab data objects into a more universal format, a CSV. As our next step, we need find our petals and spots in these CSVs, and put their information into a format that is useful to us. The module get_spots of our makeFlowerPolygons package tries to do this.

As an example, we'll assemble a toy directory with a few flowers images in it, and run our modules as scripts on these:

In [9]:
toy=$PWD"/make_polygons/toy"
mkdir $toy
newSpots=$toy"/newSpots"
mkdir $newSpots

Get some sample CSVs to play with:

In [10]:
cp -r ./make_polygons/polygons/plate3/P77{3,4}* $toy

Just for continuity, I'm also going to bring in this CSV, which is from one of the petals that we looked at above. It's an image from plate 1. Plate 1 and 2 geojsons and CSVs are in our google drive of finished images, to keep the repo from being so bloated. I've put a copy back in the repo:

In [11]:
mkdir -p $toy/P431F1/right
cp ./make_polygons/notebooks/P431F1_right_melted.csv $toy/P431F1/right

Our modules are available to be used as scripts in the repo itself. The get_spots module is here:

In [12]:
get_spots='make_polygons/package/makeFlowerPolygons/get_spots.py'

The get_spots module acts on a single CSV file at a time, in the format mention above from Doug's pipeline. For one petal csv file at a time, the syntax for the get_spots script is:

In [13]:
$get_spots -h
usage: get_spots.py [-h] [--destination DESTINATION] [--skipFinalClean] file

positional arguments:
  file                  Name of file that contains CSV versions of doug's
                        "melted" matrix, the result of manually choosing color
                        centers from his matlab color-categorization scripts.

optional arguments:
  -h, --help            show this help message and exit
  --destination DESTINATION, -d DESTINATION
                        Folder where you would like the outputted geojson.
                        Default is same directory.
  --skipFinalClean, -s  Use this flag to skip the clean step that occasionally
                        disappears polygons, if it can't fix them any other
                        way.

The last optional argument, --skipFinalClean, is necessary sometimes because our spot "cleaning" process, which attempts to resolve invalidities in the polygons, also sometimes deletes entire spots when it cannot resolve the errors in them any other way. When this happens, it's sometimes useful to retain the invalid polygon and fix it manually, with QGIS or with a text editor.

Another developer issue here is that error logging for polygon calls from images is really bad right now. There should be better tracking of which polygons worked, which ones needed fixing, which ones were thrown out, etc.

We'll get the list of the CSVs and run the script on each:

In [14]:
csvs=$(find $toy -name "*_melted.csv") 

for i in ${csvs[@]}; do
    echo $i
    $get_spots $i -d $newSpots 
done
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P774F2/mid/P774F2_mid_melted.csv
P774F2_mid_polys
P774F2_mid_polys petal is invalid! Opening bag of tricks
Trying buffering
Buffering complete. P774F2_mid_polys petal seems better.
This spot multipolygon seems okay without buffering.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P774F2/left/P774F2_left_melted.csv
P774F2_left_polys
P774F2_left_polys petal is invalid! Opening bag of tricks
Trying buffering
Buffering complete. P774F2_left_polys petal seems better.
This spot multipolygon seems okay without buffering.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P774F2/right/P774F2_right_melted.csv
P774F2_right_polys
This spot multipolygon seems okay without buffering.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P774F3/mid/P774F3_mid_melted.csv
P774F3_mid_polys
This spot multipolygon seems okay without buffering.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P774F3/left/P774F3_left_melted.csv
P774F3_left_polys
This spot multipolygon seems okay without buffering.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P774F3/right/P774F3_right_melted.csv
P774F3_right_polys
P774F3_right_polys petal is invalid! Opening bag of tricks
Trying buffering
multiple polygons detected where there should only be one
Buffering complete. P774F3_right_polys petal seems better.
P774F3_right_polys spots is invalid! Opening bag of tricks
Trying buffering
multiple polygons detected where there should only be one
Buffering complete. P774F3_right_polys spots seems better.
This spot multipolygon seems okay without buffering.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P773F2/mid/P773F2_mid_melted.csv
P773F2_mid_polys
P773F2_mid_polys petal is invalid! Opening bag of tricks
Trying buffering
multiple polygons detected where there should only be one
Buffering complete. P773F2_mid_polys petal seems better.
P773F2_mid_polys spots is invalid! Opening bag of tricks
Trying buffering
multiple polygons detected where there should only be one
Buffering complete. P773F2_mid_polys spots seems better.
Buffering spot multipolygon.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P773F2/left/P773F2_left_melted.csv
P773F2_left_polys
Buffering spot multipolygon.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P773F2/right/P773F2_right_melted.csv
P773F2_right_polys
This spot multipolygon seems okay without buffering.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P431F1/right/P431F1_right_melted.csv
P431F1_right_polys
P431F1_right_polys spots is invalid! Opening bag of tricks
Trying buffering
multiple polygons detected where there should only be one
Buffering complete. P431F1_right_polys spots seems better.
P431F1_right_polys spots is invalid! Opening bag of tricks
Trying buffering
Buffering complete. P431F1_right_polys spots seems better.
This spot multipolygon seems okay without buffering.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P773F1/mid/P773F1_mid_melted.csv
P773F1_mid_polys
P773F1_mid_polys spots is invalid! Opening bag of tricks
Trying buffering
Buffering complete. P773F1_mid_polys spots seems better.
P773F1_mid_polys spots is invalid! Opening bag of tricks
Trying buffering
Buffering complete. P773F1_mid_polys spots seems better.
This spot multipolygon seems okay without buffering.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P773F1/left/P773F1_left_melted.csv
P773F1_left_polys
P773F1_left_polys petal is invalid! Opening bag of tricks
Trying buffering
Buffering complete. P773F1_left_polys petal seems better.
P773F1_left_polys spots is invalid! Opening bag of tricks
Trying buffering
multiple polygons detected where there should only be one
Buffering complete. P773F1_left_polys spots seems better.
Buffering spot multipolygon.
/home/daniel/Documents/cooley_lab/mimulusSpeckling/make_polygons/toy/P773F1/right/P773F1_right_melted.csv
P773F1_right_polys
P773F1_right_polys petal is invalid! Opening bag of tricks
Trying buffering
Buffering complete. P773F1_right_polys petal seems better.
P773F1_right_polys spots is invalid! Opening bag of tricks
Trying buffering
Buffering complete. P773F1_right_polys spots seems better.
This spot multipolygon seems okay without buffering.

Using matplotlib (so back to python), we can see the original image:

In [1]:
exFlowerImage="./dougRaster/Rotated_and_Cropped/plate1/P431F1.JPG"
In [5]:
plt.rcParams['figure.figsize'] = [12, 5]
img=mpimg.imread(exFlowerImage)
plt.imshow(img)