#!/usr/bin/env python # coding: utf-8 # # Color quantization with scikit-learn # # This notebook is part of a blog post on [Geophysics Labs](http://geophysicslabs.com/2016/12/15/how-to-add-maps-to-opendtect/). # # Here I demonstrate how to convert a 3-channel RGB picture into an indexed-color one-band grid. This step is essential to be able to import coloured images into OpendTect. # # The example shown here makes use of the Kevitsa dataset that was made freely available by the [Frank Arnott Award](https://www.frankarnottaward.com/Home.aspx). #
# Let's start by loading the Numpy and matplotlib libraries. # In[1]: import numpy as np import matplotlib.pyplot as plt import matplotlib.colors as mcolors get_ipython().run_line_magic('matplotlib', 'inline') # ## Loading the image # I have an image of the geological map of the Kevitsa area in Finland. This was exported as a PNG image from QGIS. # # The image file is read with the matplotlib function `imread`, which converts the image into a Numpy array. # In[2]: inFile = r'..\data\Kevitsa_geology_noframe.png' imRGB = plt.imread(inFile) # plot fig,ax = plt.subplots(figsize=(6,6)) ax.imshow(imRGB) plt.title('Original RGB image') # Let's check the dimensions of the array: # In[3]: imRGB.shape # Although this is not always the case, this PNG file contains four channels: the three colour bands (red, blue and green) and the alpha channel that stores the transparency information. We need to get rid of it with: # In[4]: imRGB = imRGB[:,:,:3] # ## Color Quantization # We need first to load the palette, which is a list of colours that we consider as fixed. # In[5]: inFile = r'..\data\Windows_256_color_palette_RGB.csv' win256 = np.loadtxt(inFile,delimiter=',') # In[6]: win256[:5] # Note that the colours are defined with integers ranging from 0 to 255. # # *See the end of this notebook in the appendix for an image of the colours present in the palette.* # # Next, we have to reshape the array of our RGB image to make sure it fits the same format with one column for each channel. # In[7]: nrows,ncols,d = imRGB.shape flat_array = np.reshape(imRGB, (nrows*ncols, 3)) flat_array[:5] # Note that in this case, the colours are defined with floats ranging from 0 to 1. Something to keep in mind for the next step. # # Now we can compute the colours in the palette that are the closest to the colour of each pixel in our RGB image. This can be done easily using the `pairwise_distances_argmin` function available in the [scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances_argmin.html#sklearn-metrics-pairwise-distances-argmin) library. # In[8]: # import function from sklearn.metrics import pairwise_distances_argmin # run function, making sure the palette data is normalised to the 0-1 interval indices = pairwise_distances_argmin(flat_array,win256/255) # reshape the indices to the shape of the initial image indexedImage = indices.reshape((nrows,ncols)) # If we now display our result with a "normal" sequential colormap like *viridis*, we will get a strange image. This is because the plotting function is missing a crucial bit of information, which is the palette that was used to perform the quantization. # In[9]: fig,ax = plt.subplots(figsize=(6,6)) ax.imshow(indexedImage,cmap='viridis') plt.title('Quantization with nearest distance to win256') # To display our indexed-color image properly with matplotlib, we need first to create the appropriate colormap with the colours of the palette. This is done with a function of the `colors` sub-module in [matplotlib](http://matplotlib.org/api/colors_api.html). # In[10]: new_cm = mcolors.LinearSegmentedColormap.from_list('win256', win256/255) plt.register_cmap(cmap=new_cm) # optional but useful to be able to call the colormap by its name. # Let's call `imshow` again with our new colormap. We also need to add the `norm` parameter to prevent `imshow` from normalizing our indices. # In[11]: fig,ax = plt.subplots(figsize=(6,6)) ax.imshow(indexedImage,cmap='win256',norm=mcolors.NoNorm()) plt.title('Quantization with nearest distance to win256') # That's it! We can now save the resulting grid to a text file and import it into OpendTect as a 3D horizon. # # However, in our example, there is one more necesssary step. The rotated red square on the image tells us there is a mismatch between the grid of the image and the grid defined by the 3D survey. This can be corrected by interpolation and this is the subject of the next notebook. # # For now, we can simply save the array in a NPY file. # In[12]: outFile = r'..\data\Kevitsa_geology_indexed.npy' np.save(outFile,indexedImage) # # Appendix: how to display the colors in a colormap # # Here is a handy piece of code to display the 256 colours of the palette as individual squares. # In[13]: fig, ax = plt.subplots(figsize=(4,4)) ax.imshow(np.arange(256).reshape(16, 16), cmap = 'win256', interpolation="nearest", aspect="equal") ax.set_xticklabels([]) ax.set_yticklabels([]) ax.grid(False) ax.set_title('win256: Windows 8-bit palette')