author: Kirill Panin

Diabetic Retinopathy Detection

1. Feature and data explanation (2 points)

(+) The process of collecting data is described, a detailed explanation of the task is provided, its value is explained, target features are given;


Hi friends. I've been starting this project without practical knowledge in CV. I want help my friends in that sphere. They doing ml models associated with detection diabetic retinopathy disease on early stage.
I united some evaluation topic plans to one categories. Because approaches to image processing distinguished than processing with tabular data.

Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. It is estimated to affect over 93 million people.
Major aim of this task develop robust algorithms that can detection diabetic retinopathy disease on early stage and cope with presence of noise and variation.

Provided a extremely large dataset(82 GB) link data. Train set with labels(but some labels don't true)(35 GB). High-resolution retina images taken under a variety of imaging conditions. The images in the dataset come from different models and types of cameras, which can affect the visual appearance of left vs. right. A left and right eyes is provided for every patient. Like any real-world data set have encounter noise in both the images and labels. Images may contain artifacts, be out of focus, underexposed, or overexposed.

Clinicians, doctors has marked up the data using a web special tool.

A clinician has rated the presence of diabetic retinopathy in each image on a scale of 0 to 4, according to the following scale(multiclass classification, target):

0 - No DR
1 - Mild
2 - Moderate
3 - Severe
4 - Proliferative DR

The methods how to solve this task:

The goal is to make a retinopathy model by using a pretrained models inception v3(base and retraining some modified final layers with attention), VGG-16 and just Keras from scratch.

This can be massively improved with

  • high-resolution images
  • better data sampling
  • ensuring there is no leaking between training and validation sets, sample(replace = True) is real dangerous
  • better target variable (age) normalization
  • pretrained models
  • attention/related techniques to focus on areas, segmentation
In [9]:
import numpy as np
import pandas as pd
import seaborn as sns
import os
import tensorflow as tf
from PIL import Image
from tensorflow.python.client import device_lib
import matplotlib.pyplot as plt
In [5]:
#cd /media/gismart/sda/kirill_data/deepdee/
#cd ../../../workdir/Work/deepDee/
In [7]:
# Example of image.
im1 ='train/10210_right.jpeg')