Blood Cell Classification | Devnith Wijesinghe

Originally publised as a Notebook Kaggle. Bronze winning Notebook.

About the dataset

The dataset consists of 17,092 jpg images of individual normal cells, captured using the CellaVision DM96 analyzer at the Hospital Clinic of Barcelona. These images are categorized into eight groups: neutrophils, eosinophils, basophils, lymphocytes, monocytes, immature granulocytes, erythroblasts, and platelets. The images are 360 x 363 pixels in size and have been annotated by clinical pathologists.

Model structure

The model I used is a simple CNN model with two convolutional layers. The model could be improved by using Transfer Learning to build upon a pretrained model.

The code for the model is as follows

model = tf.keras.Sequential([
    keras.Input(shape=(300, 300, 3)),
    # Here is the rescaling layer which we use to normalize the input
    keras.layers.Rescaling(1./255),
    keras.layers.Conv2D(32, 3, activation='relu'),
    keras.layers.MaxPooling2D(),
    keras.layers.Conv2D(32, 3, activation='relu'),
    keras.layers.MaxPooling2D(),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Flatten(),
    keras.layers.Dense(num_classes)
])

The full code for the preprocessing, model training and evaluation process is included in the Jupyter Notebook.

Results

This simple model resulted in a accuracy over 80%. The validation accuracy was 87.5%. As shown in the image below, the model has predicted the label for most images correctly.

Technologies used

Kaggle
Tensorflow and Keras
Numpy
Matplotlib