This work is done in order to understand the APIs of FastAI in a better fashion and how the model tuning works in an incremental fashion

Problem Statement - Classify the Clouds in one of the ten cohorts

The categories are Altocumulus, Altostratus, Cirroculumulus, Cirrostratus, Cirrus, Cumulonimbus, Cumulus, Nimbostratus, Stratocumulus, Stratus

Data - The data analysed in the present work is present here

Some information about the clouds
Clouds are classified according to their height above and appearance (texture) from the ground.
The following cloud roots and translations summarize the components of this classification system

Cirro-:curl of hair, high; > - Alto-:mid; > - Strato-:layer;> - Nimbo-:rain, precipitation;> - Cumulo-:heap.

Imports and Downloads

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
input_dir = "/kaggle/input/"
dirs = ["howard-cloudx", "clouds", "cloudtest"]
for d in dirs:
    for dirname, _, filenames in os.walk(input_dir + d):
        counter = 0
        for filename in filenames:
            counter = counter + 1
        print(f"Number of files loaded:{counter}")

Number of files loaded:0
Number of files loaded:0
Number of files loaded:0
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:0
Number of files loaded:106
Number of files loaded:112
Number of files loaded:107
Number of files loaded:100
Number of files loaded:162
Number of files loaded:104
Number of files loaded:106
Number of files loaded:163
Number of files loaded:100
Number of files loaded:111
Number of files loaded:0
Number of files loaded:0
Number of files loaded:0
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:0
Number of files loaded:106
Number of files loaded:112
Number of files loaded:107
Number of files loaded:100
Number of files loaded:162
Number of files loaded:106
Number of files loaded:163
Number of files loaded:100
Number of files loaded:3

Download Timm for the pyTorch models

! pip install timm

Collecting timm
  Downloading timm-0.6.7-py3-none-any.whl (509 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 510.0/510.0 kB 2.1 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: torchvision in /opt/conda/lib/python3.7/site-packages (from timm) (0.12.0)
Requirement already satisfied: torch>=1.4 in /opt/conda/lib/python3.7/site-packages (from timm) (1.11.0)
Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.7/site-packages (from torch>=1.4->timm) (4.3.0)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /opt/conda/lib/python3.7/site-packages (from torchvision->timm) (9.1.1)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from torchvision->timm) (1.21.6)
Requirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from torchvision->timm) (2.28.1)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.7/site-packages (from requests->torchvision->timm) (2.1.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->torchvision->timm) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->torchvision->timm) (1.26.12)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->torchvision->timm) (2022.6.15)
Installing collected packages: timm
Successfully installed timm-0.6.7
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

from fastai.vision.all import *
from fastai import *
from fastai.vision import *
from fastai.vision.widgets import *
import timm

Set the relevant paths

path = Path('../input/howard-cloudx/Howard-Cloud-X/')
path.ls()

(#2) [Path('../input/howard-cloudx/Howard-Cloud-X/test'),Path('../input/howard-cloudx/Howard-Cloud-X/train')]

train = (path/"train")
valid = (path/"test")

len(train.ls())

10

fnames = get_image_files(path)

fnames

(#1420) [Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/7bc6ae9d-9f32-4e42-8927-5561fa5d6cf5.jpg'),Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/7aa35bb4-4779-4d28-8bc7-f6ebde621dc1.jpg'),Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/0b7f52be-982c-4d98-b8f8-2630dbd77472.jpg'),Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/8bd1cfbb-663e-4760-ae12-dc4a35f84648.jpg'),Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/584d01a2-a9c8-4160-8186-434b617d7c3f.jpg'),Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/5d144b98-9413-49e5-8cd6-ff866b83eec5.jpg'),Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/5bde98f2-49c0-476f-be6c-52172ecb120c.jpg'),Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/3eb68ded-346c-4003-b02f-615e37e100e9.jpg'),Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/00a82e59-a751-4668-81fe-1b1ad638749f.jpg'),Path('../input/howard-cloudx/Howard-Cloud-X/test/Cirroculumulus/1b7312b1-d774-4a03-8fa3-25773ac6f358.jpg')...]

dblock = DataBlock()
dsets = dblock.datasets(fnames)
dsets.train[0]
len(dsets.train), len(dsets.valid)

(1136, 284)

Attempt 1
Use a vanilla datablock, imitate the various tutorials and see how is the result
Model used - Resnet 34
Error - 45%

Item transformation is needed because all the images are of different sizes, the deep models generally stack all the images together and flatten them, that's why it is imperative to resize them to same dimensions.

dblock = DataBlock(blocks = (ImageBlock, CategoryBlock),
    get_items = get_image_files, get_y = parent_label, splitter  = RandomSplitter(), item_tfms = Resize(300, method = ResizeMethod.Squish),
                  batch_tfms=aug_transforms(size= 224))

We also have used batch transformations which is Data augmentation here. Each batch of the images will be augmented( orientation change, crop randomly, darken, lighten etc) and then the size will be changed to the size we supplied.

dsets = dblock.datasets(path)

dsets.vocab

['Altocumulus', 'Altostratus', 'Cirroculumulus', 'Cirrostratus', 'Cirrus', 'Cumulonimbus', 'Cumulus', 'Nimbostratus', 'Stratocumulus', 'Stratus']

dls = dblock.dataloaders(path)
dls.show_batch()

learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.lr_find()

Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth

SuggestedLRs(valley=0.0005754399462603033)

learn.fine_tune(5, 3e-3)

45% error rate is high

Keep calling the garbage collector as these models are large

import gc
_ = gc.collect()

Attempt 2
Use Image data loaders, provide train and test data separately.
Model used - Convnext_small_in22k
Accuracy - 59-60%

dls_2 = ImageDataLoaders.from_folder(path, train = "train", valid = "test",vocab = dsets.vocab, bs = 64, item_tfms = Resize(300, method = ResizeMethod.Squish),
                  batch_tfms=aug_transforms(size= 224))
                  #batch_tfms=[*aug_transforms(size=224),Normalize.from_stats(*imagenet_stats)])

dls_2.show_batch()

learn_2 = vision_learner(dls_2, "convnext_small_in22k", model_dir="/tmp/model/", metrics=accuracy)

Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_small_22k_224.pth" to /root/.cache/torch/hub/checkpoints/convnext_small_22k_224.pth

learn_2.fine_tune(10, 3e-3)

learn_2.fine_tune(10, 3e-3)

learn_2.fine_tune(5, 3e-3)

Validation loss is very bumpy

learn_2.show_results()

Attempt 3
Similar to attempt 2, change in bacth size and model
Model used - Convnext_tiny_hnf
Accuracy - 59%

dls_3 = ImageDataLoaders.from_folder(path, train = "train", valid = "test",vocab = dsets.vocab, bs = 32, item_tfms = Resize(400, method = ResizeMethod.Squish),
                  batch_tfms=aug_transforms(size= 300))
                  #batch_tfms=[*aug_transforms(size=224),Normalize.from_stats(*imagenet_stats)])

learn_3 = vision_learner(dls_3, "convnext_tiny_hnf", model_dir="/tmp/model/", metrics=accuracy)

Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/convnext_tiny_hnf_a2h-ab7e9df2.pth" to /root/.cache/torch/hub/checkpoints/convnext_tiny_hnf_a2h-ab7e9df2.pth

learn_3.fine_tune(10, 3e-3)

learn_3.fine_tune(5, 3e-3)

Not much improvement here

learn_3.show_results()

A lot of confused classes
Use interp = ClassificationInterpretation.from_learner(learn_3)

interp.most_confused(min_val =3)

[('Altocumulus', 'Cirroculumulus', 7),
 ('Stratus', 'Stratocumulus', 7),
 ('Altocumulus', 'Altostratus', 6),
 ('Stratocumulus', 'Stratus', 6),
 ('Cirrostratus', 'Altostratus', 5),
 ('Cirrostratus', 'Cirrus', 5),
 ('Cirrus', 'Cirrostratus', 5),
 ('Cumulus', 'Cumulonimbus', 4),
 ('Cirrostratus', 'Altocumulus', 3),
 ('Cirrostratus', 'Cirroculumulus', 3),
 ('Cumulonimbus', 'Cumulus', 3)]

Attempt 4
Train more, use default data augmentation and change model to convnext base
Model used - Convnext_base
Accuracy - 61%

dls_4 = ImageDataLoaders.from_folder(path, train = "train", valid = "test",vocab = dsets.vocab, bs = 32, item_tfms = Resize(128),
                  batch_tfms=aug_transforms())
                  #batch_tfms=[*aug_transforms(size=224),Normalize.from_stats(*imagenet_stats)])

learn_4 = vision_learner(dls_4, "convnext_base", model_dir="/tmp/model/", metrics=accuracy)

Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_base_1k_224_ema.pth" to /root/.cache/torch/hub/checkpoints/convnext_base_1k_224_ema.pth

Trained once and then train again

learn_4.fine_tune(10, 3e-3)

turns = 2
for i in range(turns):
    learn_4.fine_tune(10, 3e-3)

learn_4.show_results()

interp = ClassificationInterpretation.from_learner(learn_4)
interp.plot_confusion_matrix()

The confusion matrix shows that certain classes are prone to getting misclassified. Let's have one more attempt and then see what's happening
Altocumulus, Stratus, Stratocumulus, and Cirrorstratus don't produce good results

Attempt 5
Data augmentation according to Imagenet statistics
Model used - Convnext_small_in22k
Accuracy - 59%

Empty the torch cache to avoid the CUDA out of memory error

torch.cuda.empty_cache()

dls_5 = ImageDataLoaders.from_folder(path, train = "train", valid = "test",vocab = dsets.vocab, bs = 64, item_tfms = Resize(300, method = ResizeMethod.Squish),
                  #batch_tfms=aug_transforms(size= 224))
                  batch_tfms=[*aug_transforms(size=224),Normalize.from_stats(*imagenet_stats)])

learn_5 = vision_learner(dls_5, "convnext_small_in22k", model_dir="/tmp/model/", metrics=[accuracy])

turns = 2
for i in range(turns):
    learn_5.fine_tune(10, 3e-3)

learn_5.show_results()

interp = ClassificationInterpretation.from_learner(learn_5)
interp.plot_confusion_matrix()

interp.most_confused(min_val =4)

[('Cirrostratus', 'Altostratus', 7),
 ('Altocumulus', 'Cirroculumulus', 6),
 ('Stratocumulus', 'Stratus', 6),
 ('Stratus', 'Stratocumulus', 6),
 ('Altocumulus', 'Altostratus', 5),
 ('Cumulonimbus', 'Cumulus', 5),
 ('Stratocumulus', 'Nimbostratus', 5),
 ('Cumulus', 'Cumulonimbus', 4),
 ('Nimbostratus', 'Cumulus', 4)]

As it turns out that even in this iteration Altocumulus, Cirrostratus, and Stratocumulus are getting misclassified a lot

I am no cloud expert, so googling to see the difference between these clouds.

This primer provides a quick overview.

Cirrostratus has a tendency to get misclassified in Cirro category clouds viz. Cirrus and Cirrocumulus. Also, it gets classified as Altostratus

Similarly, Altocumulus looks a lot like Altostratus and Cirrocumulus. Thus, a lot of misclassification.

There are other misclassifications too but for now, let's see how the model behaves when we remove two noisy classes out of the three. I choose Altocumulus and Cirrostratus randomly and would see the result.
There should be better methods to improve the efficiency:> - Remove noisy class> - Balance the classes

Get more samples per class

Data preprocessing

For now, let's remove the noisy class and see how the model performs

Changed Data
The clouds directory has the updated data

Attempt 6
Build the model on updated dataset and use the parameters from the test that performed the best
Model used - Convnext_small_in22k
Accuracy - 70%

for dirname, _, filenames in os.walk('/kaggle/input/clouds/Clouds'):
    counter = 0
    for filename in filenames:
        counter = counter + 1
        #print(os.path.join(dirname, filename))
    print(f"Number of files loaded:{counter}")

Number of files loaded:0
Number of files loaded:0
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:25
Number of files loaded:0
Number of files loaded:106
Number of files loaded:112
Number of files loaded:107
Number of files loaded:100
Number of files loaded:162
Number of files loaded:106
Number of files loaded:163
Number of files loaded:100

path_n = Path('../input/clouds/Clouds')
path_n.ls()

(#2) [Path('../input/clouds/Clouds/test'),Path('../input/clouds/Clouds/train')]

_This part isn't really needed, one can create a list of classes for dsetsn

dblock_n = DataBlock(blocks = (ImageBlock, CategoryBlock),
    get_items = get_image_files, get_y = parent_label, splitter  = RandomSplitter(), item_tfms = Resize(300, method = ResizeMethod.Squish),
                  batch_tfms=aug_transforms(size= 224))

dsets_n = dblock_n.datasets(path_n)
dsets_n.vocab

['Altostratus', 'Cirroculumulus', 'Cirrus', 'Cumulonimbus', 'Cumulus', 'Nimbostratus', 'Stratocumulus', 'Stratus']

Use the same configuration of batch size 32, image resize, and batch transformations

dls_6 = ImageDataLoaders.from_folder(path_n, train = "train", valid = "test",vocab = dsets_n.vocab, bs = 32, item_tfms = Resize(128),
                  batch_tfms=aug_transforms())

learn_6 = vision_learner(dls_6, "convnext_small_in22k", model_dir="/tmp/model/", metrics=accuracy)

Try to retrain the entire model from scratch instead of using pretrained weights from the transfer learning model

learn_6.fit_one_cycle(3, 3e-3)

learn_6.unfreeze()

learn_6.fit_one_cycle(10)

Not much improvement, fine tune to see if there is any change.

learn_6.fine_tune(10, 3e-3)

learn_6.fine_tune(10, 3e-3)

Validation loss is very bumpy
It can mean that the learning rate isn't set correctly.

learn_6.recorder.plot_loss()

learn_6.show_results()

The Confusion Matrix looks a little better than before

interp = ClassificationInterpretation.from_learner(learn_6)
interp.plot_confusion_matrix()

Attempt 7
Use a smaller model and use different learning rates
Model used - Convnext_small
Accuracy - 72.5%

dls_7 = ImageDataLoaders.from_folder(path_n, train = "train", valid = "test",vocab = dsets_n.vocab, bs = 32, item_tfms = Resize(224),
                  batch_tfms=aug_transforms())

learn_7 = vision_learner(dls_7, "convnext_small", model_dir="/tmp/model/", metrics=accuracy)

Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_small_1k_224_ema.pth" to /root/.cache/torch/hub/checkpoints/convnext_small_1k_224_ema.pth

learn_7.lr_find()

SuggestedLRs(valley=0.0014454397605732083)

This has been a good lesson. Use high larger learning rate when you start the training and then decrease it progressively.
FastAI provides discriminative learning rate using splice but we are doing it manually here

lr = [2e-2, 3e-3]
for i in range(len(lr)):
    learn_7.fine_tune(10, lr[i])

Still a little bumpy validation loss but accuracy seems stable

learn_7.recorder.plot_loss()

learn_7.show_results()

interp = ClassificationInterpretation.from_learner(learn_7)
interp.plot_confusion_matrix()

learn_7.loss_func

FlattenedLoss of CrossEntropyLoss()

learn_7.opt

<fastai.optimizer.Optimizer at 0x7f3511212d10>

Attempt 8 - Final attempt
Change the loss function; Decrease the size of the model
Model used - Convnext_tiny
Accuracy - 71.5%

dls_8 = ImageDataLoaders.from_folder(path_n, train = "train", valid = "test",vocab = dsets_n.vocab, bs = 32, item_tfms = Resize(224),
                  batch_tfms=aug_transforms(size= 128))

The Binary cross entropy loss would penalise those preductions that are confident but wrong.

Focal loss disccussed in this paper https://arxiv.org/pdf/1708.02002.pdf provides more control over the penalisation with the help of a gamma parameter. Gamma = 0 behaves similar to Cross Entropy Loss but higher values of gamma (0 to 2) change this behaviour.

Higher values of gamma down-weight easy examples’ contribution to loss

learn_8 = vision_learner(dls_8, "convnext_tiny", model_dir="/tmp/model/", loss_func= FocalLossFlat(gamma = 1.5),metrics=accuracy)

Just checking whether loss function has updated or not!!! (Trust issues) :P

learn_8.loss_func

FlattenedLoss of FocalLoss()

learn_8.lr_find()

SuggestedLRs(valley=0.0004786300996784121)

Use not only the discrimative learning rates but also decrease the amount of training for lower training rate

lr = [2e-2, 3e-3, 4e-4]
epoch_lst = [10, 8, 7]
for i in range(len(lr)):
    learn_8.fine_tune(epoch_lst[i], lr[i])

learn_8.show_results()

Hmmm...okayish results here, still many misclassifications in Stratus and Stratocumulus.

interp = ClassificationInterpretation.from_learner(learn_8)
interp.plot_confusion_matrix()

Out of all, Nimbostratus should not have any misclassifications. Nimbostartus are the dark clouds during the rainy days!!!

interp.most_confused(min_val =4)

[('Cumulonimbus', 'Cumulus', 5),
 ('Stratocumulus', 'Stratus', 5),
 ('Cumulus', 'Cumulonimbus', 4),
 ('Stratus', 'Cumulonimbus', 4),
 ('Stratus', 'Stratocumulus', 4)]

Testing Phase

Let's test on unseen data. All these images have been taken from the internet.
Please upload more images in the cloudtest directory for testing purpose

A Cirrus Cloud

img_2 = PILImage.create("../input/cloudtest/Screenshot 2022-09-02 at 10.44.33 PM.png")
img_2

learn_8.predict(img_2)

('Cirrus',
 TensorBase(2),
 TensorBase([1.7253e-07, 3.9383e-04, 9.9961e-01, 1.5218e-07, 5.4315e-07, 2.1076e-08,
         1.6901e-07, 2.4720e-07]))

Another Cirrus

img_3 = PILImage.create("../input/cloudtest/Screenshot 2022-09-02 at 11.35.30 PM.png")
img_3

learn_8.predict(img_3)

('Cirrus',
 TensorBase(2),
 TensorBase([6.0119e-04, 3.1065e-01, 6.7279e-01, 1.9115e-04, 1.5691e-04, 5.5428e-03,
         4.2583e-03, 5.8062e-03]))

A Cumulus Cloud

img_4 = PILImage.create("../input/cloudtest/Screenshot 2022-09-02 at 11.36.15 PM.png")
img_4

learn_8.predict(img_4)

('Cumulus',
 TensorBase(4),
 TensorBase([1.1306e-05, 1.1890e-06, 2.5026e-06, 1.8337e-03, 9.9480e-01, 3.3508e-03,
         1.7057e-06, 2.3660e-07]))

The model correctly classified all the three unseen images.
It's only 72.5% correct, so it is bound to fail on 30% unless we make improvements.

Let's check how the validation loss is behaving in the last model

learn_8.recorder.plot_loss()

It's empty because the recorder object is somehow empty.
Strange!!

learn_8.recorder.losses

[]

Dirty work, create a dataframe from the epochs and train, valid_loss from the iterations above manually and plot

data = {'Epochs':  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
        'train_loss': [1.365729,
1.218585,
1.123894,
1.093431,
0.954352,
0.795425,
0.675656,
0.543923,
0.431559,
0.366880,
0.168368,
0.161951,
0.163574,
0.162362,
0.164124
],
         'valid_loss': [1.222900,
0.952413,
1.245164,
1.219178,
1.016823,
0.980452,
0.715877,
0.732847,
0.700989,
0.680698,
0.690924,
0.702923,
0.694562,
0.700117,
0.698838],
        
        'accuracy' : [0.505000,
0.570000,
0.545000,
0.585000,
0.605000,
0.615000,
0.660000,
0.685000,
0.705000,
0.700000,
0.710000,
0.710000,
0.710000,
0.705000,
0.715000]
        
        }

df = pd.DataFrame(data)

df

Shows a decent picture that how validation loss decreased and with that the accuracy increased.

I prefer this version that attempt 7 as the validation loss seems much lower and accuracy doesn't suffer too greatly.

plt.figure(figsize=(10, 10))
plt.xlabel("Epochs")
plt.ylabel("Loss and Accuracy")
plt.plot(df['train_loss'], label = "train_loss")
plt.plot(df['valid_loss'], label = "valid_loss")
plt.plot(df['accuracy'], label = "accuracy")
plt.legend()
plt.show()

learn_8.recorder.metric_names

(#5) ['epoch','train_loss','valid_loss','accuracy','time']

cm = interp.confusion_matrix()
interp.plot_confusion_matrix()
cm

array([[22,  0,  2,  0,  0,  0,  0,  1],
       [ 2, 20,  2,  0,  1,  0,  0,  0],
       [ 1,  2, 22,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 17,  5,  1,  2,  0],
       [ 0,  0,  0,  4, 20,  1,  0,  0],
       [ 0,  0,  1,  1,  2, 17,  2,  2],
       [ 1,  0,  1,  0,  3,  2, 13,  5],
       [ 0,  1,  0,  4,  2,  2,  4, 12]])

As this is a classification problem, let's check the precision, recall, and F1 score

tp = cm.diagonal()

fn = cm.sum(1) - tp
fp = cm.sum(0) - tp

precision = tp / (tp + fp)
recall = tp / (tp + fn)

recall

array([0.88, 0.8 , 0.88, 0.68, 0.8 , 0.68, 0.52, 0.48])

Precision : It is the quantity of the right predictions that the model made. It doesn't consider the wrong predictions made by the model.

precision

array([0.84615385, 0.86956522, 0.78571429, 0.65384615, 0.60606061,
       0.73913043, 0.61904762, 0.6       ])

An ideal system with high precision and high recall will return many results, with all results labeled correctly.

np.sum(precision), np.sum(recall)

(5.719518162996424, 5.720000000000001)

print("Precision for the Model : ", 0.715)
print("Recall for the Model :", 0.715)

Precision for the Model :  0.715
Recall for the Model : 0.715

F1 score is the harmonic mean of precision and recall and balances precision and recall, a value close to 1 is desirable

f1 = (2* precision * recall)/(precision+recall)
f1

array([0.8627451 , 0.83333333, 0.83018868, 0.66666667, 0.68965517,
       0.70833333, 0.56521739, 0.53333333])

np.mean(f1)

0.7111841259586632

0.711 as F1 score. It's not great but ok!

There are many things that can be done to improve the model but would require a long time and most of it would be in data prep. For now, I am moving on and might come back to check on it again.

epoch	train_loss	valid_loss	error_rate	time
0	2.064874	1.747166	0.609155	00:48
1	1.752833	1.704080	0.510563	00:48
2	1.523126	1.648633	0.482394	00:47
3	1.290462	1.527772	0.461268	00:46
4	1.103239	1.472800	0.450704	00:48

epoch	train_loss	valid_loss	accuracy	time
0	1.785524	1.654420	0.496000	01:16
1	1.653368	1.497555	0.532000	01:15
2	1.462549	1.604444	0.532000	01:12
3	1.318390	1.526556	0.508000	00:56
4	1.201218	1.520712	0.536000	00:58
5	1.087901	1.508507	0.560000	00:57
6	0.971522	1.487124	0.544000	00:57
7	0.869471	1.502575	0.544000	01:00
8	0.774482	1.483035	0.556000	01:00
9	0.720841	1.477164	0.556000	00:59

epoch	train_loss	valid_loss	accuracy	time
0	0.611424	1.463446	0.584000	00:57
1	0.557592	1.490726	0.584000	00:57
2	0.535933	1.540918	0.580000	00:56
3	0.532068	1.563381	0.552000	00:56
4	0.505497	1.657638	0.536000	00:57
5	0.488271	1.666002	0.564000	00:56
6	0.449273	1.650724	0.560000	00:56
7	0.415201	1.592716	0.556000	00:57
8	0.380137	1.578643	0.556000	00:57
9	0.358816	1.575698	0.552000	00:56

epoch	train_loss	valid_loss	accuracy	time
0	0.370500	1.589147	0.556000	01:14
1	0.364840	1.605959	0.568000	01:14
2	0.338760	1.648296	0.580000	01:16
3	0.315394	1.643008	0.600000	01:16
4	0.281450	1.620948	0.592000	01:16

epoch	train_loss	valid_loss	accuracy	time
0	1.705282	1.659915	0.464000	01:03
1	1.510822	1.677615	0.500000	01:02
2	1.389804	1.828296	0.484000	01:05
3	1.243197	1.696750	0.548000	01:02
4	1.028381	1.620666	0.536000	01:04
5	0.865795	1.609255	0.596000	01:04
6	0.722874	1.573360	0.572000	01:05
7	0.598872	1.562924	0.576000	01:05
8	0.526616	1.588943	0.596000	01:05
9	0.483461	1.561903	0.572000	01:02

epoch	train_loss	valid_loss	accuracy	time
0	0.511615	1.587777	0.580000	01:03
1	0.579787	1.852812	0.564000	01:05
2	0.533436	1.827137	0.576000	01:03
3	0.445922	1.816562	0.576000	01:03
4	0.378245	1.790648	0.596000	01:05

epoch	train_loss	valid_loss	accuracy	time
0	2.147752	1.833840	0.404000	00:42
1	1.912063	1.745342	0.456000	00:42
2	1.813161	1.781601	0.488000	00:43
3	1.630206	1.714715	0.476000	00:41
4	1.467049	1.503493	0.528000	00:42
5	1.309382	1.521237	0.560000	00:42
6	1.134238	1.565486	0.556000	00:42
7	0.990034	1.484479	0.544000	00:42
8	0.902086	1.504959	0.536000	00:42
9	0.880721	1.513757	0.540000	00:41

epoch	train_loss	valid_loss	accuracy	time
0	0.912602	1.510895	0.536000	00:42
1	0.865395	1.549080	0.576000	00:42
2	0.875826	1.717728	0.512000	00:42
3	0.862483	1.573677	0.548000	00:41
4	0.786304	1.639315	0.556000	00:42
5	0.735020	1.523650	0.596000	00:42
6	0.648714	1.536606	0.560000	00:42
7	0.574546	1.554603	0.568000	00:43
8	0.524813	1.570686	0.576000	00:44
9	0.495079	1.571282	0.576000	00:43

epoch	train_loss	valid_loss	accuracy	time
0	0.547203	1.604538	0.572000	00:43
1	0.494175	1.651787	0.580000	00:42
2	0.553949	1.665809	0.580000	00:42
3	0.550795	1.639198	0.588000	00:44
4	0.509917	1.724905	0.564000	00:42
5	0.497022	1.643369	0.580000	00:42
6	0.448560	1.632840	0.596000	00:43
7	0.384618	1.619914	0.604000	00:42
8	0.346280	1.624194	0.600000	00:43
9	0.329392	1.632581	0.612000	00:42

epoch	train_loss	valid_loss	accuracy	time
0	1.765770	1.708579	0.516000	00:59
1	1.627048	1.535210	0.588000	01:00
2	1.466308	1.609761	0.532000	01:01
3	1.335094	1.660312	0.560000	01:01
4	1.179072	1.543942	0.544000	00:59
5	1.062035	1.544576	0.568000	00:59
6	0.941348	1.508574	0.556000	00:59
7	0.847385	1.469271	0.576000	00:59
8	0.760762	1.449720	0.584000	01:00
9	0.686221	1.446685	0.584000	00:59

epoch	train_loss	valid_loss	accuracy	time
0	0.647399	1.456817	0.584000	01:00
1	0.600847	1.467131	0.580000	00:58
2	0.592070	1.505024	0.588000	00:59
3	0.560953	1.572787	0.608000	00:59
4	0.530090	1.550739	0.588000	00:59
5	0.501039	1.511042	0.612000	01:00
6	0.453741	1.523643	0.580000	01:01
7	0.410653	1.509357	0.596000	00:59
8	0.377629	1.516859	0.584000	01:00
9	0.350783	1.506683	0.588000	00:59

epoch	train_loss	valid_loss	accuracy	time
0	2.518435	1.781701	0.555000	00:30
1	1.950543	1.266906	0.595000	00:28
2	1.560208	1.127361	0.620000	00:26

epoch	train_loss	valid_loss	accuracy	time
0	1.218273	1.534749	0.550000	00:29
1	1.637923	3.227510	0.345000	00:29
2	1.940466	4.200366	0.140000	00:29
3	1.838173	11.386209	0.215000	00:29
4	1.644612	5.846445	0.240000	00:29
5	1.447882	2.120700	0.415000	00:29
6	1.259362	2.177001	0.495000	00:28
7	1.076406	1.193845	0.620000	00:29
8	0.922181	1.021270	0.680000	00:29
9	0.804969	1.049691	0.670000	00:30

epoch	train_loss	valid_loss	accuracy	time
0	0.701466	1.232059	0.645000	00:29
1	0.694632	1.230741	0.665000	00:29
2	0.689335	1.308101	0.650000	00:29
3	0.656844	1.264420	0.720000	00:30
4	0.618628	1.193656	0.715000	00:29
5	0.592718	1.226670	0.680000	00:29
6	0.567708	1.216985	0.695000	00:29
7	0.532835	1.179013	0.700000	00:29
8	0.509911	1.218369	0.670000	00:29
9	0.479909	1.211526	0.680000	00:29

epoch	train_loss	valid_loss	accuracy	time
0	0.552285	1.275757	0.685000	00:28
1	0.485817	1.299778	0.675000	00:29
2	0.491668	1.238084	0.690000	00:29
3	0.511293	1.329848	0.680000	00:30
4	0.494083	1.386478	0.700000	00:29
5	0.464426	1.351498	0.685000	00:29
6	0.455210	1.327508	0.685000	00:29
7	0.435786	1.300146	0.665000	00:29
8	0.406114	1.265782	0.695000	00:29
9	0.387949	1.260656	0.700000	00:29

epoch	train_loss	valid_loss	accuracy	time
0	1.701880	1.546097	0.565000	00:37
1	1.466775	1.456164	0.595000	00:38
2	1.434179	1.275465	0.620000	00:37
3	1.335906	1.562193	0.580000	00:39
4	1.186197	1.245194	0.655000	00:38
5	1.023701	1.149056	0.660000	00:38
6	0.841730	1.275522	0.680000	00:37
7	0.666139	1.119972	0.700000	00:38
8	0.524536	1.110425	0.710000	00:38
9	0.438384	1.070021	0.715000	00:38

epoch	train_loss	valid_loss	accuracy	time
0	0.339435	1.150641	0.715000	00:38
1	0.332937	1.089203	0.715000	00:38
2	0.351498	1.137119	0.690000	00:38
3	0.324245	1.143893	0.720000	00:38
4	0.302303	1.170959	0.705000	00:38
5	0.279240	1.209391	0.700000	00:38
6	0.272465	1.129670	0.720000	00:38
7	0.260656	1.139702	0.720000	00:39
8	0.245559	1.133754	0.720000	00:39
9	0.235748	1.133464	0.725000	00:38

epoch	train_loss	valid_loss	accuracy	time
0	1.365729	1.222900	0.505000	00:43
1	1.218585	0.952413	0.570000	00:44
2	1.123894	1.245164	0.545000	00:44
3	1.093431	1.219178	0.585000	00:42
4	0.954352	1.016823	0.605000	00:44
5	0.795425	0.980452	0.615000	00:43
6	0.675656	0.715877	0.660000	00:43
7	0.543923	0.732847	0.685000	00:43
8	0.431559	0.700989	0.705000	00:41
9	0.366880	0.680698	0.700000	00:42

epoch	train_loss	valid_loss	accuracy	time
0	0.262735	0.715863	0.700000	00:42
1	0.291509	0.663961	0.695000	00:42
2	0.267504	0.725574	0.710000	00:44
3	0.229500	0.753732	0.710000	00:42
4	0.229024	0.719795	0.710000	00:44
5	0.211879	0.703274	0.710000	00:43
6	0.201298	0.708483	0.710000	00:42
7	0.192585	0.680750	0.700000	00:43

epoch	train_loss	valid_loss	accuracy	time
0	0.183451	0.683985	0.715000	00:42
1	0.160688	0.700210	0.715000	00:43
2	0.168368	0.690924	0.710000	00:43
3	0.161951	0.702923	0.710000	00:42
4	0.163574	0.694562	0.710000	00:43
5	0.162362	0.700117	0.705000	00:42
6	0.164124	0.698838	0.715000	00:44

	Epochs	train_loss	valid_loss	accuracy
0	0	1.365729	1.222900	0.505
1	1	1.218585	0.952413	0.570
2	2	1.123894	1.245164	0.545
3	3	1.093431	1.219178	0.585
4	4	0.954352	1.016823	0.605
5	5	0.795425	0.980452	0.615
6	6	0.675656	0.715877	0.660
7	7	0.543923	0.732847	0.685
8	8	0.431559	0.700989	0.705
9	9	0.366880	0.680698	0.700
10	10	0.168368	0.690924	0.710
11	11	0.161951	0.702923	0.710
12	12	0.163574	0.694562	0.710
13	13	0.162362	0.700117	0.705
14	14	0.164124	0.698838	0.715

This work is done in order to understand the APIs of FastAI in a better fashion and how the model tuning works in an incremental fashion

Problem Statement - Classify the Clouds in one of the ten cohorts

The categories are Altocumulus, Altostratus, Cirroculumulus, Cirrostratus, Cirrus, Cumulonimbus, Cumulus, Nimbostratus, Stratocumulus, Stratus

Data - The data analysed in the present work is present here

Some information about the clouds

Clouds are classified according to their height above and appearance (texture) from the ground.

The following cloud roots and translations summarize the components of this classification system

Imports and Downloads

Download Timm for the pyTorch models

Set the relevant paths

Attempt 1

Use a vanilla datablock, imitate the various tutorials and see how is the result

Model used - Resnet 34

Error - 45%

Item transformation is needed because all the images are of different sizes, the deep models generally stack all the images together and flatten them, that's why it is imperative to resize them to same dimensions.

We also have used batch transformations which is Data augmentation here. Each batch of the images will be augmented( orientation change, crop randomly, darken, lighten etc) and then the size will be changed to the size we supplied.

45% error rate is high

Keep calling the garbage collector as these models are large

Attempt 2

Use Image data loaders, provide train and test data separately.

Model used - Convnext_small_in22k

Accuracy - 59-60%

Validation loss is very bumpy

Attempt 3

Similar to attempt 2, change in bacth size and model

Model used - Convnext_tiny_hnf

Accuracy - 59%

Not much improvement here

A lot of confused classes

Use interp = ClassificationInterpretation.from_learner(learn_3)

Attempt 4

Train more, use default data augmentation and change model to convnext base

Model used - Convnext_base

Accuracy - 61%

Trained once and then train again

The confusion matrix shows that certain classes are prone to getting misclassified. Let's have one more attempt and then see what's happening

Altocumulus, Stratus, Stratocumulus, and Cirrorstratus don't produce good results

Attempt 5

Data augmentation according to Imagenet statistics

Model used - Convnext_small_in22k

Accuracy - 59%

Empty the torch cache to avoid the CUDA out of memory error

As it turns out that even in this iteration Altocumulus, Cirrostratus, and Stratocumulus are getting misclassified a lot

I am no cloud expert, so googling to see the difference between these clouds.

This primer provides a quick overview.

Cirrostratus has a tendency to get misclassified in Cirro category clouds viz. Cirrus and Cirrocumulus. Also, it gets classified as Altostratus

Similarly, Altocumulus looks a lot like Altostratus and Cirrocumulus. Thus, a lot of misclassification.

There are other misclassifications too but for now, let's see how the model behaves when we remove two noisy classes out of the three. I choose Altocumulus and Cirrostratus randomly and would see the result.

There should be better methods to improve the efficiency:> - Remove noisy class> - Balance the classes

For now, let's remove the noisy class and see how the model performs

Changed Data

The clouds directory has the updated data

Attempt 6

Build the model on updated dataset and use the parameters from the test that performed the best

Model used - Convnext_small_in22k

Accuracy - 70%

_This part isn't really needed, one can create a list of classes for dsetsn

Use the same configuration of batch size 32, image resize, and batch transformations

Try to retrain the entire model from scratch instead of using pretrained weights from the transfer learning model

Not much improvement, fine tune to see if there is any change.

Validation loss is very bumpy

It can mean that the learning rate isn't set correctly.

The Confusion Matrix looks a little better than before

Attempt 7

Use a smaller model and use different learning rates

Model used - Convnext_small

Accuracy - 72.5%

This has been a good lesson. Use high larger learning rate when you start the training and then decrease it progressively.

FastAI provides discriminative learning rate using splice but we are doing it manually here

Still a little bumpy validation loss but accuracy seems stable

Attempt 8 - Final attempt

Change the loss function; Decrease the size of the model

Model used - Convnext_tiny

Accuracy - 71.5%

The Binary cross entropy loss would penalise those preductions that are confident but wrong.

Focal loss disccussed in this paper https://arxiv.org/pdf/1708.02002.pdf provides more control over the penalisation with the help of a gamma parameter. Gamma = 0 behaves similar to Cross Entropy Loss but higher values of gamma (0 to 2) change this behaviour.

Higher values of gamma down-weight easy examples’ contribution to loss

Just checking whether loss function has updated or not!!! (Trust issues) :P

Use not only the discrimative learning rates but also decrease the amount of training for lower training rate

Hmmm...okayish results here, still many misclassifications in Stratus and Stratocumulus.