The notebok is based on FastAI course that ran in April to May 2022; it is written to explore the "further research" questions of lesson 3 of the course, which deals with Neural Net foundations.

Problem statement - Classify the hand written digits with the help of FastAI's libraries

Data - Hindi MNIST data present at https://www.kaggle.com/datasets/imbikramsaha/hindi-mnist

The Methodology will remain the same as discussed in the chapter.

Define the baseline model first.

Define dataloaders and other parameters that are required for implementing the Stochastic Gradient Descent,

Define the loss function and the accuracy metric

Fit your model using FastAI's libraries and check whether it beats the baseline model.

Make Improvements.

Resources: > - Chapter Link:https://course.fast.ai/Lessons/lesson3.html> - Video based on 2020 course where the part of the problem is discussed:https://www.youtube.com/watch?v=p50s63nPq9I&t=6605s

Imports and Downloads

from fastai.vision.all import *
from fastbook import *

matplotlib.rc('image', cmap='Greys')

Read the Data

# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    cnt = 0
    for filename in filenames:
        cnt = cnt+1
        #print(os.path.join(dirname, filename))
    print(f"Read {cnt} files from the directory- {dirname}")

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Read 0 files from the directory- /kaggle/input
Read 0 files from the directory- /kaggle/input/hindi-mnist
Read 0 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST
Read 0 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/7
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/2
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/5
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/8
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/0
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/3
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/1
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/4
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/9
Read 300 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/test/6
Read 0 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/7
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/2
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/5
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/8
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/0
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/3
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/1
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/4
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/9
Read 1700 files from the directory- /kaggle/input/hindi-mnist/Hindi-MNIST/train/6

Define train and validation paths

train_dir = "/kaggle/input/hindi-mnist/Hindi-MNIST/train"
train_path = Path(train_dir)
valid_dir = "/kaggle/input/hindi-mnist/Hindi-MNIST/test"
valid_path = Path(valid_dir)

zeroes = train_path.ls().sorted()[0].ls()
ones = train_path.ls().sorted()[1].ls()
twos = train_path.ls().sorted()[2].ls()
threes = train_path.ls().sorted()[3].ls()
fours = train_path.ls().sorted()[4].ls()
fives = train_path.ls().sorted()[5].ls()
sixes = train_path.ls().sorted()[6].ls()
sevens = train_path.ls().sorted()[7].ls()
eights = train_path.ls().sorted()[8].ls()
nines = train_path.ls().sorted()[9].ls()

Print a digit to see what the data is

im = Image.open(sixes[0])
im

tensor(im)[4:10,4:10]

tensor([[  4,  28,  97, 185, 236, 254],
        [ 50, 164, 245, 255, 255, 255],
        [184, 251, 255, 255, 254, 236],
        [253, 255, 255, 253, 191,  79],
        [255, 255, 255, 190,  54,   7],
        [255, 255, 254, 139,  22,   1]], dtype=torch.uint8)

im.shape

(32, 32)

The size of the images - 32 x 32

Pandas has a nice background gradient feature

im_t = tensor(im)
df = pd.DataFrame(im_t[4:15,4:22])
df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')

Baseline Model

Create a model that is simple and works fine enough(ideally, it should beat the chance i.e. it should have at least 50% accurate)

Distance from mean - Create a mean image for each of the digit from 0 to 9

Dict = {0: zeroes,1:ones, 2: twos, 3: threes, 4: fours, 5: fives, 6: sixes, 7:sevens, 8:eights, 9:nines}

train_tensors = []
valid_tensors = []

for key in Dict:
    train_tensors.append([tensor(Image.open(o)) for o in Dict[key]])
    valid_inf =  valid_path.ls().sorted()[key].ls()
    valid_tensors.append([tensor(Image.open(o)) for o in valid_inf])

Keep checking the size of the variables involved, it is one of the best practices

len(train_tensors), len(valid_tensors), len(train_tensors[0]), len(valid_tensors[0])

(10, 10, 1700, 300)

Stack the images for each of the digit class and find the mean digit(average of all the images for a digit)

**Each image is 2D with size 3232, by stacking all 1700 images(for one digit) in the train set, we create a 3d shape of 17003232

stacked_train_tensors = []
for i in range(len(train_tensors)):
    stacked_train_tensors.append((torch.stack(train_tensors[i]).float()/255))
    #print(i)
                                 
print(len(stacked_train_tensors)), print(stacked_train_tensors[0].shape)

10
torch.Size([1700, 32, 32])

(None, None)

stacked_train_tensors_mean = []
for i in range(len(train_tensors)):
    stacked_train_tensors_mean.append((torch.stack(train_tensors[i]).float()/255).mean(0))
    #print(i)
print(len(stacked_train_tensors_mean))

10

stacked_valid_tensors = []
for i in range(len(valid_tensors)):
    stacked_valid_tensors.append((torch.stack(valid_tensors[i]).float()/255))
    #print(i)
print(len(stacked_valid_tensors)), print(stacked_valid_tensors[0].shape)

10
torch.Size([300, 32, 32])

(None, None)

stacked_train_tensors_mean[0].shape

torch.Size([32, 32])

Plot one of the mean images, it will be blurry as it is a mean value

show_image(stacked_train_tensors_mean[4])

<AxesSubplot:>

show_image(train_tensors[4][0])

<AxesSubplot:>

Take a sample image and fine the distance between the image from its respective mean image i.e. compare a 4 with the mean 4.

Here, Root mean square and mean absolute errors are calculated

dist_4_abs = (train_tensors[4][0] - stacked_train_tensors_mean[4]).abs().mean()
dist_4_sqr = ((train_tensors[4][0] - stacked_train_tensors_mean[4])**2).mean().sqrt()

dist_4_abs, dist_4_sqr

(tensor(49.9770), tensor(104.4630))

dist_3_abs = (train_tensors[3][0] - stacked_train_tensors_mean[4]).abs().mean()
dist_3_sqr = ((train_tensors[3][0] - stacked_train_tensors_mean[4])**2).mean().sqrt()

dist_3_abs, dist_3_sqr

(tensor(67.0818), tensor(122.1938))

The error checks out i.e. the distance between mean 4 and 4 is less than mean 4 and 3(or any other number). Let's investigate it more

Let's define Error functions

def rms_error(a,b):
    return ((a-b)**2).mean((-1, -2)).sqrt()

RMS error

for i in range(10):
    err =  rms_error(train_tensors[4][0],stacked_train_tensors_mean[i])
    print(err)

tensor(104.5548)
tensor(104.4847)
tensor(104.5112)
tensor(104.4911)
tensor(104.4630)
tensor(104.4989)
tensor(104.5112)
tensor(104.4960)
tensor(104.5195)
tensor(104.4974)

L1 error

for i in range(10):
    print(F.l1_loss(train_tensors[4][0].float(),stacked_train_tensors_mean[i]))

tensor(50.0465)
tensor(50.0393)
tensor(50.0182)
tensor(50.0276)
tensor(49.9770)
tensor(50.0150)
tensor(50.0189)
tensor(50.0297)
tensor(50.0313)
tensor(50.0543)

MSE/L2 error

for i in range(10):
    print(F.mse_loss(train_tensors[4][0].float(),stacked_train_tensors_mean[i]))

tensor(10931.7139)
tensor(10917.0586)
tensor(10922.5928)
tensor(10918.3809)
tensor(10912.5215)
tensor(10920.0205)
tensor(10922.5967)
tensor(10919.4229)
tensor(10924.3340)
tensor(10919.7129)

_All the error functions have lowest error values for the distance between mean digits and sample image - traintensors[4][0] which is a 4

Broadcasting happens here, despite different shapes of the two tensors, the results are calculated

All tensors in the validation set for a particular digit will be compared against the mean digit

print(stacked_valid_tensors[4].shape), print(stacked_train_tensors_mean[4].shape)
error = rms_error(stacked_valid_tensors[4], stacked_train_tensors_mean[4])
error.shape, error[0:15]

torch.Size([300, 32, 32])
torch.Size([32, 32])

(torch.Size([300]),
 tensor([0.3021, 0.3300, 0.2755, 0.2987, 0.3047, 0.3128, 0.3670, 0.3253, 0.3164, 0.3172, 0.3009, 0.3079, 0.3204, 0.2997, 0.2655]))

def predict_input(input_tensor):
    errors_in_pred = []
   # errors = rms_error(input_tensor, stacked_train_tensors_mean[x])
    for i in range(10):
        errors = rms_error(input_tensor, stacked_train_tensors_mean[i])
        errors_in_pred.append(errors)
    #return torch.argmin(torch.stack(errors_in_pred), 0)
    # across the first axis, 0 specifies the axis
    return torch.argmin(torch.stack(errors_in_pred), 0)

y = predict_input(stacked_valid_tensors[9])
y, y.shape

(tensor([9, 9, 9, 9, 6, 9, 9, 9, 9, 9, 9, 9, 9, 6, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 1, 9, 0, 9, 9, 9, 6, 9, 9, 9, 9, 9, 9, 8, 9, 9, 9,
         9, 9, 9, 9, 8, 9, 9, 9, 9, 9, 9, 6, 6, 9, 9, 9, 9, 9, 9, 9, 6, 9, 9, 9, 1, 9, 9, 9, 9, 9, 8, 9, 9, 6, 9, 9, 9, 9, 9, 9, 1, 9, 9, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 9, 9, 9, 1, 9,
         9, 9, 9, 1, 9, 9, 9, 9, 9, 9, 9, 4, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 2, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
         9, 6, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 9, 9, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 4, 9, 4, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 5, 9, 9, 9, 1, 9, 9, 9, 9,
         6, 8, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 4, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 9, 9, 9, 9, 1, 9]),
 torch.Size([300]))

(y == 9).float().mean()

tensor(0.8833)

accuracies = []
for i in range(10):
    #print(i)
    preds = predict_input(stacked_valid_tensors[i])
    acc = (preds == i).float().mean()
    accuracies.append(acc)
    #print(preds)
    #pred_e = torch.argmin(err, 0) 
   # print(preds)
    #accuracies.append((pred_e == i).float().mean())
    
accuracies

[tensor(0.9667),
 tensor(0.9033),
 tensor(0.7300),
 tensor(0.5867),
 tensor(0.9267),
 tensor(0.7433),
 tensor(0.8567),
 tensor(0.7900),
 tensor(0.8767),
 tensor(0.8833)]

print('baseline model accuracy:', torch.stack(accuracies).mean())

baseline model accuracy: tensor(0.8263)

82% baseline accuracy, let's try to beat that

Prepare for Stochastic gradient descent

stacked_train_tensors[0][0].shape # one image from digit 0

torch.Size([32, 32])

Entire data in row column format

lst = [stacked_train_tensors[i] for i in range(10)]
# one row represents one image. image is flattened to 32*32 = 1024 pixels
train_x = torch.cat(lst).view(-1, 32*32) 
train_x.shape

torch.Size([17000, 1024])

y_tensor = torch.tensor([])
for i in range(10):
    a = tensor(np.full(len(stacked_train_tensors[i]),i))
    y_tensor = torch.cat([y_tensor, a])    
    
y_tensor = y_tensor.unsqueeze(1)

PyTorch won't accept a FloatTensor as categorical target, so you've to cast your tensor to LongTensor

y_tensor

tensor([[0.],
        [0.],
        [0.],
        ...,
        [9.],
        [9.],
        [9.]])

y_tensor = y_tensor.type(torch.LongTensor)

y_tensor.shape

torch.Size([17000, 1])

This is an important step, it will create tuples of input and output

dset = list(zip(train_x,y_tensor))

Same processing for validation set

valid_lst = [stacked_valid_tensors[i] for i in range(10)]
# one row represents one image. image is flattened to 32*32 = 1024 pixels
valid_x = torch.cat(valid_lst).view(-1, 32*32) 
valid_x.shape


#train_y = tensor([1]*len(threes) + [0]*len(sevens)).unsqueeze(1)
valid_y_tensor = torch.tensor([])
for i in range(10):
    a = tensor(np.full(len(stacked_valid_tensors[i]),i))
    valid_y_tensor = torch.cat([valid_y_tensor, a])    
    
valid_y_tensor = valid_y_tensor.unsqueeze(1) 

valid_dset = list(zip(valid_x,valid_y_tensor))

valid_y_tensor.shape

torch.Size([3000, 1])

Testing a few things, ignore

def init_params(size, std=1.0): 
    return (torch.randn(size)*std).requires_grad_()

weights = init_params((32*32,1))
bias = init_params(1)

weights.shape, bias.shape

(torch.Size([1024, 1]), torch.Size([1]))

(train_x[0]*weights.T).sum() + bias

tensor([14.7308], grad_fn=<AddBackward0>)

def linear1(xb): 
    return xb@weights + bias

preds = linear1(train_x)
preds

tensor([[ 14.7308],
        [ 15.2412],
        [ 12.4542],
        ...,
        [  7.2494],
        [-17.0967],
        [  6.3935]], grad_fn=<AddBackward0>)

train_x.shape

torch.Size([17000, 1024])

preds.shape

torch.Size([17000, 1])

Testing ends here

Let's build the Neural Net model

Dataloaders

dl_train = DataLoader(dset, batch_size=256, shuffle=True)
dl_valid = DataLoader(valid_dset, batch_size=256)

dls = DataLoaders(dl_train, dl_valid)

17000 train samples divided in 256 batches 17000/256 ~ 67

len(dls.train)

67

Loss functions and Accuracy metric

Sigmoid transforms everything between 0 and 1, helps in taking probability

def loss_func(predictions, targets):
    predictions = predictions.sigmoid()
    return torch.where(targets==1, 1-predictions, predictions).mean()

def accuracy_metric(prediction, y):
    idx = torch.argmax(prediction, axis=1) # returns the index of the highest value
    return (idx==y.T).float().mean()

Define the model. we'll use 30 neurons in the hidden layer. 1024 in input, 30 in hidden, 10 in output(because 10 classes are there (0 to 9))

model = nn.Sequential(
    nn.Linear(32*32, 30), # 1024 input features and 30 output features
    nn.ReLU(),
    nn.Linear(30,10),
)

learn_loss_func = Learner(dls, model, loss_func=loss_func, opt_func=SGD, metrics=accuracy_metric)

learn_loss_func.fit(n_epoch=10, lr=0.1)

This model didn't learn or what?

def softmax_loss(prediction, y):
    soft_m = torch.softmax(prediction, dim=1)
    index = tensor(range(len(y)))
    return soft_m[index.long(), y.long()].mean()

learn_softmax = Learner(dls, model, loss_func=softmax_loss, opt_func=SGD, metrics=accuracy_metric)

learn_softmax.fit(n_epoch=10, lr=0.1)

This doesn't work as well, it's not learning.

I cheated and looked on forums and people said there can be precision issues, so use something logarithmic

def loss_entropy(pred, y):
    #print(y.shape)
    y = y.long()
    if y.ndim > 1:
        y = y.squeeze()
   # print(y.shape)
    return F.cross_entropy(pred, y)

learn_entropy = Learner(dls, model, loss_func=loss_entropy, opt_func=SGD, metrics=accuracy_metric)

learn_entropy.fit(n_epoch=30, lr=0.1)

It learns now

plt.plot(L(learn_loss_func.recorder.values).itemgot(2), label='w/ simple_loss');
plt.plot(L(learn_entropy.recorder.values).itemgot(2), label='w/ entropy');
plt.plot(L(learn_softmax.recorder.values).itemgot(2), label='w/ softmax');

plt.title('accuracy');
plt.legend(loc='best');
plt.xlabel('epoch');

The FastAI model has beaten the baseline model.

Although, I am still not very sure why the softmax or simple loss didn't work

Just checking what's happening inside the model.

y_tensor[10000:10010]

tensor([[5],
        [5],
        [5],
        [5],
        [5],
        [5],
        [5],
        [5],
        [5],
        [5]])

The index corresponding to y_tensor value will have ideally highest value in the predictions.

model(train_x)[10000:10010]

tensor([[-12.5915,  -5.5043,  -8.8870,  -3.4508,  -3.3452,   5.5960,  -1.0710,  -1.8192, -10.9644, -10.7473],
        [-18.2513, -23.7355,  -2.3928,  -1.6395,  -5.8924,   5.2856, -11.4164,  -9.5124,  -9.5765, -15.0326],
        [-19.6813, -13.3664,   0.4013,   0.3663,  -1.9135,   8.4236,   0.1444,  -0.3372, -20.2419, -11.6217],
        [ -9.4473,  -9.3769,  -1.5958,  -1.1041,  -5.1527,   3.5932,  -4.2382,  -3.6048,  -5.3509, -12.6652],
        [-15.1676,  -7.6468,  -1.3655,  -4.8876,  -5.6747,   5.9361,  -6.5804,  -2.6483,  -7.4650, -12.2578],
        [ -8.7004,  -6.9439,  -2.0877,  -1.5366,  -2.4985,   2.0714,  -7.3040,  -7.5453,  -5.1091,  -2.5851],
        [ -9.2084,  -6.5548,  -2.2084,  -0.6765,   0.5429,   5.1066,  -5.8337,  -5.4852,  -6.9861,  -5.3916],
        [-18.3216,  -9.8647,  -0.6680,  -2.3356,  -5.4501,   6.5298,  -3.1101,  -2.1203, -12.1218, -13.7853],
        [ -7.9667, -18.4219,  -8.8755,  -7.8094,  -1.1771,   2.8631,  -9.9230,  -2.7488,  -3.8866, -11.4819],
        [-18.9722, -17.4550,  -6.5690,  -3.4893,   0.5602,   6.4109, -10.1749,  -3.2340, -13.9268, -12.4225]], grad_fn=<SliceBackward0>)

m = learn_entropy.model
m

Sequential(
  (0): Linear(in_features=1024, out_features=30, bias=True)
  (1): ReLU()
  (2): Linear(in_features=30, out_features=10, bias=True)
)

w, b = m[0].parameters()

The first layer learns about specific features and patterns from the data but here I am not sure what is being learnt. Maybe computer knows it better

for i in range(w.shape[0]):
    show_image(w[i].view(32,32))

/opt/conda/lib/python3.7/site-packages/fastai/torch_core.py:77: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  if ax is None: _,ax = plt.subplots(figsize=figsize)

epoch	train_loss	valid_loss	accuracy_metric	time
0	0.214083	0.123915	0.091667	00:00
1	0.133767	0.108301	0.091000	00:00
2	0.113340	0.104889	0.090667	00:00
3	0.108106	0.103433	0.090667	00:00
4	0.103318	0.102633	0.090333	00:00
5	0.102949	0.102128	0.089667	00:00
6	0.102660	0.101779	0.090667	00:00
7	0.102763	0.101527	0.092333	00:00
8	0.102156	0.101335	0.095333	00:00
9	0.101422	0.101184	0.097000	00:00

epoch	train_loss	valid_loss	accuracy_metric	time
0	0.100436	0.099736	0.098333	00:00
1	0.100521	0.099690	0.096000	00:00
2	0.100557	0.099638	0.100000	00:00
3	0.100653	0.099551	0.098667	00:01
4	0.100606	0.099459	0.100667	00:00
5	0.100483	0.099377	0.100333	00:00
6	0.100640	0.099186	0.099333	00:00
7	0.100577	0.099071	0.099000	00:00
8	0.100639	0.098888	0.097000	00:00
9	0.100628	0.098917	0.096333	00:00

epoch	train_loss	valid_loss	accuracy_metric	time
0	0.989585	0.521992	0.858000	00:00
1	0.518371	0.347884	0.895667	00:00
2	0.346704	0.284983	0.909667	00:00
3	0.272468	0.256439	0.920333	00:00
4	0.230199	0.212597	0.945333	00:00
5	0.207443	0.198280	0.949000	00:00
6	0.190783	0.185562	0.947667	00:00
7	0.174779	0.179351	0.951333	00:00
8	0.163493	0.172312	0.955000	00:00
9	0.155247	0.166811	0.956000	00:00
10	0.148723	0.162777	0.953333	00:00
11	0.142068	0.157631	0.957000	00:00
12	0.136054	0.156244	0.954667	00:00
13	0.133357	0.153135	0.958000	00:00
14	0.127693	0.165472	0.954667	00:00
15	0.125129	0.144914	0.959667	00:00
16	0.118554	0.145416	0.962333	00:00
17	0.114223	0.141651	0.960333	00:00
18	0.112119	0.148653	0.956333	00:00
19	0.108679	0.138428	0.963000	00:00
20	0.104451	0.136940	0.964667	00:00
21	0.104604	0.151451	0.960333	00:00
22	0.100410	0.141625	0.960333	00:00
23	0.096883	0.132164	0.963667	00:00
24	0.093484	0.129140	0.965667	00:00
25	0.092321	0.130651	0.962000	00:00
26	0.089158	0.128698	0.964667	00:01
27	0.087250	0.128824	0.964000	00:00
28	0.083911	0.130542	0.964667	00:00
29	0.081479	0.127400	0.964000	00:00

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
0	4	28	97	185	236	254	255	255	255	255	255	255	255	255	255	255	255	233
1	50	164	245	255	255	255	255	252	234	208	181	159	154	183	235	254	255	255
2	184	251	255	255	254	236	177	107	56	29	17	11	10	21	75	185	245	254
3	253	255	255	253	191	79	23	6	1	0	0	0	0	0	7	40	98	130
4	255	255	255	190	54	7	0	0	0	0	0	0	0	0	0	1	4	6
5	255	255	254	139	22	1	0	0	0	0	0	0	0	0	0	0	0	0
6	255	255	255	213	87	25	10	5	3	4	7	8	7	4	2	0	0	0
7	248	255	255	255	239	195	156	128	111	117	138	150	140	98	42	9	1	0
8	161	245	255	255	255	255	255	254	252	253	255	255	254	247	154	34	3	0
9	45	143	245	255	255	255	255	255	255	255	255	255	255	250	149	32	2	0
10	135	216	253	255	255	255	255	255	255	255	255	244	208	123	40	7	0	0

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
0	4	28	97	185	236	254	255	255	255	255	255	255	255	255	255	255	255	233
1	50	164	245	255	255	255	255	252	234	208	181	159	154	183	235	254	255	255
2	184	251	255	255	254	236	177	107	56	29	17	11	10	21	75	185	245	254
3	253	255	255	253	191	79	23	6	1	0	0	0	0	0	7	40	98	130
4	255	255	255	190	54	7	0	0	0	0	0	0	0	0	0	1	4	6
5	255	255	254	139	22	1	0	0	0	0	0	0	0	0	0	0	0	0
6	255	255	255	213	87	25	10	5	3	4	7	8	7	4	2	0	0	0
7	248	255	255	255	239	195	156	128	111	117	138	150	140	98	42	9	1	0
8	161	245	255	255	255	255	255	254	252	253	255	255	254	247	154	34	3	0
9	45	143	245	255	255	255	255	255	255	255	255	255	255	250	149	32	2	0
10	135	216	253	255	255	255	255	255	255	255	255	244	208	123	40	7	0	0

The notebok is based on FastAI course that ran in April to May 2022; it is written to explore the "further research" questions of lesson 3 of the course, which deals with Neural Net foundations.

Problem statement - Classify the hand written digits with the help of FastAI's libraries

Data - Hindi MNIST data present at https://www.kaggle.com/datasets/imbikramsaha/hindi-mnist

The Methodology will remain the same as discussed in the chapter.

Resources: > - Chapter Link:https://course.fast.ai/Lessons/lesson3.html> - Video based on 2020 course where the part of the problem is discussed:https://www.youtube.com/watch?v=p50s63nPq9I&t=6605s

Imports and Downloads

Read the Data

Define train and validation paths

Print a digit to see what the data is

Pandas has a nice background gradient feature

Baseline Model

Create a model that is simple and works fine enough(ideally, it should beat the chance i.e. it should have at least 50% accurate)

Distance from mean - Create a mean image for each of the digit from 0 to 9

Keep checking the size of the variables involved, it is one of the best practices

Stack the images for each of the digit class and find the mean digit(average of all the images for a digit)

**Each image is 2D with size 3232, by stacking all 1700 images(for one digit) in the train set, we create a 3d shape of 17003232

Plot one of the mean images, it will be blurry as it is a mean value

Take a sample image and fine the distance between the image from its respective mean image i.e. compare a 4 with the mean 4.

Here, Root mean square and mean absolute errors are calculated

The error checks out i.e. the distance between mean 4 and 4 is less than mean 4 and 3(or any other number). Let's investigate it more

Let's define Error functions

RMS error

L1 error

MSE/L2 error

_All the error functions have lowest error values for the distance between mean digits and sample image - traintensors[4][0] which is a 4

Broadcasting happens here, despite different shapes of the two tensors, the results are calculated

All tensors in the validation set for a particular digit will be compared against the mean digit

82% baseline accuracy, let's try to beat that

Prepare for Stochastic gradient descent

Entire data in row column format

PyTorch won't accept a FloatTensor as categorical target, so you've to cast your tensor to LongTensor

This is an important step, it will create tuples of input and output

Same processing for validation set

Testing a few things, ignore

Testing ends here

Let's build the Neural Net model

Dataloaders

17000 train samples divided in 256 batches 17000/256 ~ 67

Loss functions and Accuracy metric

Sigmoid transforms everything between 0 and 1, helps in taking probability

Define the model. we'll use 30 neurons in the hidden layer. 1024 in input, 30 in hidden, 10 in output(because 10 classes are there (0 to 9))

This model didn't learn or what?

This doesn't work as well, it's not learning.

I cheated and looked on forums and people said there can be precision issues, so use something logarithmic

It learns now

The FastAI model has beaten the baseline model.

Although, I am still not very sure why the softmax or simple loss didn't work

Just checking what's happening inside the model.

The index corresponding to y_tensor value will have ideally highest value in the predictions.

The first layer learns about specific features and patterns from the data but here I am not sure what is being learnt. Maybe computer knows it better

End of the notebook

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
0	4	28	97	185	236	254	255	255	255	255	255	255	255	255	255	255	255	233
1	50	164	245	255	255	255	255	252	234	208	181	159	154	183	235	254	255	255
2	184	251	255	255	254	236	177	107	56	29	17	11	10	21	75	185	245	254
3	253	255	255	253	191	79	23	6	1	0	0	0	0	0	7	40	98	130
4	255	255	255	190	54	7	0	0	0	0	0	0	0	0	0	1	4	6
5	255	255	254	139	22	1	0	0	0	0	0	0	0	0	0	0	0	0
6	255	255	255	213	87	25	10	5	3	4	7	8	7	4	2	0	0	0
7	248	255	255	255	239	195	156	128	111	117	138	150	140	98	42	9	1	0
8	161	245	255	255	255	255	255	254	252	253	255	255	254	247	154	34	3	0
9	45	143	245	255	255	255	255	255	255	255	255	255	255	250	149	32	2	0
10	135	216	253	255	255	255	255	255	255	255	255	244	208	123	40	7	0	0