Imports

from fastai.data.all import *
from fastai.vision.all import *

The first step is to download and decompress our data (if it’s not already done) and get its location:

path = untar_data(URLs.PETS)

Path.BASE_PATH = path

path.ls()

(#2) [Path('images'),Path('annotations')]

The filenames are in the “images” folder. The get_image_files function helps get all the images in subfolders

fnames = get_image_files(path/"images")

Empty DataBlock.

dblock = DataBlock()

By itself, a DataBlock is just a blue print on how to assemble your data. It does not do anything until you pass it a source. You can choose to then convert that source into a Datasets or a DataLoaders by using the DataBlock.datasets or DataBlock.dataloaders method. Since we haven’t done anything to get our data ready for batches, the dataloaders method will fail here, but we can have a look at how it gets converted in Datasets. This is where we pass the source of our data, here all our filenames

dsets = dblock.datasets(fnames)
dsets.train[0]

(Path('images/scottish_terrier_99.jpg'),
 Path('images/scottish_terrier_99.jpg'))

dsets

(#7390) [(Path('images/beagle_115.jpg'), Path('images/beagle_115.jpg')),(Path('images/boxer_18.jpg'), Path('images/boxer_18.jpg')),(Path('images/Maine_Coon_157.jpg'), Path('images/Maine_Coon_157.jpg')),(Path('images/scottish_terrier_28.jpg'), Path('images/scottish_terrier_28.jpg')),(Path('images/english_setter_6.jpg'), Path('images/english_setter_6.jpg')),(Path('images/american_pit_bull_terrier_79.jpg'), Path('images/american_pit_bull_terrier_79.jpg')),(Path('images/boxer_128.jpg'), Path('images/boxer_128.jpg')),(Path('images/Persian_265.jpg'), Path('images/Persian_265.jpg')),(Path('images/Maine_Coon_182.jpg'), Path('images/Maine_Coon_182.jpg')),(Path('images/keeshond_89.jpg'), Path('images/keeshond_89.jpg'))...]

By default, the data block API assumes we have an input and a target, which is why we see our filename repeated twice.

_The first thing we can do is use a getitems function to actually assemble our items inside the data block

dblock = DataBlock(get_items = get_image_files)

get_image_files

<function fastai.data.transforms.get_image_files(path, recurse=True, folders=None)>

Pass the folder name

dsets = dblock.datasets(path/"images")
dsets.valid[0]

(Path('images/Siamese_158.jpg'), Path('images/Siamese_158.jpg'))

Labeling the data is important, if capitalised name then classify as cat otherwise dog

def label_func(fname):
    return "cat" if fname.name[0].isupper() else "dog"

dblock = DataBlock(get_items = get_image_files,
                   get_y     = label_func)

dsets = dblock.datasets(path/"images")
dsets.train[0]

(Path('images/newfoundland_74.jpg'), 'dog')

Now that our inputs and targets are ready, we can specify types to tell the data block API that our inputs are images and our targets are categories. Types are represented by blocks in the data block API, here we use ImageBlock and CategoryBlock:

dblock = DataBlock(blocks    = (ImageBlock, CategoryBlock),
                   get_items = get_image_files,
                   get_y     = label_func)

dsets = dblock.datasets(path/"images")
dsets.train[0]

(PILImage mode=RGB size=500x333, TensorCategory(0))

dsets.vocab

['cat', 'dog']

dblock = DataBlock(blocks    = (ImageBlock, CategoryBlock),
                   get_items = get_image_files,
                   get_y     = label_func,
                   splitter  = RandomSplitter())

dsets = dblock.datasets(path/"images")
dsets.train[0]

(PILImage mode=RGB size=500x399, TensorCategory(0))

The next step is to control how our validation set is created. We do this by passing a splitter to DataBlock. For instance, here is how to do a random split.
Also, resize the images

dblock = DataBlock(blocks    = (ImageBlock, CategoryBlock),
                   get_items = get_image_files,
                   get_y     = label_func,
                   splitter  = RandomSplitter(),
                   item_tfms = Resize(224))

dls = dblock.dataloaders(path/"images")
dls.show_batch()

The way we usually build the data block in one go is by answering a list of questions:

what is the types of your inputs/targets? Here images and categories
where is your data? Here in filenames in subfolders
does something need to be applied to inputs? Here no
does something need to be applied to the target? Here the label_func function
how to split the data? Here randomly
do we need to apply something on formed items? Here a resize
do we need to apply something on formed batches? Here no

Image classification

Grandparents Spiltter splits the items from the grand parent folder names (train_name and valid_name)

mnist = DataBlock(blocks=(ImageBlock(cls=PILImageBW), CategoryBlock), 
                  get_items=get_image_files, 
                  splitter=GrandparentSplitter(),
                  get_y=parent_label)

dls = mnist.dataloaders(untar_data(URLs.MNIST_TINY))
dls.show_batch(max_n=9, figsize=(4,4))

One can use Random Splitter as well

pets = DataBlock(blocks=(ImageBlock, CategoryBlock), 
                 get_items=get_image_files, 
                 splitter=RandomSplitter(),
                 get_y=Pipeline([attrgetter("name"), RegexLabeller(pat = r'^(.*)_\d+.jpg$')]),
                 item_tfms=Resize(128),
                 batch_tfms=aug_transforms())

dls = pets.dataloaders(untar_data(URLs.PETS)/"images")
dls.show_batch(max_n=9)

The Pascal dataset is originally an object detection dataset (we have to predict where some objects are in pictures). But it contains lots of pictures with various objects in them, so it gives a great example for a multi-label problem. Let’s download it and have a look at the data:

pascal_source = untar_data(URLs.PASCAL_2007)
df = pd.read_csv(pascal_source/"train.csv")

df.head(5)

pascal = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=ColSplitter(),
                   get_x=ColReader(0, pref=pascal_source/"train"),
                   get_y=ColReader(1, label_delim=' '),
                   item_tfms=Resize(224),
                   batch_tfms=aug_transforms())

dls = pascal.dataloaders(df)
dls.show_batch()

Alternative way to write the data block

pascal = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=ColSplitter(),
                   get_x=lambda x:pascal_source/"train"/f'{x[0]}',
                   get_y=lambda x:x[1].split(' '),
                   item_tfms=Resize(224),
                   batch_tfms=aug_transforms())

dls = pascal.dataloaders(df)
dls.show_batch()

Image Localization

There are various problems that fall in the image localization category:image segmentation (which is a task where you have to predict the class of each pixel of an image), coordinate predictions (predict one or several key points on an image) and object detection (draw a box around objects to detect).

path = untar_data(URLs.CAMVID_TINY)

path.ls()

(#3) [Path('/root/.fastai/data/camvid_tiny/images'),Path('/root/.fastai/data/camvid_tiny/codes.txt'),Path('/root/.fastai/data/camvid_tiny/labels')]

The MaskBlock is generated with the codes that give the correpondence between pixel value of the masks and the object they correspond to (like car, road, pedestrian…).

camvid = DataBlock(blocks=(ImageBlock, MaskBlock(codes = np.loadtxt(path/'codes.txt', dtype=str))),
    get_items=get_image_files,
    splitter=RandomSplitter(),
    get_y=lambda o: path/'labels'/f'{o.stem}_P{o.suffix}',
    batch_tfms=aug_transforms())

dls = camvid.dataloaders(path/"images")
dls.show_batch()

Points

biwi_source = untar_data(URLs.BIWI_SAMPLE)
fn2ctr = load_pickle(biwi_source/'centers.pkl')

biwi = DataBlock(blocks=(ImageBlock, PointBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 get_y=lambda o:fn2ctr[o.name].flip(0),
                 batch_tfms=aug_transforms())

dls = biwi.dataloaders(biwi_source)
dls.show_batch(max_n=9)

coco_source = untar_data(URLs.COCO_TINY)
images, lbl_bbox = get_annotations(coco_source/'train.json')
img2bbox = dict(zip(images, lbl_bbox))

Bounding Boxes

coco_source = untar_data(URLs.COCO_TINY)
images, lbl_bbox = get_annotations(coco_source/'train.json')
img2bbox = dict(zip(images, lbl_bbox))

_We provide three types, because we have two targets:the bounding boxes and the labels. That’s why we pass ninp=1 at the end, to tell the library where the inputs stop and the targets begin.

coco = DataBlock(blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 get_y=[lambda o: img2bbox[o.name][0], lambda o: img2bbox[o.name][1]], 
                 item_tfms=Resize(128),
                 batch_tfms=aug_transforms(),
                 n_inp=1)

dls = coco.dataloaders(coco_source)
dls.show_batch(max_n=9)

Text

from fastai.text.all import *

path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
df.head()

imdb_lm = DataBlock(blocks=TextBlock.from_df('text', is_lm=True),
                    get_x=ColReader('text'),
                    splitter=ColSplitter())

dls = imdb_lm.dataloaders(df, bs=64, seq_len=72)
dls.show_batch(max_n=6)

Tabular

from fastai.tabular.core import *

adult_source = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(adult_source/'adult.csv')
df.head()

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']

Standard preprocessing in fastai, use those pre-processors:

procs = [Categorify, FillMissing, Normalize]

splits = RandomSplitter()(range_of(df))

to = TabularPandas(df, procs, cat_names, cont_names, y_names="salary", splits=splits, y_block=CategoryBlock)

dls = to.dataloaders()
dls.show_batch()

	fname	labels	is_valid
0	000005.jpg	chair	True
1	000007.jpg	car	True
2	000009.jpg	horse person	True
3	000012.jpg	car	False
4	000016.jpg	bicycle	True

	label	text	is_valid
0	negative	Un-bleeping-believable! Meg Ryan doesn't even look her usual pert lovable self in this, which normally makes me forgive her shallow ticky acting schtick. Hard to believe she was the producer on this dog. Plus Kevin Kline: what kind of suicide trip has his career been on? Whoosh... Banzai!!! Finally this was directed by the guy who did Big Chill? Must be a replay of Jonestown - hollywood style. Wooofff!	False
1	positive	This is a extremely well-made film. The acting, script and camera-work are all first-rate. The music is good, too, though it is mostly early in the film, when things are still relatively cheery. There are no really superstars in the cast, though several faces will be familiar. The entire cast does an excellent job with the script.<br /><br />But it is hard to watch, because there is no good end to a situation like the one presented. It is now fashionable to blame the British for setting Hindus and Muslims against each other, and then cruelly separating them into two countries. There is som...	False
2	negative	Every once in a long while a movie will come along that will be so awful that I feel compelled to warn people. If I labor all my days and I can save but one soul from watching this movie, how great will be my joy.<br /><br />Where to begin my discussion of pain. For starters, there was a musical montage every five minutes. There was no character development. Every character was a stereotype. We had swearing guy, fat guy who eats donuts, goofy foreign guy, etc. The script felt as if it were being written as the movie was being shot. The production value was so incredibly low that it felt li...	False
3	positive	Name just says it all. I watched this movie with my dad when it came out and having served in Korea he had great admiration for the man. The disappointing thing about this film is that it only concentrate on a short period of the man's life - interestingly enough the man's entire life would have made such an epic bio-pic that it is staggering to imagine the cost for production.<br /><br />Some posters elude to the flawed characteristics about the man, which are cheap shots. The theme of the movie "Duty, Honor, Country" are not just mere words blathered from the lips of a high-brassed offic...	False
4	negative	This movie succeeds at being one of the most unique movies you've seen. However this comes from the fact that you can't make heads or tails of this mess. It almost seems as a series of challenges set up to determine whether or not you are willing to walk out of the movie and give up the money you just paid. If you don't want to feel slighted you'll sit through this horrible film and develop a real sense of pity for the actors involved, they've all seen better days, but then you realize they actually got paid quite a bit of money to do this and you'll lose pity for them just like you've alr...	False

	text	text_
0	xxbos xxmaj my kids recently started watching the xxunk of this show - both the early episodes on the xxup n , and the later ones on xxup abc xxmaj family - and they love it . ( i was n't aware the show had even lasted past the first or second season ) xxmaj i 'm curious as to what xxunk all of the cast changes - xxmaj i 've seen	xxmaj my kids recently started watching the xxunk of this show - both the early episodes on the xxup n , and the later ones on xxup abc xxmaj family - and they love it . ( i was n't aware the show had even lasted past the first or second season ) xxmaj i 'm curious as to what xxunk all of the cast changes - xxmaj i 've seen them
1	and junior had n't xxunk her wings . xxmaj xxunk gene , i suppose . xxmaj by the way , we can now make an educated guess that xxmaj grendel 's pop was probably xxmaj xxunk xxmaj thing . \n\n - xxmaj grendel and mom chose to randomly kill , fly away with or drag away their prey based only on a close reading of the next few xxunk of the script	junior had n't xxunk her wings . xxmaj xxunk gene , i suppose . xxmaj by the way , we can now make an educated guess that xxmaj grendel 's pop was probably xxmaj xxunk xxmaj thing . \n\n - xxmaj grendel and mom chose to randomly kill , fly away with or drag away their prey based only on a close reading of the next few xxunk of the script .
2	a very funny show . xxmaj let 's hope more episodes turn up on youtube and lets hope that someone will release " the xxmaj fosters " on xxup dvd in xxmaj england . \n\n xxmaj best xxmaj episode : xxmaj sex and the xxmaj evans xxunk xxmaj series 1 episode 6 . xxmaj the xxmaj foster 's episode of it was called xxmaj sex in the xxmaj black xxmaj community .	very funny show . xxmaj let 's hope more episodes turn up on youtube and lets hope that someone will release " the xxmaj fosters " on xxup dvd in xxmaj england . \n\n xxmaj best xxmaj episode : xxmaj sex and the xxmaj evans xxunk xxmaj series 1 episode 6 . xxmaj the xxmaj foster 's episode of it was called xxmaj sex in the xxmaj black xxmaj community . xxmaj
3	forces ( who , amusingly , are made to speak in xxunk - up xxunk ! ) are xxunk by our heroic trio alone , much to the king 's xxunk who , as portrayed by xxmaj marcel xxmaj xxunk best - known for his role of leader of the xxmaj parisian xxunk in xxmaj marcel xxmaj xxunk ' 's xxup children xxup of xxup paradise ( xxunk ) is	( who , amusingly , are made to speak in xxunk - up xxunk ! ) are xxunk by our heroic trio alone , much to the king 's xxunk who , as portrayed by xxmaj marcel xxmaj xxunk best - known for his role of leader of the xxmaj parisian xxunk in xxmaj marcel xxmaj xxunk ' 's xxup children xxup of xxup paradise ( xxunk ) is himself
4	cost , because it does n't project the true image of xxmaj batman . xxmaj this cartoon is more like a xxunk xxmaj kung xxmaj fu xxmaj flick and if you really wanna see a classic xxmaj batman cartoon i strongly recommend xxmaj batman the xxmaj animated xxmaj series , but this cartoon is nothing more than a piece of s xxrep 3 - xxup t ! xxmaj get xxmaj batman :	, because it does n't project the true image of xxmaj batman . xxmaj this cartoon is more like a xxunk xxmaj kung xxmaj fu xxmaj flick and if you really wanna see a classic xxmaj batman cartoon i strongly recommend xxmaj batman the xxmaj animated xxmaj series , but this cartoon is nothing more than a piece of s xxrep 3 - xxup t ! xxmaj get xxmaj batman : xxmaj
5	said that the book is better . xxmaj i 'm sure it 's not and i do n't care anyway i loved the movie . xxmaj as in all of xxmaj arnold 's films the acting is what you would expect with classic one liners from xxmaj arnold and even xxmaj xxunk gets a couple in . xxmaj but without a doubt xxmaj richard xxmaj dawson is the standout in this film	that the book is better . xxmaj i 'm sure it 's not and i do n't care anyway i loved the movie . xxmaj as in all of xxmaj arnold 's films the acting is what you would expect with classic one liners from xxmaj arnold and even xxmaj xxunk gets a couple in . xxmaj but without a doubt xxmaj richard xxmaj dawson is the standout in this film .

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capital-gain	capital-loss	hours-per-week	native-country	salary
0	49	Private	101320	Assoc-acdm	12.0	Married-civ-spouse	NaN	Wife	White	Female	0	1902	40	United-States	>=50k
1	44	Private	236746	Masters	14.0	Divorced	Exec-managerial	Not-in-family	White	Male	10520	0	45	United-States	>=50k
2	38	Private	96185	HS-grad	NaN	Divorced	NaN	Unmarried	Black	Female	0	0	32	United-States	<50k
3	38	Self-emp-inc	112847	Prof-school	15.0	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	0	0	40	United-States	>=50k
4	42	Self-emp-not-inc	82297	7th-8th	NaN	Married-civ-spouse	Other-service	Wife	Black	Female	0	0	50	United-States	<50k

	workclass	education	marital-status	occupation	relationship	race	education-num_na	age	fnlwgt	education-num	salary
0	Private	10th	Never-married	Machine-op-inspct	Not-in-family	White	False	33.0	67005.996152	6.0	<50k
1	Private	12th	Married-civ-spouse	Craft-repair	Husband	White	False	21.0	83703.995755	8.0	<50k
2	Private	Preschool	Married-civ-spouse	Other-service	Not-in-family	White	False	52.0	416129.003576	1.0	<50k
3	Local-gov	HS-grad	Married-civ-spouse	Protective-serv	Husband	White	False	34.0	155780.998653	9.0	<50k
4	Private	HS-grad	Never-married	Adm-clerical	Not-in-family	White	False	19.0	184758.999919	9.0	<50k
5	?	Bachelors	Never-married	?	Own-child	White	False	25.0	47010.997235	13.0	<50k
6	Private	Masters	Never-married	Prof-specialty	Not-in-family	White	False	30.0	196342.000092	14.0	<50k
7	Private	HS-grad	Married-civ-spouse	Handlers-cleaners	Husband	Black	False	27.0	275110.002518	9.0	>=50k
8	?	Bachelors	Never-married	?	Unmarried	Asian-Pac-Islander	False	27.0	190650.000040	13.0	<50k
9	Private	Assoc-voc	Never-married	Craft-repair	Not-in-family	White	False	32.0	38797.002375	11.0	<50k

Imports

The first step is to download and decompress our data (if it’s not already done) and get its location:

The filenames are in the “images” folder. The get_image_files function helps get all the images in subfolders

Empty DataBlock.

Pass the folder name

Labeling the data is important, if capitalised name then classify as cat otherwise dog

Now that our inputs and targets are ready, we can specify types to tell the data block API that our inputs are images and our targets are categories. Types are represented by blocks in the data block API, here we use ImageBlock and CategoryBlock:

The next step is to control how our validation set is created. We do this by passing a splitter to DataBlock. For instance, here is how to do a random split.

Also, resize the images

Image classification

Grandparents Spiltter splits the items from the grand parent folder names (train_name and valid_name)

One can use Random Splitter as well

The Pascal dataset is originally an object detection dataset (we have to predict where some objects are in pictures). But it contains lots of pictures with various objects in them, so it gives a great example for a multi-label problem. Let’s download it and have a look at the data:

Alternative way to write the data block

Image Localization

The MaskBlock is generated with the codes that give the correpondence between pixel value of the masks and the object they correspond to (like car, road, pedestrian…).

Points

Bounding Boxes

_We provide three types, because we have two targets:the bounding boxes and the labels. That’s why we pass ninp=1 at the end, to tell the library where the inputs stop and the targets begin.

Text

Tabular

Standard preprocessing in fastai, use those pre-processors:

End of Notebook