Uploaded by balashankar14401438

Training results

advertisement
Hi,
The concerns against the augmentation techniques have been reviewed. Modified the code to
allow original images as well when augmentation=True is set.
I make sure that the same augmentation is applied for the corresponding labels i.e. table_mask
and column_mask as well.
I think it would be great to lay down the steps of preprocessing because if there are any
discrepancies it will be easy for you to correct.
Mask Generation:
1. To create labels I have initialized np.zeros with dtype of np.uint8 to make sure the range
of values is [0,255]
2. Using the coordinates of every table in the document (x_min, x_max, y_min, y_max)
extracted from .xml files I have filled the initialized array with 255.
3. Hence, for the task of image segmentation this makes two class classification.
Preprocessing:
1. For the task of table detection or the structure of the table detection the colors of the
image can be treated as noise.
2. To eliminate the color we have converted the input image into Grayscale and again into
RGB.
3. To improve the contrast of the image we have done histogram equalization.
4. Finally resize the image into (1024, 1024, 3)
All the images are stored in the .jpeg format before sending them into the network.
Architecture:
1. There are three upsampling processes in each of the decoder branches.
2. Except for the last Conv2DTranspose I have used 128 filters with no activation.
3. For the last Conv2DTranspose which is the output of the branch I have used 1 filter with
sigmoid activation. We want the probability of every pixel belonging to class 1 for
calculating Binary_crossentropy. Hence output will be (None, 1024, 1024, 1)
For calculating the loss for the branches I have used binary_crossentropy. But you have
suggested trying out sparse_categorical_crossentropy.
1. Binary_crossentropy: It is analogous to log loss. The labels must be either 1 or
0. The final output must be the probability of each pixel belonging to class 1.
Hence activation is sigmoid
2. Sparse_categorical_crossentropy: It is same as categorical_crossentropy but
the labels must be in integer format i.e. 1, 2, 3, ….so on. The final output must be
the probability of two classes separately for each pixel. Hence activation is
softmax with 2 filters.
Since it is not multi-class classification task I’m sticking to the Binary_crossentropy
Extensions:
In order to extend the case_study I have used DenseNet121, RestNet50 and MobileNet_v2 as
feature_extractor as well.
Among them DenseNet121 proved best.
I have taken two sample examples and plotted the prediction of each of the feature_extractor.
DenseNet121 seems working best for even documents containing more no. of tables.
The results in the research paper when the TableNet was fine tuned on the ICDAR dataset for
f1_score is 0.95 for table_mask detection and 0.90 for column_mask detection.
With DenseNet121 I have got 0.7885 and 0.6547 respectively for validation mormot data.
From this I’m going to continue the case study with DenseNet121 Feature_extractor alone.
Download