Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs		configs
demo		demo
docker		docker
docs		docs
imgs		imgs
mmdetection-2.11.0		mmdetection-2.11.0
mmocr		mmocr
requirements		requirements
resources		resources
table_recognition		table_recognition
tests		tests
tools		tools
.coveragerc		.coveragerc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MMOCR_README.md		MMOCR_README.md
MMOCR_README_zh-CN.md		MMOCR_README_zh-CN.md
README.md		README.md
TableMASTER_README.md		TableMASTER_README.md
model_zoo.yml		model_zoo.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

MTL-TabNet: Multi-task Learning based Model for Image-based Table Recognition

About The Project

This project is the implementation of MTL-TabNet (Multi-task Learning based Model for Image-based Table Recognition) based on the repository of TableMASTER-mmocr (Thank you very much for your excellent works).

Method Description

The proposed model consists of one shared encoder, one shared decoder, and three separate decoders for three sub-tasks of the table recognition problem as shown in Fig. 1. The shared encoder encodes the input table image as a sequence of features. The sequence of features is passed to the shared decoder and then the structure decoder to predict a sequence of HTML tags that represent the structure of the table. When the structure decoder produces the HTML tag representing a new cell (‘’ or ‘<td ...’), the output of the shared decoder corresponding to that cell and the output of the shared encoder are passed into the cell-bbox decoder and the cell-content decoder to predict the bounding box coordinates and the text content of that cell. Finally, the text contents of cells are inserted into the HTML structure tags corresponding to their cells to produce the final HTML code of the input table image.

Dependency

Getting Started

Prerequisites

About PubTabNet, click here for downloading, and check their github and paper for the details.
About FinTabNet, click here for downloading the dataset and use extract_table_images_FinTabNet.py to extract table images and the annotation file (like PubTabNet).
About the metric TEDS, see github

Installation

Install mmdetection. click here for details.

# We embed mmdetection-2.11.0 source code into this project.
# You can cd and install it (recommend).
cd ./mmdetection-2.11.0
pip install -v -e .

Install mmocr. click here for details.

# install mmocr
cd {Path to MTL-TabNet}
pip install -v -e .

Install mmcv-full-1.3.4. click here for details.

pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html

# install mmcv-full-1.3.4 with torch version 1.8.0 cuda_version 10.2
pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

Usage

Data preprocess

Run data_preprocess.py to get valid train data. Remember to change the 'raw_img_root' and ‘save_root’ property of PubtabnetParser to your path.

python ./table_recognition/data_preprocess.py

It will about 8 hours to finish parsing 500777 train files. After finishing the train set parsing, change the property of 'split' folder in PubtabnetParser to 'val' and get formatted val data.

Directory structure of parsed train data is :

.
├── StructureLabelAddEmptyBbox_train
│   ├── PMC1064074_007_00.txt
│   ├── PMC1064076_003_00.txt
│   ├── PMC1064076_004_00.txt
│   └── ...
├── recognition_train_img
│   ├── 0
│       ├── PMC1064100_007_00_0.png
│       ├── PMC1064100_007_00_10.png
│       ├── ...
│       └── PMC1064100_007_00_108.png
│   ├── 1
│   ├── ...
│   └── 15
├── recognition_train_txt
│   ├── 0.txt
│   ├── 1.txt
│   ├── ...
│   └── 15.txt
├── structure_alphabet.txt
└── textline_recognition_alphabet.txt

Train

Train multi-task learning based table recognition model with MTL-TabNet.

sh ./table_recognition/expr/table_recognition_dist_train.sh

Inference

To get final results.

python ./table_recognition/run_table_inference.py

run_table_inference.py will call table_inference.py and use multiple gpu devices to do model inference. Before running this script, you should change the value of cfg in table_inference.py .

Directory structure of table recognition results are:

# If you use 8 gpu devices to inference, you will get 8 detection results pickle files, one end2end_result pickle files and 8 structure recognition results pickle files. 
.
├── structure_master_caches
│   ├── structure_master_results_0.pkl
│   ├── structure_master_results_1.pkl
│   ├── ...
│   └── structure_master_results_7.pkl

Get TEDS score

Installation.

pip install -r ./table_recognition/PubTabNet-master/src/requirements.txt

Get gtVal.json.

python ./table_recognition/get_val_gt.py

Calcutate TEDS score. Before run this script, modify pred file path and gt file path in mmocr_teds_acc_mp.py
```
python ./table_recognition/PubTabNet-master/src/mmocr_teds_acc_mp.py
```

Result

TEDS score

Datasets	TEDS (%)	TEDS-struct. (%)
FinTabNet	-	98.79
PubTabNet	96.67	97.88

Pretrained Model

Pretrain model can be download in PubTabNet and FinTabNet

Demo

To run demo for recognizing a table image (you can change the input file and checkpoint file in demo.py):

python ./table_recognition/demo/demo.py

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

@article{visapp23namly,
   title={An End-to-End Multi-Task Learning Model for Image-based Table Recognition},
   author={Nam Tuan Ly and Atsuhiro Takasu},
   booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP},
   year={2023},
   pages={626-634},
   publisher={SciTePress},
   doi={10.5220/0011685000003417},
}

Contact

Nam Ly ([email protected], [email protected])
Atsuhiro Takasu ([email protected])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MTL-TabNet: Multi-task Learning based Model for Image-based Table Recognition

Contents

About The Project

Method Description

Dependency

Getting Started

Prerequisites

Installation

Usage

Data preprocess

Train

Inference

Get TEDS score

Result

Pretrained Model

Demo

License

Citations

Contact

Acknowledgements

About

Releases

Packages

Languages

License

namtuanly/MTL-TabNet

Folders and files

Latest commit

History

Repository files navigation

MTL-TabNet: Multi-task Learning based Model for Image-based Table Recognition

Contents

About The Project

Method Description

Dependency

Getting Started

Prerequisites

Installation

Usage

Data preprocess

Train

Inference

Get TEDS score

Result

Pretrained Model

Demo

License

Citations

Contact

Acknowledgements

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages