This file will give a brief introduction for:
- How to prepare your own datasets for training with provided methods.
- How to implement your own fascinating method.
We would first give some notations.
- Prefix: a string that is used to identify your dataset. Suppose it is
vid_custom
here. - Split: We need to prepare a
train
andval
split for training and evaluating our method. When combined with the prefix, they becomevid_custom_train
andvid_custom_val
.
Then we will go through the whole pipeline.
We suggest you to organize your data structure as the ImageNet VID dataset. It should look like:
datasets
├── vid_custom
| |── train
| | |── video_snippet_1
| | | |── 000000.JPEG
| | | |── 000001.JPEG
| | | |── 000002.JPEG
| | | ...
| | |── video_snippet_2
| | | |── 000000.JPEG
| | | |── 000001.JPEG
| | | |── 000002.JPEG
| | | ...
| | ...
| |── val
| | |── video_snippet_1
| | | |── 000000.JPEG
| | | |── 000001.JPEG
| | | |── 000002.JPEG
| | | ...
| | |── video_snippet_2
| | | |── 000000.JPEG
| | | |── 000001.JPEG
| | | |── 000002.JPEG
| | | ...
| | ...
| |── annotation
| | | |── train
| | | |── val
Following this structure, your could directly use all provided methods with only minor modification on the dataloader, which will be introduced later.
After sturcturing the dataset, then we need to give a indexing file to let the dataloader know which set of frames are used to train the model. We suggest to prepare this file as VID_train_15frames.txt. This txt file should have four strings in a single line: video folder
, no meaning
(just ignore it), frame number
, video length
. That should be enough.
Once you have prepared your dataset, we should assign a dataloader to load images and annotations. Notice differnet method uses different dataloader. Take single frame baseline as an example, we use baseloader to load the data. To make it compatible with your dataset, you should modify:
- You should modify attribute
classes
to the categories in your dataset. - Modify the
load_annos()
and_preprocess_annotation()
method to make it compatible with your annotation style. Make sure the returned field is the same as the baseloader.
Then the preparation is done. Very easy, right? If your want to use other methold, you could directly inherit those dataloaders from your newly created baseloader. No additional changes are needed.
Once you have done the above steps, the dataset and the dataloader needs to be added in a couple of places:
mega_core/data/datasets/__init__.py
: add the dataloader to__all__
mega_core/config/paths_catalog.py
: addvid_custom_train
andvid_custom_val
as a dictionary name with fieldimg_dir
,anno_path
andimg_index
inDatasetCatalog.DATASETS
. And correspondingif
clause inDatasetCatalog.get()
.
Modify the configs/BASE_RCNN_Xgpu.yaml
to make it compatible with the statistics of your dataset, e.g., the NUM_CLASSES
.
First give your method a fancy name, so suppose your method's name is fancy
here.
Create a new dataloader under folder data/dataset
, it should be inherited from the VIDEODataset
class. You only need to make a minor modification on __init__()
method (see vid_fgfa.py
for example) and implement _get_train()
and _get_test()
method.
As video object detection methods usually require some reference frames to assist the detection on current frame. We recommend that the current frame should be stored in images["cur"]
and all reference frames be stored in images["ref"]
as a list. This will make the following batch collating procedure easier. But it all depends on you. see vid_fgfa.py
for a example.
Once you have created your dataloader, it needs to be added in a couple of places:
mega_core/data/collate_batch.py
: add your method namefancy
in theif
clause inBatchCollator.__call__()
. And modify the processing step to make it compatible with your dataloader behavior.mega_core/data/datasets/__init__.py
: add the dataloader to__all__
.mega_core/config/paths_catalog.py
: add correspondingif
clause inDatasetCatalog.get()
to access your method.
Create your model under directory mega_core/modeling/detector
and register it in mega_core/modeling/detector/detectors.py
. Take mega_core/modeling/detector/generalized_rcnn_mega.py
and the corresponding config files as reference if you use new submodules in your model.
mega_core/engine/trainer.py
: Line 87mega_core/engine/inference.py
: Line 31mega_core/modeling/rpn/rpn.py
: Line 254mega_core/data/build.py
: Line 66
- The field
MODEL.VID.METHOD
should be specified asfancy
. - The field
MODEL.META_ARCHITECTURE
should be your model name. - If you feel confused, take a look at other configs.
If you still feel confused with some steps above or some instructions are wrong, please contact me to fix it or make it more clear.