This repository presents a deep learning-based framework for recognizing abnormal activities in video data. By leveraging optical flow and temporal feature learning, our model effectively detects irregular events such as violent or non-violent scenes.
- Fusion Model: Combines 3D Convolutional Neural Networks (3D CNN) and optical flow for robust spatial-temporal pattern learning.
- Optical Flow Implementation: Includes custom implementations of Lucas-Kanade and Farneback algorithms for motion estimation.
- Dataset Utilization: Validated using the mini-RWF2000 dataset, with an ablation study highlighting the significance of optical flow integration.
The model integrates optical flow and 3D CNNs to learn spatial and temporal features from video frames:
- Optical Flow: Implements the Lucas-Kanade and Farneback algorithms for motion estimation.
- 3D CNN: Processes both RGB channels and optical flow data for time-sequential learning.
- Fusion Mechanism: Combines outputs from RGB and optical flow channels for classification using a Multilayer Perceptron (MLP).
- Dataset: Mini-RWF2000 dataset with pre-labeled "Fight" and "Non-Fight" scenarios.
- Hardware: Trained on NVIDIA A100 GPU.
- Optimizer: SGD with learning rate
0.003
and weight decay1e-6
. - Loss Function: Cross Entropy Loss.
- Learning Rate Scheduler: Cosine Annealing.
- Color Jittering: Applies random changes to brightness, contrast, and saturation.
- Flipping: Horizontal flipping for variability.
- Qualitative evaluation demonstrates effective motion tracking using Lucas-Kanade and Farneback algorithms.
- Training accuracy: ~75%
- Validation accuracy: ~75%
- Ablation study shows improved performance with optical flow integration.
- Without Optical Flow: Higher training accuracy but reduced generalization.
- With Optical Flow: Improved validation accuracy and robust detection.
- Mini-RWF2000: Contains 200 videos (160 training, 40 validation) pre-labeled with "Fight" and "Non-Fight".
- SCVD Dataset: Used for inference and validation in a real-world setting.
@INPROCEEDINGS{9412502, author={Cheng, Ming and Cai, Kunjing and Li, Ming}, booktitle={2020 25th International Conference on Pattern Recognition (ICPR)}, title={RWF-2000: An Open Large Scale Video Database for Violence Detection}, year={2021}, volume={}, number={}, pages={4183-4190}, doi={10.1109/ICPR48806.2021.9412502}}