Merge remote-tracking branch 'origin/main'

svip-lab · Mar 28, 2023 · 8a04330 · 8a04330
2 parents 5a7efb7 + 106dbf9
commit 8a04330
Showing 1 changed file with 24 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -2,4 +2,27 @@
 
 Here is the official implementation for CVPR 2023 paper "Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos".
 
-[Paper](https://arxiv.org/abs/2303.12370)
+## 🌱News
+- 2023-03-24: The code has been released (need revisions).
+- 2023-03-22: The preprint of the paper is available. [[Paper](https://arxiv.org/abs/2303.12370)]
+- 2023-02-28: This paper has been accepted by **`CVPR 2022`**.
+
+## Introduction
+Sequential video understanding, as an emerging video understanding task, has driven lots of researchers’ attention because of its goal-oriented nature. This paper studies weakly supervised sequential video understanding where the accurate time-stamp level text-video alignment is not provided. We solve this task by borrowing ideas from CLIP. Specifically, we use a transformer to aggregate frame-level features for video representation and use a pre-trained text encoder to encode the texts corresponding to each action and the whole video, respectively. To model the correspondence between text and video, we propose a multiple granularity loss, where the video-paragraph contrastive loss enforces matching between the whole video and the complete script, and a fine-grained frame-sentence contrastive loss enforces the matching between each action and its description. As the frame-sentence correspondence is not available, we propose to use the fact that video actions happen sequentially in the temporal domain to generate pseudo frame-sentence correspondence and supervise the network training with the pseudo labels. Extensive experiments on video sequence verification and texttovideo matching show that our method outperforms baselines by a large margin, which validates the effectiveness of our proposed approach.
+
+## Usage  
+Preparing
+
+## Acknowledgement
+Codebase from [SVIP](https://github.com/svip-lab/SVIP-Sequence-VerIfication-for-Procedures-in-Videos)
+
+## Citation 
+If you find the project or our method is useful, please consider citing the paper.  
+```
+@article{dong2023weakly,
+  title={Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos},
+  author={Dong, Sixun and Hu, Huazhang and Lian, Dongze and Luo, Weixin and Qian, Yicheng and Gao, Shenghua},
+  journal={arXiv preprint arXiv:2303.12370},
+  year={2023}
+}
+```