-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using the complete data set(about 800K data), the usage of MiB Mem will continue to increase, resulting in OOM. Is there any solution? #42
Comments
Hi, the training is based on the pytorch_lightning and it is supposed to manage the resources correctly. You can see that the dataset class SyncDreamer/ldm/data/sync_dreamer.py Line 57 in eb41a0c
which simply loads data here and is not supposed to cause increasing memory usage. Maybe, you can check whether the memory usage is growing or not when running the dataset solely. |
Thank you very much. We think this is a problem with the configured environment. I am checking with docker environment and will give feedback if there is any result. |
I have the same problem. Have you found the solution yet? The speed of OOM is proportional to the amount of num workers. |
When I run the code in docker provided by the author, this problem is solved. Your can try to run with author's docker. |
Thank you for your information. I also exactly use the docker env, this may indeed be a docker environment problem. |
When I train with the data set of about 800k objects, the number circled in the graph keeps increasing as the number of training steps increases.
My configs/syncdreamer-train.yaml is the same as provided by the author, except for the data path
https://github.com/liuyuan-pal/SyncDreamer/blob/main/configs/syncdreamer-train.yaml
The text was updated successfully, but these errors were encountered: