Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug: Investigate "No space left on device" in our pipelines #1067

Closed
DhanshreeA opened this issue Mar 13, 2024 · 3 comments
Closed

🐛 Bug: Investigate "No space left on device" in our pipelines #1067

DhanshreeA opened this issue Mar 13, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@DhanshreeA
Copy link
Member

DhanshreeA commented Mar 13, 2024

Describe the bug.

A number of our model pipelines, typically in the "upload model to dockerhub" stage fail because of "No space left on device".
Exhibits:

Describe the steps to reproduce the behavior

Go to any one of the jobs above and re run. I tried to run these jobs when no other jobs were running on our runners and yet they failed.

Expected behavior.

These jobs should pass.

Screenshots.

No response

Operating environment

Runner OS

Additional context

Potentially useful resources:

@DhanshreeA DhanshreeA added the bug Something isn't working label Mar 13, 2024
@DhanshreeA DhanshreeA self-assigned this Mar 13, 2024
@GemmaTuron
Copy link
Member

Hi @DhanshreeA
I am highlighting this model: https://github.com/ersilia-os/eos9taz/actions as I need it for chemsampler but it is not passing either :)

@DhanshreeA
Copy link
Member Author

DhanshreeA commented Mar 14, 2024

@GemmaTuron, I'm aware of this (ersilia-os/eos9taz#11) For now, I've pushed it manually to unblock you. However there's a related issue that I'm on top of: #1068 which I opened with specifically this model in mind.

@DhanshreeA
Copy link
Member Author

Root Cause: With GitHub hosted runners, one of the guarantees is getting software updates. This could mean one or all of the following frequently: runner updates, runner provisioner updates, patches to the OS on the runner, the software bundled in the OS, etc. Every such update is quite likely to eat into the disk space. For example, here's the list of all the tools installed on the runner OS, that we do not need for our builds. So far there is no straightforward solution other than an aggressive disk clean up. This has been implemented at the level of the eos-template repository. While this works for now, it is not guaranteed that this issue will not come up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants