-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to retrieve a running job and cancel it #1766
Comments
Similar issue. I'm using As |
The local executor uses the process PID as the job ID. You should be able to send a termination signal with Look for Two more things:
|
Thank you so much for your reply! It's pretty helpful. But there is one minor issue for me here: when I use
|
Hello, if you are sure that you will always use the local executor (not SLURM) you can reset the signal handler for SIGTERM to the default instead of the bypass that is configured by submitit: import signal
signal.signal(signal.SIGTERM, signal.SIG_DFL) Alternatively, you can mimic the signal sequence that SLURM would send which is a SIGTERM followed by a SIGKILL after a small delay. In bash you can use: function my_scancel {
kill $1
sleep 10
kill -9 $1
} |
Since SLURM is not installed on our supercomputing center, I cannot use command like scancel. I can use submitit to run a job and be able to monitor the progress from .out/.err files, but what command should be used to cancel a wrong job given the job id? Thanks.
The text was updated successfully, but these errors were encountered: