import submitit
# Define where we'd like submitit to place our logs
= submitit.AutoExecutor(folder='~/submitit_logs')
executor
# Define the parameters of our slurm job
# Just like Dasks' job_extra_directives, additional_parameters allows us to specify things that submitit doesn't support directly
=30, mem_gb=128, cpus_per_task=16, slurm_partition="BigCats", slurm_additional_parameters={"gres": "gpu:1"}) executor.update_parameters(timeout_min
SubmitIt Offloading
SubmitIt is a lower level library than Dask which you can also use to offload parts of your notebook to the SLURM queue. Rather than managing a cluster, you will instead directly be submitting python functions to the SLURM queue giving you more control. For more information, have a read of their PyPi page.
We can submit our function to the cluster with the executor.submit
method. This will return a future which can be unpacked with its result using future.result()
just like when we were working with Dask. Because we are offloading to the SLURM queue print statements will not be visible, just like with Dask SLURMClusters
. However, the full stack trace is still visible when an error or assertion is raised within the function.
def client_test(input1, input2, error=False, test=False):
# Force an error
if error:
assert 0 == 1
# Stop after one batch when testing
if test:
print("When running in a local cluster you can see print statements!")
return input1, input2
= executor.submit(client_test, "input1", "input2", test=True)
future future.result()
('input1', 'input2')
= executor.submit(client_test, "input1", "input2", error=True)
future future.result()
FailedJobError: Job (task=0) failed during processing with trace:
----------------------
Traceback (most recent call last):
File "/apps/mambaforge/envs/dsks_2024.06/lib/python3.10/site-packages/submitit/core/submission.py", line 55, in process_job
result = delayed.result()
File "/apps/mambaforge/envs/dsks_2024.06/lib/python3.10/site-packages/submitit/core/utils.py", line 133, in result
self._result = self.function(*self.args, **self.kwargs)
File "/tmp/ipykernel_1235436/858968069.py", line 4, in client_test
AssertionError
----------------------
You can check full logs with 'job.stderr(0)' and 'job.stdout(0)'or at paths:
- /home/mhar0048/submitit_logs/6952_0_log.err
- /home/mhar0048/submitit_logs/6952_0_log.out
Note that since we are interacting directly with the queue, we don’t need to clean up and shut down our cluster when using SubmitIt.
If needed we can be more specific about the specific GPU type and QoS we need if we have more complex requirements.
=30, mem_gb=128, cpus_per_task=16, slurm_partition="BigCats", slurm_additional_parameters={"gres": "gpu:3g.20gb:1", "partition": "BigCats"})
executor.update_parameters(timeout_min"input1", "input2", test=True).result() executor.submit(client_test,
/apps/mambaforge/envs/dsks_2024.06/lib/python3.10/site-packages/submitit/auto/auto.py:23: UserWarning: Setting 'additional_parameters' is deprecated. Use 'slurm_additional_parameters' instead.
warnings.warn(f"Setting '{arg}' is deprecated. Use '{new_arg}' instead.")
('input1', 'input2')
Comparison with Dask
As you can see, we’ve implemented the same use case with both Dask and SubmitIt. Which begs the question - which should you use for your research?
Both packages have pros and cons, but on the whole, Dask is much better suited towards tasks which can benefit from being broken into many small tasks - like when preprocessing your data. SubmitIt on the other hand is much better suited for use cases where you are looking to offload one larger job at a time, like when you are training.
Of the two, Dask is the more mature package with more flexibility and complete documentation - but if you are looking for a simple offloading package it is often far more complexity than you need.