Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make job_file_dir a parameter of HTCondorWorkflow #180

Closed
EthanMarx opened this issue Jul 15, 2024 · 6 comments · Fixed by #185
Closed

Make job_file_dir a parameter of HTCondorWorkflow #180

EthanMarx opened this issue Jul 15, 2024 · 6 comments · Fixed by #185
Assignees
Labels

Comments

@EthanMarx
Copy link
Contributor

Description

Currently, the job_file_dir is set from the law config. I have a use case where multiple HTCondorWorkflows may be running simultaneously, in which case I noticed that job files were being overwritten by one another.

The workaround I found was to hack in to the htcondor_create_job_file_factory hook:

class MyHTCWorkflow(HTCondorWorkflow):

def htcondor_create_job_file_factory(self, **kwargs):
        # set the job file dir to proper location
        kwargs["dir"] = self.job_file_dir
        return super().htcondor_create_job_file_factory(**kwargs) 

So that I can set the job_file_dir in a task dependent way.

This solution works fine, but this seems like a common enough use case that it should be more obvious how / where this can be set. I propose making this a parameter of the HTCondorWorkflow that can default to reading from the config if left unset.

Can put together a PR if this is something that is desirable.

@riga
Copy link
Owner

riga commented Jul 15, 2024

Hi @EthanMarx ,

thanks for opening the issue!

What do your job file directories look like? Usually, there is a temporary directory that is created per job submission inside that job_file_dir. It's controlled by the job:: job_file_dir_mkdtemp setting which should default to True (see https://law.readthedocs.io/en/latest/config.html#job). Is that the case for you?

@EthanMarx
Copy link
Contributor Author

Ah! I've been setting that to False, but only b/c I was misunderstanding what it does. Thanks for pointing that out - should resolve my issue!

@EthanMarx
Copy link
Contributor Author

EthanMarx commented Jul 15, 2024

Can I ask about the naming convention for the sub directories?

@riga
Copy link
Owner

riga commented Jul 15, 2024

Can I ask about the naming convention for the sub directories?

It's using the default of tempfile.mkdtemp.

@EthanMarx
Copy link
Contributor Author

Thanks. Would it make sense to name these directories based off something like the task name?

@riga
Copy link
Owner

riga commented Jul 15, 2024

Yes, that would be good! I will change the option to also accept strings that will support a couple template variables like tmp_XXXX_{task_id}.

  • More than three X's will be replaced by random characters (just like the standard mkdtemp)
  • {{task_id}} -> task.live_task_id
  • {{task_family}} -> task.task_family

I should have time to work on this tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants