-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommend DiracFile
over MassStorageFile
#162
Comments
I think Chris Jones's response was pretty good:
I think there are two main reasons why people want their files on EOS:
For 1, I think as long as there's an obvious way to get the list of file paths you need, it's not too bad. I tink you'd need a little loop in Ganga for this. For 2, I don't know if having the files at CERN actually is any faster, maybe the internet connection to the grid sites is so 🔥 fast that there's some other bottleneck. If people want, they could replicate the file to CERN anyway. And: if you can easily get a list of files, you could also I dunno. I think the most important thing is that you want the most automation possible; you don't want to be handing out scripts or functions for |
For For
Do we know what those numbers correspond to? You can look them up and generate a file list, but I fear the self-made paths from |
For sure 😞 I think the number is the ID of the grid job, which should be unique across All Grid Jobs Ever. It might be stored in the job's If you have a job and want the list of LFNs, this should do it:
If the file is replicated at CERN-USER, the LFN can be mapped to an XRootD path I think (I don't have an example job to play around with). |
Yeah, for me it basically boils down to the fact, that I want to be able to look into my ntuple within seconds. And I really cannot be bothered to find out some obscure LFNs every time :(. |
Yeah, people want to be able to do With a file on the CERN grid site/EOS you can do
which isn't awful. |
No, it is ok. But you have to remember the path :-). |
Two comments.
The above syntax works also if the file is not at CERN. The name to use can be obtained either from the The second comment is that there is a long standing request to allow the user to decide on the directory structure of |
Does someone know the state of the |
You mean the user side decision of directory that Dirac stores the file in? There is a missing feature in the LHCbDirac API. Before that is available there is nothing that can be done from the Ganga side. |
Jupp. Do the dirac guys use github to track the progress on this or is there a issue in the ganga repository we can track to keep informed? |
tracing this further the ball is in the Ganga camp now (where it has been forgotten). I have created a new issue on Github to follow this, ganga-devs/ganga#201 |
There is now a method on Ganga In [7]: df.accessURL?
Type: function
String Form:<function accessURL at 0x7f68f7183320>
File: /afs/cern.ch/lhcb/software/releases/GANGA/GANGA_v602r2/install/ganga/python/Ganga/GPIDev/Base/Proxy.py
Definition: df.accessURL(*args, **kwargs)
Docstring:
Attempt to find an accessURL which corresponds to the specified SE. If no SE is specified then
return a random one from all the replicas. For example;
I think we should advise people to use this, rather than copying everything to 'CERN-USER' and only using |
@alexpearce Yes, I think that is a good idea - at least if performance is not harmed. The most important thing is to get rid of the recommendation to use |
OK, thanks. We'll push to get this done before the next workshop. |
Should some comment be made that this in effect makes the analysis chain less "CERN-centric" - if you no longer copy files to CERN, there is nothing special about lxplus. |
Indeed. We should keep in mind that AFS will soon be 💀. I don't know if there's a know replacement for the interactive environment, so if user's are already able to do things without lxplus (on their local cluster, or on their laptops) that will make the transition easier. |
I have been playing around with this workflow in the last days and it sometimes a bit tricky. And one problem I encountered is that you have to have a valid grid proxy to use these files. This makes it complicated to use on your own PC or in a batch job. |
@saschastahl Your comment that it takes a bit more effort to get read access to these files is a valid one. However, you can obtain a long life proxy that should more or less get rid of that. Placing the files on EOS will not make your life easier unless you do subsequent analysis inside CERN which I thought we were in general discouraging. |
We do indeed want to discourage location-specific analysis. People should be able to do their work on their own machines wherever they are Do you know how to generate this "long life proxy" @egede? That might make things slightly easier. Although there are stills hoops to jump through when running on the batch system or on your local machine (that doesn't have the usual Grid machinery on it). Perhaps we could also provide instructions on how to access these non-CERN Grid files 'locally'? (For me, some sites I could access my Grid files from without a proxy, others not. It seems the access policy isn't uniform, so we just need a solution that works everywhere) |
Yes, I was specifically referring to jobs on a batch system. It involved several steps to transfer the grid proxy to my jobs. I can provide the instructions I found a twiki page but it is a bit cumbersome. |
With Ganga 6.3, each object that needs a GridProxy will contain information about it. This does not solve the problem in itself, but paves the way for the a job sent to Batch automatically would forward the proxy to it (and fail to submit if no valid proxy was avaiable). |
In order to update the lesson, should we delete the MassStorage setup and use completely or just leave a sidenote on it ? |
This should have been solved by #209 Any comments? |
Related to a discussion on failed large file downloads from the grid when using the
MassStorageFile
setup from11-eos-storage.md
For a
MassStorageFile
ganga will download the file to the local machine and then copy it to EOS. This is quite fragile, especially as the error messages you get don't really help in diagnosing what went wrong.We should switch to recommending to use
DiracFile
s instead. These can be created directly by the worker node instead of having to round trip to the machine whereganga
is running.The text was updated successfully, but these errors were encountered: