-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prolonged running time for PEER #162
Comments
As discussed let's try to hack APEX into taking fake VCF file ... but i'm still thinking if we want to run PEER separately by asking for more CPU hours? there are a few other methods out there for the remove unwanted variation (RUV) type of analysis (see here on page 2 for a review). I don't feel motivated doing comparisons of these methods by the number of QTLs discovered, as in APEX paper (where the bi-cv method seems to outperform PEER). But at least we should try to provide this data from PEER for the analysis, because "GTEx endorsed" ? |
It is less about the walltime, but more about the fact that peer stop at iteration ~400 every time it ran, which I have no way to debug. From the PEER repo, their developer do suggest using a updated version of PEER: mz2/peer#16 which may be better supported in terms of efficiency |
Thanks @hsun3163 i had a long offline discussion with the developer's team. We'll implement a version of it based on our discussions. I'll document it in more detail on the PEER and APEX module pages. |
At the moment, A peer analysis on log2cpm expression data in Ast, with 1000 maximum iteration + 60 maximum factors have not been complete after 24 Hr.
The main toll on time is the PEER_update function. which run 1000 times iteratively.
For this particular dataset, it take ~15hr to complete ~400 iterations, and the stdout file fail to update afterward.
The R wrapper for peer do not seems to have any function that could use more resource to speed up the analysis, possibly due to the fact that R only use 1 cores.
Therefore, I wonder is there anyway we could optimize this process./make compromise to speed up this process, i.e. lower the number of factors to be estimated?
The text was updated successfully, but these errors were encountered: