You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BLAST often fails, and also using too many BLAST resources can get you banned from using BLAST. Right now the pipeline fails (in query mode) if it receives no BLAST input. It would be helpful to be able to choose not to use BLAST for certain jobs. Also more generally it would be nice to move away from using the public ncbi BLAST resource but that's probably a bigger change!
The text was updated successfully, but these errors were encountered:
As far as I can see, the BLAST results are only used to make a list of RefSeq sequence accessions, which are then mapped to UniProt identifiers for use in the next steps of the pipeline. I can think of a couple of ways of getting such a list of identifiers without having to use BLAST:
Allow users to use their own MSA file or BLAST/mmseqs2 hits table, and read the sequence accessions from it. We'd have to specify that users only supply RefSeq or UniProt accessions
Use the ColabFold API to make requests to mmseqs2 to get back hits. This is fast and seems to be much more reliable than BLAST requests. See the run_mmseqs2() function from this file in the ColabFold GitHub.
I've been playing around with the latter and it might work well. Although we'd need to reproduce the previous results to see if they're replicable with mmseqs2 vs BLAST (which I would expect them to be)
BLAST often fails, and also using too many BLAST resources can get you banned from using BLAST. Right now the pipeline fails (in query mode) if it receives no BLAST input. It would be helpful to be able to choose not to use BLAST for certain jobs. Also more generally it would be nice to move away from using the public ncbi BLAST resource but that's probably a bigger change!
The text was updated successfully, but these errors were encountered: