Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to select the ligand with Entity selection node or Ligand node #701

Open
SamuelHomberg opened this issue Jan 3, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@SamuelHomberg
Copy link

Describe the bug
Differentiating between different entities using a Seperate Atoms node together with a Select Entity node does not work. Select Ligand nodes is always greyed out.

To Reproduce

  • 5dwr
  1. Import from PDB as Sticks
  2. Add Seperate Atoms node (I put it right after Group input)
  3. Add Select Entity node and connect to Seperate Atoms -> nothing is displayed
  4. Ticking the box for the protein entity (Serin/threonin-protein kinase...) shows the protein but also the ligand
  5. Ticking just the box for the ligand (N-{4-(1R,3S,...) does not show anything again
  • 3uvx
  1. Import from PDB as Sticks
  2. Add Seperate Atoms node (I put it right after Group input)
  3. Add Select Entity node and connect to Seperate Atoms -> nothing is displayed
  4. Ticking the through the boxes, the first box displays everything (including sodium ions) except for a histone chain, this can be successfully selected using the second box

Expected behavior
Entity selection allows to select different entities and or Ligand node allows to select the ligand.

Error Codes
I think this is probably unrelated?

AttributeError: 'NoneType' object has no attribute 'node_tree'
Traceback (most recent call last):
  File "C:\Program Files\Blender Foundation\Blender 4.2\4.2\scripts\modules\bpy_types.py", line 1034, in draw_ls
    func(self, context)
  File "C:\Users\Samuel\AppData\Roaming\Blender Foundation\Blender\4.2\extensions\blender_org\molecularnodes\ui\panel.py", line 63, in change_style_node_menu
    prefix = node.node_tree.name.split(" ")[0].lower()
             ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'node_tree'

Desktop (please complete the following information):

  • OS: Windows 10 / Rocky Linux 9.5
  • Hardware: Intel Core i5 6600K, GTX 1060 (Windows), Threadripper Pro 5975wx 32-cores, RTX 3090 (Linux)
  • Blender Version: 4.3 (Linux), 4.2 (Linux, Windows)
  • MolecularNodes Version: 4.2.10

Additional context
I also tried importing the different filetypes, which did not seem to make a difference. I tried it on Linux, Windows and with different Blender versions.

From the Small Molecules side of things, selecting the ligand itself would be really nice, and a workaround would be appreciated. Just using the inverse of an Is Peptide node does not work in the presence of cofactors (they also get selected) but perhaps in combination with other selectors, I'll try it out some more.

@SamuelHomberg SamuelHomberg added the bug Something isn't working label Jan 3, 2025
@BradyAJohnston
Copy link
Owner

As a workaround currently, if you molecules have bonds imported (which the example structure has) you can use Mesh Island: Island Index to select ligands etc.

image

Detection and selection of ligands is something that I have been meaning to return to for some time. The authoring of them in structure files seems to not entirely be standardised. Sometimes they are a separate chain, sometimes they are the same chain but different residue names etc. I came up with a solution that mostly kind of works years ago and haven't had another go.

I think I could probably just compare to non-standard residue names and use that for ligands / non-protein or nucleic molecules.

Open to suggestions, but I'll have a think about how to try and implement this.

@SamuelHomberg
Copy link
Author

SamuelHomberg commented Jan 3, 2025

Thanks for the workaround!

@SamuelHomberg
Copy link
Author

I have created a workaround will allow the use of the Select Entity node with some caveats:

  • Non-proteinogenic amino acids are also counted as hetero atoms (eg. PDB 5dwr) so they will not be included in the entity selection.
  • As they get their own entity_id, the node itself has not enough tick boxes to select these 'entities'.
  • The order of the entities no longer matches the order of the tick boxes, as np.unique sorts the different heteroatom types.

I'll try to solve these issues and I need to do the testing, and check whether filetypes other than .bcif also work. But if this works, I think it will be a nice solution and a special Select Ligand node might not be necessary (I have no idea how that could be implemented, as Alphafold3 can now generate .cif files with arbitrary ligands).

In molecularnodes/entities/molecule/pdbx.py:82

    @staticmethod
    def _get_entity_id(array, file):
        chain_ids = file.block["entity_poly"]["pdbx_strand_id"].as_array(str)

        # the chain_ids are an array of individual items np.array(['A,B', 'C', 'D,E,F'])
        # which need to be categorised as [1, 1, 2, 3, 3, 3] for their belonging to individual
        # entities

        chains = []
        idx = []
        for i, chain_str in enumerate(chain_ids):
            for chain in chain_str.split(","):
                chains.append(chain)
                idx.append(i)

        entity_lookup = dict(zip(chains, idx))
        chain_id_int = np.array(
            [entity_lookup.get(chain, -1) for chain in array.chain_id], int
        )
        # NEW CODE:
        # overwrite the entity value from the chain for hetero atoms

        hetero_idxs = np.nonzero(array.hetero)[0] 
        # TODO: keep from also including non-proteinogenic amino acids
        hetero_types = array.res_name[hetero_idxs]
        hetero_types_numeric = max(idx) + 1 + np.unique(hetero_types, return_inverse=True)[1]
        # TODO: np.unique sorts the hetero_types and so the order of entity_id gets changed
        entity_id_int = chain_id_int
        entity_id_int[hetero_idxs] = hetero_types_numeric
        return entity_id_int

Please let me know what you think.

@BradyAJohnston
Copy link
Owner

I've had a bit of a play around and I think this works OK, but we might need to change the approach than using the entity_id, as that is something that is authored for the file. So while we can assign a different ID to the ligand, the authors specifically assigned that protein + ligand to a single entity, and us splitting it up maybe isn't the best approach.

We could add another attribute (ligand_id). What I was using previously was to use the residue_name attribute, but start the ligand numbering from 1000 or some other number to ensure it doesn't clash with the 'canonical' numbered residues.

Ultimately it's just a problem because Geometry Nodes doesn't support string attributes, otherwise we could just match against the actual residue names rather than having to convert them to some numerical representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants