Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPR for Bipartite graph in recommendation #25

Open
riyaj8888 opened this issue Aug 24, 2023 · 3 comments
Open

PPR for Bipartite graph in recommendation #25

riyaj8888 opened this issue Aug 24, 2023 · 3 comments

Comments

@riyaj8888
Copy link

can u provide one simple example , how to use it for bipartite graph for recommending items to for users using this PPR code.

thanks

@mberr
Copy link
Owner

mberr commented Aug 24, 2023

Hi @riyaj8888 ,

I am not very familiar with the latest research on recommender systems, so you might want to take a look at that first. One particular question would be whether you need to pre-process your data to account for the sparsity of user-item interactions (e.g., cluster users or items into groups, and then operate on a densified user-cluster-item-cluster interaction matrix).

That said, this library only gives you a (fast) way to compute the (personalized) page rank, which can be interpreted as a relevance score. A quick application I can think of is to calculate PPR scores for a given user, and then recommend items that have a high score. To do this, you need to

  1. construct the adjacency matrix; there are several possible formats, but the simplest might be an edge index, see https://torch-ppr.readthedocs.io/en/stable/usage.html#torch_ppr.api.personalized_page_rank
import torch
edge_index = torch.as_tensor(data=[
  (user_id, item_id + num_users)
  for user_id, item_id in user_item_interactions
]).t()

Note that you have to make sure that the node IDs are unique, i.e., a user with id i does not get the same node id as the item i.

  1. calculate the personalized page rank scores for a user of interest
from torch_ppr import personalized_page_rank
scores = personalized_page_rank(edge_index=edge_index, indices=[user_id_of_interest])

Note that this will be a vector of size num_users + num_items, i.e., you also get scores for other users and not just items.

  1. e.g., look at the top-k items
best_item_ids = scores[num_users:].topk(k=10, sorted=True)

@riyaj8888
Copy link
Author

Thanks,
in step 3. it should be scores[num_items:] ri8?

@mberr
Copy link
Owner

mberr commented Aug 25, 2023

in step 3. it should be scores[num_items:] ri8?

No. With the code above, you use the node ids [0, ..., num_users) for user nodes, and [num_users, ..., num_users + num_items) for item nodes. Thus, we look at the scores for nodes starting at num_users until the end.

You could also switch that, but then you would need to adjust step 2, too, since indices has to be a node index, i.e., you need to translate the user index to a node index first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants