Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRAPH-OF-WORDS file with edges and nodes labels #3

Open
Matt-81 opened this issue Apr 5, 2023 · 2 comments
Open

GRAPH-OF-WORDS file with edges and nodes labels #3

Matt-81 opened this issue Apr 5, 2023 · 2 comments

Comments

@Matt-81
Copy link

Matt-81 commented Apr 5, 2023

Dear @GuillaumeDD,
thank you for this great work! I was trying gowpy.gow.miner and gowpy.gow.io for converting a corpus into a collection of graphs of words. I saw that in the exported file the graphs does not report the input text (e.g., a node like "foo", becomes "v 0 0").

I was wondering if it is possible to export it as "v 0 foo".

Thanks in advance for your help!

@GuillaumeDD
Copy link
Owner

Hi @Matt-81 ,
Thank you for your positive feedbacks on gowpy 🙏

Unfortunately, it is not possible to export a node as "v 0 foo". The reason is that frequent mining subgraph algorithms expect node/edge labels as non-negative integers, see https://github.com/Jokeren/gBolt#input-specification for instance.

However, the GoWMiner class keeps the mapping between these integers and their corresponding labels.
The easiest way to get back to the tokens is via GoWVectorizer initialized from the GoWMiner. There is an example to get back the feature names in the following notebook examples/classification-r8-frequent_subgraphs.ipynb in Section "GoW Vectorizer Example".

Hope this help!

@Matt-81
Copy link
Author

Matt-81 commented Apr 11, 2023

Hi @GuillaumeDD, great, thanks for the feedback! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants