Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax continuity constraints on Annotation #362

Open
jerinphilip opened this issue Feb 26, 2022 · 7 comments
Open

Relax continuity constraints on Annotation #362

jerinphilip opened this issue Feb 26, 2022 · 7 comments
Labels
enhancement New feature or request low-priority Things work, this could make things better.

Comments

@jerinphilip
Copy link
Contributor

Related: #355 (comment), #298

I have proposed jelmervdl/translatelocally-web-ext#5 at the experimental extension, a next feature in wishlist would be an explanation like the one below. A little far-fetched, but someday I'd like to see the visualization usually depicting attention as an explanation of translation via the extension.

image

(Screenshot taken from https://distill.pub/2016/augmented-rnns/, so we already have JS available under a permissive license, hopefully).

#298 indicates that we are editing annotation to get HTML in, but the subword tokens now include tag information. This is not ideal when we want to build things like the above. A solution is to relax the continuity constraints imposed to connect strongly to SentencePiece to just a constraint of monotonous byte ranges.

We may look at planting methods on Annotation to insert markup in between rather than doing it externally, keeping the whole data structure consistent. This would also make it simple for other markups when we get to building those.

Opening this issue to discuss.

@jerinphilip jerinphilip added enhancement New feature or request low-priority Things work, this could make things better. labels Feb 26, 2022
@kpu
Copy link
Member

kpu commented Feb 26, 2022

"Attention is not not Explanation" https://aclanthology.org/D19-1002.pdf
"Attention is not Explanation" https://aclanthology.org/N19-1357.pdf

@jelmervdl
Copy link
Member

One difficulty I noticed is that HTML is not just text with tags added in between. Some characters, like & and < need to be replaced with &amp; and &lt;.

@jerinphilip
Copy link
Contributor Author

Good enough for HTML replacement, good enough for the visualization. Attention is all we need 🤗. Besides, we can build UI etc with the existing ByteRange derived Annotation and replace attention with whatever future mechanism becomes "explanation" in a similar setting.

Some characters, like & and < need to be replaced with &amp; and &lt;.

We don't need to relax the continuity constraints for this, but such op support via the Annotation class itself could be useful for a wider range of applications. May I ask the points where these edits happen to help study abstracting ops on Annotation that HTML is currently doing that can be pushed down and be reused across other markups as well?

No hurries though, we can slowly incubate this idea.

@kpu
Copy link
Member

kpu commented Feb 27, 2022

The alignments come from guided alignment trained from fastalign. Not from attention. The alignments are what drives HTML alignment.

jerinphilip added a commit to jerinphilip/bergamot-translator that referenced this issue Mar 21, 2022
@jerinphilip
Copy link
Contributor Author

jerinphilip commented Mar 22, 2022

image

NB: Continuity constraints are not relaxed, I just got the screenshot thing shown in first comment working. Looks pretty. There are some resizing ugliness.

@lagleki
Copy link

lagleki commented Jun 2, 2022

image

NB: Continuity constraints are not relaxed, I just got the screenshot thing shown in first comment working. Looks pretty. There are some resizing ugliness.

Will you share the code of your improved version?

@jerinphilip
Copy link
Contributor Author

Will you share the code of your improved version?

jerinphilip#88 (This is early experimental code, will take a while to merge to main).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low-priority Things work, this could make things better.
Projects
None yet
Development

No branches or pull requests

4 participants