Skip to content

Commit

Permalink
add this to machine translation,. Is it okay? (#625)
Browse files Browse the repository at this point in the history
* Update machine_translation.md

* Update machine_translation.md

* Update paraphrase-generation.md

* Update paraphrase-generation.md

* Update paraphrase-generation.md
  • Loading branch information
adrienpayong authored Jun 23, 2024
1 parent 29dc695 commit ca43714
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 0 deletions.
7 changes: 7 additions & 0 deletions english/machine_translation.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,11 @@ on BLEU.
| ConvS2S (Gehring et al., 2017) | 40.46 | [Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122) |
| Transformer Base (Vaswani et al., 2017) | 38.1 | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) |

### WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages

| Model | BLEU | Paper / Source |
| ------------- | :-----:| --- |
| vanilla MNMT models| 17.95 | [Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages](https://arxiv.org/pdf/2210.09644v1.pdf)|


[Go back to the README](../README.md)
16 changes: 16 additions & 0 deletions english/paraphrase-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,19 @@ The [QQP-POS dataset](https://www.kaggle.com/c/quora-question-pairs/overview) is
| ------------- | :-----:| --- | --- |
| Unsupervised BART w/ Dynamic Blocking | 26.76 | [Niu et al., 2020](https://arxiv.org/pdf/2010.12885v1.pdf)| Unavailable|
| ParafraGPT-UC| 35.9| [Bui et al., 2020](https://arxiv.org/pdf/2011.14344v1.pdf)| [Code](https://github.com/BH-So/unsupervised-paraphrase-generation)|

### MULTIPIT, MULTIPITCROWD and MULTIPITEXPERT

Past efforts on creating paraphrase corpora only consider one paraphrase criteria without taking into account the fact that the desired “strictness” of semantic equivalence in paraphrases varies from task to task (Bhagat and Hovy, 2013; Liu and Soh, 2022). For example, for the purpose of tracking unfolding events, “A tsunami hit Haiti.” and “303 people died because of the tsunami in Haiti” are sufficiently close to be considered as paraphrases; whereas for paraphrase generation, the extra information “303 people dead” in the latter sentence may lead models to learn to hallucinate and generate more unfaithful content. In this paper, the authors present an effective data collection and annotation method to address these issues.

MULTIPIT is a topic Paraphrase in Twitter corpus that consists of a total of 130k sentence pairs with crowdsoursing (MULTIPITCROWD ) and expert (MULTIPITEXPERT ) annotations. MULTIPITCROWD is a large crowdsourced set of 125K sentence pairs that is useful for tracking information onTwitter.
| Model | F1 | Paper / Source | Code |
| ------------- | :-----:| --- | --- |
| DeBERTaV3large | 92.00 |[Improving Large-scale Paraphrase Acquisition and Generation](https://arxiv.org/pdf/2210.03235v2.pdf)| Unavailable|


MULTIPITEXPERT is an expert annotated set of 5.5K sentence pairs using a stricter definition that is more suitable for acquiring paraphrases for
generation purpose.
| Model | F1 | Paper / Source | Code |
| ------------- | :-----:| --- | --- |
| DeBERTaV3large | 83.20 |[Improving Large-scale Paraphrase Acquisition and Generation](https://arxiv.org/pdf/2210.03235v2.pdf)| Unavailable|

0 comments on commit ca43714

Please sign in to comment.