Evaluation of metrics for Speech Translation
From source:
git clone https://github.com/alan-turing-institute/ARC-M4ST
cd ARC-M4ST
python -m pip install .
brew install typst
typst compile notes.typ
Go https://ca.talkbank.org/access/CallHome, select the conversation language, create account, then you can download the "media folder". There you can find the .cha files, which contain the transcriptions.
To load the transcriptions as a bag of sentences, use m4st.parse.TranscriptParser.from_folder
to load all conversation lines. This class does not group them by participant, or conversation - it just loads every line as an entry to a list (+ some pre-processing).
To use the Ollama client, which is one way to corrupt sentences randomly, you need to install https://ollama.com, and run the server.
Distributed under the terms of the MIT license.