Skip to content

alan-turing-institute/ARC-m4st

Repository files navigation

Metrics for Speech Translation (M4ST)

Actions Status

Evaluation of metrics for Speech Translation

Installation

From source:

git clone https://github.com/alan-turing-institute/ARC-M4ST
cd ARC-M4ST
python -m pip install .

Usage

Compiling notes

brew install typst

typst compile notes.typ

CallHome Dataset

Go https://ca.talkbank.org/access/CallHome, select the conversation language, create account, then you can download the "media folder". There you can find the .cha files, which contain the transcriptions.

To load the transcriptions as a bag of sentences, use m4st.parse.TranscriptParser.from_folder to load all conversation lines. This class does not group them by participant, or conversation - it just loads every line as an entry to a list (+ some pre-processing).

Ollama

To use the Ollama client, which is one way to corrupt sentences randomly, you need to install https://ollama.com, and run the server.

License

Distributed under the terms of the MIT license.

About

Evaluating metrics for speech translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages