Skip to content

Align word sequences and calculate metrics like word error rate (WER)

License

Notifications You must be signed in to change notification settings

romanows/WordSequenceAligner

Repository files navigation

Overview

WordSequenceAligner is a Java class that aligns two string sequences
and calculates metrics such as word error rate (WER). Pretty-printing
enables human-readable logging of alignments and metrics.

This class is intended to reproduce the main functionality of the
NIST sclite tool. The Sphinx 4 source for the class
edu.cmu.sphinx.util.NISTAlign was referenced when writing the
WordSequenceAligner code.

Feedback and bugfixes are welcomed.

Brian Romanowski
[email protected]

Details

This code is licensed under one of the BSD variants, please see
LICENSE.txt for full details.

Example

WordSequenceAligner werEval = new WordSequenceAligner();
String [] ref = "the quick brown cow jumped over the moon".split(" ");
String [] hyp = "quick brown cows jumped way over the moon dude".split(" ");
Alignment a = werEval.align(ref, hyp);
System.out.println(a);

Produces the output:

        # seq  # ref   # hyp   # cor   # sub   # ins   # del   acc     WER     # seq cor
STATS:  1      8       9       6       1       2       1       0.75    0.5     0
-----   -----  -----   -----   -----   -----   -----   -----   -----   -----   -----	
REF:    THE    quick   brown   COW     jumped  ***     over    the     moon    ****
HYP:    ***    quick   brown   COWS    jumped  WAY     over    the     moon    DUDE

Where the top portion of the output are the statistics for the given
pair of reference/hypothesis sentences, and the lower portion
displays the alignment, visually.

About

Align word sequences and calculate metrics like word error rate (WER)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages