Package featuregenerator :: Package bleu :: Module bleu

Module bleu

Author: Eleftherios Avramidis

Note: Modified copy from Hieu Hoang's code for Moses Project

Provides: cook_refs(refs, n=4): Transform a list of reference sentences as strings into a form usable by cook_test(). cook_test(test, refs, n=4): Transform a test sentence as a string (together with the cooked reference sentences) into a form usable by score_cooked(). score_cooked(alltest, n=4): Score a list of cooked test sentences.

score_set(s, testid, refids, n=4): Interface with dataset.py; calculate BLEU score of testid against refids.

The reason for breaking the BLEU computation into three phases cook_refs(), cook_test(), and score_cooked() is to allow the caller to calculate BLEU scores for multiple test sets as efficiently as possible.

Functions

[hide private]

normalize(s)
Normalize and tokenize text.

source code

count_ngrams(words, n=4)

source code

cook_refs(refs, n=4)
Takes a list of reference sentences for a single segment and returns an object that encapsulates everything that BLEU needs to know about them.

source code

cook_test(test, (reflens, refmaxcounts), n=4)
Takes a test sentence and returns an object that encapsulates everything that BLEU needs to know about it.

source code

score_cooked(allcomps, n=4)

source code

smoothed_score_cooked(allcomps, n=4)

source code

smoothed_score_sentence(translation, references, n=4)
Provides the single-sentence BLEU score for one sentence, given n references

source code

score_sentence(translation, references, n=4)
Provides the single-sentence BLEU score for one sentence, given n references

source code

score_sentences(sentence_tuples, n=4)
Provides BLEU calculation for many sentences.

source code

score_multitarget_sentences(sentence_tuples, n=4)

source code

Variables

[hide private]

nonorm = 0

preserve_case = False

eff_ref_len = 'shortest'

normalize1 = [(re.compile(r'<skipped>'), ''), (re.compile(r'-\...

normalize2 = [(re.compile(r'([\{-~\[-` -&\(-\+:-@/])'), ' \\1 ...

__package__ = 'featuregenerator.bleu'

pattern = '([0-9])(-)'

replace = '\\1 \\2 '

Function Details

[hide private]

normalize(s)

source code

Normalize and tokenize text. This is lifted from NIST mteval-v11a.pl.

smoothed_score_sentence(translation, references, n=4)

source code

Provides the single-sentence BLEU score for one sentence, given n references

Parameters:

translation (str) - Translation text that needs to be evaluated
references ([str, ...]) - List of reference translations to be used for the evaluation

score_sentence(translation, references, n=4)

source code

Provides the single-sentence BLEU score for one sentence, given n references

Parameters:

translation (str) - Translation text that needs to be evaluated
references ([str, ...]) - List of reference translations to be used for the evaluation

score_sentences(sentence_tuples, n=4)

source code

Provides BLEU calculation for many sentences.

Parameters:

sentence_tuples ([tuple(str(translation), [str(reference), ...]), ...]) - a list of tuples generated out of the translated sentences. Each tuple should contain one translated sentence and its list of references.

Variables Details

[hide private]

normalize1

Value:

[(re.compile(r'<skipped>'), ''),
 (re.compile(r'-\n'), ''),
 (re.compile(r'\n'), ' ')]

normalize2

Value:

[(re.compile(r'([\{-~\[-` -&\(-\+:-@/])'), ' \\1 '),
 (re.compile(r'([^0-9])([\.,])'), '\\1 \\2 '),
 (re.compile(r'([\.,])([^0-9])'), ' \\1 \\2'),
 (re.compile(r'([0-9])(-)'), '\\1 \\2 ')]