Package dataprocessor :: Package input :: Module linereader :: Class LineReader
[hide private]
[frames] | no frames]

Class LineReader

source code

genericreader.GenericReader --+
                              |
                             LineReader

Reads and combines strings from one-sentence-per-line data

Instance Methods [hide private]
 
__init__(self, source_filename, submission_filenames, langpair, testset, pattern_name='')
Constructor.
source code
 
get_parallelsentences(self)
Returns the contents of the parsed file into an a list with ParallelSentence objects.
source code

Inherited from genericreader.GenericReader: get_dataset, get_parallelsentence, load, unload

Method Details [hide private]

__init__(self, source_filename, submission_filenames, langpair, testset, pattern_name='')
(Constructor)

source code 

Constructor. Creates a memory object that handles file data

Parameters:
  • source_filename (str) - Name of file containing source sentences, one sentence per line
  • submission_filenames (str) - List of files containing MT system output corresponding with the source file, one sentence per line. The filename of each file will be used for extracting the 'system' attribute for its imported sentences (see \pattern_name below)
  • langpair (str) - A string containing the language codes of the the language pair, source-target e.g.: de-en or en-fr
  • testset (str) - The name of the data set, e.g: testset2011
  • pattern_name - A regular expression which contains a bracketed pattern for extracting the system name out of the filename. If empty, the entire filename will be used as a system name
Overrides: genericreader.GenericReader.__init__

get_parallelsentences(self)

source code 

Returns the contents of the parsed file into an a list with ParallelSentence objects. Note that this will cause all the data of the file to be loaded into system memory at once. For big data sets this may not be optimal, so consider sentence-by-sentence reading with SAX or CElementTree (e.g. saxjcml.py) @return the list of parallel sentences @rtype [ParallelSentence, ...]

Overrides: genericreader.GenericReader.get_parallelsentences
(inherited documentation)