Package dataprocessor :: Package input :: Module taraxureader :: Class TaraXUReader
[hide private]
[frames] | no frames]

Class TaraXUReader

source code

  genericreader.GenericReader --+        
                                |        
genericxmlreader.GenericXmlReader --+    
                                    |    
                  xmlreader.XmlReader --+
                                        |
                                       TaraXUReader

classdocs

Instance Methods [hide private]
 
__init__(self, inputFilename)
Constructor.
source code
 
get_parallelsentences(self, start=None, end=None)
Returns: a list of ParallelSentence objects
source code

Inherited from xmlreader.XmlReader: get_tags

Inherited from genericxmlreader.GenericXmlReader: get_attributes, get_parallelsentence, length, load, load_str, split_and_write

Inherited from genericreader.GenericReader: get_dataset, unload

Method Details [hide private]

__init__(self, inputFilename)
(Constructor)

source code 

Constructor. Creates an XML object that handles the XML

Parameters:
  • input_xml_filename - the name of XML file
  • load - by turning this option to false, the instance will be initialized without loading everything into memory
Overrides: genericreader.GenericReader.__init__

get_parallelsentences(self, start=None, end=None)

source code 

Returns the contents of the parsed file into an a list with ParallelSentence objects. Note that this will cause all the data of the file to be loaded into system memory at once. For big data sets this may not be optimal, so consider sentence-by-sentence reading with SAX or CElementTree (e.g. saxjcml.py) @return the list of parallel sentences @rtype [ParallelSentence, ...]

Returns:
a list of ParallelSentence objects
Overrides: genericreader.GenericReader.get_parallelsentences