Package dataprocessor :: Package input :: Module genericxmlreader :: Class GenericXmlReader
[hide private]
[frames] | no frames]

Class GenericXmlReader

source code

genericreader.GenericReader --+
                              |
                             GenericXmlReader
Known Subclasses:

classdocs

Instance Methods [hide private]
 
__init__(self, input_xml_filename, load=True, stringmode=False, **kwargs)
Constructor.
source code
 
get_tags(self) source code
 
load_str(self, input) source code
 
load(self)
Loads the data of the file into memory.
source code
 
split_and_write(self, parts, re_split)
Convenience function that splits an XML file into parts and writes them directly to the disk into .part files with similar filenames.
source code
 
get_attributes(self)
@return a list of the names of the attributes contained in the XML file
source code
 
length(self) source code
 
get_parallelsentence(self, xml_entry) source code
 
get_parallelsentences(self, start=None, end=None)
Returns: a list of ParallelSentence objects
source code
 
_read_simplesentence(self, xml_entry) source code
 
_read_string(self, xml_entry) source code
 
_read_attributes(self, xml_entry)
Returns: a dictionary of the attributes of the current sentence {name:value}
source code

Inherited from genericreader.GenericReader: get_dataset, unload

Method Details [hide private]

__init__(self, input_xml_filename, load=True, stringmode=False, **kwargs)
(Constructor)

source code 

Constructor. Creates an XML object that handles ranking file data

Parameters:
  • input_xml_filename (string) - the name of XML file
  • load (boolean) - by turning this option to false, the instance will be initialized without loading everything into memory
Overrides: genericreader.GenericReader.__init__

load(self)

source code 

Loads the data of the file into memory. It is useful if the Classes has been asked not to load the filename upon initialization

Overrides: genericreader.GenericReader.load

split_and_write(self, parts, re_split)

source code 

Convenience function that splits an XML file into parts and writes them directly to the disk into .part files with similar filenames. The construction of the resulting filenames defined by parameters @param parts Number of parts to split into @type int @param re_split Regular expression which should define two (bracketed) groups upon the filename. The resulting files will have the part number inserted in the filename between these two parts

get_parallelsentence(self, xml_entry)

source code 
Overrides: genericreader.GenericReader.get_parallelsentence

get_parallelsentences(self, start=None, end=None)

source code 

Returns the contents of the parsed file into an a list with ParallelSentence objects. Note that this will cause all the data of the file to be loaded into system memory at once. For big data sets this may not be optimal, so consider sentence-by-sentence reading with SAX or CElementTree (e.g. saxjcml.py) @return the list of parallel sentences @rtype [ParallelSentence, ...]

Returns:
a list of ParallelSentence objects
Overrides: genericreader.GenericReader.get_parallelsentences

_read_attributes(self, xml_entry)

source code 
Returns:
a dictionary of the attributes of the current sentence {name:value}