[mary-dev] testing a NLP component

Tue May 3 08:49:52 CEST 2011

Hi Florent,

good start, but a few things need fixing.

First, in order to test-run a MARY module, for most MARY modules you 
need to first start the entire MARY system. Also, for log4j to work 
properly you need to configure it; so you need something like:

@BeforeClass
public static void startMARY() throws Exception {
     if (Mary.currentState() == Mary.STATE_OFF) {
         Mary.startup();
     }
     if (!MaryUtils.isLog4jConfigured()) {
         BasicConfigurator.configure();
     }
}

This will avoid the warning message you got from log4j:

 > log4j:WARN No appenders could be found for logger (marytts.IO).
 > log4j:WARN Please initialize the log4j system properly.

Second, you are trying to set plain text for input format TOKENS. But 
TOKENS is a RAWMARYXML format, so when the XML parser tries to make 
sense of your data, it fails:

 > [Fatal Error] :1:1: Content is not allowed in prolog.

The way to avoid this is to enter proper XML data, either using 
md.setData() or using md.readFrom(). I'll describe below how I usually 
do this.

Third, you are blinding yourself by discarding the helpful exception. 
NEVER EVER do something like this:

 >          try
 >          {
 >              pr.process(md);
 >          }
 >          catch (Exception e)
 >          {
 >              return;
 >          }

Take a look at some slides where I put down what I think are good 
practices regarding exception handling:
http://mary.opendfki.de/repos/trunk/doc/ErrorHandling.pdf

Fourth, the test should verify that the processing result matches 
expectations. A complex example like "M., 06.67.21.05.41, #, 423 Km, 30 
€. 20h14." will be difficult for this purpose; maybe here the expected 
state is not a specific outcome, but rather the fact that the module can 
process this without crashing at all!

One way how to write a test such that it is clean and readable is to 
write it backwards, starting with the verification... for example:

@Test
public void canProcessWildStuff() throws Exception {

   ...
   // verify expected result:
   assertNotNull(result);
}

Then, how did we get there:

@Test
public void canProcessWildStuff() throws Exception {

...
   // exercise system under test:
   MaryData result = preprocessor.process(input);
   // verify expected result:
   assertNotNull(result);
}

and finally, set up the system under test:

@Test
public void canProcessWildStuff() throws Exception {
   // Set up system under test:
   MaryModule preprocessor = 
ModuleRegistry.getModule(marytts.language.de.Preprocess.class);
   MaryData input = new MaryData(MaryDataType.TOKENS, Locale.GERMAN);
   input.readFrom(this.getClass().getResourceAsStream("wildStuff.tokens"));
   // exercise system under test:
   MaryData result = preprocessor.process(input);
   // verify expected result:
   assertNotNull(result);
}

In this example, you see that I am reading the XML document to test from 
a classpath resource, in the same package as the test class, called 
"wildStuff.tokens". How to create that document? Well, assume we have 
components that can create this input format for the target language. 
For German, this is the case for the public demo, so we can go to 
http://mary.dfki.de:59125/documentation.html#synthesis and enter into 
the GET example:
INPUT_TEXT: M., 06.67.21.05.41, #, 423 Km, 30 €. 20h14.
INPUT_TYPE: TEXT
OUTPUT_TYPE: TOKENS
LOCALE: de
... and click "Submit Query". If that works, you should get a response 
containing your test document (see below); copy that into file 
"wildStuff.tokens" in the test package.

Hope this helps. I know it's complex, but I think it's worth doing these 
things properly. It will pay off after the initial investment, I think.

Good luck!

Best,
Marc

So here is what I get from the above request, i.e. from
http://mary.dfki.de:59125/process?INPUT_TEXT=M.%2C+06.67.21.05.41%2C+%23%2C+423+Km%2C+30+%E2%82%AC.+20h14.&INPUT_TYPE=TEXT&OUTPUT_TYPE=TOKENS&LOCALE=de&AUDIO=WAVE_FILE

<?xml version="1.0" encoding="UTF-8"?>
<maryxml xmlns="http://mary.dfki.de/2002/MaryXML" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.5" 
xml:lang="de">
<p>
<s>
<t>
M.
</t>
<t>
,
</t>
<t>
06.67.21.05.41
</t>
<t>
,
</t>
<t>
#
</t>
<t>
,
</t>
<t>
423
</t>
<t>
Km
</t>
<t>
,
</t>
<t>
30
</t>
<t>
?
</t>
<t>
.
</t>
</s>
<s>
<t>
20h14
</t>
<t>
.
</t>
</s>
</p>
</maryxml>

This is roughly what TOKENS input data should look like.

On 02.05.11 16:00, fxavier at ircam.fr wrote:
> Hi everybody,
>
> First of all, thanks Marc, great explanation, very useful. So, for
> example, if I want to test the method "process" in the class "Preprocess",
> package "mary.tts.language.de.*", should be something like this:
>
>
>
> package marytts.tests.junit4;
>
> import static org.junit.Assert.*;
>
> import java.io.IOException;
> import java.util.Locale;
>
> import javax.xml.parsers.ParserConfigurationException;
>
> import marytts.datatypes.MaryData;
> import marytts.datatypes.MaryDataType;
> import marytts.language.de.*;
>
> import org.junit.Test;
> import org.xml.sax.SAXException
>
>
> public class preprocessTest {
>
>      @Test
>      public void testProcess() throws ParserConfigurationException,
> SAXException, IOException
>      {
>          //Text to pre-process
>          String plainText = new String("M., 06.67.21.05.41, #, 423 Km, 30
> €. 20h14.");
>
>          //Input and output declarations
>          MaryData md = new MaryData (MaryDataType.TOKENS, Locale.GERMAN);
>          MaryData result = new MaryData(MaryDataType.WORDS, Locale.GERMAN);
>          //end declarations
>
>          md.setData(plainText);
>          Preprocess pr = new Preprocess();
>
>          try
>          {
>              pr.process(md);
>          }
>          catch (Exception e)
>          {
>              return;
>          }
>
>          fail("Exception, ain'it?");
>      }
>
>
>
> }
>
>
>
> Basically, what this code should do is to give as input a MaryData to the
> method. The aim is to preprocess the string "plainText", though I don't
> know how to tell this to the MaryData, since it doesn't take Strings as
> arguments... Anyway, the junit test fails and I get this message:
>
> log4j:WARN No appenders could be found for logger (marytts.IO).
> log4j:WARN Please initialize the log4j system properly.
> [Fatal Error] :1:1: Content is not allowed in prolog.
>
> ...which I don't understand!
>
> any ideas?
>
>
> Thanks,
> Florent
>
>
>> Hi Florent,
>>
>> good point, thanks for asking it. Since this is about developing, let's
>> move this discussion to the mary-dev list.
>>
>> First of all, I now think as much testing as possible should be done
>> automatically, and on a continuous basis. This way you can let the
>> machine verify, from now on until the end of time, that what was working
>> once is still working.
>>
>> Conceptually one can distinguish two types of testing:
>>
>> - "unit" testing, which automatically exercises a small piece of code
>> and asserts that, e.g., a method behaves as expected -- reacts to the
>> different kinds of possible input in the expected ways, throws
>> exceptions as promised in the javadoc, etc.
>>
>> - "integration" testing, which automatically verifies whether the
>> processing carried out by a subsystem yields the expected result.
>>
>>
>> My "rule of thumb" test to distinguish one from the other is, do I need
>> to start up the MARY system (Mary.startup()) in order to run the test?
>> If so, I think it is an integration test, otherwise I treat it as a unit
>> test. It's a simplifying approach, but useful.
>>
>> Practically the difference between the two methods may not be so big for
>> you when getting started; the key issue is getting started about writing
>> tests at all.
>>
>> The tool we use in MARY is junit 4. You can find some examples of tests
>> (not many yet, but that is going to change over the next few years I
>> hope) here:
>>
>> - example of a unit test:
>> http://mary.opendfki.de/browser/branches/fr-branch/java/marytts/tests/junit4/ByteStringTranslatorTest.java
>>
>> - example of an integration test:
>> http://mary.opendfki.de/browser/branches/fr-branch/java/marytts/tests/junit4/RequestTest.java
>>
>> You can run all tests using "ant test" from the command line; to run a
>> single test, right-click in Eclipse on the class and select "run as
>> Junit test". If it is an integration test (i.e. needs to start up mary),
>> it will fail until you have provided -Dmary.base=... and probably -Xmx1g
>> or so in the VM arguments of the run target.
>>
>>
>>
>> Now, to test your own code, all you need to do is to instantiate your
>> module, send it data from the JUnit test method, and automatically
>> compare the result with the expected result. I have tried to simplify
>> this step for MaryModules somewhat by providing a base class,
>> marytts.tests.modules.MaryModuleTestCase which you can extend.
>>
>> See java/marytts/tests/junit4/language/de/JTokeniserTest.java for an
>> example (I confess it fails, which should never happen; I will fix this
>> but not now).
>>
>> I hope this can get you started.
>>
>>
>> Best regards,
>> Marc
>>
>>
>>
>> On 27.04.11 16:22, fxavier at ircam.fr wrote:
>>> Hi all,
>>>
>>> I'm trying to build NLP for french.
>>>
>>> Is there a way to test my .java (preprocessing) without coding all the
>>> NLPs, and of course without following the support for new language (that
>>> requires all the NLP ready and is pretty long)?
>>>
>>> By testing, I mean giving a simple text as input, and see the output if
>>> the preprocessing part is good. I would like to test whether my code is
>>> correct or not before going any further.
>>>
>>> Thanks in advance,
>>>
>>>
>>>
>>>
>>
>> --
>> Dr. Marc Schröder, Senior Researcher at DFKI GmbH
>> Project leader for DFKI in SSPNet http://sspnet.eu
>> Team Leader DFKI TTS Group http://mary.dfki.de
>> Editor W3C EmotionML Working Draft http://www.w3.org/TR/emotionml/
>> Portal Editor http://emotion-research.net
>>
>> Homepage: http://www.dfki.de/~schroed
>> Email: marc.schroeder at dfki.de
>> Phone: +49-681-85775-5303
>> Postal address: DFKI GmbH, Campus D3_2, Stuhlsatzenhausweg 3, D-66123
>> Saarbrücken, Germany
>> --
>> Official DFKI coordinates:
>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>> Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
>> Geschaeftsfuehrung:
>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>> Dr. Walter Olthoff
>> Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
>> Amtsgericht Kaiserslautern, HRB 2313
>>
>>
>
> _______________________________________________
> Mary-dev mailing list
> Mary-dev at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev

-- 
Dr. Marc Schröder, Senior Researcher at DFKI GmbH
Project leader for DFKI in SSPNet http://sspnet.eu
Team Leader DFKI TTS Group http://mary.dfki.de
Editor W3C EmotionML Working Draft http://www.w3.org/TR/emotionml/
Portal Editor http://emotion-research.net

Homepage: http://www.dfki.de/~schroed
Email: marc.schroeder at dfki.de
Phone: +49-681-85775-5303
Postal address: DFKI GmbH, Campus D3_2, Stuhlsatzenhausweg 3, D-66123 
Saarbrücken, Germany
--
Official DFKI coordinates:
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313