[mary-dev] testing a NLP component

Thu Apr 28 08:53:52 CEST 2011

Hi Florent,

good point, thanks for asking it. Since this is about developing, let's 
move this discussion to the mary-dev list.

First of all, I now think as much testing as possible should be done 
automatically, and on a continuous basis. This way you can let the 
machine verify, from now on until the end of time, that what was working 
once is still working.

Conceptually one can distinguish two types of testing:

- "unit" testing, which automatically exercises a small piece of code 
and asserts that, e.g., a method behaves as expected -- reacts to the 
different kinds of possible input in the expected ways, throws 
exceptions as promised in the javadoc, etc.

- "integration" testing, which automatically verifies whether the 
processing carried out by a subsystem yields the expected result.

My "rule of thumb" test to distinguish one from the other is, do I need 
to start up the MARY system (Mary.startup()) in order to run the test? 
If so, I think it is an integration test, otherwise I treat it as a unit 
test. It's a simplifying approach, but useful.

Practically the difference between the two methods may not be so big for 
you when getting started; the key issue is getting started about writing 
tests at all.

The tool we use in MARY is junit 4. You can find some examples of tests 
(not many yet, but that is going to change over the next few years I 
hope) here:

- example of a unit test:
http://mary.opendfki.de/browser/branches/fr-branch/java/marytts/tests/junit4/ByteStringTranslatorTest.java

- example of an integration test:
http://mary.opendfki.de/browser/branches/fr-branch/java/marytts/tests/junit4/RequestTest.java

You can run all tests using "ant test" from the command line; to run a 
single test, right-click in Eclipse on the class and select "run as 
Junit test". If it is an integration test (i.e. needs to start up mary), 
it will fail until you have provided -Dmary.base=... and probably -Xmx1g 
or so in the VM arguments of the run target.

Now, to test your own code, all you need to do is to instantiate your 
module, send it data from the JUnit test method, and automatically 
compare the result with the expected result. I have tried to simplify 
this step for MaryModules somewhat by providing a base class, 
marytts.tests.modules.MaryModuleTestCase which you can extend.

See java/marytts/tests/junit4/language/de/JTokeniserTest.java for an 
example (I confess it fails, which should never happen; I will fix this 
but not now).

I hope this can get you started.

Best regards,
Marc

On 27.04.11 16:22, fxavier at ircam.fr wrote:
> Hi all,
>
> I'm trying to build NLP for french.
>
> Is there a way to test my .java (preprocessing) without coding all the
> NLPs, and of course without following the support for new language (that
> requires all the NLP ready and is pretty long)?
>
> By testing, I mean giving a simple text as input, and see the output if
> the preprocessing part is good. I would like to test whether my code is
> correct or not before going any further.
>
> Thanks in advance,
>
>
>
>

-- 
Dr. Marc Schröder, Senior Researcher at DFKI GmbH
Project leader for DFKI in SSPNet http://sspnet.eu
Team Leader DFKI TTS Group http://mary.dfki.de
Editor W3C EmotionML Working Draft http://www.w3.org/TR/emotionml/
Portal Editor http://emotion-research.net

Homepage: http://www.dfki.de/~schroed
Email: marc.schroeder at dfki.de
Phone: +49-681-85775-5303
Postal address: DFKI GmbH, Campus D3_2, Stuhlsatzenhausweg 3, D-66123 
Saarbrücken, Germany
--
Official DFKI coordinates:
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313