[mary-users] Voice-Import F0PolynomialFeatureFileWriter Errors

Ingmar Steiner ingmar.steiner at dfki.de
Mon Oct 25 11:40:58 CEST 2010


Dear Margot,

thank you for the lab files! Overall, I can see a number of things wrong with your files; in no particular order:

* your text files contain non-breaking spaces... This would have caused issues with Mary versions earlier than r2595 (pre-4.1.0)

* your label files have significant problems:

  - they contain numerous labels not in the "de" allophone set. It is as if you had created them with some autolabeler which mapped phones like "n" to "n1" to avoid filesystem issues with case folding (i.e. file "n" would have clashed with file "N"). Additionally, the phone sequence "p f" should be collapsed into "pf" for Mary.

  - rec_065.lab and rec_138.lab seem to correspond to completely different prompts than rec_065.txt and rec_138.txt, respectively.

And most significantly:

  - all of your lab files are, during large portions, completely wrong! =(

If I try to build a voice with your data, I can indeed reproduce your issue with the NPE in F0PolynomialFeatureFileWriter#getInterpolatedLogF0Contour. But the cause of this is much further upstream, in that the timing of your labels is 75% (arbitrary value) garbage.

I'm afraid we don't have any automatic warnings for this kind of fundamental problem in place, so for now, it is entirely the voice-builder's responsibility to inspect and ensure the correctness of all label files before proceeding.

Just out of curiosity, how did you obtain these lab files?

Best wishes,

/**
 * Ingmar Steiner
 * Researcher, Language Technology
 * German Research Center for Artificial Intelligence
 *
 * Campus D3 1 +1.18
 * D-66123 Saarbrücken
 * Germany
 * Phone: ++49-681-857-75-5263 (NEW!)
 * Email: ingmar.steiner at dfki.de
 *
 * Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
 * Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
 * Geschäftsführung:
 * Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
 * Dr. Walter Olthoff
 * Vorsitzender des Aufsichtsrats:
 * Prof. Dr. h.c. Hans A. Aukes
 * Amtsgericht Kaiserslautern, HRB 2313
 */

On 22 Oct 2010, at 16:35, Margot Mieskes wrote:

> Dear Ingmar,
> 
> please find attached the labfiles for the files I already sent to you. 
> 
> I did not have time to check the encoding issue yet, but I will do so as soon as possible. 
> 
> Please be aware, that I will not be in the office next week, so most likely I will not be able to react to any messages. I will be back 02. November. 
> 
> Greetings,
> Margot.
> -------- Original-Nachricht --------
>> Datum: Wed, 20 Oct 2010 14:33:04 +0200
>> Von: Ingmar Steiner <ingmar.steiner at dfki.de>
>> An: Margot Mieskes <Margot.Mieskes at gmx.net>
>> Betreff: Re: [mary-users] Voice-Import F0PolynomialFeatureFileWriter Errors
> 
>> Dear Margot,
>> 
>> thank you for supplying some of your data. I've identified one bug, which
>> may be responsible for some or all of your problems:
>> 
>> http://mary.opendfki.de/ticket/349
>> 
>> By recoding your text files from cp1252 to utf8, umlauts and suchlike are
>> properly handled; previously, the prompt_allophones/*.xml were created with
>> missing segments (and malformed syllables).
>> 
>> Could you please check whether this might have occurred in your voice
>> build as well?
>> 
>> Apart from that, I will need your lab/*.lab files to do further testing...
>> Could you please send me those as well?
>> 
>> Thanks, and best wishes,
>> 
>> /**
>> * Ingmar Steiner
>> * Researcher, Language Technology
>> * German Research Center for Artificial Intelligence
>> *
>> * Campus D3 1 +1.18
>> * D-66123 Saarbrücken
>> * Germany
>> * Phone: ++49-681-857-75-5263 (NEW!)
>> * Email: ingmar.steiner at dfki.de
>> *
>> * Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
>> * Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
>> * Geschäftsführung:
>> * Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>> * Dr. Walter Olthoff
>> * Vorsitzender des Aufsichtsrats:
>> * Prof. Dr. h.c. Hans A. Aukes
>> * Amtsgericht Kaiserslautern, HRB 2313
>> */
>> 
>> On 19 Oct 2010, at 16:09, Margot Mieskes wrote:
>> 
>>> Dear Ingmar,
>>> 
>>> sorry for not reacting earlier, but I had been busy these past few days.
>>> 
>>> I'll attach two archives to this e-Mail. FilesOK contains files that
>> work ok in the voice import. FilesNok contains files that did not work in the
>> voice import. Maybe you can find any difference in them or any other reason
>> why the F0-step did not work with them. Please do not distribute them or
>> do anything else - I don't know the exact legal phrase for this issue, but
>> as far as I know I'm not allowed to give these files away.
>>> I found out about the problematic files by removing them from the
>> basenames.lst files and adding them one by one. If the step failed, I removed
>> that file again - as mentioned in my previous message. 
>>> 
>>> Concerning the Mary version I'm using: I downloaded Mary 4.1.1 and only
>> used that for starting voiceimport, mary, component installer etc. I used
>> the link provided here:
>>> 
>>> http://www.dfki.de/pipermail/mary-users/2010-September/000576.html
>>> 
>>> Btw. I managed to solve the sample rate problem, the way you suggested
>> to do.
>>> 
>>> I hope this will help you to reproduce the problems I have been
>> encountering and if there are more things I can help you with, just let me know.
>>> 
>>> Greetings,
>>> Margot.
>>> 
>>> -------- Original-Nachricht --------
>>>> Datum: Fri, 8 Oct 2010 11:16:37 +0200
>>>> Von: Ingmar Steiner <ingmar.steiner at dfki.de>
>>>> An: Margot Mieskes <Margot.Mieskes at gmx.net>
>>>> CC: mary-users <mary-users at dfki.de>
>>>> Betreff: Re: [mary-users] Voice-Import F0PolynomialFeatureFileWriter
>> Errors
>>> 
>>>> Dear Margot,
>>>> 
>>>> On 8 Oct 2010, at 10:46, Margot Mieskes wrote:
>>>> 
>>>>> Hi Ingmar,
>>>>> 
>>>>> first of all, I managed to eliminate the faulty files and successfully
>>>> continued with the VoiceImport procedure.
>>>> 
>>>> Glad you were able to work around the previous problem. If you've
>>>> identified the utterance that triggered the issue, would it be possible
>> to provide
>>>> it to us, so that we can reproduce the errors and ensure that our
>> bugfix
>>>> solves them?
>>>> 
>>>>> 
>>>>> When I tried to run the JoinCostFileMaker I get the following error
>>>> message:
>>>>> 
>>>>> Cannot currently deal with different sample rates in unit and mcep
>>>> files. 
>>>>> 
>>>>> So I checked the files which are compared (MCEP-timeline and
>>>> halfphone-features) and both had been created using the reduced
>> dataset. So I'm a bit
>>>> confused that there should be different sample rates as they had been
>>>> produced in one "session". 
>>>> 
>>>> This seems to be a bug in MCepTimelineMaker; that component does not
>> honor
>>>> the sampling rate set in the database.config and instead takes it
>> directly
>>>> from the first wav file. Before I fix this, could you please confirm
>>>> whether your wav files are indeed sampled at a different rate than the
>> value of
>>>> the "db.samplingrate" property in your database.config (which should be
>>>> 16000)?
>>>> 
>>>>> 
>>>>> Nevertheless, I continued with the procedure - I just tried to find
>>>> about subsequent errors.
>>>>> 
>>>>> At the end I get the message to run the voice-component installer to
>>>> install the voice, which I did. At the top of the GUI of the
>> voice-component
>>>> installer it says 
>>>>> 
>>>>> Download languages and voices from
>>>> http://mary.dfki.de/download/4.1.1/mary-components.xml
>>>>> 
>>>>> and also lists my voice. But it has not been uploaded in any way to
>> your
>>>> servers, right?
>>>> 
>>>> No, the VoicePackager simply creates the zip file containing all of the
>>>> voice data files, and the component description file, in your
>>>> MARYBASE/download directory. The InstallerGUI does not distinguish
>> between components that
>>>> were downloaded from our servers and others that were put into the
>>>> download directory through other means (e.g. VoicePackager, manual
>> copying).
>>>> 
>>>>> 
>>>>> So when I start the maryserver (I'm fully aware that the
>>>> JoinCostFeatures are not written), I get the following error message:
>>>>> 
>>>>> Dependency problem - Component 'marybase' is required by 'voice'm, but
>>>> version number 4.1.1 is required, and component 'marybase' provides
>> version
>>>> 4.1.0. Try running the MARY component installer to resolve this
>> problem. 
>>>>> 
>>>>> But actually the de language component is version 4.1.1 - at least the
>>>> details in the voice component installer says so.
>>>>> 
>>>>> So I am somewhat confused by the various errors. I hope you can help
>> me.
>>>> If I'd better address this issue to the list, just let me know.
>>>> 
>>>> This is an unfortunate source of user frustration. Please make sure
>> that
>>>> you are running the same Mary version that you used to build the voice.
>>>> (Note that the "db.marybaseversion" property in database.config is
>> *not*
>>>> automatically updated if you switch Mary versions.) You can also "hack"
>> the
>>>> dependency by editing the voice config file, please see
>>>> http://www.dfki.de/pipermail/mary-users/2010-September/000621.html for
>> details.
>>>> 
>>>> Best wishes,
>>>> 
>>>> /**
>>>> * Ingmar Steiner
>>>> * Researcher, Language Technology
>>>> * German Research Center for Artificial Intelligence
>>>> *
>>>> * Campus D3 1 +1.18
>>>> * D-66123 Saarbrücken
>>>> * Germany
>>>> * Phone: ++49-681-857-75-5263 (NEW!)
>>>> * Email: ingmar.steiner at dfki.de
>>>> *
>>>> * Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
>>>> * Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
>>>> * Geschäftsführung:
>>>> * Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>>> * Dr. Walter Olthoff
>>>> * Vorsitzender des Aufsichtsrats:
>>>> * Prof. Dr. h.c. Hans A. Aukes
>>>> * Amtsgericht Kaiserslautern, HRB 2313
>>>> */
>>>> 
>>>>> 
>>>>> Many thanks and sorry for bothering you personally,
>>>>> Margot.
>>>>> -- 
>>>>> Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
>>>>> Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail
>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> GMX DSL Doppel-Flat ab 19,99 &euro;/mtl.! Jetzt auch mit 
>>> gratis Notebook-Flat! http://portal.gmx.net/de/go/dsl
>>> <FilesOK.tar.gz><FilesNok.tar.gz>
>> 
> 
> -- 
> GRATIS! Movie-FLAT mit über 300 Videos. 
> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome
> <labFiles.tar.gz>



More information about the Mary-users mailing list