[mary-users] HMMVoiceMakeData - PhoneFeatureLabelAligner

Sathish Chandra Pammi satish.iiit at gmail.com
Wed Aug 3 12:11:42 CEST 2011


Dear Florent,

Have you seen any command line output where the exact misalignment is
happening? I mean phone level mismatch.

There is one quick alternative, but I knew it is a bad solution.
That is removing/discarding those 8 basenames from basenames list. So that
these utterances will be discarded from voicebuilding process.

Best regards,
Sathish

On Tue, Aug 2, 2011 at 1:54 PM, <fxavier at ircam.fr> wrote:

> Dear all,
>
> I've corrected the french phonemiser, so there is no silent at the
> beginning of the transcription. Unfortunately I still have 2991/2999
> misalignment problem. Here's for example the first sentence, here's the
> allophone xml and the lab file:
>
>
>
> <?xml version="1.0" encoding="UTF-8" standalone="no"?><maryxml
> xmlns="http://mary.dfki.de/2002/MaryXML"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.4"
> xml:lang="fr">
> <p>
> <s>
> <prosody pitch="+5%" range="+20%">
> <phrase>
> <boundary breakindex="3"/><t ph="i l" pos="[PPER3MS]">
> il
> <syllable ph="i l">
> <ph p="i"/>
> <ph p="l"/>
> </syllable>
> </t>
> <t ph="s a v E" pos="[V3S]">
> savait
> <syllable ph="s a v E">
> <ph p="s"/>
> <ph p="a"/>
> <ph p="v"/>
> <ph p="E"/>
> </syllable>
> </t>
> <t ph="k @" pos="[COSUB]">
> que
> <syllable ph="k @">
> <ph p="k"/>
> <ph p="@"/>
> </syllable>
> </t>
> <t ph="l @" pos="[DETMS]">
> le
> <syllable ph="l @">
> <ph p="l"/>
> <ph p="@"/>
> </syllable>
> </t>
> <t ph="p 9 p l @" pos="[NMS]">
> peuple
> <syllable ph="p 9 p l @">
> <ph p="p"/>
> <ph p="9"/>
> <ph p="p"/>
> <ph p="l"/>
> <ph p="@"/>
> </syllable>
> </t>
> <t ph="l @" pos="[PPOBJMS]">
> le
> <syllable ph="l @">
> <ph p="l"/>
> <ph p="@"/>
> </syllable>
> </t>
> <t ph="s u t @ n E" pos="[V3S]">
> soutenait
> <syllable ph="s u t @ n E">
> <ph p="s"/>
> <ph p="u"/>
> <ph p="t"/>
> <ph p="@"/>
> <ph p="n"/>
> <ph p="E"/>
> </syllable>
> </t>
> <boundary breakindex="3"/><t ph="_" pos=",">
> ,
>
> </t>
>
> </phrase>
> </prosody>
> <prosody pitch="-5%" range="-20%">
> <phrase>
> <t ph="s @" pos="[PDEMFS]">
> ce
> <syllable ph="s @">
> <ph p="s"/>
> <ph p="@"/>
> </syllable>
> </t>
> <t ph="k i" pos="[PRELFS]">
> qui
> <syllable ph="k i">
> <ph p="k"/>
> <ph p="i"/>
> </syllable>
> </t>
> <t ph="l H i" pos="[PPOBJMS]">
> lui
> <syllable ph="l H i">
> <ph p="l"/>
> <ph p="H"/>
> <ph p="i"/>
> </syllable>
> </t>
> <t ph="a" pos="[VA3S]">
> a
> <syllable ph="a">
> <ph p="a"/>
> </syllable>
> </t>
> <t ph="d 0 n e" pos="[VPPMS]">
> donné
> <syllable ph="d 0 n e">
> <ph p="d"/>
> <ph p="0"/>
> <ph p="n"/>
> <ph p="e"/>
> </syllable>
> </t>
> <t ph="k o~ f j a~ s" pos="[NFS]">
> confiance
> <syllable ph="k o~ f j a~ s">
> <ph p="k"/>
> <ph p="o~"/>
> <ph p="f"/>
> <ph p="j"/>
> <ph p="a~"/>
> <ph p="s"/>
> </syllable>
> </t>
> <boundary breakindex="3"/><t ph="_" pos=".">
> .
> <syllable ph="_">
> <ph p="_"/>
> </syllable>
> </t>
>
> </phrase>
> </prosody>
> </s>
> </p>
> </maryxml>
>
>
>
>
> format: end time. unit index. phone
> #
> 1.040000 0 _
> 1.175000 1 i
> 1.190000 2 l
> 1.320000 3 s
> 1.400000 4 a
> 1.435000 5 v
> 1.520000 6 E
> 1.595000 7 k
> 1.610000 8 @
> 1.665000 9 l
> 1.705000 10 @
> 1.800000 11 p
> 1.885000 12 9
> 1.990000 13 p
> 2.115000 14 l
> 2.130000 15 @
> 2.150000 16 l
> 2.165000 17 @
> 2.275000 18 s
> 2.305000 19 u
> 2.400000 20 t
> 2.415000 21 @
> 2.475000 22 n
> 2.570000 23 E
> 2.835000 24 _
> 2.915000 25 s
> 2.940000 26 @
> 3.025000 27 k
> 3.040000 28 i
> 3.055000 29 l
> 3.125000 30 H
> 3.140000 31 i
> 3.205000 32 a
> 3.285000 33 d
> 3.305000 34 0
> 3.380000 35 n
> 3.450000 36 e
> 3.525000 37 k
> 3.585000 38 o~
> 3.650000 39 f
> 3.755000 40 j
> 3.890000 41 a~
> 4.120000 42 s
> 4.720000 43 _
>
>
>
>
> If the acoustic is not the problem, what should I do? Maybe the feature
> extractor is responsible for that misalignement? I've selected all the
> features in the FeaturesSelector. I have no clue of what's going on... I
> would like to avoid to run again the EHMMLabeller that takes 11h to
> process.
>
>
>
> I really need your help,
> Florent
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> > Dear Florent,
> >
> > On 29.07.2011 01:46, fxavier at ircam.fr wrote:
> >> Dear all,
> >>
> >> Again I'm encountering some problems. This time with the
> >> HMMVoiceMakeData.
> >> It says:
> > [...]
> >> FeatureDefinition extracted from context file:
> >> /home/florent/FlorentVoice//phonefeatures/utt_2513.pfeats
> >> The following are other context features used for training Hmms:
> >>    pos_in_syl  f0
> >>    position_type  f1
> >>    prev_syl_break  f2
> >>    syl_break  f3
> >> The previous context features were extracted from file:
> >> /home/florent/FlorentVoice/mary/hmmFeatures.txt
> >> Extracting monophone and context features (1): utt_2513.pfeats and
> >> utt_2513.lab
> >> java.lang.Exception: The component HMMVoiceMakeData produced the
> >> following
> >> exception:
> >>      at
> >>
> marytts.tools.voiceimport.DatabaseImportMain$8.run(DatabaseImportMain.java:294)
> >> Caused by: java.lang.Exception: Error: Number of context features in:
> >> /home/florent/FlorentVoice//phonefeatures/utt_2513.pfeats is not the
> >> same
> >> as the number of labels in:
> >> /home/florent/FlorentVoice//phonelab/utt_2513.lab
> >>      at
> >>
> marytts.tools.voiceimport.HMMVoiceMakeData.extractMonophoneAndFullContextLabels(HMMVoiceMakeData.java:988)
> >>      at
> >>
> marytts.tools.voiceimport.HMMVoiceMakeData.makeLabels(HMMVoiceMakeData.java:733)
> >>      at
> >>
> marytts.tools.voiceimport.HMMVoiceMakeData.compute(HMMVoiceMakeData.java:233)
> >>      at
> >>
> marytts.tools.voiceimport.DatabaseImportMain$8.run(DatabaseImportMain.java:291)
> >>
> >>
> >>
> >>
> >>
> >> The fact is that PhoneLabelFeatureAligner failed. For 2992 out of 3000
> >> sentences there's an alignement problem:
> >
> > Not good. There must be something systematically wrong with the labeling.
> >
> >>
> >>
> >>
> >> (...)
> >>     utt_999 Adding pause unit in labels before unit 1
> >>   Feature file is longer than label file:  unit 41 and greater do not
> >> exist
> >> in label file
> >> Remaining problems: 2991
> >>      utt_1:  Feature file is longer than label file:  unit 45 and
> >> greater
> >> do not exist in label file
> >>   ->  Skipping all utterances ! The problems remain.
> >> Removed [0/2999] utterances from the list, [2999] utterances remain,
> >> among
> >> which [2991/2999] still have problems.
> >>
> >>
> >>
> >>
> >> Is this the reason why the HMMVoiceMakeData fail?
> >
> > More than likely.
> >
> >> What do you think is the
> >> problem, the corpus? Should I record it again?
> >
> > Absolutely not! The acoustics should not be a problem (unless it's
> > unusually bad quality or lots of mistakes from the speaker, of course).
> >
> >> When recorded it I tried to
> >> be very natural (schwa elision, mandatory and non mandatory liaisons,
> >> kind
> >> of fast flow) but the sound quality is very good (mono 16 kHz, used
> >> AudioConverterGUI to trim the silences and normalization). Maybe the
> >> phonetical transcriptions didn't take all these typical syntaxic
> >> problems
> >> in account exactly as it should be, so the EHMMLabeler didn't labeled it
> >> right...
> >
> > Indeed, the automatic labeling sadly cannot be trusted to produce what
> > you'd expect.
> >
> >> And, for every sentences when checking the RAWMARYXML there is a
> >> \n at the beginning of the utterance I don't know why. For example:
> >>
> >>
> >>
> >>
> >> <?xml version="1.0" encoding="UTF-8" ?>
> >> <maryxml version="0.4"
> >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> >> xmlns="http://mary.dfki.de/2002/MaryXML"
> >> xml:lang="fr">
> >> <boundary  breakindex="2" duration="100"/>
> >>
> >> il savait que le peuple le soutenait, ce qui lui a donné confiance.
> >> </maryxml>
> >>
> >
> > Shouldn't make a difference, IMHO. But you could try and see if it does.
> >
> >>
> >>
> >>
> >> Don't know if it's a problem though, when correcting this, it says the
> >> problem still remains.lab files looks pretty ok according to what I
> >> listen
> >> in the wav file:
> >>
> >>
> >>
> >>
> >> format: end time. unit index. phone
> >> #
> >> 1.120000 0 _
> >> 1.120000 1 _
> >
> > That looks a bit suspicious. Normally, zero-length intervals should be
> > deleted. Might be a problem with PhoneLabelFeatureAligner. Did you run
> > the PhoneUnitLabelComputer?
> >
> > Perhaps you could try and see of you can remove those zero-length
> > intervals, and whether it magically works afterwards?
> >
> >> 1.175000 2 i
> >> 1.190000 3 l
> >> 1.325000 4 s
> >> 1.400000 5 a
> >> 1.435000 6 v
> >> 1.520000 7 E
> >> 1.600000 8 k
> >> 1.615000 9 @
> >> 1.670000 10 l
> >> 1.705000 11 @
> >> 1.800000 12 p
> >> 1.885000 13 9
> >> 1.995000 14 p
> >> 2.010000 15 l
> >> 2.030000 16 @
> >> 2.130000 17 l
> >> 2.155000 18 @
> >> 2.230000 19 s
> >> 2.305000 20 u
> >> 2.400000 21 t
> >> 2.415000 22 @
> >> 2.475000 23 n
> >> 2.575000 24 E
> >> 2.835000 25 _
> >> 2.910000 26 s
> >> 2.935000 27 @
> >> 3.025000 28 k
> >> 3.040000 29 i
> >> 3.055000 30 l
> >> 3.150000 31 H
> >> 3.165000 32 i
> >> 3.210000 33 a
> >> 3.285000 34 d
> >> 3.305000 35 0
> >> 3.370000 36 n
> >> 3.450000 37 e
> >> 3.525000 38 k
> >> 3.585000 39 o~
> >> 3.655000 40 f
> >> 3.755000 41 j
> >> 3.885000 42 a~
> >> 4.110000 43 s
> >> 4.720000 44 _
> >>
> >>
> >>
> >>
> >>
> >> Is there a way to fix this without having to record the 3000 sentences
> >> corpus? I think the quality won't be as good as it should be, but I need
> >> a
> >> result in the nex few days, I would obviously record it again later.
> >
> > As I said, the problem is unlikely to be caused by the recordings,
> > unless the audio or speech quality is exceptionally bad. It's the
> > labels, and the features extracted using them.
> >
> > Best wishes,
> >
> > -Ingmar
> >
> >>
> >> Thanks in advance,
> >>
> >>
> >>
> >>
> >> Florent
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Mary-users mailing list
> >> Mary-users at dfki.de
> >> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
> >
> > --
> > Ingmar Steiner
> > Postdoctoral Researcher
> >
> > LORIA Speech Group, Nancy, France
> > National Institute for Research in
> > Computer Science and Control (INRIA)
> > _______________________________________________
> > Mary-users mailing list
> > Mary-users at dfki.de
> > http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
> >
>
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>



-- 
------------------------
Sathish Chandra Pammi, Researcher
Web: http://www.dfki.de/~chandra/
DFKI GmbH,
Saarbrücken, Germany
Tel: +49-17624869114
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-users/attachments/20110803/946a8ad4/attachment.htm 


More information about the Mary-users mailing list