[mary-dev] TDPSOLA

Sat Nov 24 10:09:34 CET 2012

Dear Ingmar,

I am still stuck... you said that PSOLA is not used to for concatenation.

I read through a blizzard paper in which is stated:

"
a)
However serious differences between selected units and duration
model sometimes occurs. To handle this we used time-scale modification
algorithm as a part of USLTM. This method works in time
domain, in pitch synchronous way and modifies speech without
any contaminations.
b)
Selected and modified units are then concatenated in time domain
in pitch synchronous way. Overlap and Add (OLA) method is
used.
"

If I give a resume of the above statement, it reads:

a) First we stretch/shrink units to make sure they have the "correct" duration
b) Then TDPSOLA is used for concatenating the units. 

But b) would contradict what you said about PSOLA. You said it was used for cosmetics AFTER concatention. 

Jerome

> Date: Wed, 3 Oct 2012 17:21:43 +0100
> From: ingmar.steiner at ucd.ie
> To: jerome.perri at hotmail.com
> CC: mary-users at dfki.de
> Subject: Re: [mary-users] [mary-dev] Labelling gaps
> 
> Dear Jerome,
> 
> On 03/10/2012 17:04, Jerome Perri wrote:
> > Dear Ingmar,
> >
> > is PSOLA not used to provide a smoother joining between units during
> > concatenation?
> 
> No.
> 
> >
> > You talk about it as if it was only for cosmetics in the end.
> 
> Yes, and even then only if explicitly requested.
> 
> >
> > I thought it was used to compensate for F0 jumps or for unfitting
> > durations during unit concatenation, not to force a great prosody.
> 
> With ideal voice data, the unit-selection algorithm will be able to find 
> the perfect units. No modification needed. =)
> 
> Best wishes,
> 
> -Ingmar
> 
> > Sorry for the newbie question, this time for real.
> >
> > Greetings,
> > Jerome
> >
> >
> >  > Date: Wed, 3 Oct 2012 16:12:44 +0100
> >  > From: ingmar.steiner at ucd.ie
> >  > To: jerome.perri at hotmail.com
> >  > CC: mary-users at dfki.de
> >  > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> >  >
> >  > Dear Jerome,
> >  >
> >  > I suspect you may be conflating two distinct concepts here.
> >  >
> >  > 1) diphone concatenation with overlap: the units are joined using one or
> >  > more frames which overlap in the output. This can reduce discontinuities
> >  > at unit joins, and in Mary, this is done pitch-synchronously.
> >  >
> >  > 2) prosody modification with PSOLA: a source signal is modified by
> >  > adding or subtracting pitch periods, and compressing or expanding them,
> >  > to match a target prosody (this is a really simplified description!).
> >  > This can be done in Mary as an optional processing step after
> >  > unit-selection, but this degrades the quality of the signal and should
> >  > be considered an experimental feature.
> >  >
> >  > Note that in your example, the durations of the pitchmarks do not match
> >  > the pitch-synchronous processing in Mary (e.g., a unit with 200Hz f0
> >  > would have pitchmarks exactly 5ms apart).
> >  >
> >  > Best wishes,
> >  >
> >  > -Ingmar
> >  >
> >  > On 03/10/2012 15:44, Jerome Perri wrote:
> >  > > Thank you.
> >  > >
> >  > > May I ask a real newbie question here anyway before I dig into this
> >  > > topic in the praat ng?
> >  > >
> >  > > At first I thought that TD-PSOLA would work the following way:
> >  > > I know (via a model) which duration and pitch I need for 2
> > diphones, and
> >  > > I realise this through TD-PSOLA.
> >  > >
> >  > > But now I think that TD-PSOLA works like this ->
> >  > >
> >  > > I have 2 discontinguous diphone units and I want to join them.
> >  > > Let's say Mary/Praat found 3 pitchmarks for the right half of diphone A
> >  > > and 3 pitchmarks for the left half of diphone B.
> >  > > TD-Psola will put the audio bytes of the 3 pitchmarks over each other
> >  > > and will manipulate the duration and pitch of all audio bytes in such a
> >  > > way that both will be changed to the average of both.
> >  > >
> >  > > For example:
> >  > >
> >  > > t_R + a:_L a:_R + b_L (from file 1)
> >  > > a:_R + e_L e_R + k_R (from file 2)
> >  > >
> >  > > I want to have "t_R + a:_L a:_R + e_L e_R + k_R"
> >  > >
> >  > > The joint will be between " a:_R + b_L" and "a:_R + e_L"
> >  > > Let 's say "a:_R" from file 1 has the pitchmarks
> >  > > 1) f0 = 200, duration: 10 ms
> >  > > 2) f0 = 210, duration: 5 ms
> >  > > 3) f0 = 220, duration 20 ms
> >  > >
> >  > > Let 's say "a:_R" from file 1 has the pitchmarks
> >  > > 1) f0 = 150, duration: 5 ms
> >  > > 2) f0 = 160, duration: 15 ms
> >  > > 3) f0 = 170, duration 10 ms
> >  > >
> >  > > TDPSOLA will modify the pitchmarks in the following way:
> >  > >
> >  > > 1) New f0 for both = (200+150)/2 = 175, new duration for both =(10+5)/2
> >  > > = 7.5 ms
> >  > > 2) New f0 for both = (210+160)/2 = 185, new duration for both =(5+15)/2
> >  > > = 10 ms
> >  > > 2) New f0 for both = (220+170)/2 = 195, new duration for both
> > =(20+10)/2
> >  > > = 30 ms
> >  > >
> >  > > Is this basically the way it works?
> >  > > A reply would help me such much!!!!!
> >  > >
> >  > > Thank you!
> >  > >
> >  > > Jerome
> >  > >
> >  > > > Date: Wed, 3 Oct 2012 11:15:08 +0100
> >  > > > From: ingmar.steiner at ucd.ie
> >  > > > To: jerome.perri at hotmail.com
> >  > > > CC: mary-users at dfki.de
> >  > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> >  > > >
> >  > > > Dear Jerome,
> >  > > >
> >  > > > Mary contains an FD implementation of PSOLA, not TD. For what it's
> >  > > > worth, Praat features a TD-PSOLA implementation, and the Praat
> > user list
> >  > > > (http://groups.yahoo.com/group/praat-users) might provide some
> >  > > > accessible insight on its use. And don't worry, there are a lot of
> >  > > > "newbies" on that list. =)
> >  > > >
> >  > > > Best wishes,
> >  > > >
> >  > > > -Ingmar
> >  > > >
> >  > > > On 03/10/2012 07:58, Jerome Perri wrote:
> >  > > > > Thank you for confirming this!
> >  > > > >
> >  > > > > I have another question, please:
> >  > > > >
> >  > > > > Can anyone tell me a good place for discussing TDPSOLA?
> >  > > > > I would very much like to experiment with it, but the example in
> >  > > Mary is
> >  > > > > - as I understood it - just a starting point.
> >  > > > >
> >  > > > > I would like to be able to ask real newbie questions without being
> >  > > > > punished or ignored because my questions are just too newbie.
> >  > > > >
> >  > > > > Thank you for any hints.
> >  > > > >
> >  > > > > Jerome
> >  > > > >
> >  > > > > > Date: Tue, 2 Oct 2012 15:45:55 +0100
> >  > > > > > From: ingmar.steiner at ucd.ie
> >  > > > > > To: jerome.perri at hotmail.com
> >  > > > > > CC: bizpole at hotmail.ca; mary-users at dfki.de
> >  > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> >  > > > > >
> >  > > > > > Dear Jerome and Asif,
> >  > > > > >
> >  > > > > > the "pause" under discussion does indeed occur between
> > syllables, but
> >  > > > > > not all syllables exhibit this. In fact, what you observe is the
> >  > > > > > occlusion phase of the [t], characteristic for all plosive
> >  > > phonemes (or
> >  > > > > > "stops"), during which intraoral pressure builds up. It is
> >  > > followed by
> >  > > > > > the burst (or release) phase, and the two phases comprise the [t]
> >  > > as it
> >  > > > > > is spoken in e.g., English.
> >  > > > > >
> >  > > > > > Your conclusion to label the occlusion phase as part of the
> > [t] is
> >  > > > > correct.
> >  > > > > >
> >  > > > > > Best wishes,
> >  > > > > >
> >  > > > > > -Ingmar
> >  > > > > >
> >  > > > > > On 01/10/2012 11:44, Jerome Perri wrote:
> >  > > > > > > Hi Asif,
> >  > > > > > >
> >  > > > > > > that is a very good explanation, I agree with it.
> >  > > > > > > Thank you.
> >  > > > > > >
> >  > > > > > > Jerome
> >  > > > > > >
> >  > > > > > >
> >  > > > >
> >  > >
> > ------------------------------------------------------------------------
> >  > > > > > > From: bizpole at hotmail.ca
> >  > > > > > > To: jerome.perri at hotmail.com; ingmar.steiner at ucd.ie;
> >  > > mary-dev at dfki.de;
> >  > > > > > > mary-users at dfki.de
> >  > > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> >  > > > > > > Date: Sun, 30 Sep 2012 09:22:39 -0400
> >  > > > > > >
> >  > > > > > > Hello Jerome,
> >  > > > > > > I am not expert but I will try to explain it by logic.
> >  > > > > > > I will assume it as a syllable pause.(Phonetic Pause)
> >  > > > > > > The gap or pause between _HO_ and _TEL_ is due to syllables
> > of this
> >  > > > > > > word. _HO_ is 1st block and _TEL_ is 2nd block of this
> >  > > disyllable word,
> >  > > > > > > _HO_ is said completely 1st and as one block while a pause came
> >  > > before
> >  > > > > > > _TEL_ to make a complete word, _TEL_ caused a gap between both
> >  > > blocks,
> >  > > > > > > so in my opinion gap/pause should be labeled
> >  > > > > > > as a part of T.
> >  > > > > > > Thanks
> >  > > > > > > Asif Mir
> >  > > > > > > *From:* Jerome Perri <mailto:jerome.perri at hotmail.com>
> >  > > > > > > *Sent:* Friday, September 28, 2012 6:49 AM
> >  > > > > > > *To:* ingmar.steiner at ucd.ie <mailto:ingmar.steiner at ucd.ie> ;
> >  > > > > > > mary-dev at dfki.de <mailto:mary-dev at dfki.de> ; mary-users at dfki.de
> >  > > > > > > <mailto:mary-users at dfki.de>
> >  > > > > > > *Subject:* [mary-users] [mary-dev] Labelling gaps
> >  > > > > > >
> >  > > > > > > Hello!
> >  > > > > > >
> >  > > > > > > I would like to ask what the rules for labelling are for cases
> >  > > where
> >  > > > > > > there is a gap.
> >  > > > > > >
> >  > > > > > > For example in a word like "hotel":
> >  > > > > > > It is likely that the speaker made a small pause betwee the
> > "o" and
> >  > > > > the "t".
> >  > > > > > > Should the gap/pause be labelled as a part of the "o" or as
> > a part
> >  > > > > of "t"?
> >  > > > > > >
> >  > > > > > > Thank you!
> >  > > > > > >
> >  > > > > > > Jerome
> >  > > > > > >
> >  > > > > > >
> >  > > > >
> >  > >
> > ------------------------------------------------------------------------
> >  > > > > > >
> >  > > > > > > _______________________________________________
> >  > > > > > > Mary-users mailing list
> >  > > > > > > Mary-users at dfki.de <mailto:Mary-users at dfki.de>
> >  > > > > > > http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
> >  > > > > >
> >  > > > > > --
> >  > > > > > Ingmar Steiner
> >  > > > > > Postdoctoral Research Fellow
> >  > > > > > Centre for Next Generation Localisation
> >  > > > > >
> >  > > > > > Multilingual Ubiquitous Speech Technology (MUSTER)
> >  > > > > > Computer Science and Informatics
> >  > > > > > University College Dublin
> >  > > > > >
> >  > > > > > Speech Communication Laboratory
> >  > > > > > Centre for Language and Communication Studies
> >  > > > > > Trinity College Dublin
> >  > > >
> >  > > > --
> >  > > > Ingmar Steiner
> >  > > > Postdoctoral Research Fellow
> >  > > > Centre for Next Generation Localisation
> >  > > >
> >  > > > Multilingual Ubiquitous Speech Technology (MUSTER)
> >  > > > Computer Science and Informatics
> >  > > > University College Dublin
> >  > > >
> >  > > > Speech Communication Laboratory
> >  > > > Centre for Language and Communication Studies
> >  > > > Trinity College Dublin
> >  >
> >  > --
> >  > Ingmar Steiner
> >  > Postdoctoral Research Fellow
> >  > Centre for Next Generation Localisation
> >  >
> >  > Multilingual Ubiquitous Speech Technology (MUSTER)
> >  > Computer Science and Informatics
> >  > University College Dublin
> >  >
> >  > Speech Communication Laboratory
> >  > Centre for Language and Communication Studies
> >  > Trinity College Dublin
> 
> -- 
> Ingmar Steiner
> Postdoctoral Research Fellow
> Centre for Next Generation Localisation
> 
> Multilingual Ubiquitous Speech Technology (MUSTER)
> Computer Science and Informatics
> University College Dublin
> 
> Speech Communication Laboratory
> Centre for Language and Communication Studies
> Trinity College Dublin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-dev/attachments/20121124/137ba210/attachment-0001.htm