[mary-dev] TDPSOLA

Jerome Perri jerome.perri at hotmail.com
Sat Nov 24 23:05:29 CET 2012


It had nothing to do with Mary, it was a general question because I did not understand TDPSOLA in full yet. 

Jerome

> Date: Sat, 24 Nov 2012 20:08:06 +0000
> From: ingmar.steiner at ucd.ie
> To: jerome.perri at hotmail.com
> CC: mary-dev at dfki.de
> Subject: Re: TDPSOLA
> 
> Dear Jerome,
> 
> you do not say which paper you refer to, but I'm quite sure it has 
> nothing to do with MARY. In previous correspondence, you asked me about 
> MARY, and I explained these details wrt MARY.
> 
> Best wishes,
> 
> -Ingmar
> 
> On 11/24/12 09:09, Jerome Perri wrote:
> > Dear Ingmar,
> >
> > I am still stuck... you said that PSOLA is not used to for concatenation.
> >
> > I read through a blizzard paper in which is stated:
> >
> > "
> > a)
> > However serious differences between selected units and duration
> > model sometimes occurs. To handle this we used time-scale modification
> > algorithm as a part of USLTM. This method works in time
> > domain, in pitch synchronous way and modifies speech without
> > any contaminations.
> > b)
> > Selected and modified units are then concatenated in time domain
> > in pitch synchronous way. Overlap and Add (OLA) method is
> > used.
> > "
> >
> > If I give a resume of the above statement, it reads:
> >
> > a) First we stretch/shrink units to make sure they have the "correct"
> > duration
> > b) Then TDPSOLA is used for concatenating the units.
> >
> > But b) would contradict what you said about PSOLA. You said it was used
> > for cosmetics AFTER concatention.
> >
> > Jerome
> >
> >
> >  > Date: Wed, 3 Oct 2012 17:21:43 +0100
> >  > From: ingmar.steiner at ucd.ie
> >  > To: jerome.perri at hotmail.com
> >  > CC: mary-users at dfki.de
> >  > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> >  >
> >  > Dear Jerome,
> >  >
> >  > On 03/10/2012 17:04, Jerome Perri wrote:
> >  > > Dear Ingmar,
> >  > >
> >  > > is PSOLA not used to provide a smoother joining between units during
> >  > > concatenation?
> >  >
> >  > No.
> >  >
> >  > >
> >  > > You talk about it as if it was only for cosmetics in the end.
> >  >
> >  > Yes, and even then only if explicitly requested.
> >  >
> >  > >
> >  > > I thought it was used to compensate for F0 jumps or for unfitting
> >  > > durations during unit concatenation, not to force a great prosody.
> >  >
> >  > With ideal voice data, the unit-selection algorithm will be able to find
> >  > the perfect units. No modification needed. =)
> >  >
> >  > Best wishes,
> >  >
> >  > -Ingmar
> >  >
> >  > > Sorry for the newbie question, this time for real.
> >  > >
> >  > > Greetings,
> >  > > Jerome
> >  > >
> >  > >
> >  > > > Date: Wed, 3 Oct 2012 16:12:44 +0100
> >  > > > From: ingmar.steiner at ucd.ie
> >  > > > To: jerome.perri at hotmail.com
> >  > > > CC: mary-users at dfki.de
> >  > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> >  > > >
> >  > > > Dear Jerome,
> >  > > >
> >  > > > I suspect you may be conflating two distinct concepts here.
> >  > > >
> >  > > > 1) diphone concatenation with overlap: the units are joined using
> > one or
> >  > > > more frames which overlap in the output. This can reduce
> > discontinuities
> >  > > > at unit joins, and in Mary, this is done pitch-synchronously.
> >  > > >
> >  > > > 2) prosody modification with PSOLA: a source signal is modified by
> >  > > > adding or subtracting pitch periods, and compressing or expanding
> > them,
> >  > > > to match a target prosody (this is a really simplified description!).
> >  > > > This can be done in Mary as an optional processing step after
> >  > > > unit-selection, but this degrades the quality of the signal and
> > should
> >  > > > be considered an experimental feature.
> >  > > >
> >  > > > Note that in your example, the durations of the pitchmarks do not
> > match
> >  > > > the pitch-synchronous processing in Mary (e.g., a unit with 200Hz f0
> >  > > > would have pitchmarks exactly 5ms apart).
> >  > > >
> >  > > > Best wishes,
> >  > > >
> >  > > > -Ingmar
> >  > > >
> >  > > > On 03/10/2012 15:44, Jerome Perri wrote:
> >  > > > > Thank you.
> >  > > > >
> >  > > > > May I ask a real newbie question here anyway before I dig into this
> >  > > > > topic in the praat ng?
> >  > > > >
> >  > > > > At first I thought that TD-PSOLA would work the following way:
> >  > > > > I know (via a model) which duration and pitch I need for 2
> >  > > diphones, and
> >  > > > > I realise this through TD-PSOLA.
> >  > > > >
> >  > > > > But now I think that TD-PSOLA works like this ->
> >  > > > >
> >  > > > > I have 2 discontinguous diphone units and I want to join them.
> >  > > > > Let's say Mary/Praat found 3 pitchmarks for the right half of
> > diphone A
> >  > > > > and 3 pitchmarks for the left half of diphone B.
> >  > > > > TD-Psola will put the audio bytes of the 3 pitchmarks over each
> > other
> >  > > > > and will manipulate the duration and pitch of all audio bytes
> > in such a
> >  > > > > way that both will be changed to the average of both.
> >  > > > >
> >  > > > > For example:
> >  > > > >
> >  > > > > t_R + a:_L a:_R + b_L (from file 1)
> >  > > > > a:_R + e_L e_R + k_R (from file 2)
> >  > > > >
> >  > > > > I want to have "t_R + a:_L a:_R + e_L e_R + k_R"
> >  > > > >
> >  > > > > The joint will be between " a:_R + b_L" and "a:_R + e_L"
> >  > > > > Let 's say "a:_R" from file 1 has the pitchmarks
> >  > > > > 1) f0 = 200, duration: 10 ms
> >  > > > > 2) f0 = 210, duration: 5 ms
> >  > > > > 3) f0 = 220, duration 20 ms
> >  > > > >
> >  > > > > Let 's say "a:_R" from file 1 has the pitchmarks
> >  > > > > 1) f0 = 150, duration: 5 ms
> >  > > > > 2) f0 = 160, duration: 15 ms
> >  > > > > 3) f0 = 170, duration 10 ms
> >  > > > >
> >  > > > > TDPSOLA will modify the pitchmarks in the following way:
> >  > > > >
> >  > > > > 1) New f0 for both = (200+150)/2 = 175, new duration for both
> > =(10+5)/2
> >  > > > > = 7.5 ms
> >  > > > > 2) New f0 for both = (210+160)/2 = 185, new duration for both
> > =(5+15)/2
> >  > > > > = 10 ms
> >  > > > > 2) New f0 for both = (220+170)/2 = 195, new duration for both
> >  > > =(20+10)/2
> >  > > > > = 30 ms
> >  > > > >
> >  > > > > Is this basically the way it works?
> >  > > > > A reply would help me such much!!!!!
> >  > > > >
> >  > > > > Thank you!
> >  > > > >
> >  > > > > Jerome
> >  > > > >
> >  > > > > > Date: Wed, 3 Oct 2012 11:15:08 +0100
> >  > > > > > From: ingmar.steiner at ucd.ie
> >  > > > > > To: jerome.perri at hotmail.com
> >  > > > > > CC: mary-users at dfki.de
> >  > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> >  > > > > >
> >  > > > > > Dear Jerome,
> >  > > > > >
> >  > > > > > Mary contains an FD implementation of PSOLA, not TD. For what
> > it's
> >  > > > > > worth, Praat features a TD-PSOLA implementation, and the Praat
> >  > > user list
> >  > > > > > (http://groups.yahoo.com/group/praat-users) might provide some
> >  > > > > > accessible insight on its use. And don't worry, there are a
> > lot of
> >  > > > > > "newbies" on that list. =)
> >  > > > > >
> >  > > > > > Best wishes,
> >  > > > > >
> >  > > > > > -Ingmar
> >  > > > > >
> >  > > > > > On 03/10/2012 07:58, Jerome Perri wrote:
> >  > > > > > > Thank you for confirming this!
> >  > > > > > >
> >  > > > > > > I have another question, please:
> >  > > > > > >
> >  > > > > > > Can anyone tell me a good place for discussing TDPSOLA?
> >  > > > > > > I would very much like to experiment with it, but the
> > example in
> >  > > > > Mary is
> >  > > > > > > - as I understood it - just a starting point.
> >  > > > > > >
> >  > > > > > > I would like to be able to ask real newbie questions
> > without being
> >  > > > > > > punished or ignored because my questions are just too newbie.
> >  > > > > > >
> >  > > > > > > Thank you for any hints.
> >  > > > > > >
> >  > > > > > > Jerome
> >  > > > > > >
> >  > > > > > > > Date: Tue, 2 Oct 2012 15:45:55 +0100
> >  > > > > > > > From: ingmar.steiner at ucd.ie
> >  > > > > > > > To: jerome.perri at hotmail.com
> >  > > > > > > > CC: bizpole at hotmail.ca; mary-users at dfki.de
> >  > > > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> >  > > > > > > >
> >  > > > > > > > Dear Jerome and Asif,
> >  > > > > > > >
> >  > > > > > > > the "pause" under discussion does indeed occur between
> >  > > syllables, but
> >  > > > > > > > not all syllables exhibit this. In fact, what you observe
> > is the
> >  > > > > > > > occlusion phase of the [t], characteristic for all plosive
> >  > > > > phonemes (or
> >  > > > > > > > "stops"), during which intraoral pressure builds up. It is
> >  > > > > followed by
> >  > > > > > > > the burst (or release) phase, and the two phases comprise
> > the [t]
> >  > > > > as it
> >  > > > > > > > is spoken in e.g., English.
> >  > > > > > > >
> >  > > > > > > > Your conclusion to label the occlusion phase as part of the
> >  > > [t] is
> >  > > > > > > correct.
> >  > > > > > > >
> >  > > > > > > > Best wishes,
> >  > > > > > > >
> >  > > > > > > > -Ingmar
> >  > > > > > > >
> >  > > > > > > > On 01/10/2012 11:44, Jerome Perri wrote:
> >  > > > > > > > > Hi Asif,
> >  > > > > > > > >
> >  > > > > > > > > that is a very good explanation, I agree with it.
> >  > > > > > > > > Thank you.
> >  > > > > > > > >
> >  > > > > > > > > Jerome
> >  > > > > > > > >
> >  > > > > > > > >
> >  > > > > > >
> >  > > > >
> >  > >
> > ------------------------------------------------------------------------
> >  > > > > > > > > From: bizpole at hotmail.ca
> >  > > > > > > > > To: jerome.perri at hotmail.com; ingmar.steiner at ucd.ie;
> >  > > > > mary-dev at dfki.de;
> >  > > > > > > > > mary-users at dfki.de
> >  > > > > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> >  > > > > > > > > Date: Sun, 30 Sep 2012 09:22:39 -0400
> >  > > > > > > > >
> >  > > > > > > > > Hello Jerome,
> >  > > > > > > > > I am not expert but I will try to explain it by logic.
> >  > > > > > > > > I will assume it as a syllable pause.(Phonetic Pause)
> >  > > > > > > > > The gap or pause between _HO_ and _TEL_ is due to syllables
> >  > > of this
> >  > > > > > > > > word. _HO_ is 1st block and _TEL_ is 2nd block of this
> >  > > > > disyllable word,
> >  > > > > > > > > _HO_ is said completely 1st and as one block while a
> > pause came
> >  > > > > before
> >  > > > > > > > > _TEL_ to make a complete word, _TEL_ caused a gap
> > between both
> >  > > > > blocks,
> >  > > > > > > > > so in my opinion gap/pause should be labeled
> >  > > > > > > > > as a part of T.
> >  > > > > > > > > Thanks
> >  > > > > > > > > Asif Mir
> >  > > > > > > > > *From:* Jerome Perri <mailto:jerome.perri at hotmail.com>
> >  > > > > > > > > *Sent:* Friday, September 28, 2012 6:49 AM
> >  > > > > > > > > *To:* ingmar.steiner at ucd.ie
> > <mailto:ingmar.steiner at ucd.ie> ;
> >  > > > > > > > > mary-dev at dfki.de <mailto:mary-dev at dfki.de> ;
> > mary-users at dfki.de
> >  > > > > > > > > <mailto:mary-users at dfki.de>
> >  > > > > > > > > *Subject:* [mary-users] [mary-dev] Labelling gaps
> >  > > > > > > > >
> >  > > > > > > > > Hello!
> >  > > > > > > > >
> >  > > > > > > > > I would like to ask what the rules for labelling are
> > for cases
> >  > > > > where
> >  > > > > > > > > there is a gap.
> >  > > > > > > > >
> >  > > > > > > > > For example in a word like "hotel":
> >  > > > > > > > > It is likely that the speaker made a small pause betwee the
> >  > > "o" and
> >  > > > > > > the "t".
> >  > > > > > > > > Should the gap/pause be labelled as a part of the "o" or as
> >  > > a part
> >  > > > > > > of "t"?
> >  > > > > > > > >
> >  > > > > > > > > Thank you!
> >  > > > > > > > >
> >  > > > > > > > > Jerome
> >  > > > > > > > >
> >  > > > > > > > >
> >  > > > > > >
> >  > > > >
> >  > >
> > ------------------------------------------------------------------------
> >  > > > > > > > >
> >  > > > > > > > > _______________________________________________
> >  > > > > > > > > Mary-users mailing list
> >  > > > > > > > > Mary-users at dfki.de <mailto:Mary-users at dfki.de>
> >  > > > > > > > > http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
> >  > > > > > > >
> >  > > > > > > > --
> >  > > > > > > > Ingmar Steiner
> >  > > > > > > > Postdoctoral Research Fellow
> >  > > > > > > > Centre for Next Generation Localisation
> >  > > > > > > >
> >  > > > > > > > Multilingual Ubiquitous Speech Technology (MUSTER)
> >  > > > > > > > Computer Science and Informatics
> >  > > > > > > > University College Dublin
> >  > > > > > > >
> >  > > > > > > > Speech Communication Laboratory
> >  > > > > > > > Centre for Language and Communication Studies
> >  > > > > > > > Trinity College Dublin
> >  > > > > >
> >  > > > > > --
> >  > > > > > Ingmar Steiner
> >  > > > > > Postdoctoral Research Fellow
> >  > > > > > Centre for Next Generation Localisation
> >  > > > > >
> >  > > > > > Multilingual Ubiquitous Speech Technology (MUSTER)
> >  > > > > > Computer Science and Informatics
> >  > > > > > University College Dublin
> >  > > > > >
> >  > > > > > Speech Communication Laboratory
> >  > > > > > Centre for Language and Communication Studies
> >  > > > > > Trinity College Dublin
> >  > > >
> >  > > > --
> >  > > > Ingmar Steiner
> >  > > > Postdoctoral Research Fellow
> >  > > > Centre for Next Generation Localisation
> >  > > >
> >  > > > Multilingual Ubiquitous Speech Technology (MUSTER)
> >  > > > Computer Science and Informatics
> >  > > > University College Dublin
> >  > > >
> >  > > > Speech Communication Laboratory
> >  > > > Centre for Language and Communication Studies
> >  > > > Trinity College Dublin
> >  >
> >  > --
> >  > Ingmar Steiner
> >  > Postdoctoral Research Fellow
> >  > Centre for Next Generation Localisation
> >  >
> >  > Multilingual Ubiquitous Speech Technology (MUSTER)
> >  > Computer Science and Informatics
> >  > University College Dublin
> >  >
> >  > Speech Communication Laboratory
> >  > Centre for Language and Communication Studies
> >  > Trinity College Dublin
> 
> -- 
> Ingmar Steiner
> Postdoctoral Research Fellow
> Centre for Next Generation Localisation
> 
> Multilingual Ubiquitous Speech Technology (MUSTER)
> Computer Science and Informatics
> University College Dublin
> 
> Speech Communication Laboratory
> Centre for Language and Communication Studies
> Trinity College Dublin
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-dev/attachments/20121124/f73e5b99/attachment-0001.htm 


More information about the Mary-dev mailing list