[mary-dev] TDPSOLA
Jerome Perri
jerome.perri at hotmail.com
Sat Nov 24 23:05:29 CET 2012
It had nothing to do with Mary, it was a general question because I did not understand TDPSOLA in full yet.
Jerome
> Date: Sat, 24 Nov 2012 20:08:06 +0000
> From: ingmar.steiner at ucd.ie
> To: jerome.perri at hotmail.com
> CC: mary-dev at dfki.de
> Subject: Re: TDPSOLA
>
> Dear Jerome,
>
> you do not say which paper you refer to, but I'm quite sure it has
> nothing to do with MARY. In previous correspondence, you asked me about
> MARY, and I explained these details wrt MARY.
>
> Best wishes,
>
> -Ingmar
>
> On 11/24/12 09:09, Jerome Perri wrote:
> > Dear Ingmar,
> >
> > I am still stuck... you said that PSOLA is not used to for concatenation.
> >
> > I read through a blizzard paper in which is stated:
> >
> > "
> > a)
> > However serious differences between selected units and duration
> > model sometimes occurs. To handle this we used time-scale modification
> > algorithm as a part of USLTM. This method works in time
> > domain, in pitch synchronous way and modifies speech without
> > any contaminations.
> > b)
> > Selected and modified units are then concatenated in time domain
> > in pitch synchronous way. Overlap and Add (OLA) method is
> > used.
> > "
> >
> > If I give a resume of the above statement, it reads:
> >
> > a) First we stretch/shrink units to make sure they have the "correct"
> > duration
> > b) Then TDPSOLA is used for concatenating the units.
> >
> > But b) would contradict what you said about PSOLA. You said it was used
> > for cosmetics AFTER concatention.
> >
> > Jerome
> >
> >
> > > Date: Wed, 3 Oct 2012 17:21:43 +0100
> > > From: ingmar.steiner at ucd.ie
> > > To: jerome.perri at hotmail.com
> > > CC: mary-users at dfki.de
> > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> > >
> > > Dear Jerome,
> > >
> > > On 03/10/2012 17:04, Jerome Perri wrote:
> > > > Dear Ingmar,
> > > >
> > > > is PSOLA not used to provide a smoother joining between units during
> > > > concatenation?
> > >
> > > No.
> > >
> > > >
> > > > You talk about it as if it was only for cosmetics in the end.
> > >
> > > Yes, and even then only if explicitly requested.
> > >
> > > >
> > > > I thought it was used to compensate for F0 jumps or for unfitting
> > > > durations during unit concatenation, not to force a great prosody.
> > >
> > > With ideal voice data, the unit-selection algorithm will be able to find
> > > the perfect units. No modification needed. =)
> > >
> > > Best wishes,
> > >
> > > -Ingmar
> > >
> > > > Sorry for the newbie question, this time for real.
> > > >
> > > > Greetings,
> > > > Jerome
> > > >
> > > >
> > > > > Date: Wed, 3 Oct 2012 16:12:44 +0100
> > > > > From: ingmar.steiner at ucd.ie
> > > > > To: jerome.perri at hotmail.com
> > > > > CC: mary-users at dfki.de
> > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> > > > >
> > > > > Dear Jerome,
> > > > >
> > > > > I suspect you may be conflating two distinct concepts here.
> > > > >
> > > > > 1) diphone concatenation with overlap: the units are joined using
> > one or
> > > > > more frames which overlap in the output. This can reduce
> > discontinuities
> > > > > at unit joins, and in Mary, this is done pitch-synchronously.
> > > > >
> > > > > 2) prosody modification with PSOLA: a source signal is modified by
> > > > > adding or subtracting pitch periods, and compressing or expanding
> > them,
> > > > > to match a target prosody (this is a really simplified description!).
> > > > > This can be done in Mary as an optional processing step after
> > > > > unit-selection, but this degrades the quality of the signal and
> > should
> > > > > be considered an experimental feature.
> > > > >
> > > > > Note that in your example, the durations of the pitchmarks do not
> > match
> > > > > the pitch-synchronous processing in Mary (e.g., a unit with 200Hz f0
> > > > > would have pitchmarks exactly 5ms apart).
> > > > >
> > > > > Best wishes,
> > > > >
> > > > > -Ingmar
> > > > >
> > > > > On 03/10/2012 15:44, Jerome Perri wrote:
> > > > > > Thank you.
> > > > > >
> > > > > > May I ask a real newbie question here anyway before I dig into this
> > > > > > topic in the praat ng?
> > > > > >
> > > > > > At first I thought that TD-PSOLA would work the following way:
> > > > > > I know (via a model) which duration and pitch I need for 2
> > > > diphones, and
> > > > > > I realise this through TD-PSOLA.
> > > > > >
> > > > > > But now I think that TD-PSOLA works like this ->
> > > > > >
> > > > > > I have 2 discontinguous diphone units and I want to join them.
> > > > > > Let's say Mary/Praat found 3 pitchmarks for the right half of
> > diphone A
> > > > > > and 3 pitchmarks for the left half of diphone B.
> > > > > > TD-Psola will put the audio bytes of the 3 pitchmarks over each
> > other
> > > > > > and will manipulate the duration and pitch of all audio bytes
> > in such a
> > > > > > way that both will be changed to the average of both.
> > > > > >
> > > > > > For example:
> > > > > >
> > > > > > t_R + a:_L a:_R + b_L (from file 1)
> > > > > > a:_R + e_L e_R + k_R (from file 2)
> > > > > >
> > > > > > I want to have "t_R + a:_L a:_R + e_L e_R + k_R"
> > > > > >
> > > > > > The joint will be between " a:_R + b_L" and "a:_R + e_L"
> > > > > > Let 's say "a:_R" from file 1 has the pitchmarks
> > > > > > 1) f0 = 200, duration: 10 ms
> > > > > > 2) f0 = 210, duration: 5 ms
> > > > > > 3) f0 = 220, duration 20 ms
> > > > > >
> > > > > > Let 's say "a:_R" from file 1 has the pitchmarks
> > > > > > 1) f0 = 150, duration: 5 ms
> > > > > > 2) f0 = 160, duration: 15 ms
> > > > > > 3) f0 = 170, duration 10 ms
> > > > > >
> > > > > > TDPSOLA will modify the pitchmarks in the following way:
> > > > > >
> > > > > > 1) New f0 for both = (200+150)/2 = 175, new duration for both
> > =(10+5)/2
> > > > > > = 7.5 ms
> > > > > > 2) New f0 for both = (210+160)/2 = 185, new duration for both
> > =(5+15)/2
> > > > > > = 10 ms
> > > > > > 2) New f0 for both = (220+170)/2 = 195, new duration for both
> > > > =(20+10)/2
> > > > > > = 30 ms
> > > > > >
> > > > > > Is this basically the way it works?
> > > > > > A reply would help me such much!!!!!
> > > > > >
> > > > > > Thank you!
> > > > > >
> > > > > > Jerome
> > > > > >
> > > > > > > Date: Wed, 3 Oct 2012 11:15:08 +0100
> > > > > > > From: ingmar.steiner at ucd.ie
> > > > > > > To: jerome.perri at hotmail.com
> > > > > > > CC: mary-users at dfki.de
> > > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> > > > > > >
> > > > > > > Dear Jerome,
> > > > > > >
> > > > > > > Mary contains an FD implementation of PSOLA, not TD. For what
> > it's
> > > > > > > worth, Praat features a TD-PSOLA implementation, and the Praat
> > > > user list
> > > > > > > (http://groups.yahoo.com/group/praat-users) might provide some
> > > > > > > accessible insight on its use. And don't worry, there are a
> > lot of
> > > > > > > "newbies" on that list. =)
> > > > > > >
> > > > > > > Best wishes,
> > > > > > >
> > > > > > > -Ingmar
> > > > > > >
> > > > > > > On 03/10/2012 07:58, Jerome Perri wrote:
> > > > > > > > Thank you for confirming this!
> > > > > > > >
> > > > > > > > I have another question, please:
> > > > > > > >
> > > > > > > > Can anyone tell me a good place for discussing TDPSOLA?
> > > > > > > > I would very much like to experiment with it, but the
> > example in
> > > > > > Mary is
> > > > > > > > - as I understood it - just a starting point.
> > > > > > > >
> > > > > > > > I would like to be able to ask real newbie questions
> > without being
> > > > > > > > punished or ignored because my questions are just too newbie.
> > > > > > > >
> > > > > > > > Thank you for any hints.
> > > > > > > >
> > > > > > > > Jerome
> > > > > > > >
> > > > > > > > > Date: Tue, 2 Oct 2012 15:45:55 +0100
> > > > > > > > > From: ingmar.steiner at ucd.ie
> > > > > > > > > To: jerome.perri at hotmail.com
> > > > > > > > > CC: bizpole at hotmail.ca; mary-users at dfki.de
> > > > > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> > > > > > > > >
> > > > > > > > > Dear Jerome and Asif,
> > > > > > > > >
> > > > > > > > > the "pause" under discussion does indeed occur between
> > > > syllables, but
> > > > > > > > > not all syllables exhibit this. In fact, what you observe
> > is the
> > > > > > > > > occlusion phase of the [t], characteristic for all plosive
> > > > > > phonemes (or
> > > > > > > > > "stops"), during which intraoral pressure builds up. It is
> > > > > > followed by
> > > > > > > > > the burst (or release) phase, and the two phases comprise
> > the [t]
> > > > > > as it
> > > > > > > > > is spoken in e.g., English.
> > > > > > > > >
> > > > > > > > > Your conclusion to label the occlusion phase as part of the
> > > > [t] is
> > > > > > > > correct.
> > > > > > > > >
> > > > > > > > > Best wishes,
> > > > > > > > >
> > > > > > > > > -Ingmar
> > > > > > > > >
> > > > > > > > > On 01/10/2012 11:44, Jerome Perri wrote:
> > > > > > > > > > Hi Asif,
> > > > > > > > > >
> > > > > > > > > > that is a very good explanation, I agree with it.
> > > > > > > > > > Thank you.
> > > > > > > > > >
> > > > > > > > > > Jerome
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > > > > > > > From: bizpole at hotmail.ca
> > > > > > > > > > To: jerome.perri at hotmail.com; ingmar.steiner at ucd.ie;
> > > > > > mary-dev at dfki.de;
> > > > > > > > > > mary-users at dfki.de
> > > > > > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
> > > > > > > > > > Date: Sun, 30 Sep 2012 09:22:39 -0400
> > > > > > > > > >
> > > > > > > > > > Hello Jerome,
> > > > > > > > > > I am not expert but I will try to explain it by logic.
> > > > > > > > > > I will assume it as a syllable pause.(Phonetic Pause)
> > > > > > > > > > The gap or pause between _HO_ and _TEL_ is due to syllables
> > > > of this
> > > > > > > > > > word. _HO_ is 1st block and _TEL_ is 2nd block of this
> > > > > > disyllable word,
> > > > > > > > > > _HO_ is said completely 1st and as one block while a
> > pause came
> > > > > > before
> > > > > > > > > > _TEL_ to make a complete word, _TEL_ caused a gap
> > between both
> > > > > > blocks,
> > > > > > > > > > so in my opinion gap/pause should be labeled
> > > > > > > > > > as a part of T.
> > > > > > > > > > Thanks
> > > > > > > > > > Asif Mir
> > > > > > > > > > *From:* Jerome Perri <mailto:jerome.perri at hotmail.com>
> > > > > > > > > > *Sent:* Friday, September 28, 2012 6:49 AM
> > > > > > > > > > *To:* ingmar.steiner at ucd.ie
> > <mailto:ingmar.steiner at ucd.ie> ;
> > > > > > > > > > mary-dev at dfki.de <mailto:mary-dev at dfki.de> ;
> > mary-users at dfki.de
> > > > > > > > > > <mailto:mary-users at dfki.de>
> > > > > > > > > > *Subject:* [mary-users] [mary-dev] Labelling gaps
> > > > > > > > > >
> > > > > > > > > > Hello!
> > > > > > > > > >
> > > > > > > > > > I would like to ask what the rules for labelling are
> > for cases
> > > > > > where
> > > > > > > > > > there is a gap.
> > > > > > > > > >
> > > > > > > > > > For example in a word like "hotel":
> > > > > > > > > > It is likely that the speaker made a small pause betwee the
> > > > "o" and
> > > > > > > > the "t".
> > > > > > > > > > Should the gap/pause be labelled as a part of the "o" or as
> > > > a part
> > > > > > > > of "t"?
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > > >
> > > > > > > > > > Jerome
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Mary-users mailing list
> > > > > > > > > > Mary-users at dfki.de <mailto:Mary-users at dfki.de>
> > > > > > > > > > http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Ingmar Steiner
> > > > > > > > > Postdoctoral Research Fellow
> > > > > > > > > Centre for Next Generation Localisation
> > > > > > > > >
> > > > > > > > > Multilingual Ubiquitous Speech Technology (MUSTER)
> > > > > > > > > Computer Science and Informatics
> > > > > > > > > University College Dublin
> > > > > > > > >
> > > > > > > > > Speech Communication Laboratory
> > > > > > > > > Centre for Language and Communication Studies
> > > > > > > > > Trinity College Dublin
> > > > > > >
> > > > > > > --
> > > > > > > Ingmar Steiner
> > > > > > > Postdoctoral Research Fellow
> > > > > > > Centre for Next Generation Localisation
> > > > > > >
> > > > > > > Multilingual Ubiquitous Speech Technology (MUSTER)
> > > > > > > Computer Science and Informatics
> > > > > > > University College Dublin
> > > > > > >
> > > > > > > Speech Communication Laboratory
> > > > > > > Centre for Language and Communication Studies
> > > > > > > Trinity College Dublin
> > > > >
> > > > > --
> > > > > Ingmar Steiner
> > > > > Postdoctoral Research Fellow
> > > > > Centre for Next Generation Localisation
> > > > >
> > > > > Multilingual Ubiquitous Speech Technology (MUSTER)
> > > > > Computer Science and Informatics
> > > > > University College Dublin
> > > > >
> > > > > Speech Communication Laboratory
> > > > > Centre for Language and Communication Studies
> > > > > Trinity College Dublin
> > >
> > > --
> > > Ingmar Steiner
> > > Postdoctoral Research Fellow
> > > Centre for Next Generation Localisation
> > >
> > > Multilingual Ubiquitous Speech Technology (MUSTER)
> > > Computer Science and Informatics
> > > University College Dublin
> > >
> > > Speech Communication Laboratory
> > > Centre for Language and Communication Studies
> > > Trinity College Dublin
>
> --
> Ingmar Steiner
> Postdoctoral Research Fellow
> Centre for Next Generation Localisation
>
> Multilingual Ubiquitous Speech Technology (MUSTER)
> Computer Science and Informatics
> University College Dublin
>
> Speech Communication Laboratory
> Centre for Language and Communication Studies
> Trinity College Dublin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-dev/attachments/20121124/f73e5b99/attachment-0001.htm
More information about the Mary-dev
mailing list