[mary-dev] TDPSOLA

Sat Nov 24 21:08:06 CET 2012

Dear Jerome,

you do not say which paper you refer to, but I'm quite sure it has 
nothing to do with MARY. In previous correspondence, you asked me about 
MARY, and I explained these details wrt MARY.

Best wishes,

-Ingmar

On 11/24/12 09:09, Jerome Perri wrote:
> Dear Ingmar,
>
> I am still stuck... you said that PSOLA is not used to for concatenation.
>
> I read through a blizzard paper in which is stated:
>
> "
> a)
> However serious differences between selected units and duration
> model sometimes occurs. To handle this we used time-scale modification
> algorithm as a part of USLTM. This method works in time
> domain, in pitch synchronous way and modifies speech without
> any contaminations.
> b)
> Selected and modified units are then concatenated in time domain
> in pitch synchronous way. Overlap and Add (OLA) method is
> used.
> "
>
> If I give a resume of the above statement, it reads:
>
> a) First we stretch/shrink units to make sure they have the "correct"
> duration
> b) Then TDPSOLA is used for concatenating the units.
>
> But b) would contradict what you said about PSOLA. You said it was used
> for cosmetics AFTER concatention.
>
> Jerome
>
>
>  > Date: Wed, 3 Oct 2012 17:21:43 +0100
>  > From: ingmar.steiner at ucd.ie
>  > To: jerome.perri at hotmail.com
>  > CC: mary-users at dfki.de
>  > Subject: Re: [mary-users] [mary-dev] Labelling gaps
>  >
>  > Dear Jerome,
>  >
>  > On 03/10/2012 17:04, Jerome Perri wrote:
>  > > Dear Ingmar,
>  > >
>  > > is PSOLA not used to provide a smoother joining between units during
>  > > concatenation?
>  >
>  > No.
>  >
>  > >
>  > > You talk about it as if it was only for cosmetics in the end.
>  >
>  > Yes, and even then only if explicitly requested.
>  >
>  > >
>  > > I thought it was used to compensate for F0 jumps or for unfitting
>  > > durations during unit concatenation, not to force a great prosody.
>  >
>  > With ideal voice data, the unit-selection algorithm will be able to find
>  > the perfect units. No modification needed. =)
>  >
>  > Best wishes,
>  >
>  > -Ingmar
>  >
>  > > Sorry for the newbie question, this time for real.
>  > >
>  > > Greetings,
>  > > Jerome
>  > >
>  > >
>  > > > Date: Wed, 3 Oct 2012 16:12:44 +0100
>  > > > From: ingmar.steiner at ucd.ie
>  > > > To: jerome.perri at hotmail.com
>  > > > CC: mary-users at dfki.de
>  > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
>  > > >
>  > > > Dear Jerome,
>  > > >
>  > > > I suspect you may be conflating two distinct concepts here.
>  > > >
>  > > > 1) diphone concatenation with overlap: the units are joined using
> one or
>  > > > more frames which overlap in the output. This can reduce
> discontinuities
>  > > > at unit joins, and in Mary, this is done pitch-synchronously.
>  > > >
>  > > > 2) prosody modification with PSOLA: a source signal is modified by
>  > > > adding or subtracting pitch periods, and compressing or expanding
> them,
>  > > > to match a target prosody (this is a really simplified description!).
>  > > > This can be done in Mary as an optional processing step after
>  > > > unit-selection, but this degrades the quality of the signal and
> should
>  > > > be considered an experimental feature.
>  > > >
>  > > > Note that in your example, the durations of the pitchmarks do not
> match
>  > > > the pitch-synchronous processing in Mary (e.g., a unit with 200Hz f0
>  > > > would have pitchmarks exactly 5ms apart).
>  > > >
>  > > > Best wishes,
>  > > >
>  > > > -Ingmar
>  > > >
>  > > > On 03/10/2012 15:44, Jerome Perri wrote:
>  > > > > Thank you.
>  > > > >
>  > > > > May I ask a real newbie question here anyway before I dig into this
>  > > > > topic in the praat ng?
>  > > > >
>  > > > > At first I thought that TD-PSOLA would work the following way:
>  > > > > I know (via a model) which duration and pitch I need for 2
>  > > diphones, and
>  > > > > I realise this through TD-PSOLA.
>  > > > >
>  > > > > But now I think that TD-PSOLA works like this ->
>  > > > >
>  > > > > I have 2 discontinguous diphone units and I want to join them.
>  > > > > Let's say Mary/Praat found 3 pitchmarks for the right half of
> diphone A
>  > > > > and 3 pitchmarks for the left half of diphone B.
>  > > > > TD-Psola will put the audio bytes of the 3 pitchmarks over each
> other
>  > > > > and will manipulate the duration and pitch of all audio bytes
> in such a
>  > > > > way that both will be changed to the average of both.
>  > > > >
>  > > > > For example:
>  > > > >
>  > > > > t_R + a:_L a:_R + b_L (from file 1)
>  > > > > a:_R + e_L e_R + k_R (from file 2)
>  > > > >
>  > > > > I want to have "t_R + a:_L a:_R + e_L e_R + k_R"
>  > > > >
>  > > > > The joint will be between " a:_R + b_L" and "a:_R + e_L"
>  > > > > Let 's say "a:_R" from file 1 has the pitchmarks
>  > > > > 1) f0 = 200, duration: 10 ms
>  > > > > 2) f0 = 210, duration: 5 ms
>  > > > > 3) f0 = 220, duration 20 ms
>  > > > >
>  > > > > Let 's say "a:_R" from file 1 has the pitchmarks
>  > > > > 1) f0 = 150, duration: 5 ms
>  > > > > 2) f0 = 160, duration: 15 ms
>  > > > > 3) f0 = 170, duration 10 ms
>  > > > >
>  > > > > TDPSOLA will modify the pitchmarks in the following way:
>  > > > >
>  > > > > 1) New f0 for both = (200+150)/2 = 175, new duration for both
> =(10+5)/2
>  > > > > = 7.5 ms
>  > > > > 2) New f0 for both = (210+160)/2 = 185, new duration for both
> =(5+15)/2
>  > > > > = 10 ms
>  > > > > 2) New f0 for both = (220+170)/2 = 195, new duration for both
>  > > =(20+10)/2
>  > > > > = 30 ms
>  > > > >
>  > > > > Is this basically the way it works?
>  > > > > A reply would help me such much!!!!!
>  > > > >
>  > > > > Thank you!
>  > > > >
>  > > > > Jerome
>  > > > >
>  > > > > > Date: Wed, 3 Oct 2012 11:15:08 +0100
>  > > > > > From: ingmar.steiner at ucd.ie
>  > > > > > To: jerome.perri at hotmail.com
>  > > > > > CC: mary-users at dfki.de
>  > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
>  > > > > >
>  > > > > > Dear Jerome,
>  > > > > >
>  > > > > > Mary contains an FD implementation of PSOLA, not TD. For what
> it's
>  > > > > > worth, Praat features a TD-PSOLA implementation, and the Praat
>  > > user list
>  > > > > > (http://groups.yahoo.com/group/praat-users) might provide some
>  > > > > > accessible insight on its use. And don't worry, there are a
> lot of
>  > > > > > "newbies" on that list. =)
>  > > > > >
>  > > > > > Best wishes,
>  > > > > >
>  > > > > > -Ingmar
>  > > > > >
>  > > > > > On 03/10/2012 07:58, Jerome Perri wrote:
>  > > > > > > Thank you for confirming this!
>  > > > > > >
>  > > > > > > I have another question, please:
>  > > > > > >
>  > > > > > > Can anyone tell me a good place for discussing TDPSOLA?
>  > > > > > > I would very much like to experiment with it, but the
> example in
>  > > > > Mary is
>  > > > > > > - as I understood it - just a starting point.
>  > > > > > >
>  > > > > > > I would like to be able to ask real newbie questions
> without being
>  > > > > > > punished or ignored because my questions are just too newbie.
>  > > > > > >
>  > > > > > > Thank you for any hints.
>  > > > > > >
>  > > > > > > Jerome
>  > > > > > >
>  > > > > > > > Date: Tue, 2 Oct 2012 15:45:55 +0100
>  > > > > > > > From: ingmar.steiner at ucd.ie
>  > > > > > > > To: jerome.perri at hotmail.com
>  > > > > > > > CC: bizpole at hotmail.ca; mary-users at dfki.de
>  > > > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
>  > > > > > > >
>  > > > > > > > Dear Jerome and Asif,
>  > > > > > > >
>  > > > > > > > the "pause" under discussion does indeed occur between
>  > > syllables, but
>  > > > > > > > not all syllables exhibit this. In fact, what you observe
> is the
>  > > > > > > > occlusion phase of the [t], characteristic for all plosive
>  > > > > phonemes (or
>  > > > > > > > "stops"), during which intraoral pressure builds up. It is
>  > > > > followed by
>  > > > > > > > the burst (or release) phase, and the two phases comprise
> the [t]
>  > > > > as it
>  > > > > > > > is spoken in e.g., English.
>  > > > > > > >
>  > > > > > > > Your conclusion to label the occlusion phase as part of the
>  > > [t] is
>  > > > > > > correct.
>  > > > > > > >
>  > > > > > > > Best wishes,
>  > > > > > > >
>  > > > > > > > -Ingmar
>  > > > > > > >
>  > > > > > > > On 01/10/2012 11:44, Jerome Perri wrote:
>  > > > > > > > > Hi Asif,
>  > > > > > > > >
>  > > > > > > > > that is a very good explanation, I agree with it.
>  > > > > > > > > Thank you.
>  > > > > > > > >
>  > > > > > > > > Jerome
>  > > > > > > > >
>  > > > > > > > >
>  > > > > > >
>  > > > >
>  > >
> ------------------------------------------------------------------------
>  > > > > > > > > From: bizpole at hotmail.ca
>  > > > > > > > > To: jerome.perri at hotmail.com; ingmar.steiner at ucd.ie;
>  > > > > mary-dev at dfki.de;
>  > > > > > > > > mary-users at dfki.de
>  > > > > > > > > Subject: Re: [mary-users] [mary-dev] Labelling gaps
>  > > > > > > > > Date: Sun, 30 Sep 2012 09:22:39 -0400
>  > > > > > > > >
>  > > > > > > > > Hello Jerome,
>  > > > > > > > > I am not expert but I will try to explain it by logic.
>  > > > > > > > > I will assume it as a syllable pause.(Phonetic Pause)
>  > > > > > > > > The gap or pause between _HO_ and _TEL_ is due to syllables
>  > > of this
>  > > > > > > > > word. _HO_ is 1st block and _TEL_ is 2nd block of this
>  > > > > disyllable word,
>  > > > > > > > > _HO_ is said completely 1st and as one block while a
> pause came
>  > > > > before
>  > > > > > > > > _TEL_ to make a complete word, _TEL_ caused a gap
> between both
>  > > > > blocks,
>  > > > > > > > > so in my opinion gap/pause should be labeled
>  > > > > > > > > as a part of T.
>  > > > > > > > > Thanks
>  > > > > > > > > Asif Mir
>  > > > > > > > > *From:* Jerome Perri <mailto:jerome.perri at hotmail.com>
>  > > > > > > > > *Sent:* Friday, September 28, 2012 6:49 AM
>  > > > > > > > > *To:* ingmar.steiner at ucd.ie
> <mailto:ingmar.steiner at ucd.ie> ;
>  > > > > > > > > mary-dev at dfki.de <mailto:mary-dev at dfki.de> ;
> mary-users at dfki.de
>  > > > > > > > > <mailto:mary-users at dfki.de>
>  > > > > > > > > *Subject:* [mary-users] [mary-dev] Labelling gaps
>  > > > > > > > >
>  > > > > > > > > Hello!
>  > > > > > > > >
>  > > > > > > > > I would like to ask what the rules for labelling are
> for cases
>  > > > > where
>  > > > > > > > > there is a gap.
>  > > > > > > > >
>  > > > > > > > > For example in a word like "hotel":
>  > > > > > > > > It is likely that the speaker made a small pause betwee the
>  > > "o" and
>  > > > > > > the "t".
>  > > > > > > > > Should the gap/pause be labelled as a part of the "o" or as
>  > > a part
>  > > > > > > of "t"?
>  > > > > > > > >
>  > > > > > > > > Thank you!
>  > > > > > > > >
>  > > > > > > > > Jerome
>  > > > > > > > >
>  > > > > > > > >
>  > > > > > >
>  > > > >
>  > >
> ------------------------------------------------------------------------
>  > > > > > > > >
>  > > > > > > > > _______________________________________________
>  > > > > > > > > Mary-users mailing list
>  > > > > > > > > Mary-users at dfki.de <mailto:Mary-users at dfki.de>
>  > > > > > > > > http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>  > > > > > > >
>  > > > > > > > --
>  > > > > > > > Ingmar Steiner
>  > > > > > > > Postdoctoral Research Fellow
>  > > > > > > > Centre for Next Generation Localisation
>  > > > > > > >
>  > > > > > > > Multilingual Ubiquitous Speech Technology (MUSTER)
>  > > > > > > > Computer Science and Informatics
>  > > > > > > > University College Dublin
>  > > > > > > >
>  > > > > > > > Speech Communication Laboratory
>  > > > > > > > Centre for Language and Communication Studies
>  > > > > > > > Trinity College Dublin
>  > > > > >
>  > > > > > --
>  > > > > > Ingmar Steiner
>  > > > > > Postdoctoral Research Fellow
>  > > > > > Centre for Next Generation Localisation
>  > > > > >
>  > > > > > Multilingual Ubiquitous Speech Technology (MUSTER)
>  > > > > > Computer Science and Informatics
>  > > > > > University College Dublin
>  > > > > >
>  > > > > > Speech Communication Laboratory
>  > > > > > Centre for Language and Communication Studies
>  > > > > > Trinity College Dublin
>  > > >
>  > > > --
>  > > > Ingmar Steiner
>  > > > Postdoctoral Research Fellow
>  > > > Centre for Next Generation Localisation
>  > > >
>  > > > Multilingual Ubiquitous Speech Technology (MUSTER)
>  > > > Computer Science and Informatics
>  > > > University College Dublin
>  > > >
>  > > > Speech Communication Laboratory
>  > > > Centre for Language and Communication Studies
>  > > > Trinity College Dublin
>  >
>  > --
>  > Ingmar Steiner
>  > Postdoctoral Research Fellow
>  > Centre for Next Generation Localisation
>  >
>  > Multilingual Ubiquitous Speech Technology (MUSTER)
>  > Computer Science and Informatics
>  > University College Dublin
>  >
>  > Speech Communication Laboratory
>  > Centre for Language and Communication Studies
>  > Trinity College Dublin

-- 
Ingmar Steiner
Postdoctoral Research Fellow
Centre for Next Generation Localisation

Multilingual Ubiquitous Speech Technology (MUSTER)
Computer Science and Informatics
University College Dublin

Speech Communication Laboratory
Centre for Language and Communication Studies
Trinity College Dublin