[Rock-dev] Questions Regarding Typelib

Mon Jun 9 16:47:03 CEST 2014

Am 09.06.2014 16:24, schrieb Sylvain Joyeux:
> On Mon, Jun 9, 2014 at 12:04 PM, Janosch Machowinski 
> <Janosch.Machowinski at dfki.de <mailto:Janosch.Machowinski at dfki.de>> wrote:
>
>     Am 08.06.2014 23:16, schrieb Sylvain Joyeux:
>
>         Oh ... I would also need the size of each stream's sample
>         (including the size of vectors if there are any) ...
>
>         About index loading: the way the index was marshalled needed
>         to be changed (but was not) after the change you made to
>         indexing (i.e. making indexes dense). A 3-line patch improves
>         performance quite a lot already. Alignment is already pretty
>         good on my test file (~4s).
>
>     It gets worse with the number of streams. Try a testcase with ~60
>     streams. There the performance really
>     drops, and this is the 'reality' test case...
>
> Created a dataset of one minute with 100 streams. Each stream is at 
> 100Hz, so that's 600k samples. It took 4.6 seconds to generate the 
> index and 0.8 seconds to load the file index (from warm cache, so with 
> probably little I/O overhead).
How long did the stream alignment take ? This is the part were usually 
the problem is, as you can't get better than
O((log n)*s) there, were n is the number of streams and s the amount of 
samples.
>
> C++ *is* faster. Of course it is. From what I see, not fast enough to 
> justify the refactoring that you are proposing.
Ohh yes, it does. Recently I did a log of localization debugging. You 
can't jump data in this case (and a lot of other usecases too) which 
means you have to replay the whole logstream. If the replay is double as 
fast, it means you need half
the time for debugging. So in my eyes it is 100% worth the effort.
>
> Would be a lot more interesting to find out why using Vizkit and log 
> control kills performance so much and how we could optimize the 
> typelib parts (which are C++ already !)
One step after another...
>
> Again, you are *not* giving the right measurements. Speed factors and 
> durations are meaningless if we don't know how many samples each 
> stream has, and how long each stream lasts. Just "it is 24x times 
> faster" means nothing.
You got the C++ implementation, just run multiIndexTester on your 
testdata and compare the results.
Greetings
     Janosch