Bug 267925 - laog data structure quadratic for a single sequence of lock
Summary: laog data structure quadratic for a single sequence of lock
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: helgrind (show other bugs)
Version: 3.7 SVN
Platform: Unlisted Binaries Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-08 00:18 UTC by Philippe Waroquiers
Modified: 2011-10-22 19:35 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
test program reproducing quadratic behaviour (1.75 KB, application/octet-stream)
2011-03-08 00:18 UTC, Philippe Waroquiers
Details
implement garbage collection of the laog data structure (24.03 KB, text/plain)
2011-05-15 18:12 UTC, Philippe Waroquiers
Details
(updated to rev 11860 implement garbage collection of the laog data structure (24.14 KB, text/plain)
2011-07-07 01:03 UTC, Philippe Waroquiers
Details
laog garbage collect: upgraded to 11969, added assert on index (24.33 KB, text/plain)
2011-08-13 07:03 UTC, Philippe Waroquiers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Waroquiers 2011-03-08 00:18:39 UTC
Created attachment 57758 [details]
test program reproducing quadratic behaviour

Version:           3.7 SVN
OS:                Linux

In the program I will attach, there is only one single sequence in which locks are taken, but still
we obtain a huge nr of locksets in univ_laog. There is some quadratic
aspect in what should be a "simple/easy" case.
For a single "sequence" of locks, I was expecting a number of locksets
that would be O(nr_of_locks) but we obtain O(nr_of_locks * nr_of_locks).
I have not understood where this quadratic aspect is coming from.

For what concerns the deletion of locks:
For a certain total nr of locks, the nr of entries in univ_laog
changes if the lock deletion is done in a different order.
There is an additional quadratic aspect in laog__handle_one_lock_deletion().
However, I am quite sure this is *not* the problem in the real application,
as I have commented the laog__handle_one_lock_deletion code and that
does not solve the problem of the real application (which crashes
once the laog data structure reaches (almost) 4Gb).

With the above program, if you use a big number (e.g. many = 10000),
then at least my (powerful) workstation was almost dead :).

For what concerns the real application: I believe a library (we do not
have the sources) maintains an in-memory database of records each
protected by a lock. A "table scan locking all records" behaves like
the above.  Note that I am not completely sure to have reproduced the
exact behaviour of the (real) application with this program.


for level in 1 2 3 4
do
   for many in 1 2 3 4 5 6 7 8 9 10 20
   do
     valgrind --tool=helgrind --profile-heap=yes  /cm/local_build/weekly/valgrind/t2t $many $level  2>&1 | grep -e univ_laog -e many
   done
done

many 1 level 1  total_locks: 1
        1,992 in         5: hg.ids.5 (univ_laog)
many 2 level 1  total_locks: 2
        2,096 in        11: hg.ids.5 (univ_laog)
many 3 level 1  total_locks: 3
        2,256 in        20: hg.ids.5 (univ_laog)
many 4 level 1  total_locks: 4
        2,504 in        32: hg.ids.5 (univ_laog)
many 5 level 1  total_locks: 5
        2,768 in        47: hg.ids.5 (univ_laog)
many 6 level 1  total_locks: 6
        3,168 in        65: hg.ids.5 (univ_laog)
many 7 level 1  total_locks: 7
        3,568 in        86: hg.ids.5 (univ_laog)
many 8 level 1  total_locks: 8
        4,176 in       110: hg.ids.5 (univ_laog)
many 9 level 1  total_locks: 9
        4,728 in       137: hg.ids.5 (univ_laog)
many 10 level 1  total_locks: 10
        5,368 in       167: hg.ids.5 (univ_laog)
many 20 level 1  total_locks: 20
       17,896 in       632: hg.ids.5 (univ_laog)
many 1 level 2  total_locks: 4
        2,552 in        35: hg.ids.5 (univ_laog)
many 2 level 2  total_locks: 6
        3,440 in        80: hg.ids.5 (univ_laog)
many 3 level 2  total_locks: 8
        4,824 in       143: hg.ids.5 (univ_laog)
many 4 level 2  total_locks: 10
        6,856 in       224: hg.ids.5 (univ_laog)
many 5 level 2  total_locks: 12
        9,136 in       323: hg.ids.5 (univ_laog)
many 6 level 2  total_locks: 14
       12,560 in       440: hg.ids.5 (univ_laog)
many 7 level 2  total_locks: 16
       16,128 in       575: hg.ids.5 (univ_laog)
many 8 level 2  total_locks: 18
       20,400 in       728: hg.ids.5 (univ_laog)
many 9 level 2  total_locks: 20
       26,456 in       899: hg.ids.5 (univ_laog)
many 10 level 2  total_locks: 22
       32,304 in     1,088: hg.ids.5 (univ_laog)
many 20 level 2  total_locks: 42
      154,672 in     3,968: hg.ids.5 (univ_laog)
many 1 level 3  total_locks: 6
        3,392 in        77: hg.ids.5 (univ_laog)
many 2 level 3  total_locks: 9
        5,696 in       185: hg.ids.5 (univ_laog)
many 3 level 3  total_locks: 12
        9,352 in       338: hg.ids.5 (univ_laog)
many 4 level 3  total_locks: 15
       14,792 in       536: hg.ids.5 (univ_laog)
many 5 level 3  total_locks: 18
       22,384 in       779: hg.ids.5 (univ_laog)
many 6 level 3  total_locks: 21
       30,848 in     1,067: hg.ids.5 (univ_laog)
many 7 level 3  total_locks: 24
       41,360 in     1,400: hg.ids.5 (univ_laog)
many 8 level 3  total_locks: 27
       56,296 in     1,778: hg.ids.5 (univ_laog)
many 9 level 3  total_locks: 30
       71,528 in     2,201: hg.ids.5 (univ_laog)
many 10 level 3  total_locks: 33
       89,496 in     2,669: hg.ids.5 (univ_laog)
many 20 level 3  total_locks: 63
      473,896 in     9,824: hg.ids.5 (univ_laog)
many 1 level 4  total_locks: 8
        4,672 in       137: hg.ids.5 (univ_laog)
many 2 level 4  total_locks: 12
        9,272 in       335: hg.ids.5 (univ_laog)
many 3 level 4  total_locks: 16
       16,784 in       617: hg.ids.5 (univ_laog)
many 4 level 4  total_locks: 20
       28,008 in       983: hg.ids.5 (univ_laog)
many 5 level 4  total_locks: 24
       41,832 in     1,433: hg.ids.5 (univ_laog)
many 6 level 4  total_locks: 28
       61,864 in     1,967: hg.ids.5 (univ_laog)
many 7 level 4  total_locks: 32
       84,520 in     2,585: hg.ids.5 (univ_laog)
many 8 level 4  total_locks: 36
      116,512 in     3,287: hg.ids.5 (univ_laog)
many 9 level 4  total_locks: 40
      149,936 in     4,073: hg.ids.5 (univ_laog)
many 10 level 4  total_locks: 44
      189,544 in     4,943: hg.ids.5 (univ_laog)
many 20 level 4  total_locks: 84
    1,056,992 in    18,263: hg.ids.5 (univ_laog)



Reproducible: Always

Steps to Reproduce:
run the attached program e.g. with different values showing
the quadratic aspect e.g.
valgrind --tool=helgrind t2t 10 1
                         t2t 20 1
                         t2t 30 1


Actual Results:  
Many more entries and memeory in univ_laog than expected.

Expected Results:  
Less entries/memory (not quadratic)
Comment 1 Philippe Waroquiers 2011-05-15 18:12:29 UTC
Created attachment 60035 [details]
implement garbage collection of the laog data structure

With the attached patch, at work, helgrind can succesfully
run a big application using many locks.
Without the patch, memory is reaching many GBs very rapidly, and the
crash out of memory.

* hg_main.c : add static void univ_laog_do_GC ( void )
              add next_gc_univ_laog variable telling when to do the next GC
              laog__add_edge : check next_gc_univ_laog and call univ_laog_do_GC
              laog__del_edge : remove src,dst exposition if present
                               check next_gc_univ_laog and call univ_laog_do_GC
              UWordV_dup : new function duplicating a UWordV
              laog__handle_one_lock_deletion : use UWordV_dup to duplicate
                 succ_words and preds_words, to allow proper GC.
                 cleaned up laog links of the deleted lock

* hg_wordset.h: added HG_(dieWS) to indicate a ws can be cleaned up
* hg_wordset.c: implemented HG_(dieWS)
                added some debug trace + all debug trace conditionnaly called
                using defined symbol HG_DEBUG
* helgrind/tests/t2t.c & t2t_laog.vgtest: new test
Comment 2 Philippe Waroquiers 2011-07-07 01:03:02 UTC
Created attachment 61660 [details]
(updated to rev 11860 implement garbage collection of the laog data structure

implement garbage collection of the laog data structure.


With the attached patch, at work, helgrind can succesfully
run a big application using many locks.
Without the patch, memory is reaching many GBs very rapidly, and the
crash out of memory.

* hg_main.c : add static void univ_laog_do_GC ( void )
              add next_gc_univ_laog variable telling when to do the next GC
              laog__add_edge : check next_gc_univ_laog and call univ_laog_do_GC
              laog__del_edge : remove src,dst exposition if present
                               check next_gc_univ_laog and call univ_laog_do_GC
              UWordV_dup : new function duplicating a UWordV
              laog__handle_one_lock_deletion : use UWordV_dup to duplicate
                 succ_words and preds_words, to allow proper GC.
                 cleaned up laog links of the deleted lock

* hg_wordset.h: added HG_(dieWS) to indicate a ws can be cleaned up
* hg_wordset.c: implemented HG_(dieWS)
                added some debug trace + all debug trace conditionnaly called
                using defined symbol HG_DEBUG
* helgrind/tests/t2t.c & t2t_laog.vgtest: new test
Comment 3 Julian Seward 2011-08-10 09:52:46 UTC
(In reply to comment #2)
> Created an attachment (id=61660) [details]
> (updated to rev 11860 implement garbage collection of the laog data structure

Nice patch, generally no problem.  One small comment:

  +static void univ_laog_do_GC ( void ) {
  ...
  +   while (VG_(nextIterFM)( laog, NULL, (UWord*)&links )) {
  +      tl_assert(links);
  +      univ_laog_seen[links->inns] = True;
  +      univ_laog_seen[links->outs] = True;
  +      links = NULL;
  +   }

  For the two writes to univ_laog_seen[], can we pls have an assertion
  that the index values are in range
  ( >= 0, <= HG_(cardinalityWSU)(univ_laog))

My only other comment is related to the potential dangling reference
problem created by HG_(dieWS), as you also noted.  It unfortunately
breaks the previous nice property that all WordSetIDs were valid
forever.

I see that you added some assertions against this where you could.
I can't think of any solution that doesn't involve a lot of extra
complexity (reference counted WordSetIDs?  sigh)

Maybe I shouldn't worry about this.  Or maybe there is some way to
delay longer, on average, before a dead WordSetID is re-used, so as
to increase the chance that one of your assertions detects it?
If not, let's just commit it and I'll stop worrying.
Comment 4 Philippe Waroquiers 2011-08-10 21:52:36 UTC
(In reply to comment #3)

>   For the two writes to univ_laog_seen[], can we pls have an assertion
>   that the index values are in range
>   ( >= 0, <= HG_(cardinalityWSU)(univ_laog))
Will do.

> 
> My only other comment is related to the potential dangling reference
> problem created by HG_(dieWS), as you also noted.  It unfortunately
> breaks the previous nice property that all WordSetIDs were valid
> forever.
> 
> I see that you added some assertions against this where you could.
> I can't think of any solution that doesn't involve a lot of extra
> complexity (reference counted WordSetIDs?  sigh)
Effectively, Ref counting or similar will have quite an impact
on the code and its complexity.

> 
> Maybe I shouldn't worry about this.  Or maybe there is some way to
> delay longer, on average, before a dead WordSetID is re-used, so as
> to increase the chance that one of your assertions detects it?
> If not, let's just commit it and I'll stop worrying.

Losing the 'valid forever' and have not a lot of chance to discover
that an invalid (i.e. re-cycled) ws is used effectively annoying.
WE could first allocate the ws till the next gc limit before starting
to recycle. But currently, the increase of the next gc limit
(i.e. the newly allocatable ws) is just 1 more than the previous reached
max => this would bring almost no additional protection.
Changing the logic to change the next gc is easy, but very easily
re-introduces huge memory usage.

One possible solution is to use a part of the bit range of a ws
to maintain a generation number.
E.g. let's assume the upper 8 bits are reserved for the ws generation.

A wsu would then maintain in parallel with ix2vec an array of generation
(i.e. each ws has its generation).
Each time a ws is marked as dead, its generation is incremented.
After 256 generation, we would go back to generation 0.
When returning a wordset outside of hg_wordset.c, the upper bits
of the ws will be set to its generation.

Before using a ws (I believe this will be only inside do_ix2vec),
the generation of the ws is compared with its expected generation
=> assert if generations are different.
If generations are equal, then the indexing can be done after masking
the generation bits.

I believe this would be relatively simple to implement.
In terms of memory, this is reasonable (an array of bytes,
same nr of elements as ix2vec).
In terms of cpu: as I believe the "cache" lookups do not need to check
the generation (they can/should cache the results with the generations),
the cpu should not be too much influenced by the needed generation check
and masking (should be only in do_ix2vec IIUC).
The level of protection against re-using an old (dangling but recycled)
ws depends on the nr of bits used for the generation.

I see two possible problems:
1. the max nr of wordset in a wsu will be reduced from 2^32 to 2^(32-GENBITS)
   Taking too many GENBITS will imply a low max on nr of wordset.
   Taking too few  GENBITS will imply a low level of protection.
   => not clear how many bits we should/could take.
2. hg_wordset.h indicates that a ws is a 'opaque, small int index'.
   If a client of hg_wordset.h has used the knowledge that this is a
   'small int' to index another of its own array, then this kind of usage
   is broken. I do not believe there is such a nasty usage but I have not
   scanned the whole helgrind code.
Comment 5 Julian Seward 2011-08-12 07:59:22 UTC
(In reply to comment #4)
> One possible solution is to use a part of the bit range of a ws
> to maintain a generation number.
> [...]
> I see two possible problems:
> [...]

Yes .. good idea .. we could do that, and it would give more
protection.  I don't think either of the possible problems would be a
problem in practice.

But .. too much effort and complexity for this one case, IMO.  Let's
do the patch as-is.  Can you prepare a final version?
Comment 6 Philippe Waroquiers 2011-08-13 07:03:26 UTC
Created attachment 62792 [details]
laog garbage collect: upgraded to 11969, added assert on index

(In reply to comment #5)
> But .. too much effort and complexity for this one case, IMO.  Let's
> do the patch as-is.  Can you prepare a final version?
Attached new version, upgraded to revision 11969.
Added the assertions:
+      tl_assert(links->inns >= 0 && links->inns < univ_laog_cardinality);
...
+      tl_assert(links->outs >= 0 && links->outs < univ_laog_cardinality);
Comment 7 Julian Seward 2011-10-22 19:35:11 UTC
(In reply to comment #6)
> Created an attachment (id=62792) [details]
> laog garbage collect: upgraded to 11969, added assert on index

Committed verbatim; r12201.  Thanks Philippe!