Created attachment 57758 [details] test program reproducing quadratic behaviour Version: 3.7 SVN OS: Linux In the program I will attach, there is only one single sequence in which locks are taken, but still we obtain a huge nr of locksets in univ_laog. There is some quadratic aspect in what should be a "simple/easy" case. For a single "sequence" of locks, I was expecting a number of locksets that would be O(nr_of_locks) but we obtain O(nr_of_locks * nr_of_locks). I have not understood where this quadratic aspect is coming from. For what concerns the deletion of locks: For a certain total nr of locks, the nr of entries in univ_laog changes if the lock deletion is done in a different order. There is an additional quadratic aspect in laog__handle_one_lock_deletion(). However, I am quite sure this is *not* the problem in the real application, as I have commented the laog__handle_one_lock_deletion code and that does not solve the problem of the real application (which crashes once the laog data structure reaches (almost) 4Gb). With the above program, if you use a big number (e.g. many = 10000), then at least my (powerful) workstation was almost dead :). For what concerns the real application: I believe a library (we do not have the sources) maintains an in-memory database of records each protected by a lock. A "table scan locking all records" behaves like the above. Note that I am not completely sure to have reproduced the exact behaviour of the (real) application with this program. for level in 1 2 3 4 do for many in 1 2 3 4 5 6 7 8 9 10 20 do valgrind --tool=helgrind --profile-heap=yes /cm/local_build/weekly/valgrind/t2t $many $level 2>&1 | grep -e univ_laog -e many done done many 1 level 1 total_locks: 1 1,992 in 5: hg.ids.5 (univ_laog) many 2 level 1 total_locks: 2 2,096 in 11: hg.ids.5 (univ_laog) many 3 level 1 total_locks: 3 2,256 in 20: hg.ids.5 (univ_laog) many 4 level 1 total_locks: 4 2,504 in 32: hg.ids.5 (univ_laog) many 5 level 1 total_locks: 5 2,768 in 47: hg.ids.5 (univ_laog) many 6 level 1 total_locks: 6 3,168 in 65: hg.ids.5 (univ_laog) many 7 level 1 total_locks: 7 3,568 in 86: hg.ids.5 (univ_laog) many 8 level 1 total_locks: 8 4,176 in 110: hg.ids.5 (univ_laog) many 9 level 1 total_locks: 9 4,728 in 137: hg.ids.5 (univ_laog) many 10 level 1 total_locks: 10 5,368 in 167: hg.ids.5 (univ_laog) many 20 level 1 total_locks: 20 17,896 in 632: hg.ids.5 (univ_laog) many 1 level 2 total_locks: 4 2,552 in 35: hg.ids.5 (univ_laog) many 2 level 2 total_locks: 6 3,440 in 80: hg.ids.5 (univ_laog) many 3 level 2 total_locks: 8 4,824 in 143: hg.ids.5 (univ_laog) many 4 level 2 total_locks: 10 6,856 in 224: hg.ids.5 (univ_laog) many 5 level 2 total_locks: 12 9,136 in 323: hg.ids.5 (univ_laog) many 6 level 2 total_locks: 14 12,560 in 440: hg.ids.5 (univ_laog) many 7 level 2 total_locks: 16 16,128 in 575: hg.ids.5 (univ_laog) many 8 level 2 total_locks: 18 20,400 in 728: hg.ids.5 (univ_laog) many 9 level 2 total_locks: 20 26,456 in 899: hg.ids.5 (univ_laog) many 10 level 2 total_locks: 22 32,304 in 1,088: hg.ids.5 (univ_laog) many 20 level 2 total_locks: 42 154,672 in 3,968: hg.ids.5 (univ_laog) many 1 level 3 total_locks: 6 3,392 in 77: hg.ids.5 (univ_laog) many 2 level 3 total_locks: 9 5,696 in 185: hg.ids.5 (univ_laog) many 3 level 3 total_locks: 12 9,352 in 338: hg.ids.5 (univ_laog) many 4 level 3 total_locks: 15 14,792 in 536: hg.ids.5 (univ_laog) many 5 level 3 total_locks: 18 22,384 in 779: hg.ids.5 (univ_laog) many 6 level 3 total_locks: 21 30,848 in 1,067: hg.ids.5 (univ_laog) many 7 level 3 total_locks: 24 41,360 in 1,400: hg.ids.5 (univ_laog) many 8 level 3 total_locks: 27 56,296 in 1,778: hg.ids.5 (univ_laog) many 9 level 3 total_locks: 30 71,528 in 2,201: hg.ids.5 (univ_laog) many 10 level 3 total_locks: 33 89,496 in 2,669: hg.ids.5 (univ_laog) many 20 level 3 total_locks: 63 473,896 in 9,824: hg.ids.5 (univ_laog) many 1 level 4 total_locks: 8 4,672 in 137: hg.ids.5 (univ_laog) many 2 level 4 total_locks: 12 9,272 in 335: hg.ids.5 (univ_laog) many 3 level 4 total_locks: 16 16,784 in 617: hg.ids.5 (univ_laog) many 4 level 4 total_locks: 20 28,008 in 983: hg.ids.5 (univ_laog) many 5 level 4 total_locks: 24 41,832 in 1,433: hg.ids.5 (univ_laog) many 6 level 4 total_locks: 28 61,864 in 1,967: hg.ids.5 (univ_laog) many 7 level 4 total_locks: 32 84,520 in 2,585: hg.ids.5 (univ_laog) many 8 level 4 total_locks: 36 116,512 in 3,287: hg.ids.5 (univ_laog) many 9 level 4 total_locks: 40 149,936 in 4,073: hg.ids.5 (univ_laog) many 10 level 4 total_locks: 44 189,544 in 4,943: hg.ids.5 (univ_laog) many 20 level 4 total_locks: 84 1,056,992 in 18,263: hg.ids.5 (univ_laog) Reproducible: Always Steps to Reproduce: run the attached program e.g. with different values showing the quadratic aspect e.g. valgrind --tool=helgrind t2t 10 1 t2t 20 1 t2t 30 1 Actual Results: Many more entries and memeory in univ_laog than expected. Expected Results: Less entries/memory (not quadratic)
Created attachment 60035 [details] implement garbage collection of the laog data structure With the attached patch, at work, helgrind can succesfully run a big application using many locks. Without the patch, memory is reaching many GBs very rapidly, and the crash out of memory. * hg_main.c : add static void univ_laog_do_GC ( void ) add next_gc_univ_laog variable telling when to do the next GC laog__add_edge : check next_gc_univ_laog and call univ_laog_do_GC laog__del_edge : remove src,dst exposition if present check next_gc_univ_laog and call univ_laog_do_GC UWordV_dup : new function duplicating a UWordV laog__handle_one_lock_deletion : use UWordV_dup to duplicate succ_words and preds_words, to allow proper GC. cleaned up laog links of the deleted lock * hg_wordset.h: added HG_(dieWS) to indicate a ws can be cleaned up * hg_wordset.c: implemented HG_(dieWS) added some debug trace + all debug trace conditionnaly called using defined symbol HG_DEBUG * helgrind/tests/t2t.c & t2t_laog.vgtest: new test
Created attachment 61660 [details] (updated to rev 11860 implement garbage collection of the laog data structure implement garbage collection of the laog data structure. With the attached patch, at work, helgrind can succesfully run a big application using many locks. Without the patch, memory is reaching many GBs very rapidly, and the crash out of memory. * hg_main.c : add static void univ_laog_do_GC ( void ) add next_gc_univ_laog variable telling when to do the next GC laog__add_edge : check next_gc_univ_laog and call univ_laog_do_GC laog__del_edge : remove src,dst exposition if present check next_gc_univ_laog and call univ_laog_do_GC UWordV_dup : new function duplicating a UWordV laog__handle_one_lock_deletion : use UWordV_dup to duplicate succ_words and preds_words, to allow proper GC. cleaned up laog links of the deleted lock * hg_wordset.h: added HG_(dieWS) to indicate a ws can be cleaned up * hg_wordset.c: implemented HG_(dieWS) added some debug trace + all debug trace conditionnaly called using defined symbol HG_DEBUG * helgrind/tests/t2t.c & t2t_laog.vgtest: new test
(In reply to comment #2) > Created an attachment (id=61660) [details] > (updated to rev 11860 implement garbage collection of the laog data structure Nice patch, generally no problem. One small comment: +static void univ_laog_do_GC ( void ) { ... + while (VG_(nextIterFM)( laog, NULL, (UWord*)&links )) { + tl_assert(links); + univ_laog_seen[links->inns] = True; + univ_laog_seen[links->outs] = True; + links = NULL; + } For the two writes to univ_laog_seen[], can we pls have an assertion that the index values are in range ( >= 0, <= HG_(cardinalityWSU)(univ_laog)) My only other comment is related to the potential dangling reference problem created by HG_(dieWS), as you also noted. It unfortunately breaks the previous nice property that all WordSetIDs were valid forever. I see that you added some assertions against this where you could. I can't think of any solution that doesn't involve a lot of extra complexity (reference counted WordSetIDs? sigh) Maybe I shouldn't worry about this. Or maybe there is some way to delay longer, on average, before a dead WordSetID is re-used, so as to increase the chance that one of your assertions detects it? If not, let's just commit it and I'll stop worrying.
(In reply to comment #3) > For the two writes to univ_laog_seen[], can we pls have an assertion > that the index values are in range > ( >= 0, <= HG_(cardinalityWSU)(univ_laog)) Will do. > > My only other comment is related to the potential dangling reference > problem created by HG_(dieWS), as you also noted. It unfortunately > breaks the previous nice property that all WordSetIDs were valid > forever. > > I see that you added some assertions against this where you could. > I can't think of any solution that doesn't involve a lot of extra > complexity (reference counted WordSetIDs? sigh) Effectively, Ref counting or similar will have quite an impact on the code and its complexity. > > Maybe I shouldn't worry about this. Or maybe there is some way to > delay longer, on average, before a dead WordSetID is re-used, so as > to increase the chance that one of your assertions detects it? > If not, let's just commit it and I'll stop worrying. Losing the 'valid forever' and have not a lot of chance to discover that an invalid (i.e. re-cycled) ws is used effectively annoying. WE could first allocate the ws till the next gc limit before starting to recycle. But currently, the increase of the next gc limit (i.e. the newly allocatable ws) is just 1 more than the previous reached max => this would bring almost no additional protection. Changing the logic to change the next gc is easy, but very easily re-introduces huge memory usage. One possible solution is to use a part of the bit range of a ws to maintain a generation number. E.g. let's assume the upper 8 bits are reserved for the ws generation. A wsu would then maintain in parallel with ix2vec an array of generation (i.e. each ws has its generation). Each time a ws is marked as dead, its generation is incremented. After 256 generation, we would go back to generation 0. When returning a wordset outside of hg_wordset.c, the upper bits of the ws will be set to its generation. Before using a ws (I believe this will be only inside do_ix2vec), the generation of the ws is compared with its expected generation => assert if generations are different. If generations are equal, then the indexing can be done after masking the generation bits. I believe this would be relatively simple to implement. In terms of memory, this is reasonable (an array of bytes, same nr of elements as ix2vec). In terms of cpu: as I believe the "cache" lookups do not need to check the generation (they can/should cache the results with the generations), the cpu should not be too much influenced by the needed generation check and masking (should be only in do_ix2vec IIUC). The level of protection against re-using an old (dangling but recycled) ws depends on the nr of bits used for the generation. I see two possible problems: 1. the max nr of wordset in a wsu will be reduced from 2^32 to 2^(32-GENBITS) Taking too many GENBITS will imply a low max on nr of wordset. Taking too few GENBITS will imply a low level of protection. => not clear how many bits we should/could take. 2. hg_wordset.h indicates that a ws is a 'opaque, small int index'. If a client of hg_wordset.h has used the knowledge that this is a 'small int' to index another of its own array, then this kind of usage is broken. I do not believe there is such a nasty usage but I have not scanned the whole helgrind code.
(In reply to comment #4) > One possible solution is to use a part of the bit range of a ws > to maintain a generation number. > [...] > I see two possible problems: > [...] Yes .. good idea .. we could do that, and it would give more protection. I don't think either of the possible problems would be a problem in practice. But .. too much effort and complexity for this one case, IMO. Let's do the patch as-is. Can you prepare a final version?
Created attachment 62792 [details] laog garbage collect: upgraded to 11969, added assert on index (In reply to comment #5) > But .. too much effort and complexity for this one case, IMO. Let's > do the patch as-is. Can you prepare a final version? Attached new version, upgraded to revision 11969. Added the assertions: + tl_assert(links->inns >= 0 && links->inns < univ_laog_cardinality); ... + tl_assert(links->outs >= 0 && links->outs < univ_laog_cardinality);
(In reply to comment #6) > Created an attachment (id=62792) [details] > laog garbage collect: upgraded to 11969, added assert on index Committed verbatim; r12201. Thanks Philippe!