Bug 351475

Summary:	Optimization of an SQL query
Product:	[Frameworks and Libraries] Akonadi	Reporter:	Marc Cousin <cousinmarc>
Component:	server	Assignee:	kdepim bugs <pim-bugs-null>
Status:	RESOLVED FIXED
Severity:	normal	CC:	cousinmarc, dvratil, winter
Priority:	NOR
Version First Reported In:	1.13.0
Target Milestone:	---
Platform:	Arch Linux
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description Marc Cousin 2015-08-19 10:07:59 UTC

I gave a look at the queries run by akonadi (on PostgreSQL, as it is the database with which I am most proficient), as I see a load that is quite heavy on my laptop (not that powerful, but very large number of emails).

I found out that 90-95% of the CPU time used on my database is running this query (thanks to pg_stat_statements):

SELECT count(DISTINCT PimItemTable.id) FROM PimItemTable INNER JOIN PimItemFlagRelation ON ( PimItemTable.id = PimItemFlagRelation.PimItem_id ) WHERE ( PimItemTable.collectionId = $1 AND ( PimItemFlagRelation.Flag_id = $2 OR PimItemFlagRelation.Flag_id = $3 ) );


Here is an example:

akonadi=# explain analyze SELECT count(DISTINCT PimItemTable.id) FROM PimItemTable INNER JOIN PimItemFlagRelation ON ( PimItemTable.id = PimItemFlagRelation.PimItem_id ) WHERE ( PimItemTable.collectionId = 54 AND ( PimItemFlagRelation.Flag_id = 1 OR PimItemFlagRelation.Flag_id = -1 ) )
;
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=11821.98..11821.99 rows=1 width=4) (actual time=92.960..92.960 rows=1 loops=1)
   ->  Hash Join  (cost=3862.72..11817.67 rows=1725 width=4) (actual time=73.820..92.745 rows=1719 loops=1)
         Hash Cond: (pimitemflagrelation.pimitem_id = pimitemtable.id)
         ->  Seq Scan on pimitemflagrelation  (cost=0.00..6717.29 rows=244082 width=8) (actual time=0.014..59.284 rows=243780 loops=1)
               Filter: ((flag_id = 1) OR (flag_id = (-1)))
               Rows Removed by Filter: 84707
         ->  Hash  (cost=3841.10..3841.10 rows=1730 width=4) (actual time=1.217..1.217 rows=1719 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 61kB
               ->  Bitmap Heap Scan on pimitemtable  (cost=69.83..3841.10 rows=1730 width=4) (actual time=0.254..0.813 rows=1719 loops=1)
                     Recheck Cond: (collectionid = 54)
                     Heap Blocks: exact=91
                     ->  Bitmap Index Scan on pimitemtable_collectionindex  (cost=0.00..69.39 rows=1730 width=0) (actual time=0.231..0.231 rows=1719 loops=1)
                           Index Cond: (collectionid = 54)
 Planning time: 0.758 ms
 Execution time: 93.024 ms


Notice that this takes 93ms. This query wants to count each PimItemTable record from the collection 54 (that's one of my IMAP folders) that has a flag to -1 or 1. As we are joining, the join may create several records, hence the DISTINCT.

I guess you already know that, sorry for the long introduction.

The thing is, what we want, is just to check if, for each record of  PimItemTable, there is a record in PimItemFlagRelation with a flag to -1 or 1 that matches. It's a semi-join, not a full-fledged join. It's less costly to perform, and doesn't create duplicates.

This can be written through an EXISTS

akonadi=# explain analyze SELECT count(PimItemTable.id) FROM PimItemTable where PimItemTable.collectionId = 54 and EXISTS (SELECT 1 FROM PimItemFlagRelation WHERE pimItemTable.id = PimItemFlagRelation.PimItem_id AND (PimItemFlagRelation.Flag_id = 1 OR PimItemFlagRelation.Flag_id = -1));
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=13935.92..13935.93 rows=1 width=4) (actual time=11.151..11.151 rows=1 loops=1)
   ->  Nested Loop Semi Join  (cost=70.25..13932.13 rows=1515 width=4) (actual time=0.406..10.689 rows=1719 loops=1)
         ->  Bitmap Heap Scan on pimitemtable  (cost=69.83..3841.10 rows=1730 width=4) (actual time=0.381..1.174 rows=1719 loops=1)
               Recheck Cond: (collectionid = 54)
               Heap Blocks: exact=91
               ->  Bitmap Index Scan on pimitemtable_collectionindex  (cost=0.00..69.39 rows=1730 width=0) (actual time=0.347..0.347 rows=1719 loops=1)
                     Index Cond: (collectionid = 54)
         ->  Index Only Scan using pimitemflagrelation_pkey on pimitemflagrelation  (cost=0.42..6.15 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=1719)
               Index Cond: (pimitem_id = pimitemtable.id)
               Filter: ((flag_id = 1) OR (flag_id = (-1)))
               Heap Fetches: 1719
 Planning time: 0.768 ms
 Execution time: 11.240 ms


Same result (1719). Much less work for the database (semi-join, no need for deduplication afterwards).

This query should work on the 3 databases engines, as EXISTS is supported everywhere.

Reproducible: Always

Steps to Reproduce:
1. Use akonadi
2. Profile the database

Comment 1 Marc Cousin 2015-08-19 14:36:07 UTC

I forgot to mention that this query alone used 1hour of cpu time this morning on my laptop…

Comment 2 Daniel Vrátil 2015-08-21 18:10:10 UTC

Hi,

this is a query to update collection statistics (i.e. number of read/unread emails in a folder). In Akonadi 1.13.0 (and before) we were calculating this expensive statistics all the time. Not just whenever something changed, but we calculated it again for each client that asked for it, which was can be over 10 clients at once.

I actually fixed this in the 1.13 branch after release but we never made 1.13.1 release with this fix. The fix is also included in current 15.08.0 release. We now have a centralized cache for the statistics, which is only updated when necessary and caches the results until next change, so we don't run the SQL query each time for every client.

The query now looks slightly different, in addition to the flag count it also calculates total size (as in bytes) of all items in the collection. Originally it were two queries, I wrote this ironically because I found that the JOIN is slow and wanted to avoid doing to JOINs. Maybe with your solution we could afford running two queries to get flags count and size while still being faster than this.

This is the current query we have:

SELECT count(DISTINCT PimItemTable.id), sum(PimItemTable.size), sum(CASE WHEN PimItemFlagRelation.rightId = $1 OR PimItemFlagRelation.rightId = $2 THEN 1 ELSE 0 END) FROM PimItemTable LEFT JOIN PimItemFlagRelation ON PimItemTable.id = PimItemFlagRelation.leftId
WHERE PimItemTable.collectionId = $3

Comment 3 Daniel Vrátil 2015-08-21 18:11:38 UTC

Reconstructing queries from SQL builder is hard :-) This is the correct query:

SELECT count(DISTINCT PimItemTable.id), sum(PimItemTable.size), sum(CASE WHEN PimItemFlagRelation.flag_id = $1 OR PimItemFlagRelation.flag_id = $2 THEN 1 ELSE 0 END) FROM PimItemTable LEFT JOIN PimItemFlagRelation ON PimItemTable.id = PimItemFlagRelation.pimitem_id WHERE PimItemTable.collectionId = $3

Comment 4 Marc Cousin 2015-08-22 04:48:23 UTC

Ok that's great news :)

I just got the 1.13 upgrade in Arch. I'll give it a look.

Comment 5 Marc Cousin 2015-08-22 05:20:42 UTC

The 1.13 still exhibits the problem.

Anyway, if you need the query in the comment 3, yeah, caching it is the best thing to do.

The problem is that we are joining «a lot» (thousands) of records. It consumes either CPU (if done through hash join or merge join) or IO (if the planner choses a nested loop). For people with SSDs, nested loop is the way to go. But as most PostgreSQL and MySQL instances are embedded with akonadi, they have the «basic» tuning in place, and don't know that :)

The solutions that are usually applied when a query like this really has to be made fast no matter what are all through denormalization:

* get rid of the PimItemFlagRelation table, and store the pimflags directly in PimItemTable (using an array, or a json for instance). I don't think mysql has arrays anyway. It often makes the code complicated...
* simply maintain the totals in the database (with triggers). Same here, as far as I remember, MySQL's trigger aren't very advanced

Usually, these are bad ideas. Denormalization is really a last-resort solution, as it makes code more complicated.

akonadi 15.08 is in arch testing. I will give it a look too.

Comment 6 Marc Cousin 2015-08-22 05:40:49 UTC

15.08 is much faster. I see 1s of culumated query time for what took like 100 before. Case closed as far as I'm concerned :)