Summary: | Optimization of an SQL query | ||
---|---|---|---|
Product: | [Frameworks and Libraries] Akonadi | Reporter: | Marc Cousin <cousinmarc> |
Component: | server | Assignee: | kdepim bugs <kdepim-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | cousinmarc, dvratil, winter |
Priority: | NOR | ||
Version: | 1.13.0 | ||
Target Milestone: | --- | ||
Platform: | Arch Linux | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: |
Description
Marc Cousin
2015-08-19 10:07:59 UTC
I forgot to mention that this query alone used 1hour of cpu time this morning on my laptop… Hi, this is a query to update collection statistics (i.e. number of read/unread emails in a folder). In Akonadi 1.13.0 (and before) we were calculating this expensive statistics all the time. Not just whenever something changed, but we calculated it again for each client that asked for it, which was can be over 10 clients at once. I actually fixed this in the 1.13 branch after release but we never made 1.13.1 release with this fix. The fix is also included in current 15.08.0 release. We now have a centralized cache for the statistics, which is only updated when necessary and caches the results until next change, so we don't run the SQL query each time for every client. The query now looks slightly different, in addition to the flag count it also calculates total size (as in bytes) of all items in the collection. Originally it were two queries, I wrote this ironically because I found that the JOIN is slow and wanted to avoid doing to JOINs. Maybe with your solution we could afford running two queries to get flags count and size while still being faster than this. This is the current query we have: SELECT count(DISTINCT PimItemTable.id), sum(PimItemTable.size), sum(CASE WHEN PimItemFlagRelation.rightId = $1 OR PimItemFlagRelation.rightId = $2 THEN 1 ELSE 0 END) FROM PimItemTable LEFT JOIN PimItemFlagRelation ON PimItemTable.id = PimItemFlagRelation.leftId WHERE PimItemTable.collectionId = $3 Reconstructing queries from SQL builder is hard :-) This is the correct query: SELECT count(DISTINCT PimItemTable.id), sum(PimItemTable.size), sum(CASE WHEN PimItemFlagRelation.flag_id = $1 OR PimItemFlagRelation.flag_id = $2 THEN 1 ELSE 0 END) FROM PimItemTable LEFT JOIN PimItemFlagRelation ON PimItemTable.id = PimItemFlagRelation.pimitem_id WHERE PimItemTable.collectionId = $3 Ok that's great news :) I just got the 1.13 upgrade in Arch. I'll give it a look. The 1.13 still exhibits the problem. Anyway, if you need the query in the comment 3, yeah, caching it is the best thing to do. The problem is that we are joining «a lot» (thousands) of records. It consumes either CPU (if done through hash join or merge join) or IO (if the planner choses a nested loop). For people with SSDs, nested loop is the way to go. But as most PostgreSQL and MySQL instances are embedded with akonadi, they have the «basic» tuning in place, and don't know that :) The solutions that are usually applied when a query like this really has to be made fast no matter what are all through denormalization: * get rid of the PimItemFlagRelation table, and store the pimflags directly in PimItemTable (using an array, or a json for instance). I don't think mysql has arrays anyway. It often makes the code complicated... * simply maintain the totals in the database (with triggers). Same here, as far as I remember, MySQL's trigger aren't very advanced Usually, these are bad ideas. Denormalization is really a last-resort solution, as it makes code more complicated. akonadi 15.08 is in arch testing. I will give it a look too. 15.08 is much faster. I see 1s of culumated query time for what took like 100 before. Case closed as far as I'm concerned :) |