Bug 399167

Summary: Akonadi enters deadlock state which makes it completely unuseable
Product: [Frameworks and Libraries] Akonadi Reporter: Andreas Schneider <asn>
Component: IMAP resourceAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: dschridde+kde, dvratil, faure, jens-bugs.kde.org, kde-bugs, kfunk, me, miso, vinz, vkrause
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: Akonadi error log
full akonadi error log
mariadb status

Description Andreas Schneider 2018-09-28 07:01:56 UTC
SUMMARY

Every few weeks, akonadi runs into a deadlock state, the console output is:


org.kde.pim.akonadicore: Creating/updating items from the akonadi database failed: "Multiple merge candidates, aborting"
org.kde.pim.akonadicore: Error during ItemSync:  "Multiple merge candidates, aborting"


It doesn't tell me in which mail account nor folder this is happening. To get my mails back into a working state, I have to completely remove the database any synchronize all my mails.


Sometimes after some time I see:

org.kde.pim.akonadiserver: DATABASE ERROR:
org.kde.pim.akonadiserver: DATABASE ERROR:
org.kde.pim.akonadiserver:   Error code: "1205"
org.kde.pim.akonadiserver:   Error code: "1205"
org.kde.pim.akonadiserver:   DB error:  "Lock wait timeout exceeded; try restarting transaction"
org.kde.pim.akonadiserver:   DB error:  "Lock wait timeout exceeded; try restarting transaction"
org.kde.pim.akonadiserver:   Error text: "Lock wait timeout exceeded; try restarting transaction QMYSQL3: Unable to execute statement"
org.kde.pim.akonadiserver:   Error text: "Lock wait timeout exceeded; try restarting transaction QMYSQL3: Unable to execute statement"
org.kde.pim.akonadiserver:   Query: "UPDATE PimItemTable SET atime = ? WHERE ( ( PimItemTable.id = ? ) )"
org.kde.pim.akonadiserver:   Query: "UPDATE PimItemTable SET atime = ? WHERE ( ( PimItemTable.id = ? ) )"


There is no way to solve this without deep akonadi knowledge.


rpm -q akonadi-server
akonadi-server-18.08.1-1.1.x86_64
Comment 1 Andreas Schneider 2018-10-02 08:49:22 UTC
Please print the "folder name" in the error/debug message:

org.kde.pim.akonadicore: Creating/updating items from the akonadi database failed: "Multiple merge candidates, aborting"

and/or

org.kde.pim.akonadicore: Error during ItemSync:  "Multiple merge candidates, aborting"
Comment 2 Michal Hlavac 2018-10-11 09:42:14 UTC
Created attachment 115560 [details]
Akonadi error log

Same issue here. I get this on daily basis. It's realy unusable. Full akonadi log in attachment.
Comment 3 Michal Hlavac 2018-11-16 13:46:42 UTC
I fixed this by completely reinitialization of akonadi.

somethink like:
akonadictl stop
remove directories ~/.local/share/akonadi*
remove files and directories ~/.config/akonadi*
akonadictl start

and setup all IMAP accounts again.
Comment 4 Michal Hlavac 2018-11-19 15:10:47 UTC
It didn't help. After few days errors are back. I think it may have some connection with suspend to RAM.
Comment 5 Daniel Vrátil 2018-11-27 10:14:28 UTC
The Multiple Merge Candidates issue and the transaction deadlock are not exactly the same issue, but I believe they are somewhat connected.

I don't have a plan yet how to solve this (a workaround for the transaction deadlock is to use Postgres :)), if anyone has a simple set of steps how to reliably reproduce this issue, it would be a tremendous help.
Comment 6 Andreas Schneider 2018-11-27 11:11:28 UTC
I can more or less reproduce the multiple merge canditates.

I'm connected to an imap server over a slow VPN connection. I have "Download all messages for offline use" selected.

When I read the Inbox and delete a mail, I press 'Del' and the message is gone. Then a sync with the imap it triggered automatically. The sync makes the message appear again and them it is removed again.

If I read the next mail and want to delete it. I press 'Del' however the sync to imap was trigger and I hit 'Del' while the old message reappears I send a delete for that message twice! Now the there is a 'multiple merge canditates' issue for Trash.
Comment 7 Andreas Schneider 2018-11-27 11:13:38 UTC
I press 'Del' the message is remove from the message list. The message reappears  because of the imap sync in the message list. Now if you 'Del' the reappeared message again, you create a 'Multiple merge candidates' error for that message in Trash ...
Comment 8 Vinzenz Vietzke [TUXEDO] 2018-12-01 21:51:30 UTC
Created attachment 116616 [details]
full akonadi error log

(In reply to Michal Hlavac from comment #4)
> It didn't help. After few days errors are back. I think it may have some
> connection with suspend to RAM.

I can somehow confirm that as I suspend my machines to RAM as well. Kmail worked fine for a while then suddenly stopped. Restarting akonadi helps for manually fetching emails once. Full error log attached.
Comment 9 Kevin Funk 2019-01-08 10:31:00 UTC
Running into this issue here after completely cleaning ~/.local/share/akonadi*, trying to recover from a different issue... :(

Akonadi 5.8.3.
Comment 10 Volker Krause 2019-02-07 08:57:17 UTC
Hit this too. Unclear what caused it, but the workaround from #338658 seems to fix the damage:
DELETE pimitemtable FROM pimitemtable
LEFT JOIN parttable ON pimitemtable.id = parttable.pimItemId
WHERE parttable.id IS NULL

That would suggest that a part-less item can fairly reliably trigger the transaction deadlock.
Comment 11 Milian Wolff 2019-02-11 08:33:03 UTC
I had 1007 part-less items, removing them seems to have helped for now. Could we add that cleanup routine to `akonadictl fsck`? Also, any idea how could prevent this from happening again - I am willing to apply patches in the hunt for why this happens in the first place.
Comment 12 Milian Wolff 2019-02-11 08:40:52 UTC
ok, that cleanup didn't help for long... First I got spammed by hundreds of lines of:

  1281.968 critical: org.kde.pim.akonadiserver/Akonadi::Server::PartHelper::translateData[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/parthelper.cpp:152]: Error:  "No such file or directory"
  1281.970 critical: org.kde.pim.akonadiserver/Akonadi::Server::PartHelper::translateData[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/parthelper.cpp:151]: Payload file  "/home/milian/.local/share/akonadi/file_db_data/94/1670294_r1"  could not be open for reading!
  1281.970 critical: org.kde.pim.akonadiserver/Akonadi::Server::PartHelper::translateData[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/parthelper.cpp:152]: Error:  "No such file or directory"
  1281.971 warning: org.kde.pim.akonadiprivate/Akonadi::ExternalPartStorage::replayTransaction[/home/milian/projects/kf5/src/kde/pim/akonadi/src/private/externalpartstorage.cpp:326]: Warning: failed to remove "/home/milian/.local/share/akonadi/file_db_data/93/1670293_r1" while committing a transaction
  1281.971 warning: org.kde.pim.akonadiprivate/Akonadi::ExternalPartStorage::replayTransaction[/home/milian/projects/kf5/src/kde/pim/akonadi/src/private/externalpartstorage.cpp:326]: Warning: failed to remove "/home/milian/.local/share/akonadi/file_db_data/94/1670294_r1" while committing a transaction
  1281.979 critical: org.kde.pim.akonadiserver/Akonadi::Server::PartHelper::translateData[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/parthelper.cpp:151]: Payload file  "/home/milian/.local/share/akonadi/file_db_data/96/1670296_r1"  could not be open for reading!
  1281.979 critical: org.kde.pim.akonadiserver/Akonadi::Server::PartHelper::translateData[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/parthelper.cpp:152]: Error:  "No such file or directory"
  1281.981 critical: org.kde.pim.akonadiserver/Akonadi::Server::PartHelper::translateData[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/parthelper.cpp:151]: Payload file  "/home/milian/.local/share/akonadi/file_db_data/97/1670297_r1"  could not be open for reading!
  1281.981 critical: org.kde.pim.akonadiserver/Akonadi::Server::PartHelper::translateData[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/parthelper.cpp:152]: Error:  "No such file or directory"
  1281.982 warning: org.kde.pim.akonadiprivate/Akonadi::ExternalPartStorage::replayTransaction[/home/milian/projects/kf5/src/kde/pim/akonadi/src/private/externalpartstorage.cpp:326]: Warning: failed to remove "/home/milian/.local/share/akonadi/file_db_data/96/1670296_r1" while committing a transaction
  1281.982 warning: org.kde.pim.akonadiprivate/Akonadi::ExternalPartStorage::replayTransaction[/home/milian/projects/kf5/src/kde/pim/akonadi/src/private/externalpartstorage.cpp:326]: Warning: failed to remove "/home/milian/.local/share/akonadi/file_db_data/97/1670297_r1" while committing a transaction

and finally I reached:

  1331.367 warning: org.kde.pim.akonadiserver/Akonadi::Server::QueryBuilder::exec[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/querybuilder.cpp:435]: "Deadlock found when trying to get lock; try restarting transaction QMYSQL3: Unable to store statement results"
  1331.515 critical: org.kde.pim.akonadiserver/Akonadi::Server::DataStore::retryLastTransaction[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/datastore.cpp:1426]: DATABASE ERROR when retrying transaction
  1331.515 critical: org.kde.pim.akonadiserver/Akonadi::Server::DataStore::retryLastTransaction[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/datastore.cpp:1427]:   Error code: "1062"
  1331.515 critical: org.kde.pim.akonadiserver/Akonadi::Server::DataStore::retryLastTransaction[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/datastore.cpp:1428]:   DB error:  "Duplicate entry '615495-5' for key 'PartTable_pimItemIdTypeIndex'"
  1331.515 critical: org.kde.pim.akonadiserver/Akonadi::Server::DataStore::retryLastTransaction[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/datastore.cpp:1429]:   Error text: "Duplicate entry '615495-5' for key 'PartTable_pimItemIdTypeIndex' QMYSQL3: Unable to execute statement"
  1331.516 critical: org.kde.pim.akonadiserver/Akonadi::Server::DataStore::retryLastTransaction[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/storage/datastore.cpp:1430]:   Query: "INSERT INTO PartTable (pimItemId, partTypeId, data, datasize, version, storage) VALUES (?, ?, ?, ?, ?, ?)"
  1331.516 warning: org.kde.pim.akonadiserver/Akonadi::Server::Connection::handleIncomingData[/home/milian/projects/kf5/src/kde/pim/akonadi/src/server/connection.cpp:283]: Error while handling command CreateItem on connection akonadi_imap_resource_3 (0x557ffe691720)
  1205.189 warning: org.kde.pim.akonadicore/Akonadi::ItemSyncPrivate::slotLocalChangeDone[/home/milian/projects/kf5/src/kde/pim/akonadi/src/core/itemsync.cpp:443]: Creating/updating items from the akonadi database failed: "Failed to query database for item"
  1205.189 warning: org.kde.pim.akonadicore/Akonadi::ItemSync::slotResult[/home/milian/projects/kf5/src/kde/pim/akonadi/src/core/itemsync.cpp:499]: Error during ItemSync:  "Failed to query database for item"
  1205.280 warning: org.kde.pim.imapresource/RetrieveItemsTask::prepareRetrieval[/home/milian/projects/kf5/src/kde/pim/kdepim-runtime/resources/imap/retrieveitemstask.cpp:474]: Detected inconsistency in local cache, we're missing some messages. Server:  4014  Local:  3935
  1205.281 warning: org.kde.pim.imapresource/RetrieveItemsTask::prepareRetrieval[/home/milian/projects/kf5/src/kde/pim/kdepim-runtime/resources/imap/retrieveitemstask.cpp:475]: Refetching complete mailbox.

looking at the item table, I once again have 46 items without a part in them... So I seem to be able to trigger this quite reliably, I'm afraid :(
Comment 13 Michal Hlavac 2019-02-27 10:31:23 UTC
Created attachment 118398 [details]
mariadb status

I run `SHOW ENGINE INNODB STATUS` query in akonadiconsole.
It looks like 2 transactions with same update query lock each other.

UPDATE PimItemTable SET atime = ? WHERE ( ( PimItemTable.id = ? ) )

Output is in attachment.
Comment 14 Christoph Feck 2019-03-08 19:48:50 UTC
*** Bug 404538 has been marked as a duplicate of this bug. ***
Comment 15 Kevin Funk 2019-03-11 13:47:26 UTC
*** Bug 402526 has been marked as a duplicate of this bug. ***
Comment 16 David Faure 2019-03-11 14:01:22 UTC
I debugged this and came up with https://phabricator.kde.org/D19487

It breaks canceling (as a user, clicking on the progress item to cancel), but if you stay away from canceling, you can try that patch to make akonadi more useable. I'm working on fixing canceling....
Comment 17 Michal Hlavac 2019-03-14 17:56:37 UTC
Can confirm that temporary fix works. Didn't try to cancel fetch process.

Patched packages form openSUSE Tumbleweed:
https://build.opensuse.org/package/show/home:hlavki/akonadi-server
Comment 18 David Faure 2019-03-21 15:49:52 UTC
Git commit f1281cf18f40fd69acd61c31b48f5ce43e138eea by David Faure.
Committed on 21/03/2019 at 15:49.
Pushed by dfaure into branch 'master'.

Akonadi: fix dangling transaction after itemsync failure

Summary:
TransactionSequence was emitting result() twice when rolling back.

* How did this happen?
The TransactionRollbackJob is (automatically) added as a subjob of the
TransactionSequence, so when it finishes, slotResult is called (like for
all subjobs), as well as rollbackResult().
Since the latter emits result() already [mostly for symmetry with
commitResult()], we don't need to do that in slotResult (which doesn't do
it for the case of committing, either).

* Why is it a problem to emit result() twice?
Well, first, it's against the law in KJob world. In practice,
ItemSyncPrivate::slotTransactionResult was called twice (for the same
TransactionSequence job) which made it decrement mTransactionJobs one
time too many.
As a result, checkDone() finished too early and didn't go into the
"commit transaction" branch for other transactions.
Leaving a transaction "open" is a good recipe for database deadlocks further
down the line.

* Why did the TransactionSequence roll back in the first place?
In my case because of the infamous and not-yet fixed "Multiple merge
candidates" problem, but it seems that it can also happen when having
items without a part, according to Volker's investigations.
All of these issues still need to be fixed, but at least akonadi seems
to be still usable after they happen.

Test Plan: Ctrl+L in kmail, with a folder having multiple items for the same RID

Reviewers: dvratil, vkrause

Reviewed By: dvratil

Subscribers: kfunk, kde-pim

Tags: #kde_pim

Differential Revision: https://phabricator.kde.org/D19487

M  +0    -2    src/core/jobs/transactionsequence.cpp

https://commits.kde.org/akonadi/f1281cf18f40fd69acd61c31b48f5ce43e138eea
Comment 19 David Faure 2019-03-21 15:50:24 UTC
Git commit 8ff596c4fe15199b66262c624d8b7c8d8ec7368f by David Faure.
Committed on 21/03/2019 at 15:50.
Pushed by dfaure into branch 'Applications/19.04'.

Akonadi: fix dangling transaction after itemsync failure

Summary:
TransactionSequence was emitting result() twice when rolling back.

* How did this happen?
The TransactionRollbackJob is (automatically) added as a subjob of the
TransactionSequence, so when it finishes, slotResult is called (like for
all subjobs), as well as rollbackResult().
Since the latter emits result() already [mostly for symmetry with
commitResult()], we don't need to do that in slotResult (which doesn't do
it for the case of committing, either).

* Why is it a problem to emit result() twice?
Well, first, it's against the law in KJob world. In practice,
ItemSyncPrivate::slotTransactionResult was called twice (for the same
TransactionSequence job) which made it decrement mTransactionJobs one
time too many.
As a result, checkDone() finished too early and didn't go into the
"commit transaction" branch for other transactions.
Leaving a transaction "open" is a good recipe for database deadlocks further
down the line.

* Why did the TransactionSequence roll back in the first place?
In my case because of the infamous and not-yet fixed "Multiple merge
candidates" problem, but it seems that it can also happen when having
items without a part, according to Volker's investigations.
All of these issues still need to be fixed, but at least akonadi seems
to be still usable after they happen.

Test Plan: Ctrl+L in kmail, with a folder having multiple items for the same RID

Reviewers: dvratil, vkrause

Reviewed By: dvratil

Subscribers: kfunk, kde-pim

Tags: #kde_pim

Differential Revision: https://phabricator.kde.org/D19487

(cherry picked from commit f1281cf18f40fd69acd61c31b48f5ce43e138eea)

M  +0    -2    src/core/jobs/transactionsequence.cpp

https://commits.kde.org/akonadi/8ff596c4fe15199b66262c624d8b7c8d8ec7368f
Comment 20 David Faure 2019-03-21 22:17:20 UTC
Both fixes have been pushed in Applications/19.04 branch.

(The fix for canceling is https://phabricator.kde.org/D19488)
Comment 21 David Faure 2019-04-04 06:59:57 UTC
Found one more problem.... https://phabricator.kde.org/D20243
Comment 22 David Faure 2019-04-07 13:23:41 UTC
Additional fix pushed in https://cgit.kde.org/akonadi.git/commit/?id=257598195d09b1022ebf08cd58245cd3493f5a2e for 19.04

Note: all this is about akonadi not getting stuck when the "multiple merge candidates" problem happens. The problem itself isn't fixed yet, for lack of a reproducible testcase.
Comment 23 David Faure 2019-04-07 13:46:52 UTC
Oh, and about part-less items, I just noticed that I had a bunch in a spam folder, and after syncing that folder, the same query returns nothing. So it seems to me that a successful sync clears that up. In other words, it would be a consequence of a stuck sync -- not the cause for it.
Comment 24 David Faure 2019-08-16 11:04:42 UTC
Git commit 15c91a0ac93051465b37807efceb6e9fd36cb73b by David Faure.
Committed on 16/08/2019 at 09:27.
Pushed by dfaure into branch 'Applications/18.12'.

Akonadi: fix dangling transaction after itemsync failure

Summary:
TransactionSequence was emitting result() twice when rolling back.

* How did this happen?
The TransactionRollbackJob is (automatically) added as a subjob of the
TransactionSequence, so when it finishes, slotResult is called (like for
all subjobs), as well as rollbackResult().
Since the latter emits result() already [mostly for symmetry with
commitResult()], we don't need to do that in slotResult (which doesn't do
it for the case of committing, either).

* Why is it a problem to emit result() twice?
Well, first, it's against the law in KJob world. In practice,
ItemSyncPrivate::slotTransactionResult was called twice (for the same
TransactionSequence job) which made it decrement mTransactionJobs one
time too many.
As a result, checkDone() finished too early and didn't go into the
"commit transaction" branch for other transactions.
Leaving a transaction "open" is a good recipe for database deadlocks further
down the line.

* Why did the TransactionSequence roll back in the first place?
In my case because of the infamous and not-yet fixed "Multiple merge
candidates" problem, but it seems that it can also happen when having
items without a part, according to Volker's investigations.
All of these issues still need to be fixed, but at least akonadi seems
to be still usable after they happen.

Test Plan: Ctrl+L in kmail, with a folder having multiple items for the same RID

Reviewers: dvratil, vkrause

Reviewed By: dvratil

Subscribers: kfunk, kde-pim

Tags: #kde_pim

Differential Revision: https://phabricator.kde.org/D19487

(cherry picked from commit f1281cf18f40fd69acd61c31b48f5ce43e138eea)
(cherry picked from commit 8ff596c4fe15199b66262c624d8b7c8d8ec7368f)

M  +0    -2    src/core/jobs/transactionsequence.cpp

https://commits.kde.org/akonadi/15c91a0ac93051465b37807efceb6e9fd36cb73b
Comment 25 Sandro Knauß 2019-08-28 16:36:36 UTC
Git commit ca67354dcc5b4640f26de0b3e46c79cf1e50bc32 by Sandro Knauß, on behalf of David Faure.
Committed on 28/08/2019 at 16:36.
Pushed by knauss into branch 'Applications/18.08'.

Akonadi: fix dangling transaction after itemsync failure

Summary:
TransactionSequence was emitting result() twice when rolling back.

* How did this happen?
The TransactionRollbackJob is (automatically) added as a subjob of the
TransactionSequence, so when it finishes, slotResult is called (like for
all subjobs), as well as rollbackResult().
Since the latter emits result() already [mostly for symmetry with
commitResult()], we don't need to do that in slotResult (which doesn't do
it for the case of committing, either).

* Why is it a problem to emit result() twice?
Well, first, it's against the law in KJob world. In practice,
ItemSyncPrivate::slotTransactionResult was called twice (for the same
TransactionSequence job) which made it decrement mTransactionJobs one
time too many.
As a result, checkDone() finished too early and didn't go into the
"commit transaction" branch for other transactions.
Leaving a transaction "open" is a good recipe for database deadlocks further
down the line.

* Why did the TransactionSequence roll back in the first place?
In my case because of the infamous and not-yet fixed "Multiple merge
candidates" problem, but it seems that it can also happen when having
items without a part, according to Volker's investigations.
All of these issues still need to be fixed, but at least akonadi seems
to be still usable after they happen.

Test Plan: Ctrl+L in kmail, with a folder having multiple items for the same RID

Reviewers: dvratil, vkrause

Reviewed By: dvratil

Subscribers: kfunk, kde-pim

Tags: #kde_pim

Differential Revision: https://phabricator.kde.org/D19487

(cherry picked from commit f1281cf18f40fd69acd61c31b48f5ce43e138eea)
(cherry picked from commit 8ff596c4fe15199b66262c624d8b7c8d8ec7368f)
(cherry picked from commit 15c91a0ac93051465b37807efceb6e9fd36cb73b)

M  +0    -2    src/core/jobs/transactionsequence.cpp

https://commits.kde.org/akonadi/ca67354dcc5b4640f26de0b3e46c79cf1e50bc32