336582 – Sudden write interruption during akonadictl vacuum causes database corruption

Bug 336582 - Sudden write interruption during akonadictl vacuum causes database corruption

Summary: Sudden write interruption during akonadictl vacuum causes database corruption

Status:	RESOLVED NOT A BUG

Alias:	None

Product:	Akonadi
Classification:	Frameworks and Libraries
Component:	general (show other bugs)
Version:	GIT (master)
Platform:	Compiled Sources Linux

Importance:	NOR critical
Target Milestone:	---
Assignee:	kdepim bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-06-22 13:22 UTC by Martin Steigerwald
Modified:	2014-06-22 13:41 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Steigerwald 2014-06-22 13:22:17 UTC

As Akonadi despite my changes to stop sorting of filenames in Maildir resources got slower and slower again, I thought I try akonadictl vacuum. I had a kernel compile running at that time and well… as it sometimes happens during heavy I/O BTRFS seemed to lock up. Thus I rebooted.

On reboot the Akonadi database was corrupted. parttable.ibd and another file was zero byte long.

Reproducible: Didn't try

Steps to Reproduce:
1. Run akonadictl vacuum
2. Switch off the machine
3. Restart
Actual Results:  
Database corrupted. Some files are zero byte.

Expected Results:  
Database is intact.

Why do I think this is a bug?

Even with delayed allocation, if an application does proper renaming and fsync() the file is either renamed and written or not. I consider a cache for pim data important. And thus it has to use fsync(). This may well be something in MySQL at play here.

Together with

Bug 336581 New: accidental database loss causes Akonadi / KMail breaks correct folder assignments

its even more important to never ever loose the database.

Potential culprits in default mysql.conf for Akonadi:

# Write out the log buffer to the log file at each commit (default:1)
innodb_flush_log_at_trx_commit=2

And if default is 0:

#sync_bin_log=0

Would be nice to have extra bugzilla entries for different DB backends.

Comment 1 Daniel Vrátil 2014-06-22 13:41:19 UTC

There's very little we can do during power-loss or kernel panic. Also this is obviously not an Akonadi problem, but a MySQL and Btrfs one, as you explicitly mentioned database corruption while running Btrfs.

The reality is, that even when kernel requests fsync, many modern HDDs (especially in laptops) will report the data as written, while in fact they will keep them in internal memory and schedule for writeback later, potentially leading to data corruption on power loss.

Per documentation of innodb_flush_log_at_trx_commit:
....
A value of 1 is required for ACID compliance. You can achieve better performance by setting the value different from 1, but then you can lose at most one second worth of transactions in a crash. With a value of 0, any mysqld process crash can erase the last second of transactions. With a value of 2, then only an operating system crash or a power outage can erase the last second of transactions. However, InnoDB's crash recovery is not affected and thus crash recovery does work regardless of the value. 

Losing "up to last second of transactions"  can only result in Akonadi cache being inconsistent with server data and which should be automatically fixed during next sync. This does not lead to database (table datafile) corruption.

Same applies for sync_binlog - it does not affect the table datafiles, only journals.