Bug 322946

Summary: MYSQL : picture collection and database on a server can handle only one user
Product: [Applications] digikam Reporter: Joerg Beschorner <joerg.beschorner>
Component: Database-MultiusersAssignee: Digikam Developers <digikam-bugs-null>
Status: REPORTED ---    
Severity: wishlist CC: caulier.gilles, dev, metzpinguin, richm+kde, swatilodha27
Priority: NOR    
Version: 3.2.0   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: attachment-19626-0.html

Description Joerg Beschorner 2013-07-29 13:01:53 UTC
I collect the pictures on a NFS server and installed a MySQL database.
The connection works and after some time of scanning the collection I can access the pictures.
Then I install a second PC and connect in the same way. After scanning I can also access the pictures.
But when I later try to access with the PC No. one it doesn't show the picture collection, even though PC No. two is already switched off.
At the terminal I can see the following:

QSqlDatabasePrivate::removeDatabase: connection 'ConnectionTest' is still in use, all queries will cease to work.
"/org/freedesktop/UDisks2/drives/ST3500418AS_9VM6A3FP" : property "DeviceNumber" does not exist 
"/org/freedesktop/UDisks2/drives/ST3500418AS_9VM6A3FP" : property "Device" does not exist 
"/org/freedesktop/UDisks2/drives/ST3500418AS_9VM6GVEF" : property "DeviceNumber" does not exist 
"/org/freedesktop/UDisks2/drives/ST3500418AS_9VM6GVEF" : property "Device" does not exist 
libv4l2: error getting pixformat: Unpassender IOCTL (I/O-Control) für das Gerät
libv4l2: error getting pixformat: Das Argument ist ungültig

I have to delete the album path and reenter again. Than the scanning starts again.
After this I cant access with PC two.
And so on.

Reproducible: Always

Steps to Reproduce:
1.Connect digikam on PC 1 with  to the server and MySQL
2.Connect digikam on PC 2 with  to the server and MySQL
3.PC 1 lost connection
Actual Results:  
Only one PC is able to access the pictures at the server

Expected Results:  
More then one PC should be able to access the same picture collection (at the same time maybe, but one after the other in every case)

I tried this with the same SQL user at both PCs and with different users. Always the same result.
Comment 1 Joerg Beschorner 2013-11-10 18:42:56 UTC
Hi there, is digikam still alive?

Joerg
Comment 2 swatilodha27 2016-07-06 20:18:02 UTC
Richard,

Could you please tell if it would be helpful to use "MAX_USER_CONNECTIONS" set to 1, while creating users? This would limit the simultaneous connections by an account. And, it would ensure that only "different" accounts access the DB at a time?

Would it be helpful in achieving concurrency? 

Thanks
Comment 3 Richard Mortimer 2016-07-06 23:06:05 UTC
Swati,

That will make no difference at all. Different accounts will still be able to access the same database and it will create a lot of difficult to debug failure modes.
Comment 4 caulier.gilles 2016-07-07 04:01:54 UTC
Richard,

Seriously, Mysql do not have a mechanism to lock database to prevent concurrency access  to the data on the same time ? How it's possible (:=))). Or i miss understand something in database server...

After all, Mysql is not Sqlite. A real server through the network must be able to wrap this king of situation, else it's not a shared data server. Or some logic must be defined in schema to prevent corrupted data in database. I'm lost....

Gilles Caulier
Comment 5 Richard Mortimer 2016-07-07 09:00:27 UTC
Before I answer the direct question let me make it clear that a "user" in this context is actually a "connection" to the shared MySQL database from a running digikam instance. It does not matter whether that connection uses the same login details as another connection or not. It is still operating on the same data.

MySQL has many methods to handle concurrency but locking the database is generally considered a last case scenario and even then it should only be used for short periods of time (sub-second) because any longer and other connections trying to access/change the database would just wait (hang) until the lock was released.

It is difficult to provide a general solution description but there are a number of tools that can be used to build up a solutions:

- first off the database has to be considered the master copy of the data. This is especially important if/when multiple instances of digikam are accessing the database simultaneously. At that point the internal digikam data structures have to be considered a cache (local copy) of the data that may be out of date. Really digikam has to be prepared to accept/work around SQL commands that fail because someone else has changed the shared database state.

- this also means that the referential integrity constraints/checks are vital to the correct operation of the database because they help to ensure that no-one add duplicates or add references to tags/images that may have been added/deleted by someone else.

- to my mind this is where the most important preparations for a multi-digikam shared operation mode come in. The database access routines in digikam should (and many are) be arranged as logical operations, e.g. add a tag and give me the database id of the tag. Those operations have to be well tested and able to work around changes made by other users. For instance the add a tag operation SQL might fail because another digikam has already added that tag. In that instance it should find the requested tag and load the data/return the tag id from the other record.

- related to that digikam would need the ability to "rescan" the database in a similar manner to the way that it can rescan a filesystem. This allows it to easily pick up changes made by other users and integrate them into the Tags/Albums trees. I suspect that much of the images operation is already working in that way because digikam generally queries the database to find images etc.

- the next thing related to concurrency is to consider the use of transactions to ensure that a group of operations either completes or does not. This may be of use when adding something like an image and there are a number of different tables that need updating with the properties of that image. This may not be strictly necessary because it might be enough that each table entry is considered on its own merits. But without it some image metadata may not get saved to the database if digikam crashes halfway through an operation.

There is nothing stopping digikam allowing multiple users to have concurrent access to the database. I've used it myself but at the moment care has to be taken by the user to make it a mostly read-only sharing otherwise strange behaviour may occur. But there is a lot of preparation that could be made to prepare for more multi-user operation. Possibly the best way to do this is to write unit tests of the individual operations on something like the tags tree and within those tests explicitly inject changes as though they were from another user to make sure that the code handles them correctly.
Comment 6 swatilodha27 2016-07-07 10:42:54 UTC
(In reply to Richard Mortimer from comment #5)
> Before I answer the direct question let me make it clear that a "user" in
> this context is actually a "connection" to the shared MySQL database from a
> running digikam instance. It does not matter whether that connection uses
> the same login details as another connection or not. It is still operating
> on the same data.
> 
> MySQL has many methods to handle concurrency but locking the database is
> generally considered a last case scenario and even then it should only be
> used for short periods of time (sub-second) because any longer and other
> connections trying to access/change the database would just wait (hang)
> until the lock was released.
> 
> It is difficult to provide a general solution description but there are a
> number of tools that can be used to build up a solutions:
> 
> - first off the database has to be considered the master copy of the data.
> This is especially important if/when multiple instances of digikam are
> accessing the database simultaneously. At that point the internal digikam
> data structures have to be considered a cache (local copy) of the data that
> may be out of date. Really digikam has to be prepared to accept/work around
> SQL commands that fail because someone else has changed the shared database
> state.
> 
> - this also means that the referential integrity constraints/checks are
> vital to the correct operation of the database because they help to ensure
> that no-one add duplicates or add references to tags/images that may have
> been added/deleted by someone else.

After your yesterday's patch on this https://bugs.kde.org/show_bug.cgi?id=355831, I think referential integrity is mostly done, so duplicates could now be avoided.

> 
> - to my mind this is where the most important preparations for a
> multi-digikam shared operation mode come in. The database access routines in
> digikam should (and many are) be arranged as logical operations, e.g. add a
> tag and give me the database id of the tag. Those operations have to be well
> tested and able to work around changes made by other users. For instance the
> add a tag operation SQL might fail because another digikam has already added
> that tag. In that instance it should find the requested tag and load the
> data/return the tag id from the other record.

For 1) and 3) point you wrote:
In order to ensure that SQL query made by a user doesn't fail (if another user has just made a change in the DB), I think we could use INSERT IGNORE INTO or REPLACE statements, instead of simply INSERT query?
If there's no duplicate, INSERT IGNORE INTO will INSERT as general, else IGNORE will discard silently without an error. Thus, preventing duplicate records.
Or REPLACE could be better, in order to get the latest change in the DB.

> - related to that digikam would need the ability to "rescan" the database in
> a similar manner to the way that it can rescan a filesystem. This allows it
> to easily pick up changes made by other users and integrate them into the
> Tags/Albums trees. I suspect that much of the images operation is already
> working in that way because digikam generally queries the database to find
> images etc.

How to go with the "re-scanning" of DB, so it's updated for all users, if changes are made by another user?
 
> - the next thing related to concurrency is to consider the use of
> transactions to ensure that a group of operations either completes or does
> not. This may be of use when adding something like an image and there are a
> number of different tables that need updating with the properties of that
> image. This may not be strictly necessary because it might be enough that
> each table entry is considered on its own merits. But without it some image
> metadata may not get saved to the database if digikam crashes halfway
> through an operation.

Using TRANSACTIONS for all SQL queries? Or with add/delete/update a Tag/image/album would suffice? 

> There is nothing stopping digikam allowing multiple users to have concurrent
> access to the database. I've used it myself but at the moment care has to be
> taken by the user to make it a mostly read-only sharing otherwise strange
> behaviour may occur. But there is a lot of preparation that could be made to
> prepare for more multi-user operation. Possibly the best way to do this is
> to write unit tests of the individual operations on something like the tags
> tree and within those tests explicitly inject changes as though they were
> from another user to make sure that the code handles them correctly.

If I'm right, making DB "read only" for users could definitely prevent lot of confusion that could occur in DB.
Comment 7 Richard Mortimer 2016-07-07 12:40:30 UTC
(In reply to swatilodha27 from comment #6)
> (In reply to Richard Mortimer from comment #5)
... snip ...
> > 
> > - this also means that the referential integrity constraints/checks are
> > vital to the correct operation of the database because they help to ensure
> > that no-one add duplicates or add references to tags/images that may have
> > been added/deleted by someone else.
> 
> After your yesterday's patch on this
> https://bugs.kde.org/show_bug.cgi?id=355831, I think referential integrity
> is mostly done, so duplicates could now be avoided.
yes that is correct. However the consequence is that any attempt to add a duplicate will fail when executing the SQL. Hence we should be vigilant for this and properly investigate any failures because they may point to a bug somewhere in the code.

> 
> > 
> > - to my mind this is where the most important preparations for a
> > multi-digikam shared operation mode come in. The database access routines in
> > digikam should (and many are) be arranged as logical operations, e.g. add a
> > tag and give me the database id of the tag. Those operations have to be well
> > tested and able to work around changes made by other users. For instance the
> > add a tag operation SQL might fail because another digikam has already added
> > that tag. In that instance it should find the requested tag and load the
> > data/return the tag id from the other record.
> 
> For 1) and 3) point you wrote:
> In order to ensure that SQL query made by a user doesn't fail (if another
> user has just made a change in the DB), I think we could use INSERT IGNORE
> INTO or REPLACE statements, instead of simply INSERT query?
> If there's no duplicate, INSERT IGNORE INTO will INSERT as general, else
> IGNORE will discard silently without an error. Thus, preventing duplicate
> records.
> Or REPLACE could be better, in order to get the latest change in the DB.
Yes. Those do work although be mindful that many of these are not in the standard SQL syntax and hence are less portable when it comes to adding additional databases in the future.

Any use of these also needs to be careful to ensure that it receives the primary key for any added/replaced rows as appropriate so that the digikam code can add that to internal data structures.

> 
> > - related to that digikam would need the ability to "rescan" the database in
> > a similar manner to the way that it can rescan a filesystem. This allows it
> > to easily pick up changes made by other users and integrate them into the
> > Tags/Albums trees. I suspect that much of the images operation is already
> > working in that way because digikam generally queries the database to find
> > images etc.
> 
> How to go with the "re-scanning" of DB, so it's updated for all users, if
> changes are made by another user?
>  
It is each digikam instance that needs to re-scan the DB. I am not familiar with the implementation of the internals of digikam to comment about how best to do it at the moment. But it could be quite complex trying to merge with the existing structure or it could be fairly simple by rebuilding (although this would have problems with keeping selections, open branches and similar.

> > - the next thing related to concurrency is to consider the use of
> > transactions to ensure that a group of operations either completes or does
> > not. This may be of use when adding something like an image and there are a
> > number of different tables that need updating with the properties of that
> > image. This may not be strictly necessary because it might be enough that
> > each table entry is considered on its own merits. But without it some image
> > metadata may not get saved to the database if digikam crashes halfway
> > through an operation.
> 
> Using TRANSACTIONS for all SQL queries? Or with add/delete/update a
> Tag/image/album would suffice? 
All queries do actually use transactions. But by default each statement is executed in a single transaction (auto commit mode). There should be no need to change the majority of statements unless there are any sequences that should be done as a single unit or not done at all.

> 
> > There is nothing stopping digikam allowing multiple users to have concurrent
> > access to the database. I've used it myself but at the moment care has to be
> > taken by the user to make it a mostly read-only sharing otherwise strange
> > behaviour may occur. But there is a lot of preparation that could be made to
> > prepare for more multi-user operation. Possibly the best way to do this is
> > to write unit tests of the individual operations on something like the tags
> > tree and within those tests explicitly inject changes as though they were
> > from another user to make sure that the code handles them correctly.
> 
> If I'm right, making DB "read only" for users could definitely prevent lot
> of confusion that could occur in DB.

Yes that is certainly true and would be a good experiment to try.

It does have some potential problems and they may prove to be harder to overcome than fixing up things to cope with changes in other areas. A few problems I can think of are:

- the readonly user will not be able to change the database version so this means that read-only users need to use the same version as any read-write user.

- thumbnail generation would be difficult for readonly.

- any attempts to tag, rate, tweak images would need to be gracefully declined.
Comment 8 swatilodha27 2016-07-07 15:26:23 UTC
Another thought, we could implement the internal locking methods. Since there're already InnoDB tables present, we could use "row level locking". This would allow multiple write access, making it suitable for multiple users.

Thus, "SELECT ... FOR UPDATE" (type of InnoDB lock) could be used. 
http://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html

Please let me know if it could be done? 

Thanks
Comment 9 Richard Mortimer 2016-07-12 12:15:52 UTC
(In reply to swatilodha27 from comment #8)
> Another thought, we could implement the internal locking methods. Since
> there're already InnoDB tables present, we could use "row level locking".
> This would allow multiple write access, making it suitable for multiple
> users.
> 
> Thus, "SELECT ... FOR UPDATE" (type of InnoDB lock) could be used. 
> http://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html
> 
> Please let me know if it could be done? 
> 
The problem is not locking within the database that all works fine.

The problem is ensuring that digikam will cope with any changes made by other digikam instances. What happens if another digikam adds a tag and then tags some images with that. Or what happens if another digikam deletes an album or a tag. These, plus other things, all have the potential to cause undesirable behaviour in digikam.
Comment 10 Thomas Debesse 2017-07-15 06:13:05 UTC
Here I'm using digiKam 5.6.0 on Win10 workstation having pictures on Samba CIFS network share and database on MySQL (Linux server), I usually have 3 users using the same database and same network share at the same time without having anyone blocking someone else.

The only issue I see is tag tree not being updated while multiple digiKam are opened at the same time, meaning each user have to close and reopen digiKam to see the new created and applied label other ones have done, but that issues is not a blocking issue of any type: here multiple people can work on the same database at the same time without locking.

digiKam not being able to updates the tag in real time while someone else is editing them on another instance is a real issue, though.
Comment 11 Justin Zobel 2022-11-06 09:25:18 UTC
Thank you for reporting this issue in KDE software. As it has been a while since this issue was reported, can we please ask you to see if you can reproduce the issue with a recent software version?

If you can reproduce the issue, please change the status to "REPORTED" when replying. Thank you!
Comment 12 Joerg Beschorner 2022-11-16 13:38:51 UTC
Created attachment 153809 [details]
attachment-19626-0.html

sorry but I didn't continue to explore this issue.
In case I try it again I will report the result.

Regards Joerg

Am 06.11.22 um 10:25 schrieb Justin Zobel:
> https://bugs.kde.org/show_bug.cgi?id=322946
>
> Justin Zobel<justin.zobel@gmail.com>  changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>           Resolution|---                         |WAITINGFORINFO
>               Status|REPORTED                    |NEEDSINFO
>
> --- Comment #11 from Justin Zobel<justin.zobel@gmail.com>  ---
> Thank you for reporting this issue in KDE software. As it has been a while
> since this issue was reported, can we please ask you to see if you can
> reproduce the issue with a recent software version?
>
> If you can reproduce the issue, please change the status to "REPORTED" when
> replying. Thank you!
>
Comment 13 Maik Qualmann 2022-11-16 17:48:31 UTC
Based on the error description, I don't know whether you still had a problem setting up the database. But managing 2 different network paths will only be possible with digiKam-8.0.0. If so, you must add your image collection as a network collection. If the mount paths differ, it will be possible with digiKam-8.0.0 to "attach" further alternative paths to an existing network collection path.

Maik