Bug 145293 - Speed up scan for new images in kphotoalbum
Summary: Speed up scan for new images in kphotoalbum
Status: RESOLVED FIXED
Alias: None
Product: kphotoalbum
Classification: Applications
Component: Backend (show other bugs)
Version: unspecified
Platform: openSUSE Linux
: NOR wishlist
Target Milestone: ---
Assignee: KPhotoAlbum Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-05-11 00:58 UTC by rlk
Modified: 2007-07-22 23:58 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
Patch to greatly speed up scanning for new images (5.73 KB, patch)
2007-05-11 00:59 UTC, rlk
Details
Patch to greatly speed up scanning for new images (6.05 KB, patch)
2007-05-12 01:31 UTC, rlk
Details
Patch to greatly speed up scanning for new images (11.55 KB, patch)
2007-07-22 23:07 UTC, rlk
Details

Note You need to log in before you can comment on or make changes to this bug.
Description rlk 2007-05-11 00:58:29 UTC
Version:            (using KDE KDE 3.5.6)
Installed from:    SuSE RPMs

Scanning for new images is very slow if the filesystem metadata isn't currently in memory.  This is because (among other things) the QDir class always stat's every file in the directory, which forces the inode to be pulled in off disk.  With 20000 files in my database, it takes a minute or two just to complete the scan.

In addition, there are certain file types we can ignore, because the raw decoder already ignores them in some cases.

With this patch, scanning for new images takes about 3 seconds on a cold system.
Comment 1 rlk 2007-05-11 00:59:03 UTC
Created attachment 20532 [details]
Patch to greatly speed up scanning for new images
Comment 2 rlk 2007-05-12 01:31:44 UTC
Created attachment 20544 [details]
Patch to greatly speed up scanning for new images

Updated per review comments.
Comment 3 rlk 2007-07-22 17:22:15 UTC
Just a reminder that this patch is still in the queue -- it makes a tremendous difference to startup speed.
Comment 4 Jan Kundrát 2007-07-22 19:00:24 UTC
OK, some comments:

a) it isn't common in KPA's sources to use void for specifying "no arguments"
b) "extensions" in RAWImageDecoder::_skipThisFile is a duplicate of stuff from RAWImageDecoder::_mightDecode. I think it should be merged somewhere... (as already suggested by Shawn Wilden on the ML)
c) using an array of QString doesn't look C++-ish to me. I see it's already done in KPA's sources, but I wonder why? Is it for performance reasons? Why don't use QStringList?
d) I'd use Set<QString> instead of QDict<void>
e) could you please sign the comment in FastDir with your name?
f) KPA uses private variable names prefixed by underscore, you use suffix (name_)
g) I'll try to make some Gentoo security dev look at the readdir calls
Comment 5 Jan Kundrát 2007-07-22 20:14:53 UTC
20:07 <+p-y> but since here you're using MAXNAMELEN, I think it's ok

So g) is solved.
Comment 6 rlk 2007-07-22 22:14:24 UTC
a) I see.  I don't agree with this practice but I'll make the change to be consistent.

b) I agree, I'm working on factoring out the common bits.  It can't be entirely merged, but there's a lot that can be factored out.

c) I'm going to fix this up in the process of doing (b) in this place.  I'm not going to fix it elsewhere.

d) I agree, I'm using QDict<void> because the caller (and its caller) already uses QDict<void> for the list of loaded files and I'd prefer not to touch code outside of this change.

e) OK

f) OK
Comment 7 rlk 2007-07-22 23:07:39 UTC
Created attachment 21223 [details]
Patch to greatly speed up scanning for new images

This addresses comments a-f above.

In addition, I've added some additional suffixes to the list to be ignored (by
the raw decoder, at any rate) unconditionally, including HTML/XML files and
compressed files.
Comment 8 Jan Kundrát 2007-07-22 23:58:03 UTC
SVN commit 691078 by jkt:

Speed up scan for new images by using readdir_r() instead of QDir. Patch by
Robert Krawitz, I've just made three RAWImageDecoder::_fileSOMETHING() functions
private.

BUG: 145293


 M  +67 -6     DB/NewImageFinder.cpp  
 M  +120 -35   ImageManager/RawImageDecoder.cpp  
 M  +13 -1     ImageManager/RawImageDecoder.h  


--- branches/extragear/kde3/graphics/kphotoalbum/DB/NewImageFinder.cpp #691077:691078
@@ -20,7 +20,9 @@
 #include <qfileinfo.h>
 #include "Settings/SettingsData.h"
 #include "Browser/BrowserWidget.h"
-#include <qdir.h>
+#include "ImageManager/RawImageDecoder.h"
+#include <sys/types.h>
+#include <dirent.h>
 #include "Utilities/Util.h"
 #include <qprogressdialog.h>
 #include <klocale.h>
@@ -56,6 +58,59 @@
     return (!_pendingLoad.isEmpty()); // returns if new images was found.
 }
 
+// FastDir is used in place of QDir because QDir stat()s every file in
+// the directory, even if we tell it not to restrict anything.  When
+// scanning for new images, we don't want to look at files we already
+// have in our database, and we also don't want to look at files whose
+// names indicate that we don't care about them.  So what we do is
+// simply read the names from the directory and let the higher layers
+// decide what to do with them.
+//
+// On my sample database with ~20,000 images, this improves the time
+// to rescan for images on a cold system from about 100 seconds to
+// about 3 seconds.
+//
+// -- Robert Krawitz, rlk@alum.mit.edu 2007-07-22
+class FastDir
+{
+public:
+    FastDir(const QString &path);
+    QStringList entryList() const;
+private:
+    const QString _path;
+};
+
+FastDir::FastDir(const QString &path)
+  : _path(path)
+{
+}
+
+QStringList FastDir::entryList() const
+{
+    QStringList answer;
+    DIR *dir;
+    dirent *file;
+    dir = opendir( QFile::encodeName(_path) );
+    if ( !dir )
+	return answer; // cannot read the directory
+
+#if defined(QT_THREAD_SUPPORT) && defined(_POSIX_THREAD_SAFE_FUNCTIONS) && !defined(Q_OS_CYGWIN)
+    union dirent_buf {
+	struct dirent mt_file;
+	char b[sizeof(struct dirent) + MAXNAMLEN + 1];
+    } *u = new union dirent_buf;
+    while ( readdir_r(dir, &(u->mt_file), &file ) == 0 && file )
+#else
+    while ( (file = readdir(dir)) )
+#endif // QT_THREAD_SUPPORT && _POSIX_THREAD_SAFE_FUNCTIONS
+	answer.append(QFile::decodeName(file->d_name));
+#if defined(QT_THREAD_SUPPORT) && defined(_POSIX_THREAD_SAFE_FUNCTIONS) && !defined(Q_OS_CYGWIN)
+    delete u;
+#endif
+    (void) closedir(dir);
+    return answer;
+}
+
 void NewImageFinder::searchForNewFiles( const QDict<void>& loadedFiles, QString directory )
 {
     if ( directory.endsWith( QString::fromLatin1("/") ) )
@@ -65,18 +120,24 @@
     if ( imageDir.endsWith( QString::fromLatin1("/") ) )
         imageDir = imageDir.mid( 0, imageDir.length()-1 );
 
-    QDir dir( directory );
-    QStringList dirList = dir.entryList( QDir::All );
+    FastDir dir( directory );
+    QStringList dirList = dir.entryList( );
+    ImageManager::RAWImageDecoder dec;
     for( QStringList::Iterator it = dirList.begin(); it != dirList.end(); ++it ) {
         QString file = directory + QString::fromLatin1("/") + *it;
-        QFileInfo fi( file );
         if ( (*it) == QString::fromLatin1(".") || (*it) == QString::fromLatin1("..") ||
              (*it) == QString::fromLatin1("ThumbNails") ||
              (*it) == QString::fromLatin1("CategoryImages") ||
-             !fi.isReadable() )
+	     loadedFiles.find( file ) ||
+	     dec._skipThisFile(loadedFiles, file) )
             continue;
 
-        if ( fi.isFile() && loadedFiles.find( file ) == 0) {
+        QFileInfo fi( file );
+
+	if ( !fi.isReadable() )
+	    continue;
+
+        if ( fi.isFile() ) {
             QString baseName = file.mid( imageDir.length()+1 );
             if ( ! DB::ImageDB::instance()->isBlocking( baseName ) ) {
                 if ( Utilities::canReadImage(file) )
--- branches/extragear/kde3/graphics/kphotoalbum/ImageManager/RawImageDecoder.cpp #691077:691078
@@ -21,6 +21,7 @@
 #include <qfile.h>
 #include <qimage.h>
 #include <qwmatrix.h>
+#include <qstringlist.h>
 #include "Settings/SettingsData.h"
 
 /* Main entry point into raw parser */
@@ -54,46 +55,130 @@
   return true;
 }
 
-bool RAWImageDecoder::_mightDecode( const QString& imageFile )
+QStringList RAWImageDecoder::_rawExtensions;
+QStringList RAWImageDecoder::_standardExtensions;
+QStringList RAWImageDecoder::_ignoredExtensions;
+
+void RAWImageDecoder::_initializeExtensionLists()
 {
-	/* Known RAW file extensions. TODO: Complete */
-	static const QString extensions[] = { QString::fromLatin1("crw"),
-										  QString::fromLatin1("cr2"),
-										  QString::fromLatin1("nef"),
-										  QString::fromLatin1("bay"),
-										  QString::fromLatin1("mos"),
-										  QString::fromLatin1("mrw"),
-										  QString::fromLatin1("orf"),
-										  QString::fromLatin1("cs1"),
-										  QString::fromLatin1("dc2"),
-										  QString::fromLatin1("kdc"),
-										  QString::fromLatin1("raf"),
-										  QString::fromLatin1("rdc"),
-										  QString::fromLatin1("x3f"),
-										  QString::fromLatin1("erf"),
-										  QString::fromLatin1("pef"),
-										  QString::null };
-	if (Settings::SettingsData::instance()->dontReadRawFilesWithOtherMatchingFile()) {
-            static const QString standardExtensions[] = {
-                QString::fromLatin1("jpg"),
-                QString::fromLatin1("JPG"),
-                QString::fromLatin1("tif"),
-                QString::fromLatin1("TIF"),
-                QString::fromLatin1("png"),
-                QString::fromLatin1("PNG"),
-                QString::null };
-            QString baseFileName = imageFile;
-            baseFileName.remove(baseFileName.length() - 3, 3);
-            for (int i = 0; !standardExtensions[i].isNull(); ++i) {
-                if (QFile::exists(baseFileName + standardExtensions[i]))
-                    return false;
-            }
+  static bool extensionListsInitialized = 0;
+  if (! extensionListsInitialized) {
+    /* Known RAW file extensions. TODO: Complete */
+    _rawExtensions << QString::fromLatin1("crw")
+		   << QString::fromLatin1("cr2")
+		   << QString::fromLatin1("nef")
+		   << QString::fromLatin1("bay")
+		   << QString::fromLatin1("mos")
+		   << QString::fromLatin1("mrw")
+		   << QString::fromLatin1("orf")
+		   << QString::fromLatin1("cs1")
+		   << QString::fromLatin1("dc2")
+		   << QString::fromLatin1("kdc")
+		   << QString::fromLatin1("raf")
+		   << QString::fromLatin1("rdc")
+		   << QString::fromLatin1("x3f")
+		   << QString::fromLatin1("erf")
+		   << QString::fromLatin1("pef");
+    _standardExtensions << QString::fromLatin1("jpg")
+			<< QString::fromLatin1("JPG")
+			<< QString::fromLatin1("jpeg")
+			<< QString::fromLatin1("JPEG")
+			<< QString::fromLatin1("tif")
+			<< QString::fromLatin1("TIF")
+			<< QString::fromLatin1("tiff")
+			<< QString::fromLatin1("TIFF")
+			<< QString::fromLatin1("png")
+			<< QString::fromLatin1("PNG");
+    _ignoredExtensions << QString::fromLatin1("thm") // Thumbnails
+		       << QString::fromLatin1("THM")
+		       << QString::fromLatin1("thumb") // thumbnail files
+						       // from dcraw
+		       << QString::fromLatin1("ctg") // Catalog files
+		       << QString::fromLatin1("gz") // Compressed files
+		       << QString::fromLatin1("Z")
+		       << QString::fromLatin1("bz2")
+		       << QString::fromLatin1("zip")
+		       << QString::fromLatin1("xml")
+		       << QString::fromLatin1("XML")
+		       << QString::fromLatin1("html")
+		       << QString::fromLatin1("HTML")
+		       << QString::fromLatin1("htm")
+		       << QString::fromLatin1("HTM");
+    extensionListsInitialized = 1;
+  }
+}
+
+bool RAWImageDecoder::_fileExistsWithExtensions( const QString& fileName,
+						const QStringList& extensionList) const
+{
+	QString baseFileName = fileName;
+	int extStart = fileName.findRev('.') + 1;
+	// We're interested in xxx.yyy, not .yyy
+	if (extStart <= 1) return false;
+	baseFileName.remove(extStart, baseFileName.length() - extStart);
+	for ( QStringList::ConstIterator it = extensionList.begin();
+	      it != extensionList.end(); ++it ) {
+	    if (QFile::exists(baseFileName + *it)) return true;
 	}
+	return false;
+}
 
-	for( int i = 0; !extensions[i].isNull(); ++i ) {
-		if( imageFile.endsWith( extensions[i], false ) ) return true;
+bool RAWImageDecoder::_fileIsKnownWithExtensions( const QDict<void>& files,
+						 const QString& fileName,
+						 const QStringList& extensionList) const
+{
+	QString baseFileName = fileName;
+	int extStart = fileName.findRev('.') + 1;
+	if (extStart <= 1) return false;
+	baseFileName.remove(extStart, baseFileName.length() - extStart);
+	for ( QStringList::ConstIterator it = extensionList.begin();
+	      it != extensionList.end(); ++it ) {
+	    if (files.find(baseFileName + *it) ) return true;
 	}
 	return false;
 }
 
+bool RAWImageDecoder::_fileEndsWithExtensions( const QString& fileName,
+					      const QStringList& extensionList) const
+{
+	for ( QStringList::ConstIterator it = extensionList.begin();
+	      it != extensionList.end(); ++it ) {
+	    if( fileName.endsWith( *it, false ) ) return true;
+	}
+	return false;
 }
+
+bool RAWImageDecoder::_mightDecode( const QString& imageFile )
+{
+	if (Settings::SettingsData::instance()->dontReadRawFilesWithOtherMatchingFile() &&
+	    _fileExistsWithExtensions(imageFile, _standardExtensions)) return false;
+	if (_fileEndsWithExtensions(imageFile, _rawExtensions)) return true;
+	return false;
+}
+
+bool RAWImageDecoder::_skipThisFile( const QDict<void>& loadedFiles, const QString& imageFile )
+{
+	// We're not interested in thumbnail and other files.
+	if (_fileEndsWithExtensions(imageFile, _ignoredExtensions)) return true;
+
+	// If we *are* interested in raw files even when other equivalent
+	// non-raw files are available, then we're interested in this file.
+	if (! (Settings::SettingsData::instance()->dontReadRawFilesWithOtherMatchingFile())) return false;
+
+	// If the file ends with something other than a known raw extension,
+	// we're interested in it.
+	if (! _fileEndsWithExtensions(imageFile, _rawExtensions)) return false;
+
+	// At this point, the file ends with a known raw extension, and we're
+	// not interested in raw files when other non-raw files are available.
+	// So search for an existing file with one of the standard
+	// extensions.
+	//
+	// This may not be the best way to do this, but it's using the
+	// same algorithm as _mightDecode above.
+	// -- Robert Krawitz rlk@alum.mit.edu 2007-07-22
+
+	return _fileIsKnownWithExtensions(loadedFiles, imageFile, _standardExtensions);
+}
+
+}
--- branches/extragear/kde3/graphics/kphotoalbum/ImageManager/RawImageDecoder.h #691077:691078
@@ -19,16 +19,28 @@
 #define RAWIMAGEDECODER_H
 
 #include "ImageDecoder.h"
+#include <qdict.h>
 
 namespace ImageManager
 {
 
 class RAWImageDecoder : public ImageDecoder {
 public:
-	RAWImageDecoder() {}
+	RAWImageDecoder() { _initializeExtensionLists(); }
 
 	virtual bool _decode(QImage *img, const QString& imageFile, QSize* fullSize, int dim=-1);
 	virtual bool _mightDecode( const QString& imageFile );
+	virtual bool _skipThisFile( const QDict<void>& loadedFiles, const QString& imageFile );
+
+private:
+	bool _fileExistsWithExtensions( const QString& fileName, const QStringList& extensionList ) const;
+	bool _fileEndsWithExtensions( const QString& fileName, const QStringList& extensionList ) const;
+	bool _fileIsKnownWithExtensions( const QDict<void>& files, const QString& fileName, const QStringList& extensionList ) const;
+
+	static QStringList _rawExtensions;
+	static QStringList _standardExtensions;
+	static QStringList _ignoredExtensions;
+	static void _initializeExtensionLists();
 };
 
 }