Bug 126228

Summary: wrong context browser sorting with postgresql
Product: [Applications] amarok Reporter: simon <s7mon>
Component: generalAssignee: Amarok Developers <amarok-bugs-dist>
Status: RESOLVED WORKSFORME    
Severity: normal CC: pndiku
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Fix for NULL values

Description simon 2006-04-25 17:44:38 UTC
Version:           1.4 SVN (using KDE KDE 3.4.3)
Installed from:    Gentoo Packages
Compiler:          gcc 3.4.5-r1 gcc version 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)
OS:                Linux

The statistics in the contextbrowser where representing score=0 in the suggested field and favourite songs by artist first, when using postgresql. So the order is "never heared"  > 80%. 
This is because in postgresql ORDER BY :
"The null value sorts higher than any other value. In other words, with ascending sort order, null values sort at the end, and with descending sort order, null values sort at the beginning."
So when generating those stats amaroK does an left join of  tags with statistics, where every tags with no matching statistic data hast NULL values then. Which leads to the described behaviour.

To get a sorting like 90% > 60% >0  > NULL one could write:
ORDER BY coalesce(stats.percentage,-1) DESC;
or
ORDER BY stats.percentage is NULL, stats.percentage desc;

The options i see are to catch this everytime in QueryBuilder::sortBy when sorting with respect to score, or have an extra sortyBy function.

Here is the patch i use currently:

--- org_collectiondb.cpp	2006-04-25 12:10:12.000000000 +0200
+++ collectiondb.cpp	2006-04-25 12:12:37.000000000 +0200
@@ -4897,9 +4897,20 @@ QueryBuilder::sortBy( int table, Q_INT64
     if ( !m_sort.isEmpty() ) m_sort += ",";
     if ( b ) m_sort += "LOWER( ";
     if ( table & tabYear ) m_sort += "(";
-
-    m_sort += tableName( table ) + ".";
-    m_sort += valueName( value );
+    // null values to the back
+    if ( CollectionDB::instance()->getType() == DbConnection::postgresql 
+	 && ( descending ) && ( value & valScore )
+       ){ 
+	//only adapt at score	
+	m_sort += "coalesce(";
+	m_sort += tableName( table ) + ".";
+    	m_sort += valueName( value );
+	m_sort += ",0)";	
+	//order by coalesce(stats.percentage,0) 	
+    } else {
+     m_sort += tableName( table ) + ".";
+     m_sort += valueName( value );
+    }
 
     if (CollectionDB::instance()->getType() == DbConnection::postgresql)
     {
Comment 1 Jaison Lee 2006-04-25 18:21:45 UTC
*** Bug 126229 has been marked as a duplicate of this bug. ***
Comment 2 Peter C. Ndikuwera 2006-04-26 15:56:46 UTC
I'd thought of using the coalesce function but I didn't want to introduce more IF statements into that function (setOptions). At the moment, the latest SVN ignores NULLs. 

The result of this is:
1) we get a similar result set to earlier versions where we didn't use a LEFT JOIN.
2) unfortunately, like before, we don't get a _real_ result set, because we ignore the NULLs.

I'll take a look at your patch and make sure it doesn't break MySQL et al.
Comment 3 Peter C. Ndikuwera 2006-04-26 16:05:52 UTC
To expand on my previous comments.

Point #2 is actually redundant, because postgres's AVG function _ignores_ NULLs, so an 
    AVG (1, 2, NULL) returns 3/2=1.5, instead of 3/3=1.

To make matters worse, so far, the potential NULLs which would need to be 'coalesced' occur for:
- score
- playcounter
- rating

If this increases in the future, we have to expand our IF condition.... :-(

I think it's best if we wait until postgres accept the SQL-standard ORDER BY x NULLS LAST syntax before we apply any patches to the sorting code.
Comment 4 simon 2006-04-26 16:29:06 UTC
Ok i just checked on mysql the coalesce function seems to be the same there and also the
syntax  of writing statistics.percentage is NULL, statistics.percentage DESC
gives the same result (null at last).

The following worked on my db using either mysql or postgresql giving the same result.

SELECT statistics.url, statistics.percentage, statistics.rating, tags.sampler
FROM tags LEFT JOIN statistics on tags.url=statistics.url
--ORDER BY statistics.percentage is NULL, statistics.percentage DESC
ORDER BY coalesce(statistics.percentage,0) DESC
LIMIT 20;

If i understand the coalesce function right,  all it does is give back the virst value not NULL, and by setting the second value to 0 all NULL should become 0.
So we have two aspects 
1. the cost of this coalsce 
2. and is there a problem with replacing NULL with 0 in our queries

One question though, when talking about latest svn, ignoring NULLs. I've  had this with svn (1 max 2 days old) where there changes with respect to NULLs or am i misunderstanding you?
Comment 5 Peter C. Ndikuwera 2006-04-26 16:44:33 UTC
I made the changes to ignore NULLs in latest SVN. 

At the moment, we filter out all NULL percentages, so the query looks something like:

SELECT statistics.url, statistics.percentage, statistics.rating, tags.sampler 
 FROM tags LEFT JOIN statistics on tags.url=statistics.url 
WHERE statistics.percentage >= 0
ORDER BY statistics.percentage DESC LIMIT 20;

which IMHO returns the same results as what you're specifying.
Comment 6 simon 2006-04-26 16:52:33 UTC
Ah i see (haven't checked out today) thanks.
I see no problem with filtering out NULL  values and therefore tracks i haven't played yet, because 
showing those as favourite doesn't seem right to me either. So i think this is not a bad solution.

P.S.: ignoring NULL values in aggregation functions as mentioned above is afaik sql standard
P.P.S i see you didn#t meen this ^ when talking about ignoring NULLs here :P
Comment 7 Peter C. Ndikuwera 2006-04-26 17:14:05 UTC
Created attachment 15783 [details]
Fix for NULL values

Simon,

I've gone thru MySQL, SQLite & PostgreSQL docs and it seems you're right about
the global use of the coalesce function.

Please test this patch (against latest SVN)

Peter
Comment 8 simon 2006-04-26 17:58:18 UTC
ok it seems to work at the first look, although i see things like this:
So there is definitively the need to filter for which columns it's used.

 [CollectionDB] POSTGRESQL QUERY FAILED: ERROR:  COALESCE types text and integer cannot be matched
amarok: 
amarok: FAILED QUERY: SELECT DISTINCT album.name,album.id,year.name as __discard ,LOWER( album.name) as __discard  FROM tags LEFT JOIN album ON album.id=tags.album LEFT JOIN year ON year.id=tags.year  WHERE true AND ( false OR tags.artist  ILIKE '1862' ESCAPE '/'  ) AND tags.sampler = false  ORDER BY COALESCE (year.name, 0) DESC ,LOWER( album.name ) ;
amarok:     [CollectionDB] POSTGRESQL QUERY FAILED: ERROR:  COALESCE types text and integer cannot be matched
amarok: 
amarok: FAILED QUERY: SELECT DISTINCT album.name,album.id,year.name as __discard ,LOWER( album.name) as __discard  FROM tags LEFT JOIN album ON album.id=tags.album LEFT JOIN year ON year.id=tags.year  WHERE true AND ( false OR tags.artist  ILIKE '1862' ESCAPE '/'  ) AND tags.sampler = true  ORDER BY COALESCE (year.name, 0) DESC ,LOWER( album.name ) ;
Comment 9 Peter C. Ndikuwera 2006-04-26 18:00:45 UTC
Yep, doing that now.
Comment 10 Peter C. Ndikuwera 2006-04-26 18:31:01 UTC
SVN commit 534217 by pndiku:

CCBUG: 126228
CCMAIL: s7mon@web.de

Use the coalesce function when sorting. NULLs will thus be sorted last.
More intelligent than simply filtering out NULL values on a case-by-case
basis.

Thanks to Simon <s7mon AT web DOT de> for pointing me in the right
direction.



 M  +21 -19    collectiondb.cpp  
 M  +0 -2      contextbrowser.cpp  
 M  +0 -2      playlistbrowser.cpp  
 M  +0 -5      statistics.cpp  


--- trunk/extragear/multimedia/amarok/src/collectiondb.cpp #534216:534217
@@ -4911,21 +4911,19 @@
          table & tabYear )
         b = false;
 
+	// only coalesce for certain columns
+	bool c = false;
+    if ( value & valScore || value & valRating || value & valPlayCounter || value & valPercentage )
+		c = true;
+
     if ( !m_sort.isEmpty() ) m_sort += ",";
     if ( b ) m_sort += "LOWER( ";
-    if ( table & tabYear ) m_sort += "(";
+    if ( c ) m_sort += "COALESCE( ";
 
     m_sort += tableName( table ) + ".";
     m_sort += valueName( value );
 
-    if (CollectionDB::instance()->getType() == DbConnection::postgresql)
-    {
-        if ( table & tabYear ) m_sort += ")";
-    }
-    else
-    {
-        if ( table & tabYear ) m_sort += "+0)";
-    }
+    if ( c ) m_sort += ", 0 )";
 
     if ( b ) m_sort += " ) ";
     if ( descending ) m_sort += " DESC ";
@@ -4956,23 +4954,27 @@
          table & tabYear )
         b = false;
 
+	// only coalesce for certain columns
+	bool c = false;
+    if ( value & valScore || value & valRating || value & valPlayCounter || value & valPercentage )
+		c = true;
+
     if ( !m_sort.isEmpty() ) m_sort += ",";
     //m_sort += functionName( function ) + "(";
     if ( b ) m_sort += "LOWER( ";
-    if ( table & tabYear ) m_sort += "(";
+    if ( c && CollectionDB::instance()->getType() != DbConnection::mysql) m_sort += "COALESCE( ";
 
-    QString columnName = functionName( function )+tableName( table )+valueName( value );
-    m_sort += columnName;
+	QString columnName;
 
     if (CollectionDB::instance()->getType() == DbConnection::postgresql)
-    {
-        if ( table & tabYear ) m_sort += ")";
-    }
-    else
-    {
-        if ( table & tabYear ) m_sort += "+0)";
-    }
+		columnName = functionName( function )+"("+tableName( table )+"."+valueName( value )+")";
+	else
+		columnName = functionName( function )+tableName( table )+valueName( value );
 
+    m_sort += columnName;
+
+	if ( c && CollectionDB::instance()->getType() != DbConnection::mysql) m_sort += ", 0 )";
+
     if ( b ) m_sort += " ) ";
     //m_sort += " ) ";
     if ( descending ) m_sort += " DESC ";
--- trunk/extragear/multimedia/amarok/src/contextbrowser.cpp #534216:534217
@@ -1375,7 +1375,6 @@
     qb.addReturnFunctionValue( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valPercentage );
     qb.addReturnValue( QueryBuilder::tabArtist, QueryBuilder::valID );
     qb.sortByFunction( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valPercentage, true );
-    qb.addFilter( QueryBuilder::tabStats, QueryBuilder::valPercentage, QString::number( -1 ), QueryBuilder::modeGreater, true);
     qb.excludeMatch( QueryBuilder::tabAlbum, i18n( "Unknown" ) );
     qb.groupBy( QueryBuilder::tabAlbum, QueryBuilder::valID );
     qb.groupBy( QueryBuilder::tabArtist, QueryBuilder::valID );
@@ -2001,7 +2000,6 @@
     qb.addReturnValue( QueryBuilder::tabStats, QueryBuilder::valScore );
     qb.addReturnValue( QueryBuilder::tabStats, QueryBuilder::valRating );
     qb.addMatch( QueryBuilder::tabSong, QueryBuilder::valArtistID, QString::number( artist_id ) );
-    qb.addFilter( QueryBuilder::tabStats, QueryBuilder::valPercentage, QString::number( -1 ), QueryBuilder::modeGreater, true);
     qb.sortBy( QueryBuilder::tabStats, QueryBuilder::valPercentage, true );
     qb.setLimit( 0, 10 );
     values = qb.run();
--- trunk/extragear/multimedia/amarok/src/playlistbrowser.cpp #534216:534217
@@ -584,7 +584,6 @@
 
     /********** Favorite Tracks **************/
     qb.initSQLDrag();
-    qb.addFilter( QueryBuilder::tabStats, QueryBuilder::valPercentage, QString::number( -1 ), QueryBuilder::modeGreater, true);
     qb.sortBy( QueryBuilder::tabStats, QueryBuilder::valPercentage, true );
     qb.setLimit( 0, 15 );
 
@@ -601,7 +600,6 @@
 
     /********** Most Played **************/
     qb.initSQLDrag();
-    qb.addFilter( QueryBuilder::tabStats, QueryBuilder::valPlayCounter, QString::number( -1 ), QueryBuilder::modeGreater, true);
     qb.sortBy( QueryBuilder::tabStats, QueryBuilder::valPlayCounter, true );
     qb.setLimit( 0, 15 );
 
--- trunk/extragear/multimedia/amarok/src/statistics.cpp #534216:534217
@@ -320,7 +320,6 @@
         qb.addReturnValue( QueryBuilder::tabSong, QueryBuilder::valURL );
         qb.addReturnValue( QueryBuilder::tabStats, QueryBuilder::valScore );
         qb.addReturnValue( QueryBuilder::tabStats, QueryBuilder::valRating );
-        qb.addFilter( QueryBuilder::tabStats, QueryBuilder::valScore, QString::number( -1 ), QueryBuilder::modeGreater, true);
         qb.setGoogleFilter( QueryBuilder::tabSong | QueryBuilder::tabArtist, m_filter );
         qb.sortBy( QueryBuilder::tabStats, QueryBuilder::valPercentage, true );
         qb.setLimit( 0, 50 );
@@ -354,7 +353,6 @@
         qb.addReturnValue( QueryBuilder::tabSong, QueryBuilder::valURL );
         qb.addReturnValue( QueryBuilder::tabStats, QueryBuilder::valPlayCounter );
         qb.addReturnValue( QueryBuilder::tabStats, QueryBuilder::valRating );
-        qb.addFilter( QueryBuilder::tabStats, QueryBuilder::valPlayCounter, QString::number( -1 ), QueryBuilder::modeGreater, true);
         qb.setGoogleFilter( QueryBuilder::tabSong | QueryBuilder::tabArtist, m_filter );
         qb.sortBy( QueryBuilder::tabStats, QueryBuilder::valPlayCounter, true );
         qb.setLimit( 0, 50 );
@@ -387,7 +385,6 @@
         qb.addReturnValue( QueryBuilder::tabArtist, QueryBuilder::valName );
         qb.addReturnFunctionValue( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valPercentage );
         qb.addReturnFunctionValue( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valRating );
-        qb.addFilter( QueryBuilder::tabStats, QueryBuilder::valPercentage, QString::number( -1 ), QueryBuilder::modeGreater, true);
         qb.sortByFunction( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valPercentage, true );
         qb.setGoogleFilter( QueryBuilder::tabArtist, m_filter );
         qb.groupBy( QueryBuilder::tabArtist, QueryBuilder::valName);
@@ -425,7 +422,6 @@
         qb.addReturnFunctionValue( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valPercentage );
         qb.addReturnFunctionValue( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valRating );
         qb.addReturnValue( QueryBuilder::tabSong, QueryBuilder::valIsCompilation );
-        qb.addFilter( QueryBuilder::tabStats, QueryBuilder::valPercentage, QString::number( -1 ), QueryBuilder::modeGreater, true);
         qb.setGoogleFilter( QueryBuilder::tabAlbum | QueryBuilder::tabArtist, m_filter );
         qb.sortByFunction( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valPercentage, true );
         qb.excludeMatch( QueryBuilder::tabAlbum, i18n( "Unknown" ) );
@@ -469,7 +465,6 @@
         qb.addReturnValue( QueryBuilder::tabGenre, QueryBuilder::valName );
         qb.addReturnFunctionValue( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valPercentage );
         qb.addReturnFunctionValue( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valRating );
-        qb.addFilter( QueryBuilder::tabStats, QueryBuilder::valPercentage, QString::number( -1 ), QueryBuilder::modeGreater, true);
         qb.setGoogleFilter( QueryBuilder::tabGenre, m_filter );
         qb.sortByFunction( QueryBuilder::funcAvg, QueryBuilder::tabStats, QueryBuilder::valScore, true );
         qb.groupBy( QueryBuilder::tabGenre, QueryBuilder::valName);
Comment 11 simon 2006-04-26 18:38:25 UTC
Just saw that in collectionbrowser  with the first patch: i couldn't open an artist (speaking + > albums and titles from this artist showing up)

 [CollectionDB] POSTGRESQL QUERY FAILED: ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list
amarok: 
amarok: FAILED QUERY: SELECT DISTINCT tags.title,tags.url,LOWER( tags.title) as __discard ,LOWER( tags.discnumber) as __d                                   iscard ,tags.track as __discard ,LOWER( tags.url) as __discard  FROM tags LEFT JOIN artist ON artist.id=tags.artist  WHERE true AND ( false OR artist.name  ILIKE '4Lyn' ESCAPE '/'  )  ORDER BY LOWER( tags.title ) ,LOWER( tags.discnumber ) ,CO                                   ALESCE (tags.track, 0),LOWER( tags.url ) ;

Comment 12 simon 2006-04-30 17:44:26 UTC
I think we can mark this solved now - everything is now working as it should (with respect to this bug)