Bug 361691

Summary: Suggestion to turn string into QStringLiteral instead of QLatin1String
Product: [Developer tools] clazy Reporter: Stephen Kelly <steveire>
Component: generalAssignee: Unassigned bugs mailing-list <unassigned-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: smartins
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Stephen Kelly 2016-04-12 20:40:29 UTC
The qstring-allocations unit test contains

    s15.contains("ff"); // Warning, should be QStringLiteral
    s15.indexOf("ff"); // Warning, should be QStringLiteral

However, both `contains()` and `indexOf()` have overloads taking a QLatin1String. Shouldn't clazy change these to QLatin1String instead of QStringLiteral?
Comment 1 Sergio Martins 2016-04-12 21:37:44 UTC
contains(QL1S) and indexOf( QL1S) are 30% slower than their QString counterparts, last time I measured.
It was caused by some fromLatin1_helper() calls, or something like that (I don't recall the exact name)
Comment 2 Stephen Kelly 2016-04-13 20:57:16 UTC
That sounds like a bug in QString, right?
Comment 3 Stephen Kelly 2016-04-13 21:10:07 UTC
I tried to replicate your result unsuccessfully. Do you still have your benchmark? I tried Qt 5.4.2 and some recent git branch.

#include <QtTest/QTest>

class StringBenchmark : public QObject
{
  Q_OBJECT
public:
  StringBenchmark(QObject* parent = nullptr) : QObject(parent)
  {
    for (auto i = 0; i < 10000; ++i)
    {
      mTestString += QStringLiteral("Long string part %1 --").arg(i);
    }
  }

private slots:
  void containsQStringLiteral() {
    QBENCHMARK {
      auto result = mTestString.contains(QStringLiteral("Long string part 5555"));
      QVERIFY(result);
    }
  }

  void containsQLatin1String() {
    QBENCHMARK {
      auto result = mTestString.contains(QLatin1String("Long string part 5555"));
      QVERIFY(result);
    }
  }

  void indexOfQStringLiteral() {
    QBENCHMARK {
      auto result = mTestString.indexOf(QStringLiteral("Long string part 5555"));
      QCOMPARE(result, 132210);
    }
  }

  void indexOfQLatin1String() {
    QBENCHMARK {
      auto result = mTestString.indexOf(QLatin1String("Long string part 5555"));
      QCOMPARE(result, 132210);
    }
  }

private:
  QString mTestString;
};

QTEST_MAIN(StringBenchmark)

#include "tester.moc"

********* Start testing of StringBenchmark *********
Config: Using QtTest library 5.8.0, Qt 5.8.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 5.2.1 20151010)
PASS   : StringBenchmark::initTestCase()
PASS   : StringBenchmark::containsQStringLiteral()
RESULT : StringBenchmark::containsQStringLiteral():
     0.038 msecs per iteration (total: 79, iterations: 2048)
PASS   : StringBenchmark::containsQLatin1String()
RESULT : StringBenchmark::containsQLatin1String():
     0.040 msecs per iteration (total: 82, iterations: 2048)
PASS   : StringBenchmark::indexOfQStringLiteral()
RESULT : StringBenchmark::indexOfQStringLiteral():
     0.038 msecs per iteration (total: 78, iterations: 2048)
PASS   : StringBenchmark::indexOfQLatin1String()
RESULT : StringBenchmark::indexOfQLatin1String():
     0.038 msecs per iteration (total: 78, iterations: 2048)
PASS   : StringBenchmark::cleanupTestCase()
Totals: 6 passed, 0 failed, 0 skipped, 0 blacklisted, 1301ms
********* Finished testing of StringBenchmark *********
Comment 4 Stephen Kelly 2016-04-13 21:20:46 UTC
I also tried with larger strings. The fact that the QLatin1String overload is slower is suspect. It could be due to the test string being very small, but that's what I expect is the primary use-case of QLatin1String.


#include <QtTest/QTest>
#include <QtCore/QDebug>

class StringBenchmark : public QObject
{
  Q_OBJECT
public:
  StringBenchmark(QObject* parent = nullptr) : QObject(parent)
  {
    for (auto i = 0; i < 1000000; ++i)
    {
      mTestString += QStringLiteral("Long string part %1 --").arg(i);
    }
  }

private slots:
  void containsQStringLiteral() {
    QBENCHMARK {
      auto result = mTestString.contains(QStringLiteral("Long string part 557755"));
      QVERIFY(result);
    }
  }

  void containsQLatin1String() {
    QBENCHMARK {
      auto result = mTestString.contains(QLatin1String("Long string part 557755"));
      QVERIFY(result);
    }
  }

  void indexOfQStringLiteral() {
    QBENCHMARK {
      auto result = mTestString.indexOf(QStringLiteral("Long string part 557755"));
      QCOMPARE(result, 14390520);
    }
  }

  void indexOfQLatin1String() {
    QBENCHMARK {
      auto result = mTestString.indexOf(QLatin1String("Long string part 557755"));
      QCOMPARE(result, 14390520);
    }
  }

private:
  QString mTestString;
};

QTEST_MAIN(StringBenchmark)

#include "tester.moc"


********* Start testing of StringBenchmark *********
Config: Using QtTest library 5.4.2, Qt 5.4.2 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 5.2.1 20151010)
PASS   : StringBenchmark::initTestCase()
PASS   : StringBenchmark::containsQStringLiteral()
RESULT : StringBenchmark::containsQStringLiteral():
     5.1 msecs per iteration (total: 83, iterations: 16)
PASS   : StringBenchmark::containsQLatin1String()
RESULT : StringBenchmark::containsQLatin1String():
     5.2 msecs per iteration (total: 84, iterations: 16)
PASS   : StringBenchmark::indexOfQStringLiteral()
RESULT : StringBenchmark::indexOfQStringLiteral():
     5.2 msecs per iteration (total: 84, iterations: 16)
PASS   : StringBenchmark::indexOfQLatin1String()
RESULT : StringBenchmark::indexOfQLatin1String():
     5.2 msecs per iteration (total: 84, iterations: 16)
PASS   : StringBenchmark::cleanupTestCase()
Totals: 6 passed, 0 failed, 0 skipped, 0 blacklisted
********* Finished testing of StringBenchmark *********
Comment 5 Sergio Martins 2016-04-13 21:32:22 UTC
I don't have the original, but just wrote this:

#include <QString>
#include <QDebug>

#define N 100000000

bool testQStringLiteral()
{
    qDebug() << "QStringLiteral";
    bool result = false;
    QString s;
    for (int i = 0; i < N; ++i)
    {
        result |= s.contains(QStringLiteral("foo"));
    }
    return result;
}

bool testQLatin1String()
{
    qDebug() << "QLatin1String";
    bool result = false;
    QString s;
    for (int i = 0; i < N; ++i)
    {
        result |= s.contains(QLatin1String("foo"));
    }
    return result;
}

int main(int argv, char **)
{
    if (argv == 1) {
        return testQLatin1String();
    } else {
        return testQStringLiteral();
    }
}

serj ~/lixo/st $ perf stat ./st 1
QStringLiteral

 Performance counter stats for './st 1':

        574.619866      task-clock (msec)         #    0.999 CPUs utilized          
                 2      context-switches          #    0.003 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               301      page-faults               #    0.524 K/sec                  
        1818862143      cycles                    #    3.165 GHz                    
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
        5707478658      instructions              #    3.14  insns per cycle        
        1001376209      branches                  # 1742.676 M/sec                  
             35138      branch-misses             #    0.00% of all branches        

       0.575097370 seconds time elapsed

serj ~/lixo/st $ perf stat ./st  
QLatin1String

 Performance counter stats for './st':

       1175.225780      task-clock (msec)         #    1.000 CPUs utilized          
                 2      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               300      page-faults               #    0.255 K/sec                  
        3733101298      cycles                    #    3.176 GHz                    
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
       12508151290      instructions              #    3.35  insns per cycle        
        2101498161      branches                  # 1788.165 M/sec                  
             38406      branch-misses             #    0.00% of all branches        

       1.175587617 seconds time elapsed
Comment 6 Sergio Martins 2016-04-13 21:33:34 UTC
Almost 50% of difference (looking at task-clock or cycles), make sure to build in release mode.
Used Qt 5.6.
Comment 7 Stephen Kelly 2016-04-13 22:16:18 UTC
Or a 100% difference, depending how you see it :).
Comment 8 Sergio Martins 2016-04-15 18:37:23 UTC
I think that settles it then
Comment 9 Stephen Kelly 2016-04-16 16:12:41 UTC
Upstreamed: https://bugreports.qt.io/browse/QTBUG-52617
Comment 10 Stephen Kelly 2016-04-16 16:35:13 UTC
I modified the testcase to test a non-empty string:

 
#include <QString>
#include <QDebug>

#define N 100000

bool testQStringLiteral()
{
    qDebug() << "QStringLiteral";
    bool result = false;
    QString s;
    for (auto i = 0; i < 1000; ++i)
    {
      s += QStringLiteral("Long string part %1 --").arg(i);
    }
    for (int i = 0; i < N; ++i)
    {
        result |= s.contains(QStringLiteral("foo"));
    }
    return result;
}

bool testQLatin1String()
{
    qDebug() << "QLatin1String";
    bool result = false;
    QString s;
    for (auto i = 0; i < 1000; ++i)
    {
      s += QStringLiteral("Long string part %1 --").arg(i);
    }
    for (int i = 0; i < N; ++i)
    {
        result |= s.contains(QLatin1String("foo"));
    }
    return result;
}

int main(int argv, char **)
{
    if (argv == 1) {
        return testQLatin1String();
    } else {
        return testQStringLiteral();
    }
}


Here's my result
 
$ perf stat ./qttester 
QLatin1String

 Performance counter stats for './qttester':

       2780.971861      task-clock (msec)         #    1.001 CPUs utilized          
                 4      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               287      page-faults               #    0.103 K/sec                  
     8,751,360,206      cycles                    #    3.147 GHz                      (83.30%)
     1,093,537,940      stalled-cycles-frontend   #   12.50% frontend cycles idle     (83.30%)
        18,016,598      stalled-cycles-backend    #    0.21% backend  cycles idle     (66.60%)
    25,205,593,541      instructions              #    2.88  insns per cycle        
                                                  #    0.04  stalled cycles per insn  (83.30%)
     4,581,167,754      branches                  # 1647.326 M/sec                    (83.44%)
           132,291      branch-misses             #    0.00% of all branches          (83.39%)

       2.778918892 seconds time elapsed

$ perf stat ./qttester 1
QStringLiteral

 Performance counter stats for './qttester 1':

       3136.221850      task-clock (msec)         #    1.001 CPUs utilized          
                 5      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               281      page-faults               #    0.090 K/sec                  
     8,869,692,354      cycles                    #    2.828 GHz                      (83.28%)
     1,988,983,951      stalled-cycles-frontend   #   22.42% frontend cycles idle     (83.28%)
        20,247,953      stalled-cycles-backend    #    0.23% backend  cycles idle     (66.67%)
    25,190,950,431      instructions              #    2.84  insns per cycle        
                                                  #    0.08  stalled cycles per insn  (83.40%)
     4,580,392,600      branches                  # 1460.481 M/sec                    (83.40%)
           138,789      branch-misses             #    0.00% of all branches          (83.40%)

       3.133671112 seconds time elapsed



I don't think clazy is right here. I think clazy is recommending something which is against intuition, and I don't think the benchmarks here justify that. Optimizing for an empty string doesn't make sense.
Comment 11 Sergio Martins 2016-04-18 19:51:16 UTC
That's cheating, that benchmark doesn't compare QLatin1String vs QStringLiteral, well, maybe it does, for 0.1 % of the code, the remaining 99.9% it's running the common implementation between both contains() overloads.

For me, both runs give me the same time, can you reproduce that consistently ?

I don't think it's bad to optimize the small string case if it doesn't make the big string worse.
Comment 12 Stephen Kelly 2016-04-18 21:53:54 UTC
Your benchmark is only for the case of the searched string being empty. In other benchmarks here, the two perform the same.

That isn't a good reason for clazy to recommend something that 1) goes against the general recommendation, 2) we don't have strong benchmarks in favor of.

After https://codereview.qt-project.org/156181 I get results for your benchmark which are the same within experimental error. Please apply it and confirm.

If you think clazy should continue to recommend non-use of QLatin1String for these methods, please provide a better benchmark than either of the ones we've posted so far.
Comment 13 Sergio Martins 2016-04-23 00:02:45 UTC
Git commit 46cfa31f4345d55d12e9697673315ddef6e0b5fc by Sergio Martins.
Committed on 22/04/2016 at 23:50.
Pushed by smartins into branch 'master'.

qstring-allocations: Also use QL1S for QString::{contains, indexOf}

If the haystack is empty passing a QSL will be faster, but as soon
as the haystack gets bigger that advantage is gone.

M  +2    -2    checks/qstring-allocations.cpp
M  +5    -5    tests/qstring-allocations/main.cpp_fixed.cpp.expected

http://commits.kde.org/clazy/46cfa31f4345d55d12e9697673315ddef6e0b5fc