Summary: | Suggestion to turn string into QStringLiteral instead of QLatin1String | ||
---|---|---|---|
Product: | [Developer tools] clazy | Reporter: | Stephen Kelly <steveire> |
Component: | general | Assignee: | Unassigned bugs mailing-list <unassigned-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | smartins |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | http://commits.kde.org/clazy/46cfa31f4345d55d12e9697673315ddef6e0b5fc | Version Fixed In: | |
Sentry Crash Report: |
Description
Stephen Kelly
2016-04-12 20:40:29 UTC
contains(QL1S) and indexOf( QL1S) are 30% slower than their QString counterparts, last time I measured. It was caused by some fromLatin1_helper() calls, or something like that (I don't recall the exact name) That sounds like a bug in QString, right? I tried to replicate your result unsuccessfully. Do you still have your benchmark? I tried Qt 5.4.2 and some recent git branch. #include <QtTest/QTest> class StringBenchmark : public QObject { Q_OBJECT public: StringBenchmark(QObject* parent = nullptr) : QObject(parent) { for (auto i = 0; i < 10000; ++i) { mTestString += QStringLiteral("Long string part %1 --").arg(i); } } private slots: void containsQStringLiteral() { QBENCHMARK { auto result = mTestString.contains(QStringLiteral("Long string part 5555")); QVERIFY(result); } } void containsQLatin1String() { QBENCHMARK { auto result = mTestString.contains(QLatin1String("Long string part 5555")); QVERIFY(result); } } void indexOfQStringLiteral() { QBENCHMARK { auto result = mTestString.indexOf(QStringLiteral("Long string part 5555")); QCOMPARE(result, 132210); } } void indexOfQLatin1String() { QBENCHMARK { auto result = mTestString.indexOf(QLatin1String("Long string part 5555")); QCOMPARE(result, 132210); } } private: QString mTestString; }; QTEST_MAIN(StringBenchmark) #include "tester.moc" ********* Start testing of StringBenchmark ********* Config: Using QtTest library 5.8.0, Qt 5.8.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 5.2.1 20151010) PASS : StringBenchmark::initTestCase() PASS : StringBenchmark::containsQStringLiteral() RESULT : StringBenchmark::containsQStringLiteral(): 0.038 msecs per iteration (total: 79, iterations: 2048) PASS : StringBenchmark::containsQLatin1String() RESULT : StringBenchmark::containsQLatin1String(): 0.040 msecs per iteration (total: 82, iterations: 2048) PASS : StringBenchmark::indexOfQStringLiteral() RESULT : StringBenchmark::indexOfQStringLiteral(): 0.038 msecs per iteration (total: 78, iterations: 2048) PASS : StringBenchmark::indexOfQLatin1String() RESULT : StringBenchmark::indexOfQLatin1String(): 0.038 msecs per iteration (total: 78, iterations: 2048) PASS : StringBenchmark::cleanupTestCase() Totals: 6 passed, 0 failed, 0 skipped, 0 blacklisted, 1301ms ********* Finished testing of StringBenchmark ********* I also tried with larger strings. The fact that the QLatin1String overload is slower is suspect. It could be due to the test string being very small, but that's what I expect is the primary use-case of QLatin1String. #include <QtTest/QTest> #include <QtCore/QDebug> class StringBenchmark : public QObject { Q_OBJECT public: StringBenchmark(QObject* parent = nullptr) : QObject(parent) { for (auto i = 0; i < 1000000; ++i) { mTestString += QStringLiteral("Long string part %1 --").arg(i); } } private slots: void containsQStringLiteral() { QBENCHMARK { auto result = mTestString.contains(QStringLiteral("Long string part 557755")); QVERIFY(result); } } void containsQLatin1String() { QBENCHMARK { auto result = mTestString.contains(QLatin1String("Long string part 557755")); QVERIFY(result); } } void indexOfQStringLiteral() { QBENCHMARK { auto result = mTestString.indexOf(QStringLiteral("Long string part 557755")); QCOMPARE(result, 14390520); } } void indexOfQLatin1String() { QBENCHMARK { auto result = mTestString.indexOf(QLatin1String("Long string part 557755")); QCOMPARE(result, 14390520); } } private: QString mTestString; }; QTEST_MAIN(StringBenchmark) #include "tester.moc" ********* Start testing of StringBenchmark ********* Config: Using QtTest library 5.4.2, Qt 5.4.2 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 5.2.1 20151010) PASS : StringBenchmark::initTestCase() PASS : StringBenchmark::containsQStringLiteral() RESULT : StringBenchmark::containsQStringLiteral(): 5.1 msecs per iteration (total: 83, iterations: 16) PASS : StringBenchmark::containsQLatin1String() RESULT : StringBenchmark::containsQLatin1String(): 5.2 msecs per iteration (total: 84, iterations: 16) PASS : StringBenchmark::indexOfQStringLiteral() RESULT : StringBenchmark::indexOfQStringLiteral(): 5.2 msecs per iteration (total: 84, iterations: 16) PASS : StringBenchmark::indexOfQLatin1String() RESULT : StringBenchmark::indexOfQLatin1String(): 5.2 msecs per iteration (total: 84, iterations: 16) PASS : StringBenchmark::cleanupTestCase() Totals: 6 passed, 0 failed, 0 skipped, 0 blacklisted ********* Finished testing of StringBenchmark ********* I don't have the original, but just wrote this: #include <QString> #include <QDebug> #define N 100000000 bool testQStringLiteral() { qDebug() << "QStringLiteral"; bool result = false; QString s; for (int i = 0; i < N; ++i) { result |= s.contains(QStringLiteral("foo")); } return result; } bool testQLatin1String() { qDebug() << "QLatin1String"; bool result = false; QString s; for (int i = 0; i < N; ++i) { result |= s.contains(QLatin1String("foo")); } return result; } int main(int argv, char **) { if (argv == 1) { return testQLatin1String(); } else { return testQStringLiteral(); } } serj ~/lixo/st $ perf stat ./st 1 QStringLiteral Performance counter stats for './st 1': 574.619866 task-clock (msec) # 0.999 CPUs utilized 2 context-switches # 0.003 K/sec 0 cpu-migrations # 0.000 K/sec 301 page-faults # 0.524 K/sec 1818862143 cycles # 3.165 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 5707478658 instructions # 3.14 insns per cycle 1001376209 branches # 1742.676 M/sec 35138 branch-misses # 0.00% of all branches 0.575097370 seconds time elapsed serj ~/lixo/st $ perf stat ./st QLatin1String Performance counter stats for './st': 1175.225780 task-clock (msec) # 1.000 CPUs utilized 2 context-switches # 0.002 K/sec 0 cpu-migrations # 0.000 K/sec 300 page-faults # 0.255 K/sec 3733101298 cycles # 3.176 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 12508151290 instructions # 3.35 insns per cycle 2101498161 branches # 1788.165 M/sec 38406 branch-misses # 0.00% of all branches 1.175587617 seconds time elapsed Almost 50% of difference (looking at task-clock or cycles), make sure to build in release mode. Used Qt 5.6. Or a 100% difference, depending how you see it :). I think that settles it then Upstreamed: https://bugreports.qt.io/browse/QTBUG-52617 I modified the testcase to test a non-empty string: #include <QString> #include <QDebug> #define N 100000 bool testQStringLiteral() { qDebug() << "QStringLiteral"; bool result = false; QString s; for (auto i = 0; i < 1000; ++i) { s += QStringLiteral("Long string part %1 --").arg(i); } for (int i = 0; i < N; ++i) { result |= s.contains(QStringLiteral("foo")); } return result; } bool testQLatin1String() { qDebug() << "QLatin1String"; bool result = false; QString s; for (auto i = 0; i < 1000; ++i) { s += QStringLiteral("Long string part %1 --").arg(i); } for (int i = 0; i < N; ++i) { result |= s.contains(QLatin1String("foo")); } return result; } int main(int argv, char **) { if (argv == 1) { return testQLatin1String(); } else { return testQStringLiteral(); } } Here's my result $ perf stat ./qttester QLatin1String Performance counter stats for './qttester': 2780.971861 task-clock (msec) # 1.001 CPUs utilized 4 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 287 page-faults # 0.103 K/sec 8,751,360,206 cycles # 3.147 GHz (83.30%) 1,093,537,940 stalled-cycles-frontend # 12.50% frontend cycles idle (83.30%) 18,016,598 stalled-cycles-backend # 0.21% backend cycles idle (66.60%) 25,205,593,541 instructions # 2.88 insns per cycle # 0.04 stalled cycles per insn (83.30%) 4,581,167,754 branches # 1647.326 M/sec (83.44%) 132,291 branch-misses # 0.00% of all branches (83.39%) 2.778918892 seconds time elapsed $ perf stat ./qttester 1 QStringLiteral Performance counter stats for './qttester 1': 3136.221850 task-clock (msec) # 1.001 CPUs utilized 5 context-switches # 0.002 K/sec 0 cpu-migrations # 0.000 K/sec 281 page-faults # 0.090 K/sec 8,869,692,354 cycles # 2.828 GHz (83.28%) 1,988,983,951 stalled-cycles-frontend # 22.42% frontend cycles idle (83.28%) 20,247,953 stalled-cycles-backend # 0.23% backend cycles idle (66.67%) 25,190,950,431 instructions # 2.84 insns per cycle # 0.08 stalled cycles per insn (83.40%) 4,580,392,600 branches # 1460.481 M/sec (83.40%) 138,789 branch-misses # 0.00% of all branches (83.40%) 3.133671112 seconds time elapsed I don't think clazy is right here. I think clazy is recommending something which is against intuition, and I don't think the benchmarks here justify that. Optimizing for an empty string doesn't make sense. That's cheating, that benchmark doesn't compare QLatin1String vs QStringLiteral, well, maybe it does, for 0.1 % of the code, the remaining 99.9% it's running the common implementation between both contains() overloads. For me, both runs give me the same time, can you reproduce that consistently ? I don't think it's bad to optimize the small string case if it doesn't make the big string worse. Your benchmark is only for the case of the searched string being empty. In other benchmarks here, the two perform the same. That isn't a good reason for clazy to recommend something that 1) goes against the general recommendation, 2) we don't have strong benchmarks in favor of. After https://codereview.qt-project.org/156181 I get results for your benchmark which are the same within experimental error. Please apply it and confirm. If you think clazy should continue to recommend non-use of QLatin1String for these methods, please provide a better benchmark than either of the ones we've posted so far. Git commit 46cfa31f4345d55d12e9697673315ddef6e0b5fc by Sergio Martins. Committed on 22/04/2016 at 23:50. Pushed by smartins into branch 'master'. qstring-allocations: Also use QL1S for QString::{contains, indexOf} If the haystack is empty passing a QSL will be faster, but as soon as the haystack gets bigger that advantage is gone. M +2 -2 checks/qstring-allocations.cpp M +5 -5 tests/qstring-allocations/main.cpp_fixed.cpp.expected http://commits.kde.org/clazy/46cfa31f4345d55d12e9697673315ddef6e0b5fc |