Version: (using KDE KDE 3.3.2) Installed from: Debian testing/unstable Packages OS: Linux According to section 15.5.4.14 of http://www.ecma-international.org/publications/files/ecma-st/ECMA-262.pdf String.split, if given a RegExp with parentheses, should intersperse the split parts of the string with the matches corresponding to each pair of parens. Konqueror doesn't do this. For example, if you type these into Konqueror's Javascript debugger, then 'abc'.split(/b/) should (and does) give a,c but 'abc'.split(/(b)/) should give a,b,c but it gives a,c
Confirmed. I hope this issue can be resolved from within kjs itself. The regular expressions themselves are handled by a 3rd party library.
There is a workaround for this bug, but I would not call it a fix. If you use the following code, then you can use 'abc'.parenSplit(/b/) instead of 'abc'.split(/b/). // String.prototype.parenSplit should do what ECMAscript says // String.prototype.split does, interspersing paren matches between // the split elements if (String('abc'.split(/(b)/))!='a,b,c') { // broken String.split, e.g. konq, IE String.prototype.parenSplit=function (re) { var m=re.exec(this); if (!m) return [this]; // without this, we have // 'ab'.parenSplit(/a|(b)/) != 'ab'.split(/a|(b)/) for(var i=0; i<m.length; ++i) { if (typeof m[i]=='undefined') m[i]=''; } return [this.substring(0,m.index)] .concat(m.slice(1)) .concat(this.substring(m.index+m[0].length).parenSplit(re)); }; } else { String.prototype.parenSplit=function (re) {return this.split(re);}; }
Confirmed still a problem.
Created attachment 17512 [details] Improve standards compliance of string.split() function + fixes for Regex when using POSIX regex.h Hi, please consider this patch for fixing bug 110597 (for 3.5 branch, though it should be possible to do something similar in trunk). The string.split function inserts matched subpatterns into the result array as per the ECMA standard. This is easy with PCRE, but I had to change the regex code when compiled using regex.h. Firstly, the nrSubPatterns variable should equal the number subpatterns, not including the +1 for the whole pattern (as it did before). Also, counting the subpatterns by checking pmatch[i].rm_so != -1 does not work, because empty subpatterns will be ignored. E.g. 'abcd'.split(/(x)?(b)/) goes horribly wrong as the current method of counting subpatterns will decide there are zero, because the (x)? pattern does not match (thus pmatch[1].rm_so == -1). Note the difference between that and /(x?)(b)/, in which case the first subpattern would match, it would just be an empty string. The only way I could find to get round this was to write some code to count the number of patterns by looking for '(' characters that are not escaped or inside character classes. Not ideal but I cannot see an alternative. Hopefully everyone uses PCRE anyway - it doesn't seem that regex.h supports utf-8 either. James
Hi, I'll try to take a look at your patch in a week or so, but can't guarantee it --- I am just pretty sure I wouldn't be able to do it over the next week (but may be someone else will). Yeah, it's pretty much expected that anyone not doing an embedded build would use libpcre. Thank your for your contribution.
Message from the Bugsquad and Konqueror teams: This bug is closed as outdated, as we do not have the manpower to maintain the KDE3 version anymore. If you still can reproduce this issue with Konqueror 4.8.4 or later, please open a new report. Thank you for your understanding.