Summary: | String.split(RegExp) with parens is not ECMA standards-conformant | ||
---|---|---|---|
Product: | [Applications] konqueror | Reporter: | lupin <lupin.wp> |
Component: | kjs | Assignee: | Konqueror Developers <konq-bugs> |
Status: | RESOLVED UNMAINTAINED | ||
Severity: | normal | CC: | maksim |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Debian testing | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Attachments: | Improve standards compliance of string.split() function + fixes for Regex when using POSIX regex.h |
Description
lupin
2005-08-12 00:31:20 UTC
Confirmed. I hope this issue can be resolved from within kjs itself. The regular expressions themselves are handled by a 3rd party library. There is a workaround for this bug, but I would not call it a fix. If you use the following code, then you can use 'abc'.parenSplit(/b/) instead of 'abc'.split(/b/). // String.prototype.parenSplit should do what ECMAscript says // String.prototype.split does, interspersing paren matches between // the split elements if (String('abc'.split(/(b)/))!='a,b,c') { // broken String.split, e.g. konq, IE String.prototype.parenSplit=function (re) { var m=re.exec(this); if (!m) return [this]; // without this, we have // 'ab'.parenSplit(/a|(b)/) != 'ab'.split(/a|(b)/) for(var i=0; i<m.length; ++i) { if (typeof m[i]=='undefined') m[i]=''; } return [this.substring(0,m.index)] .concat(m.slice(1)) .concat(this.substring(m.index+m[0].length).parenSplit(re)); }; } else { String.prototype.parenSplit=function (re) {return this.split(re);}; } Confirmed still a problem. Created attachment 17512 [details] Improve standards compliance of string.split() function + fixes for Regex when using POSIX regex.h Hi, please consider this patch for fixing bug 110597 (for 3.5 branch, though it should be possible to do something similar in trunk). The string.split function inserts matched subpatterns into the result array as per the ECMA standard. This is easy with PCRE, but I had to change the regex code when compiled using regex.h. Firstly, the nrSubPatterns variable should equal the number subpatterns, not including the +1 for the whole pattern (as it did before). Also, counting the subpatterns by checking pmatch[i].rm_so != -1 does not work, because empty subpatterns will be ignored. E.g. 'abcd'.split(/(x)?(b)/) goes horribly wrong as the current method of counting subpatterns will decide there are zero, because the (x)? pattern does not match (thus pmatch[1].rm_so == -1). Note the difference between that and /(x?)(b)/, in which case the first subpattern would match, it would just be an empty string. The only way I could find to get round this was to write some code to count the number of patterns by looking for '(' characters that are not escaped or inside character classes. Not ideal but I cannot see an alternative. Hopefully everyone uses PCRE anyway - it doesn't seem that regex.h supports utf-8 either. James Hi, I'll try to take a look at your patch in a week or so, but can't guarantee it --- I am just pretty sure I wouldn't be able to do it over the next week (but may be someone else will). Yeah, it's pretty much expected that anyone not doing an embedded build would use libpcre. Thank your for your contribution. Message from the Bugsquad and Konqueror teams: This bug is closed as outdated, as we do not have the manpower to maintain the KDE3 version anymore. If you still can reproduce this issue with Konqueror 4.8.4 or later, please open a new report. Thank you for your understanding. |