Skip to content Skip to sidebar Skip to footer

Which Regular Expression Algorithm Does Javascript Use For Regex?

I was reading this article today on two different regular expression algorithms. According to the article old Unix tools like ed, sed, grep, egrep, awk, and lex, all use what's c

Solution 1:

The Javascript ECMA language description doesn't impose a requirement for the particular implementation of regular expressions, so that part of the question isn't well-formed. You're really wondering about the particular implementation in a particular browser.

The reason Perl/Python etc use a slower algorithm, though, is that the regex language defined isn't really regular expressions. A real regular expression can be expressed as a finite state machine, but the language of regex is context free. That's why the fashion is to just call it "regex" instead of talking about regular expressions.

Update

Yes, in fact javascript regex isn't content free regular. Consider the syntax using `{n,m}', that is, matches from n to m accepted regexs. Let d the difference d=|n-m|. The syntax means there exists a string uxw that is acceptable, but a string uxw that is not. It follows via the pumping lemma for regular languages that this is not a regular language.

(augh. Thinko corrected.)

Solution 2:

Though the ECMA standard does not specify the algorithm an ECMAScript implementation should use, the fact that the standard mandates that ECMAScript regular expressions must support backreferences (\1, \2, etc.) rules out the DFA and "Thompson NFA" implementations.

Solution 3:

Perl uses a memoized recursive backtracking search and, as of some improvements in 5.10, no longer blows up on perl -e '("a" x 100000) =~ /^(ab?)*$/;'. In recent tests I performed on an OS X box, Perl 5.10 outperformed awk, even in the cases where awk's algorithm was supposed to be better.

Post a Comment for "Which Regular Expression Algorithm Does Javascript Use For Regex?"