Regexp-ERE for Delphi and Freepascal

Provided to you by Amine Moulay Ramdane

Email: aminer68@gmail.com

Here is Perl Regexp-ERE that is used in Delphi and Freepascal with PLDelphi:

https://metacpan.org/pod/Regexp::ERE

I have provided you with this module, so that you convert POSIX Extended Regular Expressions to DFA and to minimize the DFA with Hopcroft's algorithm that has a time complexity of n*log(n), and to convert back the DFA to POSIX Extended Regular Expressions, please look at my demo called test.pas inside my zip file to know how to use it, just call the Delphi and Freepascal Regex_Hopcroft() function to convert POSIX Extended Regular Expressions to DFA and to minimize the DFA with Hopcroft's algorithm, and to convert back the DFA to POSIX Extended Regular and to return it.

You can download Strawberry Perl 5.32 (64bit) from here:

http://strawberryperl.com/

And after installing Strawberry Perl 5.32 (64bit), you have to install this module by running the install.cmd and to remove this module, please run the remove.cmd

And for the documentation, read here:

https://metacpan.org/pod/Regexp::ERE

About regular expressions:

1) A regex can be converted to Deterministic finite automaton (DFA) that can be easily used to find matches within any string. See (https://en.wikipedia.org/wiki/Deterministic_finite_automaton).

The process involves first converting to a Non-deterministic Finite Automaton (NFA -https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton) using Thompson's construction (https://en.wikipedia.org/wiki/Thompson%27s_construction), then converting that to a DFA using the subset construction (https://en.wikipedia.org/wiki/Powerset_construction), and then minimizing the DFA with Hopcroft's algorithm or similar (https://en.wikipedia.org/wiki/DFA_minimization)

This method has the advantage of producing matchers that are very fast no matter how complicated the original regex was. It has some disadvantages, including the inability to handle backreferences.

Compilers use this technique to tokenize programs for parsing.

2) The other method, usually referred to as the 'backtracking' method, converts the regex into a tree of possibilities. When you match a string against this tree, it tries the first possibility first, and then if that fails it goes back and tries the next possibility against the same part of the string. This is easier to implement, but is not particularly fast.

In some cases matching strings can take a long time. See, for example http://www.regular-expressions.info/catastrophic.html

Nonetheless, this is the way it's done in most language-provided libraries, including Java, C#, Perl, etc. The closest thing to a standard implementation is PCRE, or "Perl-compatible regular expressions": http://www.pcre.org/

3)... there are also a lot of less common ways. Some packages are like (1), but do the subset construction on the fly and only build required states. Some execute the NFA directly (https://swtch.com/~rsc/regexp/regexp1.html). Some are like (2) but try multiple branches at once. Both of these methods avoid the catastrophic cases. Some parsing techniques like GLR parsing (https://en.wikipedia.org/wiki/GLR_parser) can also be used to implement pretty fast regex engines that always execute in a reasonable way.

Please read more here about regular expressions:

https://towardsdatascience.com/regex-performance-in-python-873bd948a0ea

You can go to download the zip files by clicking on the following web link:

https://drive.google.com/drive/folders/1aFzPiVNnvLWOb1ssDeQSIIl7wlpodE5N?usp=sharing

Platform: Windows.