Regular Expression: Help on creating appropriate expression needed...
Hi there!
I'm now trying since hours to construct an appropriate regex that will fit my needs. Here is the starting situation:
I've got a string like
"abc F123[F456[F789]] 123"
In this string I want to find the following expressions:
- F123[F456[F789]] - F456[F789] - F789 An expression is defined as follows: - starting with an 'F' - followed by min. one digit - optional followed by an expression enclosed in brackets '[...]' which also may (recursively) include this kind of expressions I'm searching for.I tried this by using the following regex:
[F][0-9]+[\[]{1,}.*?[\]]{1,}|[F][0-9]+ But the expression only finds the first needed match (F123[F456[F789]] ). What do I have to do to find also the other needed matches (recursicely)?Greets,
ReneMT
[974 byte] By [
renemt ] at [2008-2-19]
Overview Hey ReneMT...I know you have been patiently waiting for an answer to this and have not given up. <G> You still post to the forum and I like challanges, so I gave it a shot. The bad news is that due to the nested nature of the data, one cannot do a capture within a capture.... But that leaves two options which can be done. I wonder how you handled the situation and hope you post what you found.
Option 1 As you mentioned you were able to capture the whole thing but unable to get any submatches. This almost borders on Lexical Analysis where it is done in two phases which has a Scanner then Evaluator phase. For this problem the first phase would be to use your regex and the programmatically evaluate the output in succeeding regular expression operations. I had to do somthing similar when I had scrape screen data (no this wasn't in the 80's but the 21st century) and I devised a system which I called Cascading Regular Expressions. I setup multiple expressions to parse out / weed the text until the final expession actually captured the target data.
Option 2 I can't speak to your data needs but I have come up with a regex that gives an inidicator that tells the consumer that the following match group is a subset of the current one. Let me show you the regex:
// using System.Text.RegularExpressions;/// <summary> /// Regular expression built for C# on: Sun, Sep 10, 2006, 03:04:28 PM /// Using Expresso Version: 2.1.2150, http://www.ultrapico.com /// /// Match a pattern with subpatterns within it such as X123[X456[X789]] /// /// A description of the regular expression: /// /// Match a prefix but exclude it from the capture. [F] /// F /// [Data]: A named capture group. [\d{1,3}] /// Any digit, between 1 and 3 repetitions /// [SubMatch]: A named capture group. [\[], zero or one repetitions /// [ /// /// /// </summary> public Regex XPattern = new Regex( @"(?<=F)(?<Data> \d{1,3})(?<SubMatch> \[)? ", RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture | RegexOptions.CultureInvariant | RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled );
Here is what happens using the Explicit capture, the regex will only capture two sections in each capture group. The first named capture, named Data , is the numeric portion of the (FXXX) pattern returning the XXX. The second named capture called SubMatch which is optional, per-se and will need to be checked for existance and if it holds data which could be a [.
This gives he consumer of the code all of the data broken out into match groups. Then has to check if SubMatch exists for each group and has data. If that is the case then the next capture group actually belongs to the current capture group as a sub match. By keeping track of the recursion, one can figure out how to present the data.
Example
F123[F456[F789]]
Results in
Group 1 Data 123 SubMatch [ Group 2 Data 456 SubMatch [ Group 3 Data 789 SubMatch Therefore possibly answering the need of the situation.