Dual MatchCollection on "A B|C D|E F|G H" input
I have an input file (this is simplified; the real input file is very very very large):
A 1\n
B 2\n
C 3\n
D 4\n
I am using the following code:
Regex first_part = new Regex("[A-Z]* ");
Regex second_part = new Regex("[A-Z]*\r\n");
MatchCollection firsts = first_part.Matches(input_file);
MatchCollection seconds = second_part.Matches(input_file);
for (int i = 0; i < firsts.Count; i++)
{
Console.WriteLine(firsts[ i ].ToString() + " " + seconds[ i ].ToString());
}
Does the MatchCollection implementation guarrantee that the output is still matched up accordingly? What this I mean that A and 1 are still printed on the same line, and B and 2 as well (although on another line).
Thanks!
Your intention perhaps to print "A 1" in the console after the matches occur. If this is the case, first try to ensure that the match count for the variable 'firsts' and match count for the variable 'seconds' is equal. Consider the following code:
string input_file = string.Empty;
try
{
TextReader reader = new StreamReader("C:\\MyRegxData.txt");
input_file = reader.ReadToEnd();
Regex first_part = new Regex("[A-Z]* ");
Regex second_part = new Regex("[0-9]*\r\n");
MatchCollection firsts = first_part.Matches(input_file);
MatchCollection seconds = second_part.Matches(input_file);
MessageBox.Show("first " + firsts.Count + " second " + seconds.Count);
for (int i = 0; i < firsts.Count; i++)
{
Console.WriteLine(firsts[ i ].ToString()
+ " " + seconds[ i ].ToString());
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
In the above code, I wanted to see if the match count for the 'firsts' variable and the 'seconds' variable is equal (by calling MessageBox.Show() ). I wrote a file programmatically to generated 18278 lines (file size =185 KB) and the program still generates the correct output!
So my opinion is that if you have no syntactic error in your input file (for which match may fail), MatchCollection can guarantee the correct output.
Notice that I have also changed the regular expression for the second_part.
Hope this will help
Cheers
Why don't you use only one Regex.? It guarrantee that the output is still matched up accordingly and also improve performance.
here is example.
TextReader
reader = new StreamReader(@"C:\data.txt");
string input_file = reader.ReadToEnd();//^([A-Z]*)\s(\d*)
string pattern = @"
^ # start of string
([A-Z]*?) # zero or more A-Z (capture to Group[1])
\s # white space
(\d*) # zero or more digit (capture to Group[2])
";
Regex reg = new Regex(pattern, RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
MatchCollection col = reg.Matches(input_file);
foreach (Match m in col)
{
Console.WriteLine(@"{0}::{1}", m.Groups[1].Value, m.Groups[2].Value); // A-Z::0-9
}
cheers,
soemoe