Regex on Non-US Address

Hi,

If I had a textbox that contained the following:

--

Mr FirstName LastName

ABCD Ltd, Address 1, Address 2, Town/City, PostCode,

Notes1, Tel: 0171 800900, Fax: 0171 800909

--

how can I parse this to

--

Mr, FirstName, LastName, ABCD Ltd, Address 1, Address 2, Town/City, PostCode, Notes1, 0171 800900, 0171 800909

--

I now this can be done using Linux GREP but I am new to regular expressions and its options.

Any assistance would be greatly appreciated.

Thank you

Jason.

(Moderator: This is a user request that went un-noticed because it languished on another thread that had already been marked as an answer. This reply is now its own unique thead being split from its original and moved to the Regular Expression Forum.)

[974 byte] By [Jae] at [2008-3-6]
# 1
The trick is to place the items into matching groups ( ) and non-matching groups (?: ) or stuff you want and stuff to ignore. I have chosen to use named groups and created a regex to place all the data items into named groups such as FirstName, LastName and City to name a few. To do that I use this construct (?<GroupNameHere> MatchPatternHere )

That will allows me to access the data in a regex match such as m.Group["City"].value to retrieve an actual value. Now the pattern I have used throughout is to say [^,]* which means, loosely, match everything that is not in this box [ ]. So we will match til it hits a comma. That is the secret to regexes find a pattern to match upon. Here it is in code, notice in the regex we define the group we want followed by what we don't want.


string address = @"Mr. Joe Blow

ABCD Ltd, Main Street, Suite 108, Chicago, 80112,

Good Customer, Tel: 707 5556708, Fax: 707 5556709

";

// String concatenated only for forum post...do not do in real code.
string pattern =
@"(?<Salutation>[^\s]*)(?:\s+)" +
@"(?<FirstName>[^\s]*)(?:\s+)" +
@"(?<LastName>[^\s]*)(?:[\s\r\n]*)" +
@"(?<Company>[^,]*)(?:,\s?)" +
@"(?<Street>[^,]*)(?:,\s?)" +
@"(?<Suite>[^,]*)(?:,\s?)" +
@"(?<City>[^,]*)(?:,\s?)" +
@"(?<Zip>[^,]*)(?:[,\s\r\n]*)" +
@"(?<Notes>[^,]*)(?:,\s?Tel:\s)" +
@"(?<Phone>[^,]*)(?:,\s?Fax:\s)" +
@"(?<Fax>[^,]*)";

Regex rgx = new Regex(pattern);

string[] groupNames = rgx.GetGroupNames();
Console.WriteLine("Groups: ({0}){1}", string.Join(") (", groupNames), System.Environment.NewLine);

Match m = rgx.Match(address);

if (m.Success)
foreach (string name in groupNames)
Console.WriteLine("{0,10} : {1}", name, m.Groups[name]);

Console OutputGroups: (0) (Salutation) (FirstName) (LastName) (Company) (Street) (Suite) (City) (Zip) (Notes) (Phone) (Fax)

0 : Mr. Joe Blow

ABCD Ltd, Main Street, Suite 108, Chicago, 80112,

Good Customer, Tel: 707 5556708, Fax: 707 5556709

Salutation : Mr.
FirstName : Joe
LastName : Blow
Company : ABCD Ltd
Street : Main Street
Suite : Suite 108
City : Chicago
Zip : 80112
Notes : Good Customer
Phone : 707 5556708
Fax : 707 5556709


What that shows is the match groups. If you notice there is a "0" match group which we did not specify. This is the whole match which is found. We don't need it for this example, so it will be ignored and we can extract just what we want such as Zip or Notes.
OmegaMan at 2007-9-4 > top of Msdn Tech,.NET Development,Regular Expressions...

.NET Development

Site Classified