Regular expression to replace comma for a csv file import

Hi,

I am having trouble trying to write my own regular expression. I have tried a couple and not even sure if it is correct. If you guys know any articles that provide great explanation, please let me know.

Here is my regular expression if you guys able to fix it for me.

Regex.Replace(line, "([\w]+),([\w]+)", "$1<comma>$2")

I am trying to replace comma in this sentence and turn it into <comma> and later replace it again after splitting each column in the csv file.

"Company","Job Title","First Name","Middle Name","Last Name","Business Street","Business Street 2","Business City","Business State","Business Postal Code","Business Country","Business Phone","Mobile Phone","Business Fax","E-mail Address","Web Page","Notes:

Dear customer,

I want to thank you for your purchase.

Sincerly,

Me.
"

I am able to delete all the line breaks but cannot seperate which commas for the delimiter. Is there a way to replace char and line break together?
string.replace(",vbcrlf", "<comma><br>") << maybe?

"Company","Job Title","First Name","Middle Name","Last Name","Business

Street","Business Street 2","Business City","Business State","Business

Postal Code","Business Country","Business Phone","Mobile

Phone","Business Fax","E-mail Address","Web Page","Notes:Dear customer,I want to thank you for your purchase.Sincerly,Me."

[1915 byte] By [doank] at [2007-12-25]
# 1

You can use REs but that is probably overkill for your need. String.Replace should work for you. The only issue with it (and your RE) is that it'll pick up all commas.

String.Replace(",", "<comma>")

You can not replace two different sets of tokens (commas and linebreaks) in one operation. You'll have to replace it twice.

However even this may be overkill for you. I assume you are reading a CSV file into memory and then processing it. It would probably be easier to use code similar to the following.

using (StreamReader rdr = new StreamReader(filename))
{
string strLine = rdr.ReadLine();
while (strLine != null)
{
//Break it up
string[] tokens = strLine.Split(',');
string strNewLine = String.Join("<comma>", tokens);

//Do work

strLine = rdr.ReadLine(); //Next
};
};

The above code relies on the framework to deal with end of lines and commas. It takes up a little more memory but is really efficient. Even better is the fact that you can preprocess each token before combining them back together. For example if you need to support quoted commas you can preprocess the returned tokens to combine tokens that start and end a quoted string.

Michael Taylor - 8/30/06

TaylorMichaelL at 2007-8-31 > top of Msdn Tech,.NET Development,.NET Base Class Library...
# 2
I only wanted to replace commas in the double quotes so that I am able to use String.Split(",") method. So before doing Split, I wanted to replace all commas inside the double quotes into <comma> so those <comma> will not be split. I believe RE is the only way to differentiate.. Or am I wrong? Either way I am still stuck hehe. Thanks Taylor.

Are all linebreaks denoted as vbCrLf in ASP.NET?

line = line.Replace(vbCrLf, "<br>") ' Is there a reason why this line of code does not work?

It should replace this sentence:

"Hello World
Good Morning World
Good Afternoon World
Good Night World
"

Into:

"Hello World<br>Good Morning World<br>Good Afternoon World<br>Good Night World<br>"

But it doesnt, I am not sure why.

doank at 2007-8-31 > top of Msdn Tech,.NET Development,.NET Base Class Library...
# 3
You can ignore string.Split if you soup up your regular expression. Try this for CSV parsing. Note it assumed that it takes in one line...just as in your first post.

Example string example = "\"Company\",\"Job Title\",\"First Name\",\"Middle Name\",\"Last Name\",\"Business Street\",\"Business Street 2\",\"Business City\",\"Business State\",\"Business Postal Code\",\"Business Country\",\"Business Phone\",\"Mobile Phone\",\"Business Fax\",\"E-mail Address\",\"Web Page\",\"Notes:Dear customer,I want to thank you for your purchase.Sincerly,Me.\"";

Regex parse = new Regex("(?<=,|^)(?:\")?(?<Column>[\\w\\.\\s,:-]*)(?:\")?(?=,|$)",
RegexOptions.Singleline |
RegexOptions.ExplicitCapture);

MatchCollection mc = parse.Matches(example);

if (mc.Count > 0)
{
foreach (Match current in mc)
if (current.Success)
Console.WriteLine(current.Groups["Column"].Value);

}
else
Console.WriteLine("No Matches");

Notes
  • To make it work in C# I could not use a literal string (@"XXX") but had to escape the reg ex escapes of \w \. and \s to \\w \\. and \\s respectively, so if you paste that regex into an external reg exp parser, change them accordingly.
  • CSV data assumes all data conforms to QUOTE <any text> QUOTE COMMA .
  • All data placed into named capture group named "Column".
  • Does not handle all special characters in stringa...add any that are needed.
Output
  • Company
  • Job Title
  • First Name
  • Middle Name
  • Last Name
  • Business Street
  • Business Street 2
  • Business City
  • Business State
  • Business Postal Code
  • Business Country
  • Business Phone
  • Mobile Phone
  • Business Fax
  • E-mail Address
  • Web Page
  • Notes:Dear customer,I want to thank you for your purchase.Sincerly,Me.
OmegaMan at 2007-8-31 > top of Msdn Tech,.NET Development,.NET Base Class Library...

.NET Development

Site Classified