Remove certain code and text with regular expressions

Hi

I have an HTML file that was generated by MS Word and I'm removing all of the unnecessary code. One of the things I want removed is a certain span tag that contain the words "mso-hide" (without the quotes).

If my program runs into this code:

Code Snippet

<span
style='color:windowtext;display:none;mso-hide:screen;text-decoration:none;
text-underline:none'>3</span>

then I want to remove the whole span tag including the contents...in this case 3. the contents of this span tag will always be a one to three digit number.

I've tried many patterns but not successful. Any ideas out there?

Thanks

(Moderator: Code can be place via code snippet button on editor { } )

[1035 byte] By [sensfan] at [2008-1-4]
# 1

assuming u use .NET Regex Object, this should match on your span tags

(?si)<span[^>]*?mso-hide[^>]*?>\d{1,3}</span>

then replace the match wiht [empty string]

tested OK in Expresso v2.1

// using System.Text.RegularExpressions;

/// <summary>
/// Regular expression built for C# on: Thu, May 24, 2007, 07:08:31 PM
/// Using Expresso Version: 2.1.2150, http://www.ultrapico.com
///
/// A description of the regular expression:
///
/// Change options within the enclosing group [si]
/// Turn ON Ignore Case option
/// Turn ON Single Line option
/// <span[^>]*?mso-hide[^>]*?>\d{1,3}</span>
/// <spanAny character other than >, any number of repetitions, as few as possiblemso-hideAny character other than >, any number of repetitions, as few as possible>
/// Any digit, between 1 and 3 repetitions
///
///
/// </summary>
public static Regex regex = new Regex(
@"(?si)<span[^>]*?mso-hide[^>]*?>\d{1,3}</span>");

SergeiZ at 2007-9-25 > top of Msdn Tech,.NET Development,Regular Expressions...
# 2
Thanks Sergei, works great.
sensfan at 2007-9-25 > top of Msdn Tech,.NET Development,Regular Expressions...

.NET Development

Site Classified