RegEx and URL's

I need a regex pattern that can be used to extract urls from the clipboard. Clipboard data could be text, html or rtf even.

Possibly three different patterns?

Any ideas?

Anyone done this sort of thing before?

There are so many problems of course, for example invalid characters in url, url cant start with a hyphen, but can contain a hyphen etc.

All of these need to be matched!

http://www.url.com

http://url.com

url.com

www.url.com

(and each one could be += /file/file.html)

[753 byte] By [AndrewVos] at [2008-2-18]
# 1

try this:

Regex theRegexURL = new Regex("(([a-zA-Z][0-9a-zA-Z+\\-\\.]*:)?/{0,2}[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?(#[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?");

If (theRegexURL.IsMatch(theUrlInput))

{

//url matches

}

else

{

//url does not match

}

ahmedilyas at 2007-8-31 > top of Msdn Tech,.NET Development,Regular Expressions...
# 2
ahmedilyas wrote:

try this:

Regex theRegexURL = new Regex("(([a-zA-Z][0-9a-zA-Z+\\-\\.]*:)?/{0,2}[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?(#[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?");

If (theRegexURL.IsMatch(theUrlInput))

{

//url matches

}

else

{

//url does not match

}

That only checks if the given string is a url. How would I get these strings from a pile of html source?

AndrewVos at 2007-8-31 > top of Msdn Tech,.NET Development,Regular Expressions...
# 3

Hi,

There are many ways to do this.... try this function: You need to pass the correct regular expression e.g

http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? or

([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?

public string Locate(string pattern,string TextFromClipBoard)

{

string text = TextFromClipBoard;

string pat = pattern;

StringBuilder sb = new StringBuilder();

// Compile the regular expression.

Regex r = new Regex(pat, RegexOptions.IgnoreCase);

// Match the regular expression pattern against a text string.

Match m = r.Match(text);

//int matchCount = 0;

while (m.Success)

{

foreach (Group g in m.Groups)

{

CaptureCollection cc = g.Captures;

foreach (Capture c in cc)

{

if(Regex.IsMatch(c.ToString(),pat))

sb.Append("URL:" + c);

}

}

m = m.NextMatch();

}

return sb.ToString();

}

SandeepChanda at 2007-8-31 > top of Msdn Tech,.NET Development,Regular Expressions...
# 4

Thanks for the help guys. This has put me on the right path, sort of.

Now I just need to get all urls,

remove identifier from each (eg "http://"), and check for duplicates, so if theres http://www.startmenuex.com and www.startmenuex.com then it finds only the former.

Sure I can do this by myself. Hmm, I probably should have looked at source for forum software, cause this forum software does it quite well... Anyone here on the dev team for this forum software?

AndrewVos at 2007-8-31 > top of Msdn Tech,.NET Development,Regular Expressions...
# 5

Hmm. Perhaps I could paste the clipboard text into a hidden webBrowser, and find urls in there. Any thoughts on this?

(By the way im looking for ftp://fff.fff.com etc aswell. Hey the forum software even highlights that!)

AndrewVos at 2007-8-31 > top of Msdn Tech,.NET Development,Regular Expressions...
# 6

Hi,

I tried your regex but it accepts any kind of strings. I tried with zzzzzzzzzzzzzzzzz and it is accepted.

I just copied and pasted your Regex creation.

Am I doing something wrong?

Thanks

Claudio

claudio32 at 2007-8-31 > top of Msdn Tech,.NET Development,Regular Expressions...

.NET Development

Site Classified