RegEx and URL's
I need a regex pattern that can be used to extract urls from the clipboard. Clipboard data could be text, html or rtf even.
Possibly three different patterns?
Any ideas?
Anyone done this sort of thing before?
There are so many problems of course, for example invalid characters in url, url cant start with a hyphen, but can contain a hyphen etc.
All of these need to be matched!
http://www.url.com
http://url.com
url.com
www.url.com
(and each one could be += /file/file.html)
try this:
Regex theRegexURL = new Regex("(([a-zA-Z][0-9a-zA-Z+\\-\\.]*:)?/{0,2}[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?(#[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?");
If (theRegexURL.IsMatch(theUrlInput))
{
//url matches
}
else
{
//url does not match
}
Hi,
There are many ways to do this.... try this function: You need to pass the correct regular expression e.g
http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? or
([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
public
string Locate(string pattern,string TextFromClipBoard)
{
string text = TextFromClipBoard;
string pat = pattern;
StringBuilder sb = new StringBuilder();
// Compile the regular expression.
Regex r = new Regex(pat, RegexOptions.IgnoreCase);// Match the regular expression pattern against a text string.Match m = r.Match(text);//int matchCount = 0;while (m.Success){
foreach (Group g in m.Groups){
CaptureCollection cc = g.Captures;foreach (Capture c in cc){
if(Regex.IsMatch(c.ToString(),pat))sb.Append(
"URL:" + c);}
}
m = m.NextMatch();
}
return sb.ToString();}
Thanks for the help guys. This has put me on the right path, sort of.
Now I just need to get all urls,
remove identifier from each (eg "http://"), and check for duplicates, so if theres http://www.startmenuex.com and www.startmenuex.com then it finds only the former.
Sure I can do this by myself. Hmm, I probably should have looked at source for forum software, cause this forum software does it quite well... Anyone here on the dev team for this forum software?
Hmm. Perhaps I could paste the clipboard text into a hidden webBrowser, and find urls in there. Any thoughts on this?
(By the way im looking for ftp://fff.fff.com etc aswell. Hey the forum software even highlights that!)
Hi,
I tried your regex but it accepts any kind of strings. I tried with zzzzzzzzzzzzzzzzz and it is accepted.
I just copied and pasted your Regex creation.
Am I doing something wrong?
Thanks
Claudio