[EDITED] I'm using Java Regular expression and I don't want match some files.
I'm trying:
String regexp = "https?:://[[\\S]&&[^\"]]+(?!.*(.ico|.jpg|.css)"
I have a list with links from many websites, the links are: *.html, *.asp, *.jpg, *gif. I want use java regular expression to match everything but *.jpg, *gif, *ico.
Can someone give an idea?
Sorry, I'm not fluent in English. Hope you can understand me. Thanks!!!
Here is an example of program small program that will parse match links for a website but will exclude specific extensions.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
String regex = "(https?://[\\S^\"]+(?<!\\.ico|\\.jpg|\\.css))[\\s\"]";
String test_string = "http://www.regular- expressions.info/shorthand.html "
+ "http://www.regular-expressions.info/shorthand.html "
+ "http://www.regular-expressions.info/shorthand.css "
+ "http://www.regular-expressions.info/shorthand.ico "
+ "http://www.regular-expressions.info/shorthand.jpg "
+ "http://www.regular-expressions.info/shorthand.htm "
+ "http://www.regular-expressions.info/shorthand.jsp "
+ "http://www.regular-expressions.info/ ";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(test_string);
while (m.find()) {
System.out.printf("Match: '%s'\n", m.group(1));
}
}
}
Here are the results:
Match: 'http://www.regular-expressions.info/shorthand.html'
Match: 'http://www.regular-expressions.info/shorthand.html'
Match: 'http://www.regular-expressions.info/shorthand.htm'
Match: 'http://www.regular-expressions.info/shorthand.jsp'
Match: 'http://www.regular-expressions.info/'
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments