我正在尝试使用正则表达式从html字符串中获取文本,这是我目前正在处理的:
extension String {
func regex (pattern: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions(rawValue: 0))
let nsstr = self as NSString
let all = NSRange(location: 0, length: nsstr.length)
var matches : [String] = [String]()
regex.enumerateMatchesInString(self, options: NSMatchingOptions(rawValue: 0), range: all) {
(result : NSTextCheckingResult?, _, _) in
if let r = result {
let result = nsstr.substringWithRange(r.range) as String
matches.append(result)
}
}
return matches
} catch {
return [String]()
}
}
模式是: <img[^>]+src\\s*=\\s*['\']([^'\"]+)['\"][^>]*>
我仍然无法从中获取图片网址,这意味着它返回了一个空数组。实际上我的html字符串中包含一张图片。UIWebView
由于UITableView
尺寸调整问题,我不想使用它。因此,我需要从中获取图片网址html并使用AlamofireImage在UIImageView中显示它。
任何帮助吗?这只是我需要获取的一个网址。
这是我的标签:
<img src="https://en.wikipedia.org/wiki/File:BH_LMC.png"/>
到 :
https://en.wikipedia.org/wiki/File:BH_LMC.png
<img\b(?=\s)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\ssrc=['"]([^"]*)['"]?)(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?\/?>
此正则表达式将执行以下操作:
现场演示
https://regex101.com/r/qW9nG8/1
示范文本
请注意第一行中我们要查找特定机器人的困难边缘情况。
<img onmouseover=' if ( 6 > 3 { funSwap(" src="NotTheDroidYourLookingFor.jpg", 6 > 3 ) } ; ' src="http://website/ThisIsTheDroidYourLookingFor.jpeg" onload="img_onload(this);" onerror="img_onerror(this);" data-pid="jihgfedcba" data-imagesize="ppew" />
some text
<img src="http://website/someurl.jpeg" onload="img_onload(this);" />
more text
<img src="https://en.wikipedia.org/wiki/File:BH_LMC.png"/>
比赛样本
[0][0] = <img onmouseover=' funSwap(" src='NotTheDroidYourLookingFor.jpg", data-pid) ; ' src="http://website/ThisIsTheDroidYourLookingFor.jpeg" onload="img_onload(this);" onerror="img_onerror(this);" data-pid="jihgfedcba" data-imagesize="ppew" />
[0][1] = http://website/ThisIsTheDroidYourLookingFor.jpeg
[1][0] = <img src="http://website/someurl.jpeg" onload="img_onload(this);" />
[1][1] = http://website/someurl.jpeg
[2][0] = <img src="https://en.wikipedia.org/wiki/File:BH_LMC.png"/>
[2][1] = https://en.wikipedia.org/wiki/File:BH_LMC.png
NODE EXPLANATION
----------------------------------------------------------------------
<img '<img'
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
src= 'src='
----------------------------------------------------------------------
['"] any character of: ''', '"'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
['"]? any character of: ''', '"' (optional
(matching the most amount possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"\s]* any character except: ''', '"',
whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句