我强迫自己学习如何仅使用AppleScript编写脚本,但是我目前正面临尝试删除类中的特定标签的问题。我试图找到可靠的文档和示例,但目前看来非常有限。
这是我拥有的HTML:
<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class="foo">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami <span class="foo">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>
我想做的是删除一个特定的类,所以它将删除<span class="foo">
,结果:
<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl shoulder biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami jerky strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>
我知道如何使用do shell script
终端以及如何通过终端执行此操作,但是我想学习AppleScript词典中的可用内容。
在研究中,我找到了一种使用以下方法解析所有HTML标签的方法:
on removeMarkupFromText(theText)
set tagDetected to false
set theCleanText to ""
repeat with a from 1 to length of theText
set theCurrentCharacter to character a of theText
if theCurrentCharacter is "<" then
set tagDetected to true
else if theCurrentCharacter is ">" then
set tagDetected to false
else if tagDetected is false then
set theCleanText to theCleanText & theCurrentCharacter as string
end if
end repeat
return theCleanText
end removeMarkupFromText
但这会删除所有HTML标记,而这不是我想要的。通过搜索SO,我能够找到如何使用AppleScript解析HTML源代码来在标签之间进行提取,但是我并不想解析该文件。
我所熟悉的BBEdit中的Balance Tags
被称为Balance
在下拉但是当我运行:
tell application "BBEdit"
activate
find "<span class=\"foo\">" searching in text 1 of text document "test.html" options {search mode:grep, wrap around:true} with selecting match
balance tags
end tell
它会变得贪婪,并抓住第一个标签与倒数第二个结束标签之间的整个文本行,而不是将其自身与第一个标签及其文本隔离。
tag
我确实在字典下做了进一步的研究,find tag
可以做到:set spanTarget to (find tag "span" start_offset counter)
然后将标记作为类|class| of attributes of tag of spanTarget
并使用,balance tags
但是我仍然遇到与以前相同的问题。
因此,在纯AppleScript中,如何在不贪心的情况下删除与类关联的标签?
我相信Ron的答案是一个很好的方法,但是如果您不想使用正则表达式,可以使用下面的代码来实现。看到罗恩回答后,我不打算发布它,但是我已经创建了它,所以我认为我至少会给您第二个选择,因为您正在尝试学习。
on run
set theHTML to "<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class=\"foo\">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class=\"bar\">Pig brisket</span> jowl ham pastrami <span class=\"foo\">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>"
set theHTML to removeTag(theHTML, "<span class=\"foo\">", "</span>")
end run
on removeTag(theText, startTag, endTag)
if theText contains startTag then
set AppleScript's text item delimiters to {""}
set AppleScript's text item delimiters to startTag
set tempText to text items of (theText as string)
set AppleScript's text item delimiters to {""}
set middleText to item 2 of tempText as string
if middleText contains endTag then
set AppleScript's text item delimiters to endTag
set tempText2 to text items of (middleText as string)
set AppleScript's text item delimiters to {""}
set newString to implode(tempText2, endTag)
set item 2 of tempText to newString
end if
set newString to implode(tempText, startTag)
removeTag(newString, startTag, endTag) -- recursive
else
return theText
end if
end removeTag
on implode(parts, tag)
set newString to items 1 thru 2 of parts as string
if (count of parts) > 2 then
set newList to {newString, items 3 thru -1 of parts}
set AppleScript's text item delimiters to tag
set newString to (newList as string)
set AppleScript's text item delimiters to {""}
end if
return newString
end implode
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句