删除与类关联的HTML标签

DᴀʀᴛʜVᴀᴅᴇʀ

我强迫自己学习如何仅使用AppleScript编写脚本,但是我目前正面临尝试删除类中的特定标签的问题。我试图找到可靠的文档和示例,但目前看来非常有限。

这是我拥有的HTML:

<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class="foo">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami <span class="foo">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>

我想做的是删除一个特定的类,所以它将删除<span class="foo">,结果:

<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl shoulder biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami jerky strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>

我知道如何使用do shell script终端以及如何通过终端执行此操作,但是我想学习AppleScript词典中的可用内容。

在研究中,我找到了一种使用以下方法解析所有HTML标签的方法:

on removeMarkupFromText(theText)
    set tagDetected to false
    set theCleanText to ""
    repeat with a from 1 to length of theText
        set theCurrentCharacter to character a of theText
        if theCurrentCharacter is "<" then
            set tagDetected to true
        else if theCurrentCharacter is ">" then
            set tagDetected to false
        else if tagDetected is false then
            set theCleanText to theCleanText & theCurrentCharacter as string
        end if
    end repeat
    return theCleanText
end removeMarkupFromText

但这会删除所有HTML标记,而这不是我想要的。通过搜索SO,我能够找到如何使用AppleScript解析HTML源代码来在标签之间进行提取,但是我并不想解析该文件。

我所熟悉的BBEdit中的Balance Tags被称为Balance在下拉但是当我运行:

tell application "BBEdit"
    activate
    find "<span class=\"foo\">" searching in text 1 of text document "test.html" options {search mode:grep, wrap around:true} with selecting match
    balance tags
end tell

它会变得贪婪,并抓住第一个标签与倒数第二个结束标签之间的整个文本行,而不是将其自身与第一个标签及其文本隔离。

tag我确实在字典下做了进一步的研究,find tag可以做到:set spanTarget to (find tag "span" start_offset counter)然后将标记作为类|class| of attributes of tag of spanTarget并使用,balance tags但是我仍然遇到与以前相同的问题。

因此,在AppleScript中,如何不贪心的情况下删除与类关联的标签?

ThrowBackDewd

我相信Ron的答案是一个很好的方法,但是如果您不想使用正则表达式,可以使用下面的代码来实现。看到罗恩回答后,我不打算发布它,但是我已经创建了它,所以我认为我至少会给您第二个选择,因为您正在尝试学习。

on run
    set theHTML to "<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class=\"foo\">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class=\"bar\">Pig brisket</span> jowl ham pastrami <span class=\"foo\">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>" 
    set theHTML to removeTag(theHTML, "<span class=\"foo\">", "</span>")
end run

on removeTag(theText, startTag, endTag)
    if theText contains startTag then
        set AppleScript's text item delimiters to {""}
        set AppleScript's text item delimiters to startTag
        set tempText to text items of (theText as string)
        set AppleScript's text item delimiters to {""}

        set middleText to item 2 of tempText as string
        if middleText contains endTag then
            set AppleScript's text item delimiters to endTag
            set tempText2 to text items of (middleText as string)
            set AppleScript's text item delimiters to {""}
            set newString to implode(tempText2, endTag)
            set item 2 of tempText to newString
        end if
        set newString to implode(tempText, startTag)
        removeTag(newString, startTag, endTag) -- recursive
    else
        return theText
    end if
end removeTag

on implode(parts, tag)
    set newString to items 1 thru 2 of parts as string
    if (count of parts) > 2 then
        set newList to {newString, items 3 thru -1 of parts}
        set AppleScript's text item delimiters to tag
        set newString to (newList as string)
        set AppleScript's text item delimiters to {""}
    end if
    return newString
end implode

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章