在Tkinter文本框中突出显示两个xml文件之间的差异

debugcn 发表于 Dev

干旱化

我尝试了各种逻辑和方法，甚至在Google上搜索了很多，但仍无法为我所遇到的问题想出任何令人满意的答案。我编写了一个程序，如下所示，以突出显示我遇到一些问题的特定xml代码。很抱歉使这篇文章过长。我只想清楚地说明我的问题。

编辑：为了在给定的程序下运行，您将需要两个xml文件，它们在这里：sample1和sample2。保存此文件，然后在下面的代码中编辑要保存文件的位置C:/Users/editThisLocation/Desktop/sample1.xml

from lxml import etree
from collections import defaultdict
from collections import OrderedDict
from distutils.filelist import findall
from lxml._elementpath import findtext

from Tkinter import * 
import Tkinter as tk
import ttk

root = Tk()

class CustomText(tk.Text):

    def __init__(self, *args, **kwargs):
        tk.Text.__init__(self, *args, **kwargs)


    def highlight_pattern(self, pattern, tag, start, end,
                          regexp=True):

        start = self.index(start)
        end = self.index(end)
        self.mark_set("matchStart", start)
        self.mark_set("matchEnd", start)
        self.mark_set("searchLimit", end)

        count = tk.IntVar()
        while True:
            index = self.search(pattern, "matchEnd","searchLimit",
                                count=count, regexp=regexp)
            if index == "": break
            self.mark_set("matchStart", index)
            self.mark_set("matchEnd", "%s+%sc" % (index, count.get()))
            self.tag_add(tag, "matchStart", "matchEnd")

    def Remove_pattern(self, pattern, tag, start="1.0", end="end",
                          regexp=True):

        start = self.index(start)
        end = self.index(end)
        self.mark_set("matchStart", start)
        self.mark_set("matchEnd", start)
        self.mark_set("searchLimit", end)

        count = tk.IntVar()
        while True:
            index = self.search(pattern, "matchEnd","searchLimit",
                                count=count, regexp=regexp)
            if index == "": break
            self.mark_set("matchStart", index)
            self.mark_set("matchEnd", "%s+%sc" % (index, count.get()))
            self.tag_remove(tag, start, end)



recovering_parser = etree.XMLParser(recover=True)


sample1File = open('C:/Users/editThisLocation/Desktop/sample1.xml', 'r')
contents_sample1 = sample1File.read()

sample2File = open('C:/Users/editThisLocation/Desktop/sample2.xml', 'r')
contents_sample2 = sample2File.read()


frame1 = Frame(width=768, height=25, bg="#000000", colormap="new")
frame1.pack()
Label(frame1, text="sample 1 below - scroll to see more").pack()

textbox = CustomText(root)
textbox.insert(END,contents_sample1)
textbox.pack(expand=1, fill=BOTH)

frame2 = Frame(width=768, height=25, bg="#000000", colormap="new")
frame2.pack()
Label(frame2, text="sample 2 below - scroll to see more").pack()


textbox1 = CustomText(root)
textbox1.insert(END,contents_sample2)
textbox1.pack(expand=1, fill=BOTH)

sample1 = etree.parse("C:/Users/editThisLocation/Desktop/sample1.xml", parser=recovering_parser).getroot()
sample2 = etree.parse("C:/Users/editThisLocation/Desktop/sample2.xml", parser=recovering_parser).getroot()

ToStringsample1 = etree.tostring(sample1)
sample1String = etree.fromstring(ToStringsample1, parser=recovering_parser)

ToStringsample2 = etree.tostring(sample2)
sample2String = etree.fromstring(ToStringsample2, parser=recovering_parser)

timesample1 = sample1String.findall('{http://www.example.org/eHorizon}time')
timesample2 =  sample2String.findall('{http://www.example.org/eHorizon}time')

for i,j in zip(timesample1,timesample2):       

    for k,l in zip(i.findall("{http://www.example.org/eHorizon}feature"), j.findall("{http://www.example.org/eHorizon}feature")):

        if [k.attrib.get('color'), k.attrib.get('type')] != [l.attrib.get('color'), l.attrib.get('type')]:

            faultyLine = [k.attrib.get('color'), k.attrib.get('type'), k.text]


def high(event):

    textbox.tag_configure("yellow", background="yellow")
    limit_1 = '<p1:time nTimestamp="{0}">'.format(5)     #limit my search between timestamp 5 and timestamp 6 
    limit_2 = '<p1:time nTimestamp="{0}">'.format((5+1)) # timestamp 6

    highlightString = '<p1:feature color="{0}" type="{1}">{2}</p1:feature>'.format(faultyLine[0],faultyLine[1],faultyLine[2]) #string to be highlighted

    textbox.highlight_pattern(limit_1, "yellow", start=textbox.search(limit_1, '1.0', stopindex=END), end=textbox.search(limit_2, '1.0', stopindex=END))
    textbox.highlight_pattern(highlightString, "yellow", start=textbox.search(limit_1, '1.0', stopindex=END), end=textbox.search(limit_2, '1.0', stopindex=END))


button = 'press here to highlight error line' 
c = ttk.Label(root, text=button)
c.bind("<Button-1>",high)
c.pack()  

root.mainloop()

我想要的是

如果运行上面的代码，它将显示以下输出：

如您在图像中看到的，我只打算突出显示带有绿色勾号的代码。你们中有些人可能会考虑限制开始和结束索引以突出显示该模式。但是，如果您在我的程序中看到了，我已经在使用开始和结束索引来限制我的输出为only nTimestamp="5"，为此我正在使用limit_1andlimit_2变量。

那么，在这种类型的数据中，如何正确地从多个内部个体中突出显示一个模式nTimestamp？

编辑：在这里，我特别想突出显示第3个项目，nTimestamp="5"因为sample2.xml您没有在两个xml文件中看到该项目，因此在程序运行时它也对此有所区别。唯一的问题是突出显示正确的项目，在我看来，这是第3个项目。

我使用高亮类从布赖恩Oakley的代码在这里

编辑最近

考虑到科比约翰在下面的评论中提出的要求，目标文件永远不会为空。目标文件总是有可能包含多余或缺失的元素。最后，我当前的意图是仅突出显示不同或缺失的深层元素timestamps。但是，高亮显示timestamps正确完成了，但是高亮显示上述深层元素的问题仍然是一个问题。谢谢kobejohn澄清了这一点。

笔记：

我知道的一种方法，您可能会建议正确工作的方法是提取绿色勾号图案的索引，然后在其上运行突出显示标签，但是这种方法很难编码，并且在大数据中需要处理很多变化是完全无效的。我正在寻找另一个更好的选择。

神户约翰

此解决方案通过在您提供的描述之间base.xml并test.xml基于您提供的描述进行简化的区别来工作。差异结果是结合原始树的第三棵XML树。输出是差异，文件之间不匹配的行以颜色编码突出显示。

我希望您可以使用它或使其适应您的需求。

复制粘贴脚本

import copy
from lxml import etree
import Tkinter as tk


# assumption: the root element of both trees is the same
# note: missing subtrees will only have the parent element highlighted


def element_content_equal(e1, e2):
    # starting point here: http://stackoverflow.com/a/24349916/377366
    try:
        if e1.tag != e1.tag:
            return False
        elif e1.text != e2.text:
            return False
        elif e1.tail != e2.tail:
            return False
        elif e1.attrib != e2.attrib:
            return False
    except AttributeError:
        # e.g. None is passed in for an element
        return False
    return True


def element_is_in_sequence(element, sequence):
    for e in sequence:
        if element_content_equal(e, element):
            return True
    return False


def copy_element_without_children(element):
    e_copy = etree.Element(element.tag, attrib=element.attrib, nsmap=element.nsmap)
    e_copy.text = element.text
    e_copy.tail = element.tail
    return e_copy


# start at the root of both xml trees
parser = etree.XMLParser(recover=True, remove_blank_text=True)
base_root = etree.parse('base.xml', parser=parser).getroot()
test_root = etree.parse('test.xml', parser=parser).getroot()
# each element from the original xml trees will be placed into a merge tree
merge_root = copy_element_without_children(base_root)


# additionally each merge tree element will be tagged with its source
DIFF_ATTRIB = 'diff'
FROM_BASE_ONLY = 'base'
FROM_TEST_ONLY = 'test'

# process the pair of trees, one set of parents at a time
parent_stack = [(base_root, test_root, merge_root)]
while parent_stack:
    base_parent, test_parent, merge_parent = parent_stack.pop()
    base_children = base_parent.getchildren()
    test_children = test_parent.getchildren()

    # compare children and transfer to merge tree
    base_children_iter = iter(base_children)
    test_children_iter = iter(test_children)
    base_child = next(base_children_iter, None)
    test_child = next(test_children_iter, None)
    while (base_child is not None) or (test_child is not None):
        # first handle the case of a unique base child
        if (base_child is not None) and (not element_is_in_sequence(base_child, test_children)):
            # base_child is unique: deep copy with base only tag
            merge_child = copy.deepcopy(base_child)
            merge_child.attrib[DIFF_ATTRIB] = FROM_BASE_ONLY
            merge_parent.append(merge_child)
            # this unique child has already been fully copied to the merge tree so it doesn't go on the stack
            # only move the base child since test child hasn't been handled yet
            base_child = next(base_children_iter, None)
        elif (test_child is not None) and (not element_is_in_sequence(test_child, base_children)):
            # test_child is unique: deep copy with base only tag
            merge_child = copy.deepcopy(test_child)
            merge_child.attrib[DIFF_ATTRIB] = FROM_TEST_ONLY
            merge_parent.append(merge_child)
            # this unique child has already been fully copied to the merge tree so it doesn't go on the stack
            # only move test child since base child hasn't been handled yet
            test_child = next(test_children_iter, None)
        elif element_content_equal(base_child, test_child):
            # both trees share the same element: shallow copy either child with shared tag
            merge_child = copy_element_without_children(base_child)
            merge_parent.append(merge_child)
            # put pair of children on stack as parents to be tested since their children may differ
            parent_stack.append((base_child, test_child, merge_child))
            # move on to next children in both trees since this was a shared element
            base_child = next(base_children_iter, None)
            test_child = next(test_children_iter, None)
        else:
            raise RuntimeError  # there is something wrong - element should be unique or shared.

# display merge_tree with highlighting to indicate source of each line
#   no highlight: common element in both trees
#   green: line that exists only in test tree (i.e. additional)
#   red: line that exists only in the base tree (i.e. missing)
root = tk.Tk()
textbox = tk.Text(root)
textbox.pack(expand=1, fill=tk.BOTH)
textbox.tag_config(FROM_BASE_ONLY, background='#ff5555')
textbox.tag_config(FROM_TEST_ONLY, background='#55ff55')

# find diff lines to highlight within merge_tree string that includes kludge attributes
merge_tree_string = etree.tostring(merge_root, pretty_print=True)
diffs_by_line = []
for line, line_text in enumerate(merge_tree_string.split('\n')):
    for diff_type in (FROM_BASE_ONLY, FROM_TEST_ONLY):
        if diff_type in line_text:
            diffs_by_line.append((line+1, diff_type))

# remove kludge attributes
for element in merge_root.iter():
    try:
        del(element.attrib[DIFF_ATTRIB])
    except KeyError:
        pass
merge_tree_string = etree.tostring(merge_root, pretty_print=True)

# highlight final lines
textbox.insert(tk.END, merge_tree_string)
for line, diff_type in diffs_by_line:
    textbox.tag_add(diff_type, '{}.0'.format(line), '{}.0'.format(int(line)+1))
root.mainloop()

输入：

请注意，我清理了xml，因为与原始XML的行为不一致。原始的基本上是使用反斜杠代替正斜杠，并且在打开标签上也使用了虚假的结束斜杠。

base.xml （与此脚本位于同一位置）

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<p1:sample1 xmlns:p1="http://www.example.org/eHorizon">
   <p1:time nTimestamp="5">
      <p1:location hours = "1" path = '1'>
         <p1:feature color="6" type="a">560</p1:feature>
         <p1:feature color="2" type="a">564</p1:feature>
         <p1:feature color="3" type="b">570</p1:feature>
         <p1:feature color="4" type="c">570</p1:feature>
      </p1:location>
      <p1:location hours = "5" path = '1'>
         <p1:feature color="6" type="a">560</p1:feature>
         <p1:feature color="7" type="b">570</p1:feature>
         <p1:feature color="8" type="c">580</p1:feature>
      </p1:location>
      <p1:location hours = "5" path = '1'>
         <p1:feature color="6" type="a">560</p1:feature>
      </p1:location>
   </p1:time>
   <p1:time nTimestamp="6">
      <p1:location hours = "1" path = '1'>
         <p1:feature color="2" type="a">564</p1:feature>
         <p1:feature color="3" type="b">570</p1:feature>
         <p1:feature color="4" type="c">570</p1:feature>
      </p1:location>
      <p1:location hours = "5" path = '1'>
         <p1:feature color="6" type="a">560</p1:feature>
         <p1:feature color="9" type="b">590</p1:feature>
         <p1:feature color="10" type="c">600</p1:feature>
      </p1:location>
      <p1:location hours = "5" path = '1'>
         <p1:feature color="6" type="a">560</p1:feature>
         <p1:feature color="7" type="b">570</p1:feature>
         <p1:feature color="8" type="c">580</p1:feature>
      </p1:location>
   </p1:time>
</p1:sample1>

test.xml （与此脚本位于同一位置）

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<p1:sample1 xmlns:p1="http://www.example.org/eHorizon">
   <p1:time nTimestamp="5">
      <p1:location hours = "1" path = '1'>
         <p1:feature color="6" type="a">560</p1:feature>
         <p1:feature color="2" type="a">564</p1:feature>
         <p1:feature color="3" type="b">570</p1:feature>
         <p1:feature color="4" type="c">570</p1:feature>
      </p1:location>
      <p1:location hours = "5" path = '1'>
         <p1:feature color="6" type="a">560</p1:feature>
         <p1:feature color="7" type="b">570</p1:feature>
         <p1:feature color="8" type="c">580</p1:feature>
      </p1:location>
      <p1:location hours = "5" path = '1'>
         <p1:feature color="9" type="b">1111</p1:feature>
         <p1:feature color="10" type="c">2222</p1:feature>
      </p1:location>
   </p1:time>
   <p1:time nTimestamp="6">
      <p1:location hours = "1" path = '1'>
         <p1:feature color="2" type="a">564</p1:feature>
         <p1:feature color="3" type="b">570</p1:feature>
         <p1:feature color="4" type="c">570</p1:feature>
      </p1:location>
      <p1:location hours = "5" path = '1'>
         <p1:feature color="6" type="a">560</p1:feature>
         <p1:feature color="9" type="b">590</p1:feature>
         <p1:feature color="10" type="c">600</p1:feature>
      </p1:location>
      <p1:location hours = "5" path = '1'>
         <p1:feature color="6" type="a">560</p1:feature>
         <p1:feature color="7" type="b">570</p1:feature>
         <p1:feature color="8" type="c">580</p1:feature>
      </p1:location>
   </p1:time>
</p1:sample1>

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-06-13

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

在Tkinter文本框中突出显示两个xml文件之间的差异

在Tkinter文本框中突出显示两个xml文件之间的差异

输入：

我正在尝试在两个文本框中查找两次之间的差异

同时突出显示带有两个单独文本框的文本框文本？

如何查找并突出显示两个文本文件之间的差异？

两个日期的差异显示在另一个文本框中

在customMessageBox中显示两个文本框？

如何找到在文本框中输入的两个数字之间的差异？

如何找到在文本框中输入的两个数字之间的差异？

在第三个文本框中显示两个文本框文本

PHP比较两个CSV文件并突出显示差异

Bash脚本，用于显示两个文本文件之间的差异

如何在引导程序中的两个文本框之间留出空间

将两个输入与JavaScript相乘并在文本框中显示

在两个文本框中显示名字和姓氏？

将两个输入与JavaScript相乘并在文本框中显示

两个文件之间的差异显示空行

在marklogic中找到两个xml文件之间的差异

Python difflib 比较两个 csv 文件并突出显示 HTML 输出中的世界级差异

如何比较两个富文本框内容并突出显示已更改的字符？

将两个文本框值相乘并在第三个文本框中显示总和

Alfresco工作流程表单添加两个文本框值并自动在第三个文本框中显示总和

txt文件，python中两个值之间的差异

txt文件，python中两个值之间的差异

将同一文本框中的两个值相乘，并用符号分隔，并在其他文本框中显示结果

当我从文本框中减去两个时间（移位时间）并在文本框中显示输出时，该值返回为未定义

在文本框控件中显示xml文件

Javascript - 如何添加两个文本框的结果并在第三个中显示结果？

使tkinter在文本框中显示输出

比较两个不同工作表中的两列，然后突出显示差异

在用户窗体文本框中突出显示文本