我有一个包含数千个联系人记录的 vCard 文件。此文件已损坏,并且已为每个用户添加了个人电话、工作和额外记录的副本。
我怎样才能清理重复项?
BEGIN:VCARD
VERSION:3.0
N:Doe;John;Q.,Public
FN;CHARSET=UTF-8:John Doe
TEL;TYPE=WORK,VOICE:(111) 555-1212
TEL;TYPE=WORK,VOICE:(111) 555-1212
TEL;TYPE=WORK,VOICE:(111) 555-1212
TEL;TYPE=WORK,VOICE:(111) 555-1212
TEL;TYPE=HOME,VOICE:(404) 555-1212
TEL;TYPE=HOME,VOICE:(404) 555-1212
TEL;TYPE=HOME,VOICE:(404) 555-1212
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
TEL;TYPE=HOME,VOICE:(404) 555-1212
TEL;TYPE=HOME,VOICE:(404) 555-1212
TEL;TYPE=HOME,VOICE:(404) 555-1212
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
ADR;TYPE=HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America
URL:https://www.google.com/
PHOTO;VALUE=URL;TYPE=PNG:http://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Example_svg.svg/200px-Example_svg.svg.png
AGENT:BEGIN:VCARD
VERSION:3.0
N:Doe;John;Q.,Public
FN:John Doe
TEL;TYPE=WORK,VOICE:(111) 555-1212
TEL;TYPE=HOME,VOICE:(404) 555-1212
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
PHOTO;VALUE=URL;TYPE=PNG:http://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Example_svg.svg/200px-Example_svg.svg.png
END:VCARD
END:VCARD
我使用了 StackOverflow 中看到的以下解决方案,但它没有解决问题,因为并非所有重复项都连续出现。
perl -ne 'print unless (defined($prev) && ($_ eq $prev)); $prev=$_'
导致:
...
TEL;TYPE=WORK,VOICE:(111) 555-1212
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
TEL;TYPE=WORK,VOICE:(111) 555-1212
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
TEL;TYPE=WORK,VOICE:(111) 555-1212
TEL;TYPE=HOME,TYPE=VOICE:(404) 555-1213
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
EMAIL;TYPE=PREF,INTERNET:[email protected]
EMAIL;TYPE=INTERNET:[email protected]
删除所有重复行的最简单方法是
perl -ne 'print if !$seen{$_}++'
如果要分别对待每个BEGIN:VCARD
部分,
perl -ne '%seen = () if /\bBEGIN:VCARD\b/; print if !$seen{$_}++'
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句