我有一些twitter数据,我将文本分为优雅和Python形式的带有快乐表情和悲伤表情的文本,如下所示:
happy_set = [":)",":-)","=)",":D",":-D","=D"]
sad_set = [":(",":-(","=("]
happy = [tweet.split() for tweet in data for face in happy_set if face in tweet]
sad = [tweet.split() for tweet in data for face in sad_set if face in tweet]
但是,这种方法行之有效,可能是这样的情况:来自happy_set
和的图释sad_set
都可以在单个推文中找到。确保happy
列表仅包含来自happy_set
和反之的表情符号的pythonic方法是什么?
您可以尝试使用集合,特别是set.isdisjoint
。检查快乐鸣叫中的令牌集是否与断开sad_set
。如果是这样,它绝对属于happy
:
happy_set = set([":)",":-)","=)",":D",":-D","=D"])
sad_set = set([":(",":-(","=("])
# happy is your existing set of potentially happy tweets. To remove any tweets with sad tokens...
happy = [tweet for tweet in happy if sad_set.isdisjoint(set(tweet.split()))]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句