I have some twitter data and I split the text into those with happy emoticons and sad emoticons elegantly and pythonically like so:
happy_set = [":)",":-)","=)",":D",":-D","=D"]
sad_set = [":(",":-(","=("]
happy = [tweet.split() for tweet in data for face in happy_set if face in tweet]
sad = [tweet.split() for tweet in data for face in sad_set if face in tweet]
This works, however, it could be the case that both an emoticon from the happy_set
and sad_set
could be found in a single tweet. What is the pythonic way to ensure that the happy
list only contains emoticons from the happy_set
and vice versa?
You could try using sets, specifically set.isdisjoint
. Check to see if the set of tokens in a happy tweet is disjoint from sad_set
. If so, it definitely belongs in happy
:
happy_set = set([":)",":-)","=)",":D",":-D","=D"])
sad_set = set([":(",":-(","=("])
# happy is your existing set of potentially happy tweets. To remove any tweets with sad tokens...
happy = [tweet for tweet in happy if sad_set.isdisjoint(set(tweet.split()))]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments