使用Python和pymongo的多线程

芬肯德里

您好,我希望制作一个程序来对推文进行正向和负向分类,以分类关于已经保存在mongodb中并且一旦被分类的公司的推文,然后根据结果更新整数。

我已经编写了代码,使之成为可能,但是我想对程序进行多线程处理,但是我在python中没有任何经验,并且一直试图遵循教程,但是运气不好,因为程序只是在不经过任何程序的情况下启动和退出代码。

如果有人可以帮助我,将不胜感激。该程序和预期的多线程代码如下。

from textblob.classifiers import NaiveBayesClassifier
import pymongo
import datetime
from threading import Thread

train = [
('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg'),
(':)', 'pos'),
(':(', 'neg'),
('gr8', 'pos'),
('gr8t', 'pos'),
('lol', 'pos'),
('bff', 'neg'),
]

test = [
'The beer was good.',
'I do not enjoy my job',
"I ain't feeling dandy today.",
"I feel amazing!",
'Gary is a friend of mine.',
"I can't believe I'm doing this.",
]

filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Yahoo', 'Apple',   'Google', 'Amazon', 'EBay', 'Diageo',
              'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
              'Investec', 'WWE', 'Time Warner', 'Santander Group']

# Create pos/neg counter variables for each company using dicts
vars = {}
for word in filterKeywords:
vars[word + "SentimentOverall"] = 0


# Initialising the classifier
cl = NaiveBayesClassifier(train)


class TrainingClassification():
    def __init__(self):
        #creating the mongodb connection
        try:
            conn = pymongo.MongoClient('localhost', 27017)
            print "Connected successfully!!!"
            global db
            db = conn.TwitterDB
        except pymongo.errors.ConnectionFailure, e:
            print "Could not connect to MongoDB: %s" % e

        thread1 = Thread(target=self.apple_thread, args=())
        thread1.start()
        thread1.join()
        print "thread finished...exiting"

    def apple_thread(self):
        appleSentimentText = []
        for record in db.Apple.find():
            if record.get('created_at'):
                created_at = record.get('created_at')
                dt = datetime.strptime(created_at, '%a %b %d %H:%M:%S +0000 %Y')
                if record.get('text') and dt > datetime.today():
                    appleSentimentText.append(record.get("text"))
        for targetText in appleSentimentText:
            classificationApple = cl.classify(targetText)
            if classificationApple == "pos":
                vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] + 1
            elif classificationApple == "neg":
                vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] - 1
德夫沙克

您的代码的主要问题在这里:

thread1.start()
thread1.join()

当您在线程上调用join时,它的作用是使当前正在运行的线程(在您的情况下为主线程)等待直到线程(此处为thread1)完成。因此,您可以看到您的代码实际上不会更快。它只是启动一个线程并等待它。实际上,由于线程创建,它会稍微慢一些。

这是进行多线程处理的正确方法:

thread1.start()
thread2.start()
thread1.join()
thread2.join()

在此代码中,线程1和2都将并行运行。

重要提示:请注意,在Python中,这是“模拟”并行化。因为Python的内核不是线程安全的(主要是因为它执行垃圾回收的方式),所以它使用GIL(全局解释器锁),因此进程中的所有线程只能在1个内核上运行。如果您热衷于使用真正的并行化(例如,如果您的2个线程是CPU范围而不是I / O范围),那么请看一下多处理模块。

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章