使用python从多个文本文件中提取数据

丹尼尔

我试图同时从几个文本文件中提取数据。

import fileinput

num_lines = sum(1 for line in open('2grams.txt'))  ## in order not to print junk

count = 0
f0 = open("2gram_glues.txt", 'r')
f1 = open("2grams.txt", 'r')
f2 = open("output.txt", 'w')
f3 = open('2mwus.txt', 'r')

with fileinput.input(files=('2grams.txt', '2gram_glues.txt', '2mwus.txt')) as f:
    for line in f:
        f3.seek(0, 0)

        for line1 in f3:

            if line == line1:
                f2.write("The 2 Gram is: " + line.strip() + "\t The score is: " + f0.readline())
                count += 1
                if count >= num_lines:
                    break


f0.close()
f1.close()
f2.close()
f3.close()

2grams.txt和2gram_glues.txt分别具有相同数量的行和数据(在这些行上),但是,我实际上要写入输出文件的数据是2mwus.txt中与数据相交的数据2grams.txt具有不同的行数。

问题是我要打印与2gram_glues.txt串联的2mwus.txt(包含一个分数)。

我从2gram_glues.txt获得的分数是有序的,而不是2mwus.txt的分数。

写数据时我做错了什么?

文本文件的链接:

https://drive.google.com/folderview?id=0B1oTQq97VF44V1p3MEZwQkhqTjQ&usp=sharing

麦克风

我认为您不需要使用fileinput:

num_lines = sum(1 for line in open('2grams.txt'))  ## in order not to print junk

count = 0
intersect = open('2grams.txt', 'r')
out_file = open("output.txt", 'w')
scores = open("2gram_glues.txt", 'r')

with open('2mwus.txt', 'r') as base:
    for line in base:

        line = line.rstrip()
        number = line[-2:]
        number = int(number.lstrip())

        line = line[:-2]
        line = line.rstrip()

        intersect.seek(0, 0)
        scores_lines=scores.readlines()
        scores.seek(0, 0)

        for i, line_intersect in enumerate(intersect):
            line_intersect= line_intersect.rstrip()
            if line == line_intersect:
                print("**The 2 Gram is: " + line.strip() + "\t The score is: " + scores_lines[i] +
                      'The number is ' + str(number))
                count += 1
                if count >= num_lines:
                    break

intersect.close()
out_file.close()
scores.close()

切片和条带化

从:

'(850,·900,\t12·'
'(frequencies·850,\t4·'
'phone·but\t2·'

#\t denotes tabulation, · denotes spaces

使用:

line = line.rstrip()

使得:

'(850,·900,\t12'
'(frequencies·850,\t4'
'phone·but\t2'

然后得到数字:

number = line[-2:]

给出:

'12'
'\t4'
'\t2'

然后左剥离数字:

number = int(number.lstrip())

给出:

12
4
2

继续我们的“路线”:

'(850,·900,\t12'
'(frequencies·850,\t4'
'phone·but\t2'

使用

line = line[:-2]
line = line.rstrip()

给出:

'(850, 900,'
'(frequencies 850,'
'phone but'

有点麻烦,但避免使用RegEx的必要性

输出

**The 2 Gram is: (850, 900,  The score is: 0.857143
The number is 12
**The 2 Gram is: (Bands 4    The score is: 0.4
The number is 2
**The 2 Gram is: (frequencies 850,   The score is: 1
The number is 4
**The 2 Gram is: 1, 3,   The score is: 1
The number is 8
**The 2 Gram is: 13, 25)     The score is: 0.666667
The number is 2
**The 2 Gram is: 1800, 1900  The score is: 1
The number is 8
**The 2 Gram is: 1900, 2100  The score is: 1
The number is 10
**The 2 Gram is: 5 compatible    The score is: 0.444444
The number is 2
**The 2 Gram is: A1428: UMTS/HSPA+/DC-HSDPA  The score is: 0.5
The number is 2
**The 2 Gram is: A1429: UMTS/HSPA+/DC-HSDPA  The score is: 0.4
The number is 2
**The 2 Gram is: Australia, Germany,     The score is: 1
The number is 2
**The 2 Gram is: B (800,     The score is: 1
The number is 2
**The 2 Gram is: Full specs  The score is: 1
The number is 2
**The 2 Gram is: GSM model   The score is: 0.428571
The number is 6
**The 2 Gram is: In deciding     The score is: 1
The number is 2
**The 2 Gram is: KDDI network    The score is: 0.5
The number is 2
**The 2 Gram is: South Korea).   The score is: 1
The number is 2
**The 2 Gram is: UMTS/HSPA+/DC-HSDPA (850,   The score is: 0.666667
The number is 6
**The 2 Gram is: US AT&T     The score is: 1
The number is 2
**The 2 Gram is: US, along   The score is: 1
The number is 2
**The 2 Gram is: bands 4     The score is: 0.4
The number is 2
**The 2 Gram is: bands, making   The score is: 1
The number is 2
**The 2 Gram is: battery life    The score is: 0.363636
The number is 2
**The 2 Gram is: blazing fast    The score is: 1
The number is 2
**The 2 Gram is: didn't come     The score is: 0.666667
The number is 3
**The 2 Gram is: fact that   The score is: 0.4
The number is 3
**The 2 Gram is: iPhone 5    The score is: 0.526316
The number is 5
**The 2 Gram is: meet compatibility  The score is: 1
The number is 2
**The 2 Gram is: model A1429:    The score is: 0.5
The number is 4
**The 2 Gram is: networks in     The score is: 0.258065
The number is 4
**The 2 Gram is: networks. However,  The score is: 1
The number is 2
**The 2 Gram is: one GSM.    The score is: 0.363636
The number is 2
**The 2 Gram is: phone but   The score is: 0.1
The number is 2
**The 2 Gram is: phone. This     The score is: 0.444444
The number is 2
**The 2 Gram is: release three   The score is: 0.8
The number is 2
**The 2 Gram is: sim card    The score is: 0.8
The number is 2
**The 2 Gram is: standards worldwide.    The score is: 1
The number is 2
**The 2 Gram is: support LTE     The score is: 0.296296
The number is 4
**The 2 Gram is: the phone   The score is: 0.188679
The number is 10
**The 2 Gram is: to my   The score is: 0.12
The number is 3
**The 2 Gram is: works great     The score is: 0.4
The number is 2

带回家的想法:

  • 注意空格,rstrip是您的盟友。
  • 使用f1,f2和f3很直观,但是从长远来看,您会感到困惑。使用有意义的名称!

希望能帮助到你!

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

从文本文件中提取数据

来自分类Dev

从文本文件中提取数据

来自分类Dev

从文本文件中提取数据

来自分类Dev

使用Python从多个文本文件中的多个字典中提取键值对

来自分类Dev

从Python中的文本文件中提取数值数据

来自分类Dev

从Python的文本文件中的字段中提取数据

来自分类Dev

从文本文件中提取数据(python)

来自分类Dev

从python 3中的文本文件中提取数据

来自分类Dev

使用MATLAB从文本文件中提取数据

来自分类Dev

使用bash从文本文件中提取数据

来自分类Dev

使用bash从文本文件中提取数据

来自分类Dev

使用Pandas从文本文件中提取标题数据

来自分类Dev

使用 for 循环从文本文件中提取数据

来自分类Dev

从文本文件中提取多个模式并将其保存到熊猫数据框[python]

来自分类Dev

使用Python从文本文件中提取数据并写入新文件

来自分类Dev

使用Python从文本文件中提取数据并写入新文件

来自分类Dev

使用Python从文本文件中提取数值

来自分类Dev

使用 python 从文本文件中提取特定行

来自分类Dev

如何针对每个单独的文件名从多个文本文件中提取数据?

来自分类Dev

如何从文本文件中提取多个圣经经文?

来自分类Dev

Python程序从文本文件中提取文本?

来自分类Dev

从文本文件中提取文本的Python程序?

来自分类Dev

使用Python将数据文本文件拆分为多个MySQL文本文件

来自分类Dev

从文本文件中提取模式之间的数据

来自分类Dev

Linux:从文本文件中提取数据

来自分类Dev

从文本文件中提取数据到csv

来自分类Dev

如何从文本文件中提取数据?

来自分类Dev

Matlab脚本从文本文件中提取数据

来自分类Dev

从文本文件中提取数据列表

Related 相关文章

热门标签

归档