Python中的CSV解析

debugcn 发表于 Dev

Xentius

我想解析以下格式的csv文件：

Test Environment INFO for 1 line.
Test,TestName1,
TestAttribute1-1,TestAttribute1-2,TestAttribute1-3
TestAttributeValue1-1,TestAttributeValue1-2,TestAttributeValue1-3

Test,TestName2,
TestAttribute2-1,TestAttribute2-2,TestAttribute2-3
TestAttributeValue2-1,TestAttributeValue2-2,TestAttributeValue2-3

Test,TestName3,
TestAttribute3-1,TestAttribute3-2,TestAttribute3-3
TestAttributeValue3-1,TestAttributeValue3-2,TestAttributeValue3-3

Test,TestName4,
TestAttribute4-1,TestAttribute4-2,TestAttribute4-3
TestAttributeValue4-1-1,TestAttributeValue4-1-2,TestAttributeValue4-1-3
TestAttributeValue4-2-1,TestAttributeValue4-2-2,TestAttributeValue4-2-3
TestAttributeValue4-3-1,TestAttributeValue4-3-2,TestAttributeValue4-3-3

并希望将其转换为制表符分隔的格式，如下所示：

TestName1
TestAttribute1-1 TestAttributeValue1-1
TestAttribute1-2 TestAttributeValue1-2
TestAttribute1-3 TestAttributeValue1-3

TestName2
TestAttribute2-1 TestAttributeValue2-1
TestAttribute2-2 TestAttributeValue2-2
TestAttribute2-3 TestAttributeValue2-3


TestName3
TestAttribute3-1 TestAttributeValue3-1
TestAttribute3-2 TestAttributeValue3-2
TestAttribute3-3 TestAttributeValue3-3

TestName4
TestAttribute4-1 TestAttributeValue4-1-1 TestAttributeValue4-2-1 TestAttributeValue4-3-1
TestAttribute4-2 TestAttributeValue4-1-2 TestAttributeValue4-2-2 TestAttributeValue4-3-2
TestAttribute4-3 TestAttributeValue4-1-3 TestAttributeValue4-2-3 TestAttributeValue4-3-3

TestAttribute的数量因测试而异。对于某些测试，只有3个值，对于其他一些，则只有7个，等等。同样在TestName4示例中，某些测试执行了多次，因此每次执行都有自己的TestAttributeValue行。（在示例中，testname4被执行了3次，因此我们有3条值行）

我是python的新手，虽然知识不多，但是想用python解析csv文件。我检查了python的'csv'库，不确定是否足以满足我的需求，还是应该编写自己的字符串解析器？请你帮助我好吗？

最好

斯坦纳·利马（Steinar Lima）

我会使用itertools.groupby函数和csv模块的解决方案。请仔细阅读itertools的文档-您可以比您想象的更频繁地使用它！

我使用空行来区分数据集，这种方法使用了惰性求值，一次只在内存中存储一个数据集：

import csv
from itertools import groupby

with open('my_data.csv') as ifile, open('my_out_data.csv', 'wb') as ofile:
    # Use the csv module to handle reading and writing of delimited files.
    reader = csv.reader(ifile)
    writer = csv.writer(ofile, delimiter='\t')
    # Skip info line
    next(reader)
    # Group datasets by the condition if len(row) > 0 or not, then filter
    # out all empty lines
    for group in (v for k, v in groupby(reader, lambda x: bool(len(x))) if k):
        test_data = list(group)
        # Write header
        writer.writerow([test_data[0][1]])
        # Write transposed data
        writer.writerows(zip(*test_data[1:]))
        # Write blank line
        writer.writerow([])

假定提供的数据存储在中，则输出my_data.csv：

TestName1
TestAttribute1-1    TestAttributeValue1-1
TestAttribute1-2    TestAttributeValue1-2
TestAttribute1-3    TestAttributeValue1-3

TestName2
TestAttribute2-1    TestAttributeValue2-1
TestAttribute2-2    TestAttributeValue2-2
TestAttribute2-3    TestAttributeValue2-3

TestName3
TestAttribute3-1    TestAttributeValue3-1
TestAttribute3-2    TestAttributeValue3-2
TestAttribute3-3    TestAttributeValue3-3

TestName4
TestAttribute4-1    TestAttributeValue4-1-1 TestAttributeValue4-2-1 TestAttributeValue4-3-1
TestAttribute4-2    TestAttributeValue4-1-2 TestAttributeValue4-2-2 TestAttributeValue4-3-2
TestAttribute4-3    TestAttributeValue4-1-3 TestAttributeValue4-2-3 TestAttributeValue4-3-3

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。