这是txt文件中的两种单独的数据类型，我该如何使用熊猫插入每一行并添加相应的数据

debugcn 发表于 Dev

马拉本斯基

我最近获得了当地体育馆的数据，并试图对数据进行规范化，以便创建“健身房注册”对象，其中包含所有已注册该会话的人员。

文本文件如下所示：https : //pastebin.com/YcnSJiA7

Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
JD  John Doe    
AW  Alice Wonderland    
IM  Iron Man
Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM
JD  John Doe    
AW  Alice Wonderland    
IM  Iron Man

我已经能够使用熊猫按列[名称，名称的首字母缩写]来分隔签收，但我不知道如何检测何时某行对应于该时隙而不是签收某人。

因此，程序运行后，每一行应包含列[名称，名称，时间段的缩写]

对于我来说，处理这些数据的最简单方法就是采用这种格式，


JD  John Doe    Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
AW  Alice Wonderland    Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
IM  Iron Man    Sep 30th  '20 at 9:00AM Until Sep 30th  '20 at 10:00AM
JD  John Doe    Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM
AW  Alice Wonderland    Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM
IM  Iron Man      Sep 30th  '20 at 8:00AM Until Sep 30th  '20 at 9:00AM

我尝试遍历每行，一旦出现时隙行，然后将该行追加到下一行，直到出现新的时隙。

def testSort():
    with open("1-weak-gym.txt") as fp:
        id= []
        totalSheet=[]
        timeSlot = []
        lastLine=[]
        for ln in fp:
            if ln.startswith("Sep"): ##this is a time slot
                timeSlot.clear()
                timeSlot.append(ln[0:]) ##save that time slot as the lastDate variable
            else:
                if (timeSlot):
                    totalSheet.append(timeSlot) ##append the time slot
                    totalSheet.append(ln[0:]) ##append the name line
                else:
                    print('Hello eror')

    print(totalSheet, file=open("newOuput.txt","a"))

麻雀

您可以尝试这种方法（如果您在标题行的末尾有很强的时间模式）：

import re

def is_time_format(s):
    time_re = re.compile(r'\b((1[0-2]|0?[1-9]):([0-5][0-9])([AaPp][Mm]))')
    return bool(time_re.match(s))

with open("1-weak-gym.txt") as fp:
    new_lines = []
    extra_info = ''
    for line in fp:
        last_bit = line.split(' ')[-1]
        if is_time_format(last_bit):
            extra_info = line
            continue
        else:
            new_lines.append(line.rstrip() + '\t' + extra_info)

open("newOutput", 'w').writelines(new_lines)

然后，您将获得正确格式的文件。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。