我有一个带有文本列的数据框,其中包含多种格式的日期。我已经为所有格式编写了正则表达式。我可以单独运行正则表达式,但是当我尝试一次在数据帧上同时运行它们时,我不断收到错误“ re.error:将组名'month'重新定义为组4;位置为组1” 66“
d = [{'text':'03/25/93 Total time of visit (in minutes):'}, {'text':'April 11, 1990 CPT Code: 90791: No medical services'},
{'text':'29 Jan 1994 Primary Care Doctor:'}, {'text':'s1981 Swedish-American Hospital'}]
mdf = pd.DataFrame(d, index=[1,2,3,4])
regexpattern1 = r'(?P<month>\b\d{1,2})[/-](?P<day>\d{1,2})[/-](?P<year>\d{2})\b'
regexpattern2 = r'(?P<month>(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))[.]?[a-z]*(?:,|\s|\-)?(?P<day>\d{2})(?:\-|,|\s)? (?P<year>\d{4})'
regexpattern3 = r'(?P<day>\d{2}) (?P<month>(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))[.]?[a-z]*[,]? (?P<year>\d{4})'
regexpattern4 = r'(?P<month>)(?P<day>)\b[a-za-z]+(?P<year>\d{4})'
# mdf[['month', 'day', 'year']] = mdf['text'].str.extract(regexpattern4) # runs individually
mdf[['month', 'day', 'year']] = mdf['text'].str.extract("|".join([regexpattern1, regexpattern2, regexpattern3, regexpattern4])) # raises error
print(mdf)
Expected Output:
text month day year
1 03/25/93 Total time of visit (in minutes): 03 25 93
2 April 11, 1990 CPT Code: 90791: No medical services Apr 11 1990
3 29 Jan 1994 Primary Care Doctor: Jan 29 1994
4 s1981 Swedish-American Hospital NaN NaN 1981
Dint完全可以满足您的要求,但这可能可以帮助您基本上进行正则表达式month(and this can be expanded to cover the 12 months)
并提取带特殊字符的数字,,\
但前提是这些字符不能跟在后面:
mdf['date']=mdf.text.str.findall('(\b(?:Ma(?:rch)?)|Apr(?:il)?|Jan|[\,\/\d+]+)(?![\d+:])')
mdf.date= [",".join(line) for line in mdf.date.values]#Remove [] brackets
mdf['date']=pd.to_datetime(mdf['date'].str.replace('/', '-'))#.dt.strftime('%d-%m-%Y')(Coerce to datetime)
提取天,月和年
mdf['day']=mdf.index.day
mdf['month']=mdf.index.month
mdf['year']=mdf.index.year
mdf.reset_index(drop=True,inplace=True)
print(mdf)
text date day month \
0 03/25/93 Total time of visit (in minutes): 1993-03-25 25 3
1 April 11 1990 CPT Code: 90791: No medical serv... 1990-04-11 11 4
2 29 Jan 1994 Primary Care Doctor: 1994-01-29 29 1
3 s1981 Swedish-American Hospital 1981-01-01 1 1
year
0 1993
1 1990
2 1994
3 1981
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句