我在熊猫中有以下数据框
Date Title
58 March 2015 Data Visualization with JavaScript
63 December 2014 Eloquent JavaScript, 2nd Edition
90 October 2014 If Hemingway Wrote JavaScript
96 December 2014 JavaScript for Kids
158 February 2014 Principles of Object-Oriented JavaScript
209 November 2005 Wicked Cool Java
我必须过滤其中包含单词JavaScript的行。我正在关注。
category_javascript = np.where(Publisher['Title'].str.contains(r'(?:\s|^)JavaScript(?:\s|$)'))
它给了我以下的outupt
category_javascript
Out[106]: (array([ 58, 90, 96, 158], dtype=int64),)
63 December 2014 Eloquent JavaScript, 2nd Edition
我认为它不会过滤,因为JavaScript单词后面是逗号。我想找到准确的单词,而不考虑标点符号或组合形式。例如JavaScript-Book也可以。
请帮忙
IIUC,您不需要正则表达式,只需字符串JavaScript
:
category_javascript = np.where(Publisher['Title'].str.contains('JavaScript'))
print (Publisher['Title'].str.contains('JavaScript'))
58 True
63 True
90 True
96 True
158 True
209 False
Name: Title, dtype: bool
print (Publisher[Publisher['Title'].str.contains('JavaScript')])
Date Title
58 March 2015 Data Visualization with JavaScript
63 December 2014 Eloquent JavaScript, 2nd Edition
90 October 2014 If Hemingway Wrote JavaScript
96 December 2014 JavaScript for Kids
158 February 2014 Principles of Object-Oriented JavaScript
您可以将变音符号添加到正则表达式中,例如[,;]
:
print (Publisher['Title'].str.contains('(?:\s|^|[,;])JavaScript(?:\s|$|[,;])'))
58 True
63 True
90 True
96 True
158 True
209 False
Name: Title, dtype: bool
print (Publisher['Title'].str.contains('(?:\s|^|[,;])Java(?:\s|$|[,;])'))
58 False
63 False
90 False
96 False
158 False
209 True
Name: Title, dtype: bool
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句