我有一个包含文件路径和名称的数据框,格式如下:
files_list <- c(
"C:/User/Name/Folder/Subfolder1/Sub-subfolder/file.txt",
"C:/User/Name/Folder/Subfolder1/Sub-subfolder/file - Copy.txt",
"C:/User/Name/Folder/Subfolder1/Sub-subfolder/file (1).txt",
"C:/User/Name/Folder/Subfolder1/Sub-subfolder/file - Copy (2).txt",
"C:/User/Name/Folder/Subfolder1/fileB.txt",
"C:/User/Name/Folder/file.C.txt",
"C:/User/Name/Folder/file-D.txt",
"C:/User/Name/Folder/file",
"C:/User/Name/Folder/file Z.txt",
"C:/User/Name/Folder/file - backup.txt"
)
每个文件都有一个父文件夹和一个名称。这些名称可能包含一个或多个句点“.”。和/或破折号“-”。此外,有些具有“复制”符号、编号指定和/或文件扩展名。我想将数据转换为如下所示的内容:
[1] "Sub-subfolder file txt"
[2] "Sub-subfolder file Copy txt"
[3] "Sub-subfolder file 1 txt"
[4] "Sub-subfolder file Copy 2 txt"
[5] "Subfolder1 fileB txt"
[6] "Folder file.C txt"
[7] "Folder file-D txt"
[8] "Folder file"
[9] "Folder file Z txt"
[10] "Folder file - backup txt"
这是我认为应该起作用的代码:
sub(
"(^.:/)([^/.]+/)*([^/.]+/)([^/]+)(\\s-\\sCopy)?(\\s\\(([0-9]+)\\))?(\\.([^.]+))?$",
"\\3 \\4 \\5 \\7 \\9",
files_list
)
但我得到的是:
[1] "Sub-subfolder/ file.txt "
[2] "Sub-subfolder/ file - Copy.txt "
[3] "Sub-subfolder/ file (1).txt "
[4] "Sub-subfolder/ file - Copy (2).txt "
[5] "Subfolder1/ fileB.txt "
[6] "Folder/ file.C.txt "
[7] "Folder/ file-D.txt "
斜线“/”和额外的空格我可以处理,但是“复制”符号、数字名称和文件扩展名并没有像我期望的那样分开。
关于如何识别“复制”符号、编号指定和文件扩展名的任何建议?或者我应该只在一行代码中识别父文件夹,并将其余的放在另一行中?
(最终,我会将这些文本字符串转换为一个数据框,文件夹、文件名、副本名称和扩展名是单独的列。我很确定我可以用 来做到这一点tidyr::separate
,但即使这样也需要了解正则表达式,我想学习如何使用()
和返回引用。)
这可能有帮助:
library(tools)
as.data.frame(cbind(dirname(files_list), file_path_sans_ext(basename(files_list)), file_ext(files_list)))
# V1 V2 V3
#1 C:/User/Name/Folder/Subfolder1/Sub-subfolder file txt
#2 C:/User/Name/Folder/Subfolder1/Sub-subfolder file - Copy txt
#3 C:/User/Name/Folder/Subfolder1/Sub-subfolder file (1) txt
#4 C:/User/Name/Folder/Subfolder1/Sub-subfolder file - Copy (2) txt
#5 C:/User/Name/Folder/Subfolder1 fileB txt
#6 C:/User/Name/Folder file.C txt
#7 C:/User/Name/Folder file-D txt
#8 C:/User/Name/Folder file
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句