My question is very similar to the following: How to get a Substring from list of file names. I'm a newb to Python and would prefer a similar solution for Python (or R). I'd like to look into a directory and extract a particular substring from each applicable file name and output it as a vector (preferred), list, or array. For example, assume I have directory with the following file names:
data_ABC_48P.txt
data_DEF_48P.txt
data_GHI_48P.txt
other_96.txt
another_98.txt
I would like to reference the directory and extract the following as a character vector (for use in R) or list:
"ABC", "DEF", "GHI"
I tried the following:
from os import listdir
from os.path import isfile, join
files = [ f for f in listdir(path) if isfile(join(path,f)) ]
import re
m = re.search('data_(.+?)_48P', files)
But I get the following error:
TypeError: expected string or buffer
files
is of type
list
In [10]: type(files)
Out[10]: list
Even though I ultimately want this character vector as an input to R code, we are trying to transition all of our "scripting" to Python and use R solely for data analysis, so a Python solution would be great. I'm also using Ubuntu, so a cmd line or bash script solution could work as well. Thanks in advance!
Use List comprehension like,
[re.search(r'data_(.+?)_48P', i).group(1) for i in files if re.search(r'data_.+?_48P', i)]
You need to iterate over the list contents inorder to grab the substrings you want.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments