I have a list of strings: ls = ['a','b','c']
and another one, with larger strings, guaranteed to include one and only one strings from ls
: ls2 = ['1298a', 'eebbbd', 'qcqcq321']"
.
How can I find, for a given string from ls2
, what is the index of the corresponding string from ls
?
I can use:
for s in ls:
for ss in ls2:
if s in ss:
print (s,ss,ls.index(s))
a 1298a 0
b eebbbd 1
c qcqcq321 2
but it there something nicer?
EDIT (hope it clarifies):
The actual case I'm working on has a bigger 1st list, and a smaller 2nd:
ls = ['apo','b','c','d25','egg','f','g']
ls2 = ['apoip21', 'oiujohuid25']
and I want to get the result 0,3
because the 1st item in ls2
has the 1st item from ls
, while the 2nd in ls2
has the 4th in ls
It doesn't look like you can get away from O(m * n * p)
complexity (where m = len(ls)
, n = len(ls2)
, p = max(map(len, ls2))
) without further information about your data. You can definitely reduce your current loop from O(m2 * n * p)
by keeping track of the current index using enumerate
. Also, don't forget about early termination:
for string in ls2:
for index, key in enumerate(ls):
if key in string:
print(key, string, index)
break
Notice that I swapped the inner and outer loop to make the break
work properly: you definitely want to check each element of ls2
, but only the minimum number of elements in ls
.
Here are some timings I accumulated on the different O(m * n * p)
solutions presented here. Thanks to @thierry-lathuille for the test data:
ls = ['g', 'j', 'z', 'a', 'rr', 'ttt', 'b', 'c', 'd', 'f']
ls2 = ['1298a', 'eebbb', 'qcqcq321', 'mlkjmd', 'dùmlk', 'lof',
'erreee', 'bmw', 'ottt', 'jllll', 'lla' ]
def with_table():
table = {key: index for index, key in enumerate(ls)}
result = {}
for string in ls2:
for key in ls:
if key in string:
result[string] = table[key]
return result
def with_enumerate():
result = {}
for string in ls2:
for index, key in enumerate(ls):
if key in string:
result[string] = index
break
return result
def with_dict_comp():
return {string: index for string in ls2 for index, key in enumerate(ls) if key in string}
def with_itertools():
result = {}
for (index, key), string in itertools.product(enumerate(ls), ls2):
if key in string:
result[string] = index
return result
%timeit with_table()
4.89 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit with_enumerate()
5.27 µs ± 66.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit with_dict_comp()
6.9 µs ± 83.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit with_itertools()
17.5 ns ± 0.193 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
As it turns out, creating a lookup table for the indices is slightly faster than computing them on the fly with enumerate
.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments