I want to join two words separated by an asterisk (*) in a list of French words. After joining these words I want to check if this word exists in a French dictionary. If so, the concatenated word should remain in the list, if not it should be appended to another list. I have used yield (I'm new to this function) in my code but there is something wrong with my nested if/else loop. Can anyone help me to accomplish my goal? My unsuccessful code is below:
words = ['Bien', '*', 'venue', 'pour', 'les','engage', '*', 'ment','trop', 'de', 'YIELD', 'peut','être','contre', '*', 'productif' ]
with open ('Fr-dictionary.txt') as fr:
dic = word_tokenize(fr.read().lower())
l=[ ]
def join_asterisk(ary):
i, size = 0, len(ary)
while i < size-2:
if ary[i+1] == '*':
if ary[i] + ary[i+2] in dic:
yield ary[i] + ary[i+2]
i+=2
else: yield ary[i]
i+=1
l.append(ary[i] + ary[i+2])
if i < size:
yield ary[i]
print(list(join_asterisk(words)))
Generators are perfect for this use case, the way you can think about a generator is as a function that will give you the yielded values one by one instead of all at once (as return does). In other word, you can see it as a list that is not in memory, a list for which you'll get the next element only when asked for it. Also remark that generators are just one way of building iterators.
What that mean in your case is that you don't have to build a list l
to keep track of the correct word as the generator join_asterisk
will yield the correct words for you. What you need to do is to iterate over all the values that this generator will yield. That's exactly what list(generator)
will do, it will build a list by iterating over all values of your generator.
In the end the code would look like this:
# That look better to me (just in case you change it later)
word_separator = '*'
words = ['Bien', word_separator, 'venue', 'pour', 'les','engage', word_separator, 'ment','trop', 'de', 'YIELD', 'peut', word_separator, "tard"]
# Fake dictionary
dic = {"Bienvenue", "pour", "les", "engagement", "trop", "de", "peut", "peut-être"}
def join_asterisk(ary):
for w1, w2, w3 in zip(words, words[1:], words[2:]):
if w2 == word_separator:
word = w1 + w3
yield (word, word in dic)
elif w1 != word_separator and w1 in dic:
yield (w1, True)
correct_words = []
incorrect_words = []
for word, is_correct in join_asterisk(words):
if is_correct:
correct_words.append(word)
else:
incorrect_words.append(word)
print(correct_words)
print(incorrect_words)
This outputs
['Bienvenue', 'pour', 'les', 'engagement', 'trop', 'de']
['peuttard']
Also note that you can make use of list comprehension instead of using a for loop to fill the two lists:
correct_words = [w for w, correct in join_asterisk(words) if correct]
incorrect_words = [w for w, correct in join_asterisk(words) if not correct]
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加