I have a dataframe from tidytext that contains the individual words from some survey free-response comments. It has just shy of 500,000 rows. Being free-response data, it is riddled with typos. Using textclean::replace_misspellings took care of almost 13,000 misspelled words, but there were still ~700 unique misspellings that I manually identified.
I now have a second table with two columns, the first is the misspelling and the second is the correction.
For instance
allComments <- data.frame("Number" = 1:5, "Word" = c("organization","orginization", "oragnization", "help", "hlp"))
misspellings <- data.frame("Wrong" = c("orginization", "oragnization", "hlp"), "Right" = c("organization", "organization", "help"))
How can I replace all the values of allComments$word
that match misspellings$wrong
with misspellings$right
?
I feel like this is probably pretty basic and my R ignorance is showing....
You can use match
to find the index for words from allComments$Word
in misspellings$Wrong
and then use this index to subset them.
tt <- match(allComments$Word, misspellings$Wrong)
allComments$Word[!is.na(tt)] <- misspellings$Right[tt[!is.na(tt)]]
allComments
# Number Word
#1 1 organization
#2 2 organization
#3 3 organization
#4 4 help
#5 5 help
In case the right word is not already in allComments$Word
cast it to a character
:
allComments$Word <- as.character(allComments$Word)
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加