I am new to R and web scraping. For practice purposes I am trying to scrape information from a fake book website. So far I have managed to scrape the book titles, find the mean length of each word in the book titles, find the most used word and also find the most used word excluding stopwords. However, I am now trying to find how many times a specific word occurs. For example, how many times the word 'me' appears in the book titles, yet I am not sure how to isolate a specific word.
Code so far:
url<-'http://books.toscrape.com/index.html'
url %>%
read_html() %>%
html_nodes('h3 a') %>%
html_attr('title')->titles
titles
values<-lapply(titles,nchar)
mean(unlist(values))
mostUsedWord<-head(sort(table(tolower(unlist(strsplit(titles, '\\s+')))), decreasing = TRUE))[1]
mostUsedWord
all_words <- tolower(unlist(strsplit(titles, '\\s+')))
noStopWords <- head(sort(table(all_words[!all_words %in% tm::stopwords()]), decreasing = TRUE))[1:3]
noStopWords
I would like to find out how many times specific words occur, not just the most frequent words that are used.
Is this what you are after?
word='me'
data.frame(word = table(all_words)) %>% dplyr::filter(word.all_words == word)
word.all_words word.Freq
1 me 1
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments