我已经成功地使用XML包抓取了多个网站,但是在此特定页面上创建数据框时遇到了麻烦:
library(XML)
url <- paste("http://www.foxsports.com/nfl/injuries?season=2013&seasonType=1&week=1", sep = "")
df1 <- readHTMLTable(url)
print(df1)
> print(df1)
$`NULL`
NULL
$`NULL`
NULL
$`NULL`
Player Pos Injury Game Status
1 Dickson, Ed TE thigh Probable
2 Jensen, Ryan C foot Doubtful
3 Jones, Arthur DE illness Out
4 McPhee, Pernell LB knee Probable
5 Pitta, Dennis TE dislocated hip Injured Reserve (DFR)
6 Thompson, Deonte WR foot Doubtful
7 Williams, Brandon DT toe Doubtful
$`NULL`
Player Pos Injury Game Status
1 Anderson, C.J. RB knee Out
2 Ayers, Robert DE Achilles Probable
3 Bailey, Champ CB foot Out
4 Clady, Ryan T shoulder Probable
5 Dreessen, Joel TE knee Out
6 Kuper, Chris G ankle Doubtful
7 Osweiler, Brock QB left shoulder Probable
8 Welker, Wes WR ankle Probable
$`NULL`
etc
如果我尝试强制执行此操作,则会收到此错误:
> df1 <- data.frame(readHTMLTable(url))
Error in data.frame(`NULL` = NULL, `NULL` = NULL, `NULL` = list(Player = 1:7, :
arguments imply differing number of rows: 0, 7, 8, 6, 9, 1, 11, 4, 12, 5, 21, 3, 2, 15
我想要所有球队的所有受伤数据(球员,POS,伤害,比赛状态)。
提前致谢。
您只需要删除NULL元素和具有1列列出“没有受伤报告”的表,然后使用do.call rbind
n<-sapply(df1, function(x) !is.null(x) && ncol(x)==4)
x <- do.call("rbind", df1[n])
rownames(x)<-NULL
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句