R：使用XML从html表获取数据帧时遇到麻烦

user3623888 发表于 Dev

用户名

我正在尝试从http://www.boxofficemojo.com/weekend/chart/?view=&yr=2015&wknd=09&p=.htm上的表中获取数据到数据框中。这是我正在使用的代码：

library(XML)
data <- readHTMLTable('http://www.boxofficemojo.com/weekend/chart/?view=&yr=2015&wknd=09&p=.htm')

对XML库不是很熟悉，但是我不确定如何从中获取数据。它包含在“数据”中，但确实很丑陋，我不知道该如何使用。有什么建议吗？

卑鄙的

在浏览器的页面中，您可以在html表中看到46行和12列。检查结果（您的data）str是否包含类似内容：

> str(data, max.level = 1)
List of 5
 $ NULL:'data.frame':   0 obs. of  0 variables
 $ NULL: NULL
 $ NULL: NULL
 $ NULL:'data.frame':   49 obs. of  12 variables:
 $ NULL:'data.frame':   46 obs. of  12 variables:

最后一张表（第5号）看起来像您的目标。那么您的表是：

my_table <- data[[5]]

您可以直接使用which参数指定表号：

my_table <- readHTMLTable('the url', which = 5)

一些行和列：

> head(my_table[,3:6])
                                        V3    V4          V5     V6
1                             Focus (2015)    WB $19,100,000      -
2             Kingsman: The Secret Service   Fox $11,750,000 -36.0%
3 The SpongeBob Movie: Sponge Out of Water  Par. $11,200,000 -32.4%
4                     Fifty Shades of Grey  Uni. $10,927,000 -50.9%
5                       The Lazarus Effect Rela. $10,600,000      -
6                           McFarland, USA    BV  $7,797,000 -29.3%

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。