使用lapply列出R中每列中空变量的百分比

Matthew Rittinghouse 发表于 Dev

马修·里廷豪斯

我得到了一个大的csv，它包含115列和1000行。列中包含各种数据，有些是基于字符的，有些是整数等。但是，数据中有很多不同类型的空变量（NA，-999，NULL等）。

我想做的是编写一个脚本，该脚本将生成一个列列表，其中列中超过30％的数据是某种类型的NULL。

为此，我编写了一个脚本来为我提供一列的空百分比（以十进制表示）。这个脚本对我来说很好用。

length(which(indata$ObservationYear == "" | is.na(indata$ObservationYear) |
indata$ObservationYear == "NA" | indata$ObservationYear == "-999" |
indata$ObservationYear == "0"))/nrow(indata)

我想编写一个脚本来对所有列执行此操作。我相信我需要使用lapply函数。

我尝试在此处执行此操作，但是，我似乎根本无法使该脚本正常工作：

Null_Counter <- lapply(indata, 2, length(x),
                   length(which(indata == "" | is.na(indata) | indata == "NA" | indata == "-999" | indata == "0")))
                   names(indata(which(0.3>=Null_Counter / nrow(indata))))

我收到以下错误：

Error in match.fun(FUN) : '2' is not a function, character or symbol

和：

Error: could not find function "indata"

理想情况下，我要给我的是所有列名称的向量列表，其中所有空变量（NA，-999、0，NULL）的百分比都超过30％。

有人可以帮忙吗？

yuanhangliu1

我相信您想使用apply而不是lapply将函数应用于列表。试试这个：

Null_Counter <- apply(indata, 2, function(x) length(which(x == "" | is.na(x) | x == "NA" | x == "-999" | x == "0"))/length(x))
Null_Name <- colnames(indata)[Null_Counter >= 0.3]

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。