我有一个如下所示的 data.frame:
# A tibble: 2,003 x 16
barcost barrulesplay barrulessch barrulesrelax barrulesinjury barriskskills barraincold barrainsick barrainmessy barraininjury barrainparentdis… barrainchilddis… barrainchildclo…
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 3 4 3 4 4 4 NA NA NA NA NA NA NA
2 2 5 5 5 3 5 NA NA NA NA NA NA NA
3 2 2 2 3 2 4 NA NA NA NA NA NA NA
4 2 4 4 4 2 4 NA NA NA NA NA NA NA
5 2 3 3 4 2 4 NA NA NA NA NA NA NA
6 2 4 4 4 3 4 NA NA NA NA NA NA NA
7 3 5 5 4 2 4 NA NA NA NA NA NA NA
8 4 5 5 4 4 3 NA NA NA NA NA NA NA
9 1 5 5 5 3 5 NA NA NA NA NA NA NA
10 2 4 4 4 3 4 NA NA NA NA NA NA NA
当我使用 hmisc 形式的“描述”函数时,我得到一个列表列表(如预期的那样):
describe(questions)
在这里,我可以看到我想要提取的数据,并且在此列表的“值”下的“频率”中绘图。
我将如何创建一个整洁的 data.frame,对于每一列都有 1、2、3 等的频率,这些频率位于上面“描述”函数的列表输出中?:
summary[["barcost"]][["values"]]
$value
[1] 1 2 3 4 5
$frequency
[1] 348 806 410 360 79
So a data.frame that has the column headers as variables (under a column names "questions" for example) and then (using the example of the "barcost" questions above) 348 1's, 806 2's etc all for the "barcost" question variable.
I am aware that I may be trying to do something very complex when there is a simpler way of achieving the same goal, so open to suggestions.
You can get frequencies by column more directly. gather
will convert the data to "long" format, which facilitates tabulation by group.
library(tidyverse)
freq = gather(questions) %>% group_by(key, value) %>% tally
Then you can graph the results, for example, like this:
ggplot(freq, aes(value, n)) +
geom_col() +
facet_wrap(~ key)
If we start with the output of describe
, you could do this:
freq = map_df(describe(questions), ~.x$values, .id="Column")
但是,describe
不会返回少于三个唯一值的列的频率,因此此方法将从结果freq
数据框中排除任何此类列。
更新:如果我理解你的评论,这里有一种基于值的比例着色的方法:
# Fake data
set.seed(2)
dat = replicate(10, sample(1:5, 50, replace=TRUE))
# Get frequencies and proportions
freq = dat %>% as.data.frame %>%
gather() %>%
group_by(key, value) %>%
tally %>%
mutate(pct=n/sum(n))
ggplot(freq, aes(value, n, fill=pct)) +
geom_col() +
facet_wrap(~ key, ncol=5) +
scale_fill_gradient(low="red", high="blue")
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句