我有一些看起来类似于以下数据:
# A tibble: 2,717 x 6
# Groups: date [60]
symbol date monthly.returns score totals score_rank
<chr> <date> <dbl> <dbl> <dbl> <int>
1 GIS 2010-01-29 0.0128 0.436 119. 2
2 GIS 2010-02-26 0.00982 0.205 120. 1
3 GIS 2010-03-31 -0.0169 0.549 51.1 3
4 GIS 2010-04-30 0.0123 0.860 28.0 4
5 GIS 2010-05-28 0.000984 0.888 91.6 4
6 GIS 2010-06-30 -0.00267 0.828 15.5 4
7 GIS 2010-07-30 -0.0297 0.482 81.7 2
8 GIS 2010-08-31 0.0573 0.408 57.2 3
9 GIS 2010-09-30 0.0105 0.887 93.3 4
10 GIS 2010-10-29 0.0357 0.111 96.6 1
# ... with 2,707 more rows
我有一个score_rank
,我想做的就是每当totals
列> 100时都以以下方式过滤数据:
1)当score_rank
= = 1时,根据该score
列取前5%的观察值
2)当score_rank
= 2或3时,随机抽取5%的观测值
3)当score_rank
= = 4时,以该列为基础,乘以观察值的5%score
。
数据:
tickers <- c("GIS", "KR", "MKC", "SJM", "EL", "HRL", "HSY", "K",
"KMB", "MDLZ", "MNST", "PEP", "PG", "PM", "SYY", "TAP", "TSN", "WBA", "WMT",
"MMM", "ABMD", "ACN", "AMD", "AES", "AON", "ANTM", "APA", "CSCO", "CMS", "KO", "GRMN", "GPS",
"JEC", "SJM", "JPM", "JNPR", "KSU", "KEYS", "KIM", "NBL", "NEM", "NWL", "NFLX", "NEE", "NOC", "TMO", "TXN", "TWTR")
library(tidyquant)
data <- tq_get(tickers,
get = "stock.prices", # Collect the stock price data from 2010 - 2015
from = "2010-01-01",
to = "2015-01-01") %>%
group_by(symbol) %>%
tq_transmute(select = adjusted, # Convert the data from daily prices to monthly prices
mutate_fun = periodReturn,
period = "monthly",
type = "arithmetic")
data$score <- runif(nrow(data), min = 0, max = 1)
data$totals <- runif(nrow(data), min = 10, max = 150)
data <- data %>%
group_by(date) %>%
mutate(
score_rank = ntile(score, 4)
)
编辑:添加了代码。
这是的一种选择filter
。创建list
的函数(fs
)为每个相应的“score_rank”,使用map2
到循环在list
功能和相应的“score_rank”list
的vector
S,filter
“数据”,其中“总计”是大于100,和“score_rank”%in%
的从输入map2
向量,将“得分”列上的函数应用于filter
行样本,并将子集数据与filter
“总数”小于或等于100的数据ed绑定
library(purrr)
library(dplyr)
fs <- list(as_mapper(~ . >= quantile(., prob = 0.95)),
as_mapper(~ row_number() %in% sample(row_number(), round(0.05 * n()) )),
as_mapper(~ . <= quantile(., prob = 0.05))
)
map2_df(list(1, c(2, 3), 4), fs, ~
data %>%
filter(totals > 100, score_rank %in% .x) %>%
filter(.y(score))
)%>% bind_rows(data %>%
filter(totals <= 100))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句