我正在做一个项目,以查找必须重新设计服务的次数,但只能在原始服务结束的前30天内进行。使用以下样本数据集:
work <- tribble(
~Client , ~Initial_Date_In, ~Initial_Date_Out,
'Car1', '2019-01-01', '2019-01-02',
'Car1', '2019-06-01', '2019-06-03',
'Car2', '2019-01-01', '2019-01-02',
'Truck1', '2019-06-01', '2019-06-02',
'Truck2', '2019-04-01', '2019-04-02',
'Truck3', '2019-08-01', '2019-08-04',
'Van1', '2019-01-01', '2019-01-02',
'Van2', '2019-06-01', '2019-06-02',
'Truck4', '2019-10-05', '2019-10-09',
)
rework <- tribble(
~Client , ~Date_In, ~Date_Out,
'Car1', '2019-01-05', '2019-01-07',
'Car1', '2019-01-09', '2019-01-10',
'Car1', '2019-06-09', '2019-06-10',
'Truck3', '2019-08-07', '2019-08-08',
'Van1', '2019-03-01', '2019-03-01',
'Van2', '2019-06-09', '2019-06-11',
'Truck4', '2019-11-01', '2019-11-02',
)
work$Initial_Date_In <- lubridate::as_date(work$Initial_Date_In)
work$Initial_Date_Out <- lubridate::as_date(work$Initial_Date_Out)
rework$Date_In <- lubridate::as_date(rework$Date_In)
rework$Date_Out <- lubridate::as_date(rework$Date_Out)
我尝试了多种方法,例如:
work %>%
left_join(select(rework, Client,Date_Out), by = ("Client" = "Client")) %>%
group_by(Client) %>%
filter(max(Date_Out) >Initial_Date_Out & max(Date_Out) < (Initial_Date_Out+duration(30,"days")))
这段代码似乎拉出了最大的Initial_Date_In(Out),但我希望从工作表中的所有内容中获得适当的返工(在返工表的最大Date_Out上显示)
预期的输出将类似于:
Client Initial_Date_In Initial_Date_Out Date_Out #_Of_Rework_Visits
Car1 2019-01-01 2019-01-02 2019-01-10 2
Car1 2019-06-01 2019-06-03 2019-06-10 1
Truck3 2019-08-01 2019-08-04 2019-08-08 1
Van1 2019-06-01 2019-06-02 2019-06-11 1
Truck4 2019-10-05 2019-10-09 2019-11-02 1
关于我在做什么错的任何提示?
更新
这是避免进行任何形式的正式加入的一种方法。与和一起
堆叠,然后使用一列(给定内服务应用程序的索引)作为分组/汇总变量:work
rework
bind_rows
service_ix
Client
work %>%
mutate(service = "original") %>%
rename(Date_In = "Initial_Date_In", Date_Out = "Initial_Date_Out") %>%
group_by(Client) %>%
mutate(service_ix = row_number()) %>%
bind_rows(rework %>%
mutate(service = "rework") %>%
select(Client, Date_In, Date_Out, service)) %>%
arrange(Client, Date_In) %>%
fill(service_ix) %>%
group_by(Client, service_ix) %>%
summarise(initial_date_in = Date_In[service == "original"],
initial_date_out = Date_Out[service == "original"],
max_date_out = max(Date_Out),
n_reworks = sum(service == "rework")) %>%
filter(n_reworks > 0) %>%
ungroup()
开箱有点:
service
列,指示初始与返工。这是之后的组合df bind_rows()
:# A tibble: 16 x 5
# Groups: Client [8]
Client Date_In Date_Out service service_ix
<chr> <date> <date> <chr> <int>
1 Car1 2019-01-01 2019-01-02 original 1
2 Car1 2019-01-05 2019-01-07 rework NA
3 Car1 2019-01-09 2019-01-10 rework NA
4 Car1 2019-06-01 2019-06-03 original 2
5 Car1 2019-06-09 2019-06-10 rework NA
6 Car2 2019-01-01 2019-01-02 original 1
7 Truck1 2019-06-01 2019-06-02 original 1
8 Truck2 2019-04-01 2019-04-02 original 1
9 Truck3 2019-08-01 2019-08-04 original 1
10 Truck3 2019-08-07 2019-08-08 rework NA
11 Truck4 2019-10-05 2019-10-09 original 1
12 Truck4 2019-11-01 2019-11-02 rework NA
13 Van1 2019-01-01 2019-01-02 original 1
14 Van1 2019-03-01 2019-03-01 rework NA
15 Van2 2019-06-01 2019-06-02 original 1
16 Van2 2019-06-09 2019-06-11 rework NA
fill
步骤替换service_ix
NA值: Client Date_In Date_Out service service_ix
<chr> <date> <date> <chr> <int>
1 Car1 2019-01-01 2019-01-02 original 1
2 Car1 2019-01-05 2019-01-07 rework 1
3 Car1 2019-01-09 2019-01-10 rework 1
4 Car1 2019-06-01 2019-06-03 original 2
5 Car1 2019-06-09 2019-06-10 rework 2
Client
和分组service_ix
,并根据需要进行汇总。# A tibble: 6 x 6
Client service_ix initial_date_in initial_date_out max_date_out n_reworks
<chr> <int> <date> <date> <date> <int>
1 Car1 1 2019-01-01 2019-01-02 2019-01-10 2
2 Car1 2 2019-06-01 2019-06-03 2019-06-10 1
3 Truck3 1 2019-08-01 2019-08-04 2019-08-08 1
4 Truck4 1 2019-10-05 2019-10-09 2019-11-02 1
5 Van1 1 2019-01-01 2019-01-02 2019-03-01 1
6 Van2 1 2019-06-01 2019-06-02 2019-06-11 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句