How to programmatically subset dates in R based on previous dates?

debugcn Published at Dev

James Picerno

I'm trying to write a function in R to programmatically select a set of dates, with each iteration relying on the previous date selection. The challenge I can't solve is how to systematically analyze a dataset, select a date at each stage of the analysis, and then use that date as a starting point to select the next date. It's trivial to do so for each new iteration, one at a time. The question is how to write a function that will automatically stop when there are no more dates left in the dataset that meet the criteria? I know there's a solution, possibly using a for() and/or while() loop, possibly with a break() command. But so far I can't find the answer. Any help would be appreciated. As a trivial example of the process I'm trying to solve:

 # create fake data for 12 months with dates
library("xts")
 set.seed(67)
 dat <-xts(rnorm(12)+100,seq(as.Date("2001/1/1"), as.Date("2001/12/1"),    "1 months"))

Review the raw data:

 dat
                 [,1]
 2001-01-01 101.21940
 2001-02-01  99.87560
 2001-03-01  99.04250
 2001-04-01  99.92083
 2001-05-01  98.85659
 2001-06-01  98.94281
 2001-07-01  99.61547
 2001-08-01 100.60834
 2001-09-01 101.67247
 2001-10-01  98.46271
 2001-11-01  98.62171
 2001-12-01 100.49543

Next, create the first function to select the first date, which in this case is simply to select the second date entry:

f.1 <-function(x) { 
a <-as.Date(index(dat[2]))
 }

And create the second function, which looks at the dates beyond the first date and selects those dates >101.

 f.2 <-function(x,y) { # x=dat, y=previous foo.date
   a <-x[paste0(y+1, "/")]
   b <-as.Date(index(a[a >101]))
 }

Finally, run the functions and collect the dates...

 foo.date.1 <-f.1(dat)
 foo.date.2 <-f.2(dat,foo.date.1)
 foo.date.3 <-f.2(dat,foo.date.2)

And aggregate the output of the 3 foo.date files:

 dat.all <-c(foo.date.1, foo.date.2, foo.date.3)
 dat.all
 [1] "2001-02-01" "2001-09-01"

Note that the last date selected is foo.date.2. The third attempt -- as per foo.date.3 -- doesn't execute because there are no dates with values above 101 after 2001-09-01. For a dataset with thousands or even tens of thousands of dates, however, it's highly inefficient to find the exact set of dates that match the criteria. Any ideas on how to programmatically find a solution? In the above example, the solution via a function would a) discover that only 2 dates match the criteria and therefore the function would end after the second attempt and not attempt to search a third time; and b) aggregate the relevant dates in one file output.

Thanks in advance for any answers!

Joshua Ulrich

If I understand correctly, you want to find the index value of the observation that follows every observation > 101.

A simple and efficient solution is to lag your series first, then simply select all the index values for observations that are > 101.

datlag <- lag(dat)
index(datlag[datlag > 101])
# [1] "2001-02-01" "2001-10-01"

Based on this comment:

[T]he "criteria" (goal) is to identify the date(s) when the weights in an investment portfolio deviate from the target weights by x% for a given return series. This is easy to do for each date, one at a time. The first function identifies the first date; the second function does the same with the distinction of using the previous date. The second function may be repeated, depending on the # of rebal dates beyond the first one.

The problem seems to be truly recursive, which is a good reason to use a loop (though you still need to be careful about growing objects inside the loop).

In this case, you periodically reset your portfolio weights back to the target. That means you must re-calculate all future portfolio balances.

Here's an example with 2 assets.

# asset return data
set.seed(67)
dat <- xts(matrix(rnorm(24, 0, 0.02),12,2),
           seq(as.Date("2001/1/1"), as.Date("2001/12/1"), "1 months"))

# constraints
target_weights <- c(0.5, 0.5)
tol <- 0.01                # each asset must be +/-1% of its target
rebal_dates <- start(dat)  # assume allocation on first observation

# loop until break
while (last(rebal_dates) < end(dat)) {
  # date range, starting from period after last rebalance date
  date_range <- paste0(last(rebal_dates) + 1, "/")
  # portfolio balance over date range
  bal <- cumprod(1 + dat[date_range,])
  # portfolio weights
  wts <- bal / rowSums(bal)
  # deviations from target portfolio
  dev <- abs(wts - rep(target_weights, nrow(wts))) > tol
  # next rebalance date
  next_rebal <- which(rowSums(dev) > 0)
  # break the loop if there are no more rebalance dates
  if (length(next_rebal) == 0)
    break
  # append rebalance date to our vector
  # (yes, this is growing an object, but it's small and not very frequent)
  rebal_dates <- c(rebal_dates, index(dev)[next_rebal[1]])
}
rebal_dates
# [1] "2001-01-01" "2001-06-01" "2001-09-01" "2001-10-01" "2001-11-01"

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-07-26

Comments

0 comments

From Dev

Subset the sum and average of different columns based on the range of dates(may be duplicated dates) that can be combined into a month range

From Dev

Related Related

Article