I have data for each month for a year on insured people. All variables are dummy variables and I need to create a new variable that shows when a person became uninsured. I am calling the variable duration. My dataset (df) looks something like this:
ID Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
101 1 1 1 1 0 0 1 1 1 1 1 1
102 1 1 1 1 0 0 0 0 0 0 0 0
103 1 1 1 1 1 1 1 1 1 1 1 1
104 1 1 1 1 0 1 1 0 1 1 1 1
In the dataset, 1 is insured and 0 is uninsured.My new variable would have the have the col position for when the person changed from 1 to 0. For instance in the first row, my variable duration would have the value 5 for may. I am only insterested in the first instance of 0. For example, in row 4, i only need 5 for may and can ignore august. Also, if the person does not become uninsured like in the case of 103, the new variable would just have the value "0".
I began by using ifelse statement below but it would take me a lot of time to keep repeating it. if you have an easier solution for this, please share. Thanks!
df$duration=ifelse(df$feb==1,0,2)
There are more efficient alternatives, but maybe this is sufficient:
apply(DF[,-1], 1, function(x) which(x==0)[1])
#[1] 5 5 NA 5
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments