How to Calculate eCDF Mean in MatchIt() R

Jasmine Helen

I've been exploring MatchIt() package in R, and wondering how to calculate eCDF Mean in this package. I have used data lalonde from this package, and running the matchit package

library("MatchIt")
data("lalonde")
m.out1 <- matchit(treat ~ age + educ + race + married + 
                   nodegree + re74 + re75, data = lalonde,
                 method = "nearest", distance = "glm")

And the summary output of the matchit is

Call:
matchit(formula = treat ~ age + educ + race + married + nodegree + 
    re74 + re75, data = lalonde, method = "nearest", distance = "glm")

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
distance          0.5774        0.1822          1.7941     0.9211    0.3774   0.6444
age              25.8162       28.0303         -0.3094     0.4400    0.0813   0.1577
educ             10.3459       10.2354          0.0550     0.4959    0.0347   0.1114
raceblack         0.8432        0.2028          1.7615          .    0.6404   0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827   0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577   0.5577
married           0.1892        0.5128         -0.8263          .    0.3236   0.3236
nodegree          0.7081        0.5967          0.2450          .    0.1114   0.1114
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248   0.4470
re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342   0.2876

From the vignette("assesing-balance"), the average distance between the eCDFs of the covariate across the groups is eCDF Mean. So, I've been trying to calculate the eCDF Mean manually. For example for Age covariates.

First, I separate 2 data, "people1" for data treated, and "people2" for data untreated. And then I create the eCDF for age treated (A) and age untreated (B)

#AGE
people1$age
people=na.omit(people1$age)
age1=ecdf(as.numeric(people))
people2$age
people2=na.omit(people2$age)
age2=ecdf(as.numeric(people2))

as.list(environment(age1))
A=as.data.frame(cbind(as.list(environment(age1))$x, as.list(environment(age1))$y));A
as.list(environment(age2))
B=as.data.frame(cbind(as.list(environment(age2))$x, as.list(environment(age2))$y));B

The C matrix below is eCDF of Treated (A) and Untreated (B).

C=merge(A,B,by="V1",all=TRUE);C
C=na.omit(C) #for delete the row with NA value 
D=abs(C$V2.x-C$V2.y);summary(D)

And D is difference between eCDF treated (treat=1) and untreated (treat=0), but the result of the mean is:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.01850 0.06193 0.08809 0.09113 0.11888 0.15773

As you can see the Max of Difference eCDF is same with the output of the MatchIt(), but the Mean of Difference eCDF is not same. Can anybody solve the problem? Or know how to calculate the eCDF Mean? Thank you!

Noah

This is some of the most convoluted code I've ever seen. I'll simplify things and show you how the statistic is calculated. That said, this statistic has not been well studied and is part of the output primarily for historical reasons. Use eCDF Max (the Kolmogorov-Smirnov statistics) instead.

Step 1: get the eCDFs (which are functions, not vectors) from the treated and control units

ecdf1 <- ecdf(lalonde$age[lalonde$treat == 1])
ecdf0 <- ecdf(lalonde$age[lalonde$treat == 0])

What these functions do is take a value of the variable (age) and return the cumulative density up to each value.

Step 2: evaluate the eCDFs at each unique value of age

The reason we have to use unique values is that the eCDF already accounts for the duplicate values by creating a step in the function.

cum.dens1 <- ecdf1(unique(lalonde$age))
cum.dens0 <- ecdf0(unique(lalonde$age))

Step 3: compute the mean and maximum values of the absolute difference

ecdf.diffs <- abs(cum.dens1 - cum.dens0)
mean(ecdf.diffs)
# [1] 0.08133907
max(ecdf.diffs)
# [1] 0.157727

We can see we get the right answers.

The actual code MatchIt uses is a bit less transparent but it makes it run much faster.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

how to calculate the mean with conditions?

From Dev

How to calculate mean values from a linear model in R?

From Dev

in R, how to calculate mean of all column, by group?

From Dev

How to smooth ecdf plots in r

From Dev

How do I specify sample size with matchit() in R?

From Dev

How to calculate the mean of some elements from a vector in R?

From Dev

R - calculate probability and flip x/y axis of cumulative curve (ECDF)

From Dev

How to calculate mean of two timestamp columns in R?

From Dev

How to calculate the mean by index of N elements in an array in R

From Dev

How can I calculate confidence interval for a mean in R not using confint

From Dev

how to regroup, calculate the mean and generate new dataframe in R?

From Dev

How do I calculate the mean for the data set in R Studio?

From Dev

How to calculate mean for all pairwise combinations in R

From Dev

How to calculate mean and Sd for multiple data frames in R

From Dev

How to calculate mean in python?

From Dev

How To Calculate Mean In R (Data Structure)

From Dev

R: calculate the row mean in a matrix

From Dev

How to calculate mean in a dataframe?

From Dev

R: How to make sapply calculate mean over rows?

From Dev

R:how to calculate poblational standard deviation and mean

From Dev

How to calculate mean points for elements in a List in R

From Dev

Dart How to calculate mean?

From Dev

R - Calculate mean of rasterbrick

From Dev

How to calculate mean by row for multiple groups using dplyr in R?

From Dev

How to rank based on ecdf in r?

From Dev

R calculate how many values used to calculate mean in aggregate function

From Dev

How to calculate mean value in R summarize statement based on a condition?

From Dev

In R, how to calculate mean per column and row respectively?

From Dev

How to calculate mean of only few columns of a text file in R?

Related Related

  1. 1

    how to calculate the mean with conditions?

  2. 2

    How to calculate mean values from a linear model in R?

  3. 3

    in R, how to calculate mean of all column, by group?

  4. 4

    How to smooth ecdf plots in r

  5. 5

    How do I specify sample size with matchit() in R?

  6. 6

    How to calculate the mean of some elements from a vector in R?

  7. 7

    R - calculate probability and flip x/y axis of cumulative curve (ECDF)

  8. 8

    How to calculate mean of two timestamp columns in R?

  9. 9

    How to calculate the mean by index of N elements in an array in R

  10. 10

    How can I calculate confidence interval for a mean in R not using confint

  11. 11

    how to regroup, calculate the mean and generate new dataframe in R?

  12. 12

    How do I calculate the mean for the data set in R Studio?

  13. 13

    How to calculate mean for all pairwise combinations in R

  14. 14

    How to calculate mean and Sd for multiple data frames in R

  15. 15

    How to calculate mean in python?

  16. 16

    How To Calculate Mean In R (Data Structure)

  17. 17

    R: calculate the row mean in a matrix

  18. 18

    How to calculate mean in a dataframe?

  19. 19

    R: How to make sapply calculate mean over rows?

  20. 20

    R:how to calculate poblational standard deviation and mean

  21. 21

    How to calculate mean points for elements in a List in R

  22. 22

    Dart How to calculate mean?

  23. 23

    R - Calculate mean of rasterbrick

  24. 24

    How to calculate mean by row for multiple groups using dplyr in R?

  25. 25

    How to rank based on ecdf in r?

  26. 26

    R calculate how many values used to calculate mean in aggregate function

  27. 27

    How to calculate mean value in R summarize statement based on a condition?

  28. 28

    In R, how to calculate mean per column and row respectively?

  29. 29

    How to calculate mean of only few columns of a text file in R?

HotTag

Archive