我已经看到了几个有关拐点计算的问题。我仍然不确定我是否做对了。根据实验室确认的当前流行病中心累积病例数据,我们试图确定拐点。我使用了inflection
包装,并将拐点计算为“ 2020年2月8日”。我还尝试计算第一条指令和第二条指令,以估计每个指令的增加和变化率。我对它的数学知识很少,但是只是跟随不同SO帖子中的示例。我的问题:以下图表的结果是否一致?如果没有,如何改善我的代码。
df<-structure(list(date = structure(c(18277, 18278, 18279, 18280,
18281, 18282, 18283, 18284, 18285, 18286, 18287, 18288, 18289,
18290, 18291, 18292, 18293, 18294, 18295, 18296, 18297, 18298,
18299, 18300, 18301, 18302, 18303, 18304, 18305, 18306, 18307),
class = "Date"),
cases = c(45, 62, 121, 198, 258, 363, 425,
495, 572, 618, 698, 1590, 1905, 2261, 2639, 3125, 4109, 5142,
6384, 8351, 10117, 11618, 13603, 14982, 16903, 18454, 19558,
20630, 21960, 22961, 23621)),
class = "data.frame", row.names = c(NA, -31L))
xlb_0<- structure(c(18281, 18285, 18289, 18293,
18297, 18301, 18305,
18309), class = "Date")
library(tidyverse)
# Smooth cumulative cases over time
df$x = as.numeric(df$date)
fit_1<- loess(cases ~ x, span = 1/3, data = df)
df$case_sm <-fit_1$fitted
# use inflection to obtain inflection point
library(inflection)
guai_0 <- check_curve(df$x, df$case_sm)
check_curve(df$x, df$cases)
#> $ctype
#> [1] "convex_concave"
#>
#> $index
#> [1] 0
guai_1 <- bese(df$x, df$cases, guai_0$index)
structure(guai_1$iplast, class = "Date")
#> [1] "2020-02-08"
# Plot cumulativew numbers of cases
df %>%
ggplot(aes(x = date, y = cases ))+
geom_line(aes(y = case_sm), color = "red") +
geom_point() +
geom_vline(xintercept = guai_1$iplast) +
labs(y = "Cumulative lab confirmed infections")
# Daily new cases (first derivative) and changing rate (second derivative)
df$dt1 = c(0, diff(df$case_sm)/diff(df$x))
fit_2<- loess(dt1 ~ x, span = 1/3, data = df)
df$change_sm <-fit_2$fitted
df$dt2 <- c(NA, diff(df$change_sm)/diff(df$x))
df %>%
ggplot(aes(x = date, y = dt1))+
geom_line(aes(y = dt1,
color = "Estimated number of new cases")) +
geom_point(aes(y = dt2*2, color = "Changing rate")) +
geom_line(aes(y = dt2*2, color = "Changing rate"))+
geom_vline(xintercept = guai_1$iplast) +
labs(y = "Estimatede number of new cases") +
scale_x_date(breaks = xlb_0,
date_labels = "%b%d") +
theme(legend.title = element_blank())
#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 row(s) containing missing values (geom_path).
Created on 2020-02-17 by the reprex package (v0.3.0)
I was gonna write a comment, but I was pushing the character limit.
I am not familiar with the inflection
package so I am not one to judge if the 2020-02-08
is the true inflection. However, I will say this is difficult to answer with R because R is not necessarily good at calculating derivatives. If you had an estimated line equation - then you could potentially use this to plot first and second derivatives. Calculating rough delta's by doing the difference in (Y_n+1-Y_n)/(X_n+1-X_n)
is never optimal because a derivate in theory is the delta of two points infinitesimally close to each other. You fundamentally cannot get a great estimate of the derivative. You can even see this because you are forced to shift this estimate to either n
or n+1
. Furthermore, you would expect the inflection point of x_0
to be a local min/max in the first derivative and equal to zero in the second derivative. So I don't think your second plot helps. But this could just be due to the delta's calculated.
What I would do is first fit your data to some type of model. In this example I'm going to use the package dr4pl
to model your data to the 4 parameter logistic model. Since the function of the 4 parameter model is well known, I can write what the first and second derivative functions should be, then plot those values using stat_function
in the ggplot2
package.
library(ggplot2)
library(dr4pl)
df<-structure(list(date = structure(c(18277, 18278, 18279, 18280,
18281, 18282, 18283, 18284, 18285, 18286, 18287, 18288, 18289,
18290, 18291, 18292, 18293, 18294, 18295, 18296, 18297, 18298,
18299, 18300, 18301, 18302, 18303, 18304, 18305, 18306, 18307),
class = "Date"),
cases = c(45, 62, 121, 198, 258, 363, 425,
495, 572, 618, 698, 1590, 1905, 2261, 2639, 3125, 4109, 5142,
6384, 8351, 10117, 11618, 13603, 14982, 16903, 18454, 19558,
20630, 21960, 22961, 23621)),
class = "data.frame", row.names = c(NA, -31L))
xlb_0<- structure(c(18281, 18285, 18289, 18293,
18297, 18301, 18305,
18309), class = "Date")
df$dat_as_num <- as.numeric(df$date)
dr4pl_obj <- dr4pl(cases~dat_as_num, data = df, init.parm = c(30000, 18300, 2, 0))
#first derivative derivation
d1_dr4pl <- function(x, theta, scale = F){
if (any(is.na(theta))) {
stop("One of the parameter values is NA.")
}
if (theta[2] <= 0) {
stop("An IC50 estimate should always be positive.")
}
f <- -theta[3]*((theta[4]-theta[1])/((1+(x/theta[2])^theta[3])^2))*((x/theta[2])^(theta[3]-1))
if(scale) {
f <- scales::rescale(x = f, to = c(theta[4],theta[1]))
}
return(f)
}
#Second derivative derivation
d2_dr4pl <- function(x, theta, scale = F){
if (any(is.na(theta))) {
stop("One of the parameter values is NA.")
}
if (theta[2] <= 0) {
stop("An IC50 estimate should always be positive.")
}
f <- 2*((theta[3]*(x/theta[2])^(theta[3]-1))^2)*((theta[4]-theta[1])/((1+(x/theta[2])^(theta[3]))^3))-theta[3]*(theta[3]-1)*((x/theta[2])^(theta[3]-2))*((theta[4]-theta[1])/((1+(x/theta[2])^theta[3])^2))
if(scale) {
f <- scales::rescale(x = f, to = c(theta[4],theta[1]))
f <- f - f[1]
}
return(f)
}
ggplot(df, aes(x = dat_as_num)) +
geom_hline(yintercept = 0) +
[![enter image description here][1]][1]geom_point(aes(y = cases), color = "grey", alpha = .6, size = 5) +
stat_function(fun = d1_dr4pl, args = list(theta = dr4pl_obj$parameters, scale = T), color = "red") +
stat_function(fun = d2_dr4pl, args = list(theta = dr4pl_obj$parameters, scale = T), color = "blue") +
stat_function(fun = dr4pl::MeanResponse, args = list(theta = dr4pl_obj$parameters), color = "gold") +
geom_vline(xintercept = dr4pl_obj$parameters[2], linetype = "dotted") +
theme_classic()
As you can see, the inflection point, which is the IC50 value (theta 2) of the 4 parameter logistic model, lines up well when we approach it this way.
summary(dr4pl_obj)
#$call
#dr4pl.formula(formula = cases ~ dat_as_num, data = df, init.parm = c(30000, 18300, 2, 0))
#
#$coefficients
# Estimate StdErr 2.5 % 97.5 %
#Upper limit 25750.61451 4.301008e-05 25750.59681 25750.63221
#Log10(IC50) 18298.75347 4.301008e-09 18298.67889 18298.82806
#Slope 5154.35449 4.301008e-05 5154.33678 5154.37219
#Lower limit 58.48732 4.301008e-05 58.46962 58.50503
#
#attr(,"class")
#[1] "summary.dr4pl"
Furthermore, using dr4pl
, it says the IC50 value is roughly 18298.8
, which is late 2020-02-06
. Not too far off from the inflection
value. I'm sure there may be a better model to use than the 4pl, but it was just the one that I knew I could write first and second derivatives for the purposes of answering this question.
I'm sure other coding languages are more specialized when it comes to derivatives, and can even calculate them for you so long as you start with an initial function. I think one of these languages is mathematica.
A disclaimer, I ended up scaling the first and second derivatives so that they could be plotted together. Their actual values are much larger than shown here.
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句