使用拐点软件包查找COVID-19感染的转折点（拐点）

debugcn 发表于 Dev

Zhiqiang Wang

我已经看到了几个有关拐点计算的问题。我仍然不确定我是否做对了。根据实验室确认的当前流行病中心累积病例数据，我们试图确定拐点。我使用了inflection包装，并将拐点计算为“ 2020年2月8日”。我还尝试计算第一条指令和第二条指令，以估计每个指令的增加和变化率。我对它的数学知识很少，但是只是跟随不同SO帖子中的示例。我的问题：以下图表的结果是否一致？如果没有，如何改善我的代码。

df<-structure(list(date = structure(c(18277, 18278, 18279, 18280, 
18281, 18282, 18283, 18284, 18285, 18286, 18287, 18288, 18289, 
18290, 18291, 18292, 18293, 18294, 18295, 18296, 18297, 18298, 
18299, 18300, 18301, 18302, 18303, 18304, 18305, 18306, 18307), 
class = "Date"), 
cases = c(45, 62, 121, 198, 258, 363, 425, 
        495, 572, 618, 698, 1590, 1905, 2261, 2639, 3125, 4109, 5142, 
        6384, 8351, 10117, 11618, 13603, 14982, 16903, 18454, 19558, 
       20630, 21960, 22961, 23621)), 
class = "data.frame", row.names = c(NA, -31L))
xlb_0<- structure(c(18281, 18285, 18289, 18293, 
                    18297, 18301, 18305,   
                    18309), class = "Date")
library(tidyverse)
# Smooth cumulative cases over time 
df$x = as.numeric(df$date)
fit_1<- loess(cases ~ x, span = 1/3, data = df) 
df$case_sm <-fit_1$fitted 

# use inflection to obtain inflection point
library(inflection)
guai_0 <- check_curve(df$x, df$case_sm)
check_curve(df$x, df$cases)
#> $ctype
#> [1] "convex_concave"
#> 
#> $index
#> [1] 0
guai_1 <- bese(df$x, df$cases, guai_0$index)
structure(guai_1$iplast, class = "Date")
#> [1] "2020-02-08"

# Plot cumulativew numbers of cases 
df %>% 
  ggplot(aes(x = date, y = cases ))+
  geom_line(aes(y = case_sm), color = "red") +
  geom_point() + 
  geom_vline(xintercept = guai_1$iplast) + 
  labs(y = "Cumulative lab confirmed infections")

# Daily new cases (first derivative) and changing rate (second derivative)
df$dt1 = c(0, diff(df$case_sm)/diff(df$x))
fit_2<- loess(dt1 ~ x, span = 1/3, data = df) 
df$change_sm <-fit_2$fitted 
df$dt2 <- c(NA, diff(df$change_sm)/diff(df$x)) 

df %>%  
  ggplot(aes(x = date, y = dt1))+
  geom_line(aes(y = dt1, 
                color = "Estimated number of new cases")) + 
  geom_point(aes(y = dt2*2, color = "Changing rate")) +
  geom_line(aes(y = dt2*2, color = "Changing rate"))+
  geom_vline(xintercept = guai_1$iplast) + 
  labs(y = "Estimatede number of new cases") +
  scale_x_date(breaks = xlb_0, 
               date_labels = "%b%d") + 
  theme(legend.title = element_blank())
#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 row(s) containing missing values (geom_path).

^{Created on 2020-02-17 by the reprex package (v0.3.0)}

Justin Landis

I was gonna write a comment, but I was pushing the character limit.

I am not familiar with the inflection package so I am not one to judge if the 2020-02-08 is the true inflection. However, I will say this is difficult to answer with R because R is not necessarily good at calculating derivatives. If you had an estimated line equation - then you could potentially use this to plot first and second derivatives. Calculating rough delta's by doing the difference in (Y_n+1-Y_n)/(X_n+1-X_n) is never optimal because a derivate in theory is the delta of two points infinitesimally close to each other. You fundamentally cannot get a great estimate of the derivative. You can even see this because you are forced to shift this estimate to either n or n+1. Furthermore, you would expect the inflection point of x_0 to be a local min/max in the first derivative and equal to zero in the second derivative. So I don't think your second plot helps. But this could just be due to the delta's calculated.

What I would do is first fit your data to some type of model. In this example I'm going to use the package dr4pl to model your data to the 4 parameter logistic model. Since the function of the 4 parameter model is well known, I can write what the first and second derivative functions should be, then plot those values using stat_function in the ggplot2 package.

library(ggplot2)
library(dr4pl)
df<-structure(list(date = structure(c(18277, 18278, 18279, 18280, 
                                      18281, 18282, 18283, 18284, 18285, 18286, 18287, 18288, 18289, 
                                      18290, 18291, 18292, 18293, 18294, 18295, 18296, 18297, 18298, 
                                      18299, 18300, 18301, 18302, 18303, 18304, 18305, 18306, 18307), 
                                    class = "Date"), 
                   cases = c(45, 62, 121, 198, 258, 363, 425, 
                             495, 572, 618, 698, 1590, 1905, 2261, 2639, 3125, 4109, 5142, 
                             6384, 8351, 10117, 11618, 13603, 14982, 16903, 18454, 19558, 
                             20630, 21960, 22961, 23621)), 
              class = "data.frame", row.names = c(NA, -31L))
xlb_0<- structure(c(18281, 18285, 18289, 18293, 
                    18297, 18301, 18305,   
                    18309), class = "Date")

df$dat_as_num <- as.numeric(df$date)

dr4pl_obj <- dr4pl(cases~dat_as_num, data = df, init.parm = c(30000, 18300, 2, 0))
#first derivative derivation
d1_dr4pl <- function(x, theta, scale = F){
  if (any(is.na(theta))) {
    stop("One of the parameter values is NA.")
  }
  if (theta[2] <= 0) {
    stop("An IC50 estimate should always be positive.")
  }
  f <- -theta[3]*((theta[4]-theta[1])/((1+(x/theta[2])^theta[3])^2))*((x/theta[2])^(theta[3]-1))
  if(scale) {
    f <- scales::rescale(x = f, to = c(theta[4],theta[1]))
  }
  return(f)
}
#Second derivative derivation
d2_dr4pl <- function(x, theta, scale = F){
  if (any(is.na(theta))) {
    stop("One of the parameter values is NA.")
  }
  if (theta[2] <= 0) {
    stop("An IC50 estimate should always be positive.")
  }
  f <- 2*((theta[3]*(x/theta[2])^(theta[3]-1))^2)*((theta[4]-theta[1])/((1+(x/theta[2])^(theta[3]))^3))-theta[3]*(theta[3]-1)*((x/theta[2])^(theta[3]-2))*((theta[4]-theta[1])/((1+(x/theta[2])^theta[3])^2))
  if(scale) {
    f <- scales::rescale(x = f, to = c(theta[4],theta[1]))
    f <- f - f[1]
  }
  return(f)
}

ggplot(df, aes(x = dat_as_num)) +
  geom_hline(yintercept = 0) +
  [![enter image description here][1]][1]geom_point(aes(y = cases), color = "grey", alpha = .6, size = 5) +
  stat_function(fun = d1_dr4pl, args = list(theta = dr4pl_obj$parameters, scale = T), color = "red") +
  stat_function(fun = d2_dr4pl, args = list(theta = dr4pl_obj$parameters, scale = T), color = "blue") +
  stat_function(fun = dr4pl::MeanResponse, args = list(theta = dr4pl_obj$parameters), color = "gold") +
  geom_vline(xintercept = dr4pl_obj$parameters[2], linetype = "dotted") +
  theme_classic()

As you can see, the inflection point, which is the IC50 value (theta 2) of the 4 parameter logistic model, lines up well when we approach it this way.

summary(dr4pl_obj)
#$call
#dr4pl.formula(formula = cases ~ dat_as_num, data = df, init.parm = c(30000, 18300, 2, 0))
#
#$coefficients
#               Estimate       StdErr       2.5 %      97.5 %
#Upper limit 25750.61451 4.301008e-05 25750.59681 25750.63221
#Log10(IC50) 18298.75347 4.301008e-09 18298.67889 18298.82806
#Slope        5154.35449 4.301008e-05  5154.33678  5154.37219
#Lower limit    58.48732 4.301008e-05    58.46962    58.50503
#
#attr(,"class")
#[1] "summary.dr4pl"

Furthermore, using dr4pl, it says the IC50 value is roughly 18298.8, which is late 2020-02-06. Not too far off from the inflection value. I'm sure there may be a better model to use than the 4pl, but it was just the one that I knew I could write first and second derivatives for the purposes of answering this question.

I'm sure other coding languages are more specialized when it comes to derivatives, and can even calculate them for you so long as you start with an initial function. I think one of these languages is mathematica.

A disclaimer, I ended up scaling the first and second derivatives so that they could be plotted together. Their actual values are much larger than shown here.

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-2

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

使用拐点软件包查找COVID-19感染的转折点（拐点）

使用拐点软件包查找COVID-19感染的转折点（拐点）

在python中查找数组的转折点

循环负载中的转折点识别

在numpy 1d数组中查找拐点和平稳点

使用python修正拐点估计

Python-从转折点坐标生成多项式

在数据框中找到最小转折点

Ruby：检测数组中转折点的最优雅方法

带有转折点的画布螺旋上升效果

查找TexLive的软件包

如何找到数据集的拐点？

使用CMAKE查找到开发库的软件包

查找使用rpm安装软件包的位置

查找使用特定Shell命令的软件包

在AWS Lambda内点安装Python软件包？

点冻结不显示软件包版本

点安装与Python版本不兼容的软件包

如何检查一个序列是严格单调的，还是有一个转折点是双方都是严格单调的？

查找Haskell模块所属的软件包

如何按目录查找软件包？

如何查找与软件包关联的命令？

如何：在Ubuntu上查找软件包

使用Go软件包

使用npm软件包

如何计算K距离图中的拐点？

如何在python中找到拐点？

在Matlab中找到离散数据集的拐点

查找已安装软件的软件包名称

点列表和sudo点列表显示了不同的软件包版本

Mint 19软件包的安装和更新minecraft-installer的错误消息