How to remove inconsistencies from dataframe (time series)

Yatrosin

Let's say that we have this dataframe:

x<- as.data.frame(cbind(c("A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D"),
                        c(1,2,3,1,2,3,2,3,1,2,3,4,5,1,2,3,4,5),
                        c(10,12.5,15,2,3.4,5.7,8,9.5,1,5.6,8.9,10,11,2,3.4,6,8,10.5),
                        c(1,3,4,1,2,3,4,3,2,2,3,5,2,3,5,4,5,5)))
colnames(x)<- c("ID", "Visit", "Time", "State")

Column ID indicates subject ID.

Column Visit indicates a series of visits

Column Time indicates the time that has passed to reach a certain "State"

Column State indicates severity of a certain disease, where 5 means death. That means that you can fluctuate from worse states to better states, but you can never improve from category 5, since you are dead.

I would like to identify only those subjects that improved from category 5 to a better one, since these are errors from the dataframe (i.e. rows 13 and 16).

Additionally, I would like to remove those rows where a subject seems to have died more than once (i.e. row 18).

I made a similar question before, but it was very general and it implied that all fluctuations to a better state were removed from the dataset, which it is not what I actually want.

Uwe

Answer to modified question

The OP has modified the question substantially by requesting that all rows are considered erroneous which appear after the first occurrence of State 5 (death). This includes false recoveries (as in rows 13 and 16) as well as "duplicated deaths" (as in rows 17 and 18).

An answer to this requires a complete different approach. One possibility is to use a non-equi join:

library(data.table)
setDT(x)[x[, first(Visit[State == 5]), by = ID], on = .(ID, Visit > V1), error := TRUE][]
    ID Visit Time State error
 1:  A     1 10.0     1    NA
 2:  A     2 12.5     3    NA
 3:  A     3 15.0     4    NA
 4:  B     1  2.0     1    NA
 5:  B     2  3.4     2    NA
 6:  B     3  5.7     3    NA
 7:  B     2  8.0     4    NA
 8:  B     3  9.5     3    NA
 9:  C     1  1.0     2    NA
10:  C     2  5.6     2    NA
11:  C     3  8.9     3    NA
12:  C     4 10.0     5    NA
13:  C     5 11.0     2  TRUE
14:  D     1  2.0     3    NA
15:  D     2  3.4     5    NA
16:  D     3  6.0     4  TRUE
17:  D     4  8.0     5  TRUE
18:  D     5 10.5     5  TRUE

The number of the first visit with State 5 is returned by

x[, first(Visit[State == 5]), by = ID]
   ID V1
1:  C  4
2:  D  2

In the subsequent non-equi join only those rows are marked which appear after the first State 5 event.

Data

x <- data.frame(
  ID = c("A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D"),
  Visit = c(1,2,3,1,2,3,2,3,1,2,3,4,5,1,2,3,4,5),
  Time = c(10,12.5,15,2,3.4,5.7,8,9.5,1,5.6,8.9,10,11,2,3.4,6,8,10.5),
  State = c(1,3,4,1,2,3,4,3,2,2,3,5,2,3,5,4,5,5))

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

How to remove seasonality from time series data?

分類Dev

How to remove repeated samples from a time series in Pandas?

分類Dev

How to remove characters from a pandas series with a condition?

分類Dev

How to convert to a time series and plot a dataframe with each day as a variable or column?

分類Dev

How to convert to a time series and plot a dataframe with each day as a variable or column?

分類Dev

How to plot beautifully the segmentation of time series (pandas dataframe)

分類Dev

upsampling multiple time series in one dataframe

分類Dev

Expand time series data in pandas dataframe

分類Dev

format a time series as dataframe with julian date

分類Dev

how can i replace time-series dataframe specific values in pandas?

分類Dev

How to remove time and memory from 500 internal error - Zend

分類Dev

How to remove time segment completely from datetime after converting to string?

分類Dev

How to remove some character from the list based on pattern in PySpark Dataframe

分類Dev

how to define the lastest time of publication for a time series

分類Dev

remove a column from a dataframe spark

分類Dev

Linear Regression from Time Series Pandas

分類Dev

From binary sequence to time series frequency

分類Dev

Scala: Creating a dataframe from a series of lists

分類Dev

How can I merge time series data from 2 different csv

分類Dev

How to extract values from a Pandas DataFrame, rather than a Series (without referencing the index)?

分類Dev

Python pandas: insert rows for missing dates, time series in groupby dataframe

分類Dev

construct dataframe from a series of keys and a key:value dataframe

分類Dev

How to remove the time out of this date?

分類Dev

How to remove Hetrogenius elements in a dataframe?

分類Dev

In pandas DataFrame, how can I store a specific value from a column into a variable, and then subsequently remove that value from the column?

分類Dev

SparkSQL on pyspark: how to generate time series?

分類Dev

How to average time series by groups in R

分類Dev

Python - how to store time series into dataset

分類Dev

How to make a custom sklearn transformer for time series?

Related 関連記事

  1. 1

    How to remove seasonality from time series data?

  2. 2

    How to remove repeated samples from a time series in Pandas?

  3. 3

    How to remove characters from a pandas series with a condition?

  4. 4

    How to convert to a time series and plot a dataframe with each day as a variable or column?

  5. 5

    How to convert to a time series and plot a dataframe with each day as a variable or column?

  6. 6

    How to plot beautifully the segmentation of time series (pandas dataframe)

  7. 7

    upsampling multiple time series in one dataframe

  8. 8

    Expand time series data in pandas dataframe

  9. 9

    format a time series as dataframe with julian date

  10. 10

    how can i replace time-series dataframe specific values in pandas?

  11. 11

    How to remove time and memory from 500 internal error - Zend

  12. 12

    How to remove time segment completely from datetime after converting to string?

  13. 13

    How to remove some character from the list based on pattern in PySpark Dataframe

  14. 14

    how to define the lastest time of publication for a time series

  15. 15

    remove a column from a dataframe spark

  16. 16

    Linear Regression from Time Series Pandas

  17. 17

    From binary sequence to time series frequency

  18. 18

    Scala: Creating a dataframe from a series of lists

  19. 19

    How can I merge time series data from 2 different csv

  20. 20

    How to extract values from a Pandas DataFrame, rather than a Series (without referencing the index)?

  21. 21

    Python pandas: insert rows for missing dates, time series in groupby dataframe

  22. 22

    construct dataframe from a series of keys and a key:value dataframe

  23. 23

    How to remove the time out of this date?

  24. 24

    How to remove Hetrogenius elements in a dataframe?

  25. 25

    In pandas DataFrame, how can I store a specific value from a column into a variable, and then subsequently remove that value from the column?

  26. 26

    SparkSQL on pyspark: how to generate time series?

  27. 27

    How to average time series by groups in R

  28. 28

    Python - how to store time series into dataset

  29. 29

    How to make a custom sklearn transformer for time series?

ホットタグ

アーカイブ