subsetting data frame on sum of column

user3138373

This is a follow up question of my previous question

Considering I have the data frame like this:

g1:1    4
g1:2    5
g1:3    9
g2:1    6
g2:2    2
g3:1    5
g3:2    6
g4:1    4
g4:1    1

I use the following code to split first column on :

tab2 <- read.table("dplyrtest.txt",header=FALSE)
dput(tab2)
structure(list(V1 = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
8L), .Label = c("g1:1", "g1:2", "g1:3", "g2:1", "g2:2", "g3:1", 
"g3:2", "g4:1"), class = "factor"), V2 = c(4L, 5L, 9L, 6L, 2L, 
5L, 6L, 4L, 1L)), class = "data.frame", row.names = c(NA, -9L
))
tab2 <- data.frame(tab2$V1, do.call(rbind, strsplit(as.character(tab2$V1),split=":")))
str(tab2)

'data.frame':   9 obs. of  3 variables:
 $ tab2.V1: Factor w/ 8 levels "g1:1","g1:2",..: 1 2 3 4 5 6 7 8 8
 $ X1     : Factor w/ 4 levels "g1","g2","g3",..: 1 1 1 2 2 3 3 4 4
 $ X2     : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 1 2 1 1

tab2$X2 <- as.integer(tab2$X2)
str(tab2)

'data.frame':   9 obs. of  3 variables:
 $ tab2.V1: Factor w/ 8 levels "g1:1","g1:2",..: 1 2 3 4 5 6 7 8 8
 $ X1     : Factor w/ 4 levels "g1","g2","g3",..: 1 1 1 2 2 3 3 4 4
 $ X2     : int  1 2 3 1 2 1 2 1 1

colnames(tab2) <- c("gene","name","count")

dput(tab2)
structure(list(gene = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 8L), .Label = c("g1:1", "g1:2", "g1:3", "g2:1", "g2:2", "g3:1", 
"g3:2", "g4:1"), class = "factor"), name = structure(c(1L, 1L, 
1L, 2L, 2L, 3L, 3L, 4L, 4L), .Label = c("g1", "g2", "g3", "g4"
), class = "factor"), count = structure(c(1L, 2L, 3L, 1L, 2L, 
1L, 2L, 1L, 1L), .Label = c("1", "2", "3"), class = "factor")), class = "data.frame", row.names = c(NA, 
-9L))

tab2 <- tab2 %>% group_by(name) %>% filter(sum(as.integer(count)) > 10)

This gives a warning, and tab2 has no data in it:

# A tibble: 0 x 3
# Groups:   name [1]
# … with 3 variables: gene <fct>, name <fct>, count <fct>
Warning message:
Factor `name` contains implicit NA, consider using `forcats::fct_explicit_na`

Any help is appreciated??

Ronak Shah

The splitting step changes the numbers I believe.

Try doing this instead after reading the file.

library(tidyverse)
tab2 <- read.table("dplyrtest.txt",header=FALSE)

tab2 %>%
  separate(V1, into = c("Gene", "name")) %>%
  rename_at(3, ~"count") %>%
  group_by(Gene) %>% #OR group_by(name)
  filter(sum(count) > 10)

#  Gene  name  count
#  <chr> <chr> <int>
#1  g1    1       4
#2  g1    2       5
#3  g1    3       9
#4  g3    1       5
#5  g3    2       6

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Subsetting data frame based on max number of instances of another column

分類Dev

Subsetting a dataframe using column sum in Python

分類Dev

Sum all values in every column of a data.frame in R

分類Dev

duplicate a column in pyspark data frame

分類Dev

R : Column operation on a data frame

分類Dev

How to sum over diagonals of data frame

分類Dev

R sum over rows - Data.frame

分類Dev

Dynamically subsetting a data table

分類Dev

Matching rownumber and column name of a data frame with values of another data frame

分類Dev

jsonlite is creating a data.frame with a column of class data.frame

分類Dev

jsonlite is creating a data.frame with a column of class data.frame

分類Dev

Aggregate Column from Data Frame 1 and Insert to Data Frame 2

分類Dev

To find whether a column exists in data frame or not

分類Dev

Modify a Data Frame column with list comprehension

分類Dev

Convert a string to data frame, including column names

分類Dev

Unlist character column to data frame in R

分類Dev

add_column error with data frame

分類Dev

Change of element in column not updating in data frame

分類Dev

Create new column on grouped data frame

分類Dev

Median of string column Pandas data frame

分類Dev

Make a feature the first column in data frame

分類Dev

Transposing values in a column of a Pandas data frame

分類Dev

Separate data frame depending on one column duplicates

分類Dev

Pass function to column in data frame - Python

分類Dev

Diff on each subset of a data frame column

分類Dev

Lookup word in data frame, return column name?

分類Dev

Compare the two column in different data frame in pandas

分類Dev

Data frame with different number of values for a column

分類Dev

sorting data frame and calculating instant value of a column

Related 関連記事

  1. 1

    Subsetting data frame based on max number of instances of another column

  2. 2

    Subsetting a dataframe using column sum in Python

  3. 3

    Sum all values in every column of a data.frame in R

  4. 4

    duplicate a column in pyspark data frame

  5. 5

    R : Column operation on a data frame

  6. 6

    How to sum over diagonals of data frame

  7. 7

    R sum over rows - Data.frame

  8. 8

    Dynamically subsetting a data table

  9. 9

    Matching rownumber and column name of a data frame with values of another data frame

  10. 10

    jsonlite is creating a data.frame with a column of class data.frame

  11. 11

    jsonlite is creating a data.frame with a column of class data.frame

  12. 12

    Aggregate Column from Data Frame 1 and Insert to Data Frame 2

  13. 13

    To find whether a column exists in data frame or not

  14. 14

    Modify a Data Frame column with list comprehension

  15. 15

    Convert a string to data frame, including column names

  16. 16

    Unlist character column to data frame in R

  17. 17

    add_column error with data frame

  18. 18

    Change of element in column not updating in data frame

  19. 19

    Create new column on grouped data frame

  20. 20

    Median of string column Pandas data frame

  21. 21

    Make a feature the first column in data frame

  22. 22

    Transposing values in a column of a Pandas data frame

  23. 23

    Separate data frame depending on one column duplicates

  24. 24

    Pass function to column in data frame - Python

  25. 25

    Diff on each subset of a data frame column

  26. 26

    Lookup word in data frame, return column name?

  27. 27

    Compare the two column in different data frame in pandas

  28. 28

    Data frame with different number of values for a column

  29. 29

    sorting data frame and calculating instant value of a column

ホットタグ

アーカイブ