any efficient way to count binary values in columns of big data table?

Jared

I am using big-query for Chicago crime data in google cloud platform. However, I want to count number of arrest and non arrest per crime type. It is easy to count this in pandas but it's not intuitive for me how to count binary values in dataframe with big query. Can anyone give me possible idea to make this count?

data

because Chicago crime data is big I am not able to give reproducible example here, but it is very easy to preview crime data from here: Chicago crime data

here is little preview:

crime data preview

my big query:

SELECT
  primary_type,
  count(arrest),
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
  arrest = TRUE
  AND year IN (2001,
    2018)
  AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY
  primary_type,
  arrest

but this query gives me empty output, I don't know how to make it work.

goal:

from Chicago crime data table, I want to extract number of total arrest and non-arrest for each primary type, I want to exclude OTHER OFFENSE and all non-criminal types until end of 2018.

how can I correct my big-query to get my expected output? any efficient big-query script to get expected query output? any idea? Thanks

Mikhail Berlyant

Below should work

#standardSQL
SELECT
  primary_type,
  COUNT(arrest) arrest_total,
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE arrest = TRUE
AND year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, arrest  

I think your issue was in below line, where you selected ONLY 2001 and 2018 instead of all years in between (to at least include 2015, and 2016)

AND year IN (2001, 2018)   

So, instead you should use below one

AND year BETWEEN 2001 AND 2018   

Also, if you want to include non-arrests, you can use below

#standardSQL
SELECT
  primary_type,
  arrest,
  COUNT(arrest) arrest_total,
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, arrest

Note: I removed WHERE arrest = TRUE here and added arrest to SELECT list

Rather than these few adjustments - your initial query was quite correct

If you want to have one output row per primary_type you can use below

#standardSQL
SELECT
  primary_type,
  COUNTIF(arrest) arrests,
  COUNTIF(NOT arrest) non_arrests,
  COUNT(arrest) arrest_total,
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type   

Moreover - if you can "expand" year counts like below (for 2015 for example)

COUNTIF(year = 2015 AND arrest) AS arrests_2015,
COUNTIF(year = 2015 AND NOT arrest) AS non_arrests_2015,  

Is there any programmatic way to count number of arrest from 2001 to 2018 per crime type

#standardSQL
SELECT
  primary_type,
  year,
  COUNTIF(arrest) arrests,
  COUNTIF(NOT arrest) non_arrests,
  COUNT(arrest) arrest_total
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, year

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Efficient way to compare all columns in data table R

分類Dev

R data table unique record count based on all combination of a given list of values from 2 columns

分類Dev

Count rows in data table with certain values by group

分類Dev

Efficient way to store data

分類Dev

most efficient way to count the largest value in swift

分類Dev

Swapping values between two columns using data.table

分類Dev

sum columns values in data.table in r using .SDcols

分類Dev

Efficient way of replacing character string with numeric values based on data frame "dictionary"

分類Dev

Efficient way to swap property values in array of objects

分類Dev

Efficient way of replacing values in matrix (R)

分類Dev

JQuery Table Sorting - More Efficient way?

分類Dev

How to update the column value of the table in an efficient way?

分類Dev

How to count records from multiple columns eliminating null values in hive table

分類Dev

Accidently deleted data from PostgreSQL 9.1 table. Is there any way to restore the data?

分類Dev

How to binary replace values in columns based on value in it

分類Dev

Python most efficient way to keep sorted data

分類Dev

Efficient way to merge array of objects with similar data

分類Dev

More efficient way to fetch data from MySQL

分類Dev

data.table: How to do binary search for two (numeric) values at one key: example included

分類Dev

Efficient way to shift values in one matrix according to the values in another matrix

分類Dev

Is there any way to avoid the top rows with NULL values that has one column value (One of the table Business Key) in Cognos report prompt values

分類Dev

What's an efficient way to count "previous" rows in SQL?

分類Dev

What is an efficient way to get a count of objects grouped by a field in Django?

分類Dev

R data.table memory efficient rbindlist

分類Dev

Easiest possible way to retrieve MYSQL data in a table field with decreasing order of occurrence along with the occurrence count?

分類Dev

SQL Server: Most efficient way to look for set based on 2 columns

分類Dev

Efficient way to check for inequality between two columns accounting for NAs in R?

分類Dev

Efficient way to search string contains in multiple columns using pandas

分類Dev

Efficient way of finding the maximum absolute value, for many columns

Related 関連記事

  1. 1

    Efficient way to compare all columns in data table R

  2. 2

    R data table unique record count based on all combination of a given list of values from 2 columns

  3. 3

    Count rows in data table with certain values by group

  4. 4

    Efficient way to store data

  5. 5

    most efficient way to count the largest value in swift

  6. 6

    Swapping values between two columns using data.table

  7. 7

    sum columns values in data.table in r using .SDcols

  8. 8

    Efficient way of replacing character string with numeric values based on data frame "dictionary"

  9. 9

    Efficient way to swap property values in array of objects

  10. 10

    Efficient way of replacing values in matrix (R)

  11. 11

    JQuery Table Sorting - More Efficient way?

  12. 12

    How to update the column value of the table in an efficient way?

  13. 13

    How to count records from multiple columns eliminating null values in hive table

  14. 14

    Accidently deleted data from PostgreSQL 9.1 table. Is there any way to restore the data?

  15. 15

    How to binary replace values in columns based on value in it

  16. 16

    Python most efficient way to keep sorted data

  17. 17

    Efficient way to merge array of objects with similar data

  18. 18

    More efficient way to fetch data from MySQL

  19. 19

    data.table: How to do binary search for two (numeric) values at one key: example included

  20. 20

    Efficient way to shift values in one matrix according to the values in another matrix

  21. 21

    Is there any way to avoid the top rows with NULL values that has one column value (One of the table Business Key) in Cognos report prompt values

  22. 22

    What's an efficient way to count "previous" rows in SQL?

  23. 23

    What is an efficient way to get a count of objects grouped by a field in Django?

  24. 24

    R data.table memory efficient rbindlist

  25. 25

    Easiest possible way to retrieve MYSQL data in a table field with decreasing order of occurrence along with the occurrence count?

  26. 26

    SQL Server: Most efficient way to look for set based on 2 columns

  27. 27

    Efficient way to check for inequality between two columns accounting for NAs in R?

  28. 28

    Efficient way to search string contains in multiple columns using pandas

  29. 29

    Efficient way of finding the maximum absolute value, for many columns

ホットタグ

アーカイブ