any efficient way to count binary values in columns of big data table?

debugcn 投稿 Dev

Jared

I am using big-query for Chicago crime data in google cloud platform. However, I want to count number of arrest and non arrest per crime type. It is easy to count this in pandas but it's not intuitive for me how to count binary values in dataframe with big query. Can anyone give me possible idea to make this count?

data

because Chicago crime data is big I am not able to give reproducible example here, but it is very easy to preview crime data from here: Chicago crime data

here is little preview:

my big query:

SELECT
  primary_type,
  count(arrest),
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
  arrest = TRUE
  AND year IN (2001,
    2018)
  AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY
  primary_type,
  arrest

but this query gives me empty output, I don't know how to make it work.

goal:

from Chicago crime data table, I want to extract number of total arrest and non-arrest for each primary type, I want to exclude OTHER OFFENSE and all non-criminal types until end of 2018.

how can I correct my big-query to get my expected output? any efficient big-query script to get expected query output? any idea? Thanks

Mikhail Berlyant

Below should work

#standardSQL
SELECT
  primary_type,
  COUNT(arrest) arrest_total,
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE arrest = TRUE
AND year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, arrest

I think your issue was in below line, where you selected ONLY 2001 and 2018 instead of all years in between (to at least include 2015, and 2016)

AND year IN (2001, 2018)

So, instead you should use below one

AND year BETWEEN 2001 AND 2018

Also, if you want to include non-arrests, you can use below

#standardSQL
SELECT
  primary_type,
  arrest,
  COUNT(arrest) arrest_total,
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, arrest

Note: I removed WHERE arrest = TRUE here and added arrest to SELECT list

Rather than these few adjustments - your initial query was quite correct

If you want to have one output row per primary_type you can use below

#standardSQL
SELECT
  primary_type,
  COUNTIF(arrest) arrests,
  COUNTIF(NOT arrest) non_arrests,
  COUNT(arrest) arrest_total,
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type

Moreover - if you can "expand" year counts like below (for 2015 for example)

COUNTIF(year = 2015 AND arrest) AS arrests_2015,
COUNTIF(year = 2015 AND NOT arrest) AS non_arrests_2015,

Is there any programmatic way to count number of arrest from 2001 to 2018 per crime type

#standardSQL
SELECT
  primary_type,
  year,
  COUNTIF(arrest) arrests,
  COUNTIF(NOT arrest) non_arrests,
  COUNT(arrest) arrest_total
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, year

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-10

コメントを追加

サインイン

分類Dev

Is there any way to avoid the top rows with NULL values that has one column value (One of the table Business Key) in Cognos report prompt values

分類Dev

Easiest possible way to retrieve MYSQL data in a table field with decreasing order of occurrence along with the occurrence count?

分類Dev

Related 関連記事

記事