I am using big-query for Chicago crime data in google cloud platform. However, I want to count number of arrest and non arrest per crime type. It is easy to count this in pandas but it's not intuitive for me how to count binary values in dataframe with big query. Can anyone give me possible idea to make this count?
data
because Chicago crime data is big I am not able to give reproducible example here, but it is very easy to preview crime data from here: Chicago crime data
here is little preview:
my big query:
SELECT
primary_type,
count(arrest),
COUNTIF(year = 2015) AS arrests_2015,
COUNTIF(year = 2016) AS arrests_2016
FROM
`bigquery-public-data.chicago_crime.crime`
WHERE
arrest = TRUE
AND year IN (2001,
2018)
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY
primary_type,
arrest
but this query gives me empty output, I don't know how to make it work.
goal:
from Chicago crime data table, I want to extract number of total arrest and non-arrest for each primary type, I want to exclude OTHER OFFENSE and all non-criminal types until end of 2018.
how can I correct my big-query to get my expected output? any efficient big-query script to get expected query output? any idea? Thanks
Below should work
#standardSQL
SELECT
primary_type,
COUNT(arrest) arrest_total,
COUNTIF(year = 2015) AS arrests_2015,
COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE arrest = TRUE
AND year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, arrest
I think your issue was in below line, where you selected ONLY 2001 and 2018 instead of all years in between (to at least include 2015, and 2016)
AND year IN (2001, 2018)
So, instead you should use below one
AND year BETWEEN 2001 AND 2018
Also, if you want to include non-arrests, you can use below
#standardSQL
SELECT
primary_type,
arrest,
COUNT(arrest) arrest_total,
COUNTIF(year = 2015) AS arrests_2015,
COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, arrest
Note: I removed WHERE arrest = TRUE
here and added arrest
to SELECT list
Rather than these few adjustments - your initial query was quite correct
If you want to have one output row per primary_type
you can use below
#standardSQL
SELECT
primary_type,
COUNTIF(arrest) arrests,
COUNTIF(NOT arrest) non_arrests,
COUNT(arrest) arrest_total,
COUNTIF(year = 2015) AS arrests_2015,
COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type
Moreover - if you can "expand" year counts like below (for 2015 for example)
COUNTIF(year = 2015 AND arrest) AS arrests_2015,
COUNTIF(year = 2015 AND NOT arrest) AS non_arrests_2015,
Is there any programmatic way to count number of arrest from 2001 to 2018 per crime type
#standardSQL
SELECT
primary_type,
year,
COUNTIF(arrest) arrests,
COUNTIF(NOT arrest) non_arrests,
COUNT(arrest) arrest_total
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, year
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加