Overview
Teaching: 15 min Exercises: 45 minQuestions
How do I extract information from a table in a database?
How can I create summary statistics of a data set?
How can I combine information from several tables?
Objectives
Learn how to issue a SQL query within DBeaver
Learn basic SQL syntax: SELECT, FROM, WHERE, GROUP BY, ORDER,…
Build complex queries from simple queries
Learn how to join two tables on a common key
Queries can be issued to the database in many different ways. Here we’ll use the SQL query window in DBeaver. This is a great way to test out ideas before implementing your SQL queries in a script.
click on the “SQL Editor” button:
This should bring up a new tab:
The upper pane is where we will issue our query, and the results of the query will be shown in the results pane below.
Submit your first query by typing the following into the SQL pane:
SELECT * FROM seattlecrimeincidents LIMIT 100;
LIMIT
command restricts the database to return only the first 100 rows.*
is a wildcard requesting all columns from the database.In the following exercises you can use this cheatsheet to look up the different SQL commands.
Counting numbers of offenses
How many “TRESPASS” offenses occurred in total?
Hint: fill in the ? below:
SELECT ? FROM ? WHERE ? = ?
Solution
Here is the solution:
SELECT COUNT(*) FROM seattlecrimeincidents WHERE "Offense Type" = 'TRESPASS'
Single and double quotes in SQL
Notice that in our example above we had both single and double quotes. Here’s the rule:
- use double quotes for column names within the database
- use single quotes for strings or date/time constants Note that we only have to use double quotes for the column names because this particular data table did not follow the standards of database convention (i.e. the names have spaces in them). See this discussion for more details.
Upper/lower case: does it matter?
Most versions of the SQL language are not case sensitive, but it is a good habbit to make key words capital so that the queries are easy to read. The only time when the case matters is whne the word is in quotes and refers to a name.
Calculating several values at a time
What is the range of the latitude and longitude coordinates of all crimes?
Hint: use “max” and “min” functions.
Solution
SELECT min(longitude), max(longitude),min(latitude),max(latitude) FROM seattlecrimeincidents
Combining conditions
What is the number of bike thefts in the month of january?
Hint: the name for bike thefts is ‘THEFT-BICYCLE’.
Solution
SELECT count(*) FROM seattlecrimeincidents WHERE "Offense Type" = 'THEFT-BICYCLE' and month = 1
identify a function to apply per groups
select "Offense Type",count(*) from SeattleCrimeIncidents group by "Offense Type" order by count ASC;
Note: for homicide we see there are a lot of types of homicides -> use summarized offense description
select "Summarized Offense Description", count(*) from SeattleCrimeIncidents group by "Summarized Offense Description";
select year, count(*) from SeattleCrimeIncidents group by year;
Crimes for each month?
Solution
SELECT month,count(*) FROM seattlecrimeincidents GROUP BY month ORDER BY month ASC
Month with highest number of bike thefts?
Solution
SELECT month,count(*) FROM seattlecrimeincidents WHERE "Offense Type" = 'THEFT-BICYCLE' GROUP BY month ORDER BY count DESC
Number of crimes per census tract?
Solution
SELECT "census tract 2000",count(*) FROM seattlecrimeincidents group by "census tract 2000" ORDER BY "census tract 2000" ASC;
SELECT "census tract 2000" as "CensusTract",count(*) as "crime_count" FROM seattlecrimeincidents group by "census tract 2000" ORDER BY "census tract 2000" ASC;
SELECT max(crimeTable.crime_count) FROM (SELECT "census tract 2000" as "CensusTract",count(*) as "crime_count" FROM seattlecrimeincidents group by "census tract 2000") as crimeTable
Which census tract has the highest crime rate?
To calculate the crime rate we need to have the population of each census tract. This is missing from the SeattleCrimeIncidents Table. However, in the database there is another table called census containing the population per tract.
SELECT * from census LIMIT 10
tract | #crimes/popuation |
---|---|
. | . |
To simplify the join we can create the following two tables:
tract | crime_count |
---|---|
. | . |
SELECT round("census tract 2000"),count(*) FROM seattlecrimeincidents
group by "census tract 2000"
ORDER BY "census tract 2000" ASC;
tract | population |
---|---|
. | . |
SELECT "Census Tract","Total Population, 2010" as population from census_data
ORDER BY "Census Tract" ASC;
Then we can join them where the corresponding tracts are equal.
Hint: be careful about how division is performed in SQL. You might need to use a function which converts integers to floats (you can do this with variable::float
)
Join with where command
Solution
SELECT crimeTable.CT,cast(crimeTable.count as float)/censusTable.population as crime_rate from (select round("census tract 2000") as CT, count(*) as count from SeattleCrimeIncidents group by "census tract 2000") as crimeTable, (select "Total Population, 2010" as population,"Census Tract" as CT from census_data) as censusTable where crimeTable.CT = censusTable.CT order by "crime_rate" DESC;
Join with a join command
Solution
select crimeTable.CT,crimeTable.count::float/censusTable.population::float as crime_rate from (select round("census tract 2000") as CT, count(*) as count from SeattleCrimeIncidents group by "census tract 2000") crimeTable join (select "Total Population, 2010" as population,"Census Tract" as CT from census_data) censusTable on crimeTable.CT = censusTable.CT order by "crime_rate" DESC;
Which tract is which?
Here is a useful map of Seattle census tracts (2010). Can you create a similar map with your favorite tool and color each tract by the crime rate? You can check out the Mapping episode if needed.
Key Points
SQL provides a simple yet powerful set of tools for performing anaylis on your data