Overview

Teaching: 15 min
Exercises: 45 min
Questions
  • How do I extract information from a table in a database?

  • How can I create summary statistics of a data set?

  • How can I combine information from several tables?

Objectives
  • Learn how to issue a SQL query within DBeaver

  • Learn basic SQL syntax: SELECT, FROM, WHERE, GROUP BY, ORDER,…

  • Build complex queries from simple queries

  • Learn how to join two tables on a common key

Issuing your first queries

Queries can be issued to the database in many different ways. Here we’ll use the SQL query window in DBeaver. This is a great way to test out ideas before implementing your SQL queries in a script.

  1. click on the “SQL Editor” button:

  2. This should bring up a new tab:

    The upper pane is where we will issue our query, and the results of the query will be shown in the results pane below.

  3. Submit your first query by typing the following into the SQL pane:

     SELECT * FROM seattlecrimeincidents LIMIT 100;
    
    • the LIMIT command restricts the database to return only the first 100 rows.
    • the * is a wildcard requesting all columns from the database.



Practice problems

In the following exercises you can use this cheatsheet to look up the different SQL commands.

Counting numbers of offenses

How many “TRESPASS” offenses occurred in total?

Hint: fill in the ? below:

SELECT ? FROM ? WHERE ? = ?

Solution

Here is the solution:

SELECT COUNT(*) FROM seattlecrimeincidents WHERE "Offense Type" = 'TRESPASS'

Single and double quotes in SQL

Notice that in our example above we had both single and double quotes. Here’s the rule:

Upper/lower case: does it matter?

Most versions of the SQL language are not case sensitive, but it is a good habbit to make key words capital so that the queries are easy to read. The only time when the case matters is whne the word is in quotes and refers to a name.


Calculating several values at a time

What is the range of the latitude and longitude coordinates of all crimes?

Hint: use “max” and “min” functions.

Solution

SELECT min(longitude), max(longitude),min(latitude),max(latitude) FROM seattlecrimeincidents

Combining conditions

What is the number of bike thefts in the month of january?

Hint: the name for bike thefts is ‘THEFT-BICYCLE’.

Solution

SELECT count(*) FROM seattlecrimeincidents
        WHERE "Offense Type" = 'THEFT-BICYCLE' and month = 1



Grouping

Note: for homicide we see there are a lot of types of homicides -> use summarized offense description

Crimes for each month?

Solution

SELECT month,count(*) FROM seattlecrimeincidents GROUP BY month ORDER BY month ASC

Month with highest number of bike thefts?

Solution

SELECT month,count(*) FROM seattlecrimeincidents
	WHERE "Offense Type" = 'THEFT-BICYCLE'
	GROUP BY month
	ORDER BY count DESC

Number of crimes per census tract?

Solution

SELECT "census tract 2000",count(*) FROM seattlecrimeincidents
	group by "census tract 2000"
	ORDER BY "census tract 2000" ASC;



Ideas of aliasing and nesting

SELECT "census tract 2000" as "CensusTract",count(*) as "crime_count" FROM seattlecrimeincidents
	group by "census tract 2000"
	ORDER BY "census tract 2000" ASC;
SELECT max(crimeTable.crime_count) FROM 
(SELECT "census tract 2000" as "CensusTract",count(*) as "crime_count" FROM seattlecrimeincidents
	group by "census tract 2000") as crimeTable



Joining two tables

Which census tract has the highest crime rate?

To calculate the crime rate we need to have the population of each census tract. This is missing from the SeattleCrimeIncidents Table. However, in the database there is another table called census containing the population per tract.

 SELECT * from census LIMIT 10
tract #crimes/popuation
. .

To simplify the join we can create the following two tables:

tract crime_count
. .
 SELECT round("census tract 2000"),count(*) FROM seattlecrimeincidents
	group by "census tract 2000"
	ORDER BY "census tract 2000" ASC;
tract population
. .
 SELECT "Census Tract","Total Population, 2010" as population from census_data
	ORDER BY "Census Tract" ASC;

Then we can join them where the corresponding tracts are equal.

Hint: be careful about how division is performed in SQL. You might need to use a function which converts integers to floats (you can do this with variable::float)

Join with where command

Solution

SELECT crimeTable.CT,cast(crimeTable.count as float)/censusTable.population as crime_rate from
	(select round("census tract 2000") as CT, count(*) as count from SeattleCrimeIncidents group by "census tract 2000") as crimeTable,
    (select "Total Population, 2010" as population,"Census Tract" as CT from census_data) as censusTable
    where crimeTable.CT = censusTable.CT order by "crime_rate" DESC;

Join with a join command

Solution

select crimeTable.CT,crimeTable.count::float/censusTable.population::float as crime_rate from 
	(select round("census tract 2000") as CT, count(*) as count from SeattleCrimeIncidents group by "census tract 2000") crimeTable
    join 
    (select "Total Population, 2010" as population,"Census Tract" as CT from census_data) censusTable
    on crimeTable.CT = censusTable.CT order by "crime_rate" DESC;

Which tract is which?

Here is a useful map of Seattle census tracts (2010). Can you create a similar map with your favorite tool and color each tract by the crime rate? You can check out the Mapping episode if needed.

Key Points