Overview
Teaching: 15 min
Exercises: 45 minQuestions
How do I extract information from a table in a database?
How can I create summary statistics of a data set?
How can I combine information from several tables?
Objectives
Learn how to issue a SQL query within DBeaver
Learn basic SQL syntax: SELECT, FROM, WHERE, GROUP BY, ORDER,…
Build complex queries from simple queries
Learn how to join two tables on a common key
Queries can be issued to the database in many different ways. Here we’ll use the SQL query window in DBeaver. This is a great way to test out ideas before implementing your SQL queries in a script.
click on the “SQL Editor” button:
This should bring up a new tab:
The upper pane is where we will issue our query, and the results of the query will be shown in the results pane below.
Submit your first query by typing the following into the SQL pane:
SELECT * FROM seattlecrimeincidents LIMIT 100;
LIMIT
command restricts the database to return only the first 100 rows.*
is a wildcard requesting all columns from the database.In the following exercises you can use this cheatsheet to look up the different SQL commands.
Counting numbers of offenses
Single and double quotes in SQL
Notice that in our example above we had both single and double quotes. Here’s the rule:
- use double quotes for column names within the database
- use single quotes for strings or date/time constants Note that we only have to use double quotes for the column names because this particular data table did not follow the standards of database convention (i.e. the names have spaces in them). See this discussion for more details.
Upper/lower case: does it matter?
Most versions of the SQL language are not case sensitive, but it is a good habbit to make key words capital so that the queries are easy to read. The only time when the case matters is whne the word is in quotes and refers to a name.
Calculating several values at a time
Combining conditions
identify a function to apply per groups
select "Offense Type",count(*) from SeattleCrimeIncidents group by "Offense Type" order by count ASC;
Note: for homicide we see there are a lot of types of homicides -> use summarized offense description
select "Summarized Offense Description", count(*) from SeattleCrimeIncidents group by "Summarized Offense Description";
select year, count(*) from SeattleCrimeIncidents group by year;
Crimes for each month?
Month with highest number of bike thefts?
Number of crimes per census tract?
SELECT "census tract 2000" as "CensusTract",count(*) as "crime_count" FROM seattlecrimeincidents group by "census tract 2000" ORDER BY "census tract 2000" ASC;
SELECT max(crimeTable.crime_count) FROM (SELECT "census tract 2000" as "CensusTract",count(*) as "crime_count" FROM seattlecrimeincidents group by "census tract 2000") as crimeTable
Which census tract has the highest crime rate?
To calculate the crime rate we need to have the population of each census tract. This is missing from the SeattleCrimeIncidents Table. However, in the database there is another table called census containing the population per tract.
SELECT * from census LIMIT 10
tract | #crimes/popuation |
---|---|
. | . |
To simplify the join we can create the following two tables:
tract | crime_count |
---|---|
. | . |
SELECT round("census tract 2000"),count(*) FROM seattlecrimeincidents
group by "census tract 2000"
ORDER BY "census tract 2000" ASC;
tract | population |
---|---|
. | . |
SELECT "Census Tract","Total Population, 2010" as population from census_data
ORDER BY "Census Tract" ASC;
Then we can join them where the corresponding tracts are equal.
Hint: be careful about how division is performed in SQL. You might need to use a function which converts integers to floats (you can do this with variable::float
)
Join with where command
Join with a join command
Which tract is which?
Key Points
SQL provides a simple yet powerful set of tools for performing anaylis on your data