Introduction to SQL and Geospatial Data Processing

Scripting database connections in Python/R

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • How do we issue database queries from a Python or R script?

Objectives
  • Learn about the protocols for connecting to a database from scripts

Interfacing with the database from Python:

We will use pandas which is a data analysis toolkit that happens to have some very simple methods for moving data to and from a database. The database connections are handled through pandas using sqlalchemy.

import pandas as pd
from sqlalchemy import create_engine
import sqlalchemy.types as types

Our next step is to pass some database connection information to pandas/sqlalchemy so that we can establish a connection. We create a database “engine” object that is then used in subsequent operations as a portal to/from the database.

db_info = {
    'host': 'someurl.amazonaws.com',
    'user': 'myusername',
    'password': 'mysecretpassword',
    'port': 5432,
    'dbname': 'databasename'
}
sql_str = 'postgresql://{user}:{password}@{host}:{port}/{dbname}'.format
engine = create_engine(sql_str(**db_info))

Now we can issue a query to the database as follows:

df = pd.read_sql('SELECT * FROM seattlecrimeincidents LIMIT 100', engine)
Connecting from a Jupyter Notebook

Within a Jupyter notebook we can actually directly type SQL queries in a cell using an ipython magic. For that we need to install the ipython-sql package:

pip install ipython-sql

To test your installation, you can run all cells of the TestConnection.ipynb notebook. In the second cell you will have to substitute the word password with the real password for the database.



Interfacing with the database from R:

We will use the R package PostgreSQL.

install.packages("PostgreSQL")
library(RPostgreSQL)

We can execute a query in the following way:

drv <- dbDriver("PostgreSQL")
con <- d:Connect(drv, host="someurl.amazonaws.com", user="myusernmae", password="mysecretpassword", dbname="databasename", port="5432")
rs <- dbSendQuery(con, "select * from seattlecrimeincidents limit 100"); 
df <- fetch(rs)



Useful Resources:

Key Points