Structured Query Language (SQL)

By David Cochran

SQL is a set based language. It does have some procedural commands, but most work is done with sets, not sequential proceedures. The order of operations may not be what you expected. Records can be returned in any order, unless you specify a sort. While there are several versions, with slightly different commands in some cases, most implementations of SQL use very simular synatax. (see Microsoft SQL Server, Transact SQL, T-SQL)

Relational: SQL is for Relational Databases^, and thus is based on Relational Algebra^. RA is a family of algebra with a well-founded semantics used for modelling the data stored in relational databases, and defining queries on it.

Relational Databases are not the only solution: MongoDB, Hadoop, etc. are useful especially in Big Data.

In SQL, all columns must relate to the record (functional dependencies). Each record in a table has exactly the same fields.

Normalized: SQL databases tend to be normalized^ inorder to avoid the limits of functional dependencies: Database normalization is the process of minimize data redundancy by organizing the fields of a single monolithic table into many tables within the database which are linked by relations between the tables. For example, instead of storing the state name for each record in an address table, the zip code can be used as a link to a separate table which lists states by zip code. The state name is then removed from the address table. If a state changes it's name, far fewer records must be updated, and when the address is viewed with the related state name pulled from the zip/state table, all the addresses will show the new state.

Complete normalization is not always the goal. e.g. Do-normalization can increase performance.

Note: The following examples use the sample database from http://dev.mysql.com/doc/sakila/en/sakila-structure.html

Syntax:

Statements end with a semicolon and can be spread over many lines. A convention is to put all commands in upper case and everything else in lowercase. The most common command is:

The SELECT statement

SELECT fields FROM table WHERE condition SORT BY fields

Multiple tables can be joined.

SELECT t1.field AS field1, t2.field AS t2
FROM table1 AS t1
JOIN table2 AS t2
ON t1.ifield = t2.ifield
;

Alias: Notice that the table name has an alias (table1 is t1) which is used for each field selected. The keyword AS is optional for tables, but not for fields.

Types of JOIN: The JOIN can be one of several types:

The field list can include "column functions" which compute new data on the fly from the table fields. E.g. LEFT(last_name,1) will return the first letter of the last name, assuming there is a last_name field in the table.

No column functions in joins. Ever. Use CTEs to avoid this.

Tables or Views: Tables are the data, Views is a way of looking at one or more tables via SELECT queries. Reuse queries, Restrict access: Can be used to keep the user out of base tables, Encapsulate Business Rules, Get rid of frickin codes!

Example: All the Films the actors are in.

SELECT film.title, act.first_name, act.last_name
/* The table names above are optional since the column names are unique to those tables. I include them to assist in readability and maintainability. The table names in the joins below are not optional since the same column name occurs in both tables.
*/
FROM actor act #Notice the table alias
INNER JOIN film_actor #No alias, this table is just to eliminate a many to many relationship
ON act.actor_id = film_actor.actor_id
INNER JOIN film #No alias because the name is short
ON film.film_id = film_actor.film_id
;

Example: What films does each store have:

SELECT city.city, country.country, title
FROM inventory inv
INNER JOIN store
ON inv.store_id = store.store_id
INNER JOIN address addr
ON store.address_id = addr.address_id
INNER JOIN city
ON addr.city_id = city.city_id
INNER JOIN country
ON city.country_id = country.country_id
INNER JOIN film
ON inv.film_id = film.film_id;

Revised to avoid duplicate listings and show only records in Canadian stores in DESCending order by film title. The DISTICT keyword removes records that are the same as the previous record returned in the given order.

SELECT DISTICT city.city, country.country, title
FROM inventory inv
INNER JOIN store
ON inv.store_id = store.store_id
INNER JOIN address addr
ON store.address_id = addr.address_id
INNER JOIN city
ON addr.city_id = city.city_id
INNER JOIN country
ON city.country_id = country.country_id
INNER JOIN film
ON inv.film_id = film.film_id

WHERE country.country = 'Canada'
ORDER BY film.title DESC

Revised again to return a count of the number of each film available in each town. The GROUP BY keyword causes count to restart when the city, country, or title changes. Right???

SELECT count(film.film_id) as QTY, city.city, country.country, title
FROM inventory inv
INNER JOIN store
ON inv.store_id = store.store_id
INNER JOIN address addr
ON store.address_id = addr.address_id
INNER JOIN city
ON addr.city_id = city.city_id
INNER JOIN country
ON city.country_id = country.country_id
INNER JOIN film
ON inv.film_id = film.film_id
GROUP BY city.city, country.country, title

Aggregations/Functions: Functions to count, compute statistical values, first, last, etc.
Name Description
AVG() Return the average value of the argument
BIT_AND() Return bitwise and
BIT_OR() Return bitwise or
BIT_XOR() Return bitwise xor
COUNT(DISTINCT) Return the count of a number of different values
COUNT() Return a count of the number of rows returned
GROUP_CONCAT() Return a concatenated string. E.g. all the field values from all the selected records are made into one string
MAX() Return the maximum value
MIN() Return the minimum value
STD() Return the population standard deviation
STDDEV_POP() Return the population standard deviation
STDDEV_SAMP() Return the sample standard deviation
STDDEV() Return the population standard deviation
SUM() Return the sum. For use with numerical fields.
VAR_POP() Return the population standard variance
VAR_SAMP() Return the sample variance
VARIANCE() Return the population standard variance

Other Column Functions: String Manipulations Usual LEFT, RIGHT, MID or SUBSTR(ING), LENGTH. Some fun functions are REVERSE, REPLACE, LIKE, SOUNDEX, CONCAT, INSTR

The INSERT Statement

Used to insert data. Biggest problem is making sure foreign key fields are correct. e.g. If one table links via a field called "other_id" and you insert a value that isn't an id in that other table, your insert fails.

INSERT INTO actor
(first_name, last_name, last_update)
VALUES ('Frank', 'Zappa', CURDATE())

CURDATE() is a calculated value that returns the current date. Notice that nothing was specified for an ID field. If there is one (there should be) it will be automatically created. E.g. There should be an ID field in the actor table and it should be set with a default value of  "auto-increment" when new records are added. The first ever record added will have an ID of 1, the second will be 2 and so on. Even if record #2 is deleted, the next record (2nd of 2 in the table) will have and ID of 3.

Note: SELECT INTO is valid in SQL Server, but not MySQL

The UPDATE Statement

Change a row or rows of data

UPDATE actor
SET first_name = 'Fred'
WHERE last_name = 'Zappa';

Can also use IN with the WHERE clause

UPDATE actor
SET first_name = 'Fred'
WHERE last_name IN ('Zappa‘, ‘Jones’, ‘Smith’);

The DELETE Statement

DELETE FROM actor
WHERE last_name = 'Zappa';

Very Important! If you don’t put in a WHERE clause you will erase all the data in the table. Instead, use a SELECT to make sure your WHERE is getting you the records you want to delete, then change the SELECT to a DELETE keeping the same WHERE.

If you want to get rid of a table sometimes it is better to DROP it instead.

CTEs / Common Table Expressions

Sometimes called temporary/temp tables in SQL Server, but are really temporary views. SQL Server temp tables are not CTEs, but created on disk and then used. CTEs use the underlying indexes. Temp tables need their own indexes. There are times when it is appropriate in SQL Server to use temp tables and not CTEs, but not often. Necessary for recursion.

Example:

WITH cte1 AS
(SELECT Field1, Field2, Field3
FROM MyTable
WHERE Field1 = SomeValue)
SELECT cte1.Field1, cte.Field2, cte.Field3, at.Field1, at.Field2
FROM AnotherTable at
INNER JOIN CTE1
ON at.Field1=CTE1.Field1

SubQuery/CTE Another way to instantiate CTEs is the subquery

SELECT cte1.Field1, cte1.Field2, cte1.Field3, at.Field1, at.Field2
FROM AnotherTable at
INNER JOIN (SELECT Field1, Field2, Field3
FROM MyTable
WHERE Field1 = SomeValue)
cte1
ON at.Field1=cte1.Field1

The bold section is the subquery

Using CTEs to get rid of Column Function Joins

This is a terribly inefficient query since the engine must re-compute LEFT(st.Field1,6) for every row of st for each row of at to be compared to. Instead, Create a CTE with a derived function replacing the original column
SELECT st.Field1, st.Field2, st.Field3, at.Field4, at.Field2
FROM AnotherTable at
INNER JOIN SomeTable st
ON at.Field1=LEFT(st.Field1, 6)
WITH cte1 AS
(SELECT LEFT(Field1,6) as Field1, Field2, Field3
FROM MyTable
WHERE Field1 = SomeValue)
SELECT cte1.Field1, cte1.Field2, cte1.Field3, at.Field1, at.Field2
FROM AnotherTable at
INNER JOIN cte1
ON at.Field1=cte1.Field1

Stored Procedures/Functions/Programs

Stored Procedures – “Programs” written in SQL that are stored in the database and called by users with appropriate permission.

Functions – Used just like math functions

MySQL/DB2 use CALL stored_proc_name;

SQL Server uses EXEC(UTE) stored_proc_name;

Deterministic – Given the same input the same output will occur like ABS(-5) will always return 5

Non-Deterministic – The same input may not necessarily generate the same output like CURDATE()

Sample code to create Stored Proc

DELIMITER $$

DROP PROCEDURE IF EXISTS `sakila`.`film_not_in_stock`$$

CREATE DEFINER=`admin`@`%` PROCEDURE `sakila`.`film_not_in_stock`(IN p_film_id INT, IN p_store_id INT, OUT p_film_count INT)
 READS SQL DATA

BEGIN

 SELECT inventory_id
 FROM inventory
 WHERE film_id = p_film_id
 AND store_id = p_store_id
 AND NOT inventory_in_stock(inventory_id);

 SELECT FOUND_ROWS() INTO p_film_count;

END $$

DELIMITER ;

SQL selects for calendar functions

Microsoft's Transact-SQL

See also:

Watch out for SQL injection attacks: