By David Cochran
SQL is a set based language. It does have some procedural commands, but most work is done with sets, not sequential proceedures. The order of operations may not be what you expected. Records can be returned in any order, unless you specify a sort. While there are several versions, with slightly different commands in some cases, most implementations of SQL use very simular synatax. (see Microsoft SQL Server, Transact SQL, T-SQL)
Relational: SQL is for Relational Databases^, and thus is based on Relational Algebra^. RA is a family of algebra with a well-founded semantics used for modelling the data stored in relational databases, and defining queries on it.
Relational Databases are not the only solution: MongoDB, Hadoop, etc. are useful especially in Big Data.
In SQL, all columns must relate to the record (functional dependencies). Each record in a table has exactly the same fields.
Normalized: SQL databases tend to be normalized^ inorder to avoid the limits of functional dependencies: Database normalization is the process of minimize data redundancy by organizing the fields of a single monolithic table into many tables within the database which are linked by relations between the tables. For example, instead of storing the state name for each record in an address table, the zip code can be used as a link to a separate table which lists states by zip code. The state name is then removed from the address table. If a state changes it's name, far fewer records must be updated, and when the address is viewed with the related state name pulled from the zip/state table, all the addresses will show the new state.
Complete normalization is not always the goal. e.g. Do-normalization can increase performance.
Note: The following examples use the sample database from http://dev.mysql.com/doc/sakila/en/sakila-structure.html
Statements end with a semicolon and can be spread over many lines. A convention is to put all commands in upper case and everything else in lowercase. The most common command is:
SELECT fields FROM table WHERE condition SORT BY fields
Multiple tables can be joined.
SELECT t1.field AS field1, t2.field AS t2
FROM table1 AS t1
JOIN table2 AS t2
ON t1.ifield = t2.ifield;
Alias: Notice that the table name has an alias (table1 is t1) which is used for each field selected. The keyword AS is optional for tables, but not for fields.
Types of JOIN: The JOIN can be one of several types:
The field list can include "column functions" which compute new data on the fly from the table fields. E.g. LEFT(last_name,1) will return the first letter of the last name, assuming there is a last_name field in the table.
No column functions in joins. Ever. Use CTEs to avoid this.
Tables or Views: Tables are the data, Views is a way of looking at one or more tables via SELECT queries. Reuse queries, Restrict access: Can be used to keep the user out of base tables, Encapsulate Business Rules, Get rid of frickin codes!
Example: All the Films the actors are in.
SELECT film.title, act.first_name, act.last_name
/* The table names above are optional since the column names are unique
to those tables. I include them to assist in readability and maintainability.
The table names in the joins below are not optional since the same column
name occurs in both tables.
*/
FROM actor act #Notice the table alias
INNER JOIN film_actor #No alias, this table is just to eliminate a many
to many relationship
ON act.actor_id = film_actor.actor_id
INNER JOIN film #No alias because the name is short
ON film.film_id = film_actor.film_id;
Example: What films does each store have:
SELECT city.city, country.country, title
FROM inventory inv
INNER JOIN store
ON inv.store_id = store.store_id
INNER JOIN address addr
ON store.address_id = addr.address_id
INNER JOIN city
ON addr.city_id = city.city_id
INNER JOIN country
ON city.country_id = country.country_id
INNER JOIN film
ON inv.film_id = film.film_id;
Revised to avoid duplicate listings and show only records in Canadian stores in DESCending order by film title. The DISTICT keyword removes records that are the same as the previous record returned in the given order.
SELECT DISTICT city.city, country.country, title
FROM inventory inv
INNER JOIN store
ON inv.store_id = store.store_id
INNER JOIN address addr
ON store.address_id = addr.address_id
INNER JOIN city
ON addr.city_id = city.city_id
INNER JOIN country
ON city.country_id = country.country_id
INNER JOIN film
ON inv.film_id = film.film_id
WHERE country.country = 'Canada'
ORDER BY film.title DESC
Revised again to return a count of the number of each film available in each town. The GROUP BY keyword causes count to restart when the city, country, or title changes. Right???
SELECT count(film.film_id) as QTY, city.city, country.country, title
FROM inventory inv
INNER JOIN store
ON inv.store_id = store.store_id
INNER JOIN address addr
ON store.address_id = addr.address_id
INNER JOIN city
ON addr.city_id = city.city_id
INNER JOIN country
ON city.country_id = country.country_id
INNER JOIN film
ON inv.film_id = film.film_id
GROUP BY city.city, country.country, title
Aggregations/Functions: Functions to count, compute statistical values, first, last, etc.
Name | Description |
AVG() | Return the average value of the argument |
BIT_AND() | Return bitwise and |
BIT_OR() | Return bitwise or |
BIT_XOR() | Return bitwise xor |
COUNT(DISTINCT) | Return the count of a number of different values |
COUNT() | Return a count of the number of rows returned |
GROUP_CONCAT() | Return a concatenated string. E.g. all the field values from all the selected records are made into one string |
MAX() | Return the maximum value |
MIN() | Return the minimum value |
STD() | Return the population standard deviation |
STDDEV_POP() | Return the population standard deviation |
STDDEV_SAMP() | Return the sample standard deviation |
STDDEV() | Return the population standard deviation |
SUM() | Return the sum. For use with numerical fields. |
VAR_POP() | Return the population standard variance |
VAR_SAMP() | Return the sample variance |
VARIANCE() | Return the population standard variance |
Other Column Functions: String Manipulations Usual LEFT, RIGHT, MID or SUBSTR(ING), LENGTH. Some fun functions are REVERSE, REPLACE, LIKE, SOUNDEX, CONCAT, INSTR
Used to insert data. Biggest problem is making sure foreign key fields are correct. e.g. If one table links via a field called "other_id" and you insert a value that isn't an id in that other table, your insert fails.
INSERT INTO actor
(first_name, last_name, last_update)
VALUES ('Frank', 'Zappa', CURDATE())
CURDATE() is a calculated value that returns the current date. Notice that nothing was specified for an ID field. If there is one (there should be) it will be automatically created. E.g. There should be an ID field in the actor table and it should be set with a default value of "auto-increment" when new records are added. The first ever record added will have an ID of 1, the second will be 2 and so on. Even if record #2 is deleted, the next record (2nd of 2 in the table) will have and ID of 3.
Note: SELECT INTO is valid in SQL Server, but not MySQL
Change a row or rows of data
UPDATE actor
SET first_name = 'Fred'
WHERE last_name = 'Zappa';
Can also use IN with the WHERE clause
UPDATE actor
SET first_name = 'Fred'
WHERE last_name IN ('Zappa, Jones, Smith);
DELETE FROM actor
WHERE last_name = 'Zappa';
Very Important! If you dont put in a WHERE clause you will erase all the data in the table. Instead, use a SELECT to make sure your WHERE is getting you the records you want to delete, then change the SELECT to a DELETE keeping the same WHERE.
If you want to get rid of a table sometimes it is better to DROP it instead.
Sometimes called temporary/temp tables in SQL Server, but are really temporary views. SQL Server temp tables are not CTEs, but created on disk and then used. CTEs use the underlying indexes. Temp tables need their own indexes. There are times when it is appropriate in SQL Server to use temp tables and not CTEs, but not often. Necessary for recursion.
Example:
WITH cte1 AS
(SELECT Field1, Field2, Field3
FROM MyTable
WHERE Field1 = SomeValue)
SELECT cte1.Field1, cte.Field2, cte.Field3, at.Field1, at.Field2
FROM AnotherTable at
INNER JOIN CTE1
ON at.Field1=CTE1.Field1
SubQuery/CTE Another way to instantiate CTEs is the subquery
SELECT cte1.Field1, cte1.Field2, cte1.Field3, at.Field1, at.Field2
FROM AnotherTable at
INNER JOIN (SELECT Field1, Field2, Field3
FROM MyTable
WHERE Field1 = SomeValue) cte1
ON at.Field1=cte1.Field1
The bold section is the subquery
This is a terribly inefficient query since the engine must re-compute LEFT(st.Field1,6) for every row of st for each row of at to be compared to. | Instead, Create a CTE with a derived function replacing the original column |
SELECT st.Field1, st.Field2, st.Field3, at.Field4,
at.Field2 FROM AnotherTable at INNER JOIN SomeTable st ON at.Field1=LEFT(st.Field1, 6) |
WITH cte1 AS (SELECT LEFT(Field1,6) as Field1, Field2, Field3 FROM MyTable WHERE Field1 = SomeValue) SELECT cte1.Field1, cte1.Field2, cte1.Field3, at.Field1, at.Field2 FROM AnotherTable at INNER JOIN cte1 ON at.Field1=cte1.Field1 |
Stored Procedures Programs written in SQL that are stored in the database and called by users with appropriate permission.
Functions Used just like math functions
MySQL/DB2 use CALL stored_proc_name;
SQL Server uses EXEC(UTE) stored_proc_name;
Deterministic Given the same input the same output will occur like ABS(-5) will always return 5
Non-Deterministic The same input may not necessarily generate the same output like CURDATE()
DELIMITER $$
DROP PROCEDURE IF EXISTS `sakila`.`film_not_in_stock`$$
CREATE DEFINER=`admin`@`%` PROCEDURE `sakila`.`film_not_in_stock`(IN
p_film_id INT, IN p_store_id INT, OUT p_film_count INT)
READS SQL DATA
BEGIN
SELECT inventory_id
FROM inventory
WHERE film_id = p_film_id
AND store_id = p_store_id
AND NOT inventory_in_stock(inventory_id);
SELECT FOUND_ROWS() INTO p_film_count;
END $$
DELIMITER ;
SQL selects for calendar functions
Microsoft's Transact-SQL
See also:
Watch out for SQL injection attacks: