open-courses
  • 公开课笔记
  • CMU 15-445/645 Database Systems
    • Relational Data Model
    • Advanced SQL
    • Database Storage
    • Buffer Pools
    • Hash Tables
    • Tree Indexes
    • Index Concurrency Control
    • Query Processing
    • Sorting&Aggregations
    • Join Algorithms
    • Query Optimization
    • Parallel Execution
    • Embedded Database Logic
    • Concurrency Control Theory
    • Two Phase Locking
    • Timestamp Ordering Concurrency Control
    • Multi-Version Concurrency Control
    • Logging Schemes
    • Database Recovery
    • Introduction to Distributed Databases
    • Distributed OLTP Databases
    • Distributed OLAP Databases
  • UCB - CS162
    • OS intro
    • Introduction to the Process
    • Processes, Fork, I/O, Files
    • I/O Continued, Sockets, Networking
    • Concurrency: Processes & Threads
    • Cooperating Threads, Synchronization
    • Semaphores, Condition Variables, Readers/Writers
    • Scheduling
    • Resource Contention & Deadlock
    • Address Translation, Caching
    • File System (18,19,20)
    • Distributed Systems, Networking, TCP/IP, RPC (21,22)
    • Distributed Storage, Key-Value Stores, Security (23)
    • Security & Cloud Computing (24)
    • Topic: Ensuring Data Reaches Disk
  • MIT - 6.006
    • Sequence and Set Interface
    • Data Structure for Dynamic Sequence Interface
    • Computation Complexity
    • Algorithms and Computation
    • Structure Of Computation
    • Graph & Search
    • Tree & Search
    • Weighted Shortest Paths
    • String Matching, Karp-Rabin
    • Priority Queue Interface & Implementation
    • Dictionary Problem & Implementation
    • Sorting
    • Dynamic Programming
    • Backtracking
    • Self-Balancing Tree
  • MIT - 6.824
    • 2PC & 3PC
    • Introduction and MapReduce
    • RPC and Threads
    • Primary/Backup Replication
    • Lab: Primary/Backup Key/Value Service
    • Google File System (GFS)
    • Raft
    • Lab: Raft - Leader Election
    • Lab: Raft - Log Replication
  • Stanford-CS107
    • 原始数据类型及相互转化
    • 指鹿为马
    • 泛型函数
    • 泛型栈
    • 运行时内存结构
    • 从 C 到汇编
    • 函数的活动记录
    • C 与 C++ 代码生成
    • 编译的预处理过程
    • 编译的链接过程
    • 函数的活动记录续、并发
    • 从顺序到并发和并行
    • 信号量与多线程 1
    • 信号量与多线程 2
    • 复杂多线程问题
    • 函数式编程 - Scheme 1
    • 函数式编程 - Scheme 2
    • 函数式编程 - Scheme 3
    • 函数式编程 - Scheme 4
    • 函数式编程 - Scheme 5
    • Python 基础
  • MIT - 6.001 - SICP
    • 什么是程序
    • 程序抽象
    • 替代模型
    • 时间/空间复杂度
    • 数据抽象
    • 高阶函数
    • Symbol
    • 数据驱动编程与防御式编程
    • 数据抽象中的效率与可读性
    • 数据修改
    • 环境模型
    • 面向对象-消息传递
    • 面向对象 - Scheme 实现
    • 构建 Scheme 解释器
    • Eval-Apply Loop
    • Normal Order (Lazy) Evaluation
    • 通用机
    • 寄存器机器
    • 子程序、栈与递归
    • 在寄存器机器中执行
    • 内存管理
  • MIT - 6.046
    • Randomized Algorithms
    • Skip Lists
  • System Design
    • Twitter
    • Cache Consistency & Coherence
  • DDIA 笔记
    • Replication
    • Transactions
    • The Trouble with Distributed Systems
    • Consistency & Consensus
  • Papers We Love
    • Consistent Hashing and Random Trees (1997)
    • Dynamic Hash Tables (1988)
    • LFU Implementation With O(1) Complexity (2010)
    • Time, Clocks, and the Ordering of Events in a Distributed System (1978)
    • Dapper, a Large-Scale Distributed Systems Tracing Infrastructure (2010)
    • Gorilla: A Fast, Scalable, In-Memory Time Series Database (2015)
  • Release It 笔记
    • Anti-patterns & Patterns in Microservice Architecture
  • Database Design
    • Log Structured Merge (LSM) Tree & Usages in KV Stores
    • Prometheus
Powered by GitBook
On this page
  • 回顾
  • SQL 的历史
  • SQL 主要特性介绍
  • 概要
  • Aggregates
  • Group By
  • Having
  • Output Redirection
  • Output Control
  • Nested Queries
  • Common Table Expressions
  • String Operations
  • Date/Time Operations
  • Window Functions
  • 参考
  1. CMU 15-445/645 Database Systems

Advanced SQL

本节介绍 SQL 的部分复杂用法

PreviousRelational Data ModelNextDatabase Storage

Last updated 6 years ago

回顾

在 Relational Model 下构建查询语句的方式分为两种:Procedural 和 Non-Procedural。第一节课中已经介绍了 Relational Algebra,它属于 Procedural 类型,而本节将介绍的 SQL 属于 Non-Procedural 类型。使用 SQL 构建查询时,用户只需要指定它们想要的数据,而不需要关心数据获取的方式,DBMS 负责理解用户的查询语义,选择最优的方式构建查询计划。

SQL 的历史

  • ”SEQUAL" from IBM's System R prototype

    • Structured English Query Language

    • Adopted by Oracle in the 1970s

  • IBM releases DB2 in 1983

  • ANSI Standard in 1986. ISO in 1987

    • Structured Query Language

当前 SQL 的标准是 SQL 2016,而目前大部分 DBMSs 至少支持 SQL-92 标准,具体的系统对比信息可以到查询。

SQL 主要特性介绍

SQL 基于

概要

  • Aggregations + Group By

  • String / Date / Time Operations

  • Output Control + Redirection

  • Nested Queries

  • Common Table Expressions

  • Window Functions

本节使用的示例数据库如下所示:

student(sid, name, login, gpa)

sid

name

login

age

gpa

53666

Kanye

kayne@cs

39

4.0

53668

Bieber

jbieber@cs

22

3.9

enrolled(sid, cid, grade)

sid

cid

grade

53666

15-445

C

53688

15-721

A

course(cid, name)

cid

name

15-445

Database Systems

15-721

Advanced Database Systems

Aggregates

Aggregates 通常返回一个值,它们包括:

  • AVG(col)

  • MIN(col)

  • MAX(col)

  • SUM(col)

  • COUNT(col)

举例如下:

count.sql
SELECT COUNT(login) AS cnt
  FROM student WHERE login LIKE '%@cs';

SELECT COUNT(*) AS cnt
  FROM student WHERE login LIKE '%@cs';

SELECT COUNT(1) AS cnt
  FROM student WHERE login LIKE '%@cs';
multiple.sql
SELECT AVG(gpa), COUNT(sid)
  FROM student WHERE login LIKE '%@cs';
distinct.sql
SELECT COUNT(DISTINCT login)
  FROM student WHERE login LIKE '%@cs';

aggregate 与其它通常的查询列不可以混用,比如:

mix.sql
SELECT AVG(s.gpa), e.cid
  FROM enrolled AS e, student AS s
  WHERE e.sid = s.sid;

不同 DBMSs 的输出结果不一样,严格模式下,DBMS 应该抛错。

Group By

group by 就是把记录按某种方式分成多组,对每组记录分别做 aggregates 操作,如求每门课所有学生的 GPA 平均值:

SELECT AVG(s.gpa), e.cid
  FROM enrolled AS e, student AS s
 WHERE e.sid = s.sid
 GROUP BY e.cid;

所有非 aggregates 操作的字段,都必须出现在 group by 语句,如下面示例中的 e.cid 和 s.name:

SELECT AVG(s.gpa), e.cid, s.name
  FROM enrolled AS e, student AS s
 WHERE e.sid = s.sid
 GROUP BY e.cid, s.name;

Having

基于 aggregation 结果的过滤条件不能写在 WHERE 中,而应放在 HAVING 中,如:

SELECT AVG(s.gpa) AS avg_gpa, e.cid
  FROM enrolled AS e, student AS s
 WHERE e.sid = s.sid
 GROUP BY e.cid
HAVING avg_gpa > 3.9;

Output Redirection

将查询结果储存到另一张表上:

  • 该表必须是已经存在的表

  • 该表的列数,以及每列的数据类型必须相同

SELECT DISTINCT cid INTO CourseIds
  FROM enrolled;

INSERT INTO CourseIds (
    SELECT DISTINCT cid FROM enrolled
);
CREATE TABLE CourseIds (
    SELECT DISTINCT cid FROM enrolled
);

Output Control

Order By

语法:ORDER BY <column*> [ASC|DESC]

SELECT sid, grade FROM enrolled
 WHERE cid = '15-721'
 ORDER BY grade;

按多个字段分别排序:

SELECT sid FROM enrolled
 WHERE cid = '15-721'
 ORDER BY grade DESC, sid ASC;

Limit

语法:LIMIT <count> [offset]

SELECT sid, name FROM student
 WHERE login LIKE '%@cs'
 LIMIT 10;

SELECT sid, name FROM student
 WHERE login LIKE '%@cs'
 LIMIT 20 OFFSET 10;

Nested Queries

nested queries 包含 inner queries 和 outer queries,前者可以出现在 query 的任何位置,且 inner queries 可以引用 outer queries 中的表信息。

例 1:获取所有参与 '15-445' 这门课所有学生的姓名:

SELECT name FROM student
 WHERE sid IN (
   SELECT sid FROM enrolled
    WHERE cid = '15-445'
 );

SELECT (SELECT S.name FROM student AS S
         WHERE S.sid = E.sid) AS sname
  FROM enrolled AS E
 WHERE cid = '15-445';

语法中支持的谓词包括:

  • ALL: 所有 inner queries 返回的记录都必须满足条件

  • ANY:任意 inner queries 返回的记录满足条件即可

  • IN:与 ANY 等价

  • EXISTS:inner queries 返回的表不为空

SELECT name FROM student
 WHERE sid ANY (
   SELECT sid FROM enrolled
    WHERE cid = '15-445'
 )

例 2:找到至少参与一门课程的所有学生中,id 最大的

SELECT sid, name FROM student
 WHERE sid >= ALL (
   SELECT sid FROM enrolled
 );

SELECT sid, name FROM student
 WHERE sid IN (
   SELECT MAX(sid) FROM enrolled
 );

SELECT sid, name FROM student
 WHERE sid IN (
   SELECT sid FROM enrolled
    ORDER BY sid DESC LIMIT 1
 );

例 3:找到所有没有学生参与的课程

SELECT * FROM course
 WHERE NOT EXISTS (
   SELECT * FROM enrolled
    WHERE course.cid = enrolled.cid
 );

nested queries 比较难被优化(具体原因暂不知道)

Common Table Expressions

在一些复杂查询中,创建一些中间表能够使得这些查询逻辑更加清晰:

WITH cteName AS (
    SELECT 1
)
SELECT * FROM cteName

WITH cteName (col1, col2) AS (
    SELECT 1, 2
)
SELECT col1 + col2 FROM cteName

WITH
    cteName1(col) AS (SELECT 1),
    cteName2(col) AS (SELECT 2)
SELECT C1.col + C2.col FROM cteName1 AS C1, cteName2 AS C2;

例 1:找到所有参与课程的学生中 id 最大的

WITH cteSource(maxId) AS (
    SELECT MAX(sid) FROM enrolled
)
SELECT name FROM student, cteSource
 WHERE student.sid = cteSource.maxId

例 2:打印 1-10

WITH RECURSIVE cteSource (counter) AS (
    (SELECT 1)
    UNION ALL
    (SELECT counter + 1 FROM cteSource
      WHERE counter < 10)
)
SELECT * FROM cteSource;

String Operations

DBMS

String Case

String Quotes

SQL-92

Sensitive

Single Only

Postgres

Sensitive

Single Only

MySQL

InSensitive

Single/Double

SQLite

Sensitive

Single/Double

Oracle

Sensitive

Single Only

如在 condition 中判断两个字符串忽略大小写后是否相等:

string-equal.sql
/* SQL-92 */
WHERE UPPER(name) = UPPER('KaNyE')

/* MySQL */
WHERE name = "KaNyE"

String Matching

SELECT * FROM enrolled AS e
 WHERE e.cid LIKE '15-%';

SELECT * FROM student AS s
 WHERE s.login LIKE '%@c_';

String Operations

SQL-92 定义了一些 string 函数,如

SELECT SUBSTRING(name, 0, 5) AS abbrv_name
  FROM student WHERE sid = 53688;

SELECT * FROM student AS s
 WHERE UPPER(e.name) LIKE 'KAN%';

不同 DBMS 有不同的 string 函数(没有完全遵从 SQL-92 标准),如连接两个 strings

concatenation.sql
/* SQL-92 */
SELECT name FROM student
 WHERE login = LOWER(name) || '@cs';

/* MySQL */
SELECT name FROM student
 WHERE login = LOWER(name) + '@cs';
SELECT name FROM student
 WHERE login = CONCAT(LOWER(name), '@cs')

Date/Time Operations

不同的 DBMS 中的定义和用法不一样,具体见各 DBMS 的文档。

Window Functions

主要用于在一组记录中,对每一条记录进行一些计算,如:

例 1:

SELECT *, ROW_NUMBER() OVER () AS row_num
  FROM enrolled;

会得到类似下表:

例 2:

SELECT cid, sid,
       ROW_NUMBER() OVER (PARTITION BY cid)
  FROM enrolled
 ORDER BY cid;

可是得到类似下表:

例 3:找到每门课获得最高分的学生

SELECT * FROM (
  SELECT *,
         RANK() OVER (PARTITION BY cid ORDER BY grade ASC) AS rank
    FROM enrolled
  ) AS ranking
WHERE ranking.rank = 1

参考

,

这里
slide
video
window functions - example1
window functions - example 2