跳转至

Lec3 SQL语句

阅读信息

1042 字  10 分钟  本页总访问量 加载中...

DDL

创建表,示例:

SQL
1
2
3
4
5
6
7
CREATE TABLE branch (
    branch_name char(15) not null,  -- char表示固定长度
    branch_city varchar(30)       -- varchar表示变长
    assets      numeric(8,2)
    primary key (branch_name),
    check (assets >= 0)
)

SQL中domain的类型:

  • char(n):定长字符串,长度固定为 n,不足部分通常用空格填充。
  • varchar(n):变长字符串,最大长度为 n,只占实际使用的字符空间。
  • int:整数类型,表示机器允许范围内的整数。
  • smallint:较小范围的整数类型,占用存储空间比 int 更小。
  • numeric(p, d):定点数类型,总共有 p 位数字,其中小数部分有 d 位,常用于精确计算(如金额)。
  • real / double precision:浮点数类型,分别表示单精度和双精度浮点数,精度依赖具体机器实现。
  • float(n):浮点数类型,至少具有 n 位有效数字的精度。
  • NULL 值:所有 domain 类型都允许 NULL,表示“未知”或“缺失”的值;若声明为NOT NULL则不允许NULL。
  • date:日期类型,包含年、月、日,例如date '2007-2-27'
  • time:时间类型,表示一天中的时间(时、分、秒),例如time '11:18:16'time '11:18:16.28'
  • timestamp:时间戳类型,同时包含日期和时间,例如timestamp '2011-3-17 11:18:16.28'

完整性约束:

  • not null:不允许用null填
  • primary key (K):规定主键,默认为not null
  • Check (P):监测P值是否满足某些要求

删除表:DROP TABLE branch2

修改表:

  • 增加列:ALTER TABLE r ADD column_name,自动用null填充
  • 删除列:ALTER TABLE r DROP column_name
  • 修改列的类型:ALTER TABLE r MODIFY (branch_name char(40), assets not null),只要类型兼容即可修改
  • 创建index:CREATE INDEX index_name ON table_name
  • 创建candidate key:CREATE UNIQUE INDEX index_name ON table_name

DML

选择语句

SELECT结构:

SQL
SELECT column_list FROM table_list WHERE predicate
  • 用*可表示所有列。SELECT对应泛化投影,可以对列进行运算操作
  • table_list为不同表的笛卡尔积,而不是自然连接【为什么?】。若要表示自然连接,需要在约束中加上对应属性相等的约束。
  • predicate为约束条件,可用and, or, not, between等词表示

SELECT返回结果不默认去重。在返回的列的前面加distinct关键字则表示去重,即返回的整个元组无重复。e.g.SELECT DISTINCT branch_name, balance FROM branch。在列前加ALL表示允许重复,即默认加ALL。

用LIKE表示模糊匹配,%表示任意多个字符,_表示任意单个字符。e.g.SELECT customer_name FROM customer WHERE customer_name LIKE '%a%'

用||表示返回值的拼接,e.g.SELECT 'name=' || customer_name FROM customer,返回显示格式为name=Abies

用ORDER BY表示对表排序,默认升序排序。如果排序的列有多列,则优先级依次递减。e.g.SELECT branch_name FROM loan ORDER BY customer_name, loan DESC

集合运算

UNION表示并,INTERSECT表示交,EXCEPT表示差

聚合函数

返回时可重命名,e.g.SELECT AVG(balance)avg_bal FROM account

除聚合函数中的属性,其他属性只能出现在GROUP BY中。e.g.SELECT branch_name, AVG(balance)avg_bal FROM account GROUP BY brach_name

distinct可用在COUNT中,表示相同的记录只计数一次。

需要按照聚合函数结果筛选时,用HAVING关键字,不能用WHERE(因为WHRER是一行一行判断,而不是整体判断)。e.g.:

SQL
1
2
3
4
5
-- Find the names of all branches located in city Brooklyn where the average account balance is more than $1,200.
SELECT A.branch_name, avg(balance) FROM account A, branch B
WHERE A.branch_name = B.branch_name and branch_city ='Brooklyn'
GROUP BY A.branch_name
HAVING avg(balance) > 1200

执行顺序:From → where → group (aggregate) → having → select → distinct → order by

null值

null值判断时结果为unknown。CHECK中,unknown也满足要求,只有false为不满足。

【??待补充】

嵌套子查询

E.g.1,用嵌套子查询的in表示UNION:

SQL
SELECT distinct customer_name FROM borrower
WHERE customer_name in (SELECT customer_name FROM depositor)

E.g.2,用嵌套子查询的not in表示EXCEPT:

SQL
SELECT distinct customer_name FROM borrower
WHERE customer_name not in (SELECT customer_name FROM depositor)

加some表示子查询中存在某些值满足;加all表示子查询中的任意值都满足。

自比较时,写成嵌套子查询或自连接。

视图view

  • 创建视图:CREATE VIEW view_name AS ...
  • 使用视图:将view当成一张表操作

Remarks

Q:为了减少冗余性,需要将一张表S拆成两张表S1,S2。但SQL语句写在原先的一张表上,怎么使表的拆分不影响上层的SQL语句?

A:利用视图。按如下流程:

  1. 创建S1,S2
  2. S1,S2中分别插入S中数据
  3. 删除表S
  4. 创建视图S,S为S1,S2的自然连接

派生表:嵌套子查询中,可用AS将子查询返回内容也视为一张表,称为派生表(local view或derived relations);外层查询时,列名为派生表中的属性、WHERE中用派生表的数据判断。派生表必须重命名(即使没有用到)e.g.:

SQL
1
2
3
4
5
6
SELECT branch_name, avg_bal
FROM (SELECT branch_name, avg(balance)
    FROM account
    GROUP BY branch_name)
    AS result (branch_name, avg_bal)
WHERE avg_bal > 500

WITH从句:通过WITH语句创建的派生表只在当前语句有效,而不是全局有效。e.g.:

SQL
WITH branch_total(branch_name, a_bra_total) AS
    SELECT branch_name, sum(balance)
    FROM account
    GROUP BY branch_name
WITH total_avg(value) AS
    SELECT avg(a_bra_total)
    FROM branch_total
SELECT branch_name, a_bra_total
FROM branch_total A, total_avg B
WHERE A.a_bra_total >= B.value

修改数据库

DELETE、UPDATE、INSERT只能对单表操作。要注意先后顺序和外键约束。

DELETE和INSERT中,在同一SQL语句内,除非外层查询的元组变量引入内层查询,否则层查询只进行一次。e.g.,下面的SQL语句中,AVG只在最开始执行一次,而不是每次删除后都执行。

SQL
1
2
3
4
5
DELETE FROM account
WHERE balance < (
    SELECT AVG(balance)
    FROM account
)

UPDATE时可使用case语句:

SQL
1
2
3
4
5
6
UPDATE account
SET balance = CASE
    WHEN balance <= 10000
    THEN balance * 1.05
    ELSE balance * 1.06
END

建立在单个基本表上的视图,且视图的列对应表的列,称为“行列视图”。所有视图中,只有行列视图允许插入(插入仍然需要到底层的物理表上执行)。

事务transaction

事务是一系列数据库操作(查询、更新等)的集合,作为一个逻辑单元整体执行。

Transaction结束方式有两种:

  1. COMMIT WORK:提交事务,所有修改永久生效
  2. ROLLBACK WORK:回滚事务,撤销该事务中的所有修改

JOIN语句

FROM后写多张表,为多表的笛卡尔积;JOIN为多张表的连接

  • JOIN condition:
  • natural:自然连接,共有属性相同时连接
  • ON :按一定条件连接
  • USING (...):括号中列出的属性名相等时连接
  • JOIN type:
  • INNER JOIN:只返回两个表中连接条件匹配的行,不匹配的行直接丢弃。
  • LEFT JOIN:返回左表的全部行,右表只返回匹配的行,右表无匹配时用 NULL 填充。
  • RIGHT JOIN:返回右表的全部行,左表只返回匹配的行,左表无匹配时用 NULL 填充。
  • FULL JOIN:返回左右两个表的全部行,任一侧无匹配时都用 NULL 填充。

自然连接和USING条件中,同名属性只有一列;而用ON表示条件时,同名属性有多列。

示例

输出时字符串操作

CONCAT()表示字符串拼接,LOWER()和UPPER()表示统一大小写

SQL
SELECT CONCAT('Customer name: ', UPPER(customer_name))
FROM customer
  • LENGTH()表示字符串长度
  • SUBSTRING(字符串,起始位置,长度)表示提取子串
  • TRIM()表示去除两端空格
  • LTRIM()表示去除左侧空格
  • RTRIM()表示去除右侧空格
  • REPLACE(字符串,被替换,替换)表示替换字符串
  • INSTR()表示查找子串位置

表示 同时有

e.g.要输出同时有A和B的name:用FROM A表示有A,用嵌套自查询+IN表示有B

SQL
1
2
3
4
5
6
SELECT DISTINCT name
FROM A
WHERE name IN (
    SELECT name
    FROM B
)

同理,可用NOT IN表示有A但没有B

如果输出的和条件中信息不在一个表中,可以用多项IN表示,如

SQL
1
2
3
4
5
6
SELECT name
FROM A
WHERE (need1, need2) IN (
    SELECT need1, need2
    FROM B
)

聚合函数注意

如果有GROUP BY,则SELECT中只能有两类,否则会数量不匹配报错:

  1. GROUP BY对应的属性
  2. 其他属性的聚合函数

最值查询

  1. 子查询得到最值,外层查询用最值筛选,注意内外层中要用某些一致信息连接。e.g.找同一支行中最大存款的账户:
SQL
1
2
3
4
5
6
7
SELECT account_number
FROM account A
WHERE balance >= (
    SELECT MAX(balance)
    FROM account B
    WHERE A.branch_name = B.branch_name
)
  1. 利用SOME/ALL从句,外层遍历每个元组,内层检查是否满足要求。用重命名的方法,对同一个表检查。e.g.
SQL
1
2
3
4
5
6
7
SELECT account_number
FROM account A
WHERE balance >= ALL (
    SELECT balance
    FROM account B
    WHERE A.branch_name = B.branch_name
)

最多一个 问题

谓词UNIQUE用于判断子查询的结果中,是否不存在重复元组。e.g.找出所有在Perriridge银行至多有一个账户的客户:

Text Only
branch(branch-name, branch-city, assets)
account(account-number, branch-name, balance) depositor(customer-name, account-number)
SQL
1
2
3
4
5
6
7
8
9
SELECT customer_name
FROM depositor AS T
WHERE UNIQUE (
    SELECT R.customer_name
    FROM account, depositor AS R
    WHERE T.customer_name = R.customer_name
    AND R.account_number = account.account_number
    AND account.branch_name = 'Perryridge'
);

同理,可用NOT UNIQUE表示“至少两个”问题。

WITH从句

WITH从句和子查询的区别为:

  1. WITH从句必须重命名
  2. WITH从句返回结果的作用范围为整个SQL语句,可在语句中多次使用,而子查询只能用一次

当返回多个属性时,可类似表命名CTE_name(attribute1, attribute2),类似创建临时的VIEW

一个聚合多个输出 问题

Text Only
1
2
3
Student(sno, sname, ssex, sage, sdept)
Enrolled(sno, cno, grade)
Course(cno, cname, credits)

要求:找出至少注册10门课的学生,并输出选择的门数

  1. 直接将Student和Enrolled自然连接,用sname聚合筛选(为什么用sname筛选?因为输出必须为筛选项,要输出sname)
SQL
1
2
3
4
5
SELECT sname, COUNT(cno)
FROM Student
JOIN Enrolled USING (sno)
GROUP BY sname
HAVING COUNT(cno) > 10;
  1. 如果要同时输出sno和sname,可以先统计再连接。注意导出的表必须命名,输出时要指明是哪个表里的信息。
SQL
1
2
3
4
5
6
7
8
SELECT s1.sno, s1.sname, s2.cnt  -- 输出信息
FROM Student s1
JOIN (
    SELECT sno, COUNT(cno) AS cnt -- 选出选课大于10的sno
    FROM Enrolled
    GROUP BY sno
    HAVING COUNT(cno) > 10
) AS s2 ON s1.sno = s2.sno;

表中表 问题

Text Only
employee(id, name, age, gender, salary, boss)

属性boss即boss这个employee元组对应的id。

  1. 找出比上级更年轻、且和上级性别不同的员工。

用重命名的方式,从同一张表中多次筛选。

SQL
1
2
3
4
5
SELECT e2.name
FROM employee AS e1, employee AS e2
WHERE e1.id = e2.boss
    AND e1.age < e2.age
    AND e1.gender != e2.gender
SQL
SELECT id
FROM employee a
WHERE a.age < (
    SELECT b.age
    FROM employee b
    WHERE b.id = a.boss
)
AND a.gender != (
    SELECT b.gender
    FROM emplyee b
    WHERE b.id = a.boss
)
  1. 找出下属数量最多的员工

用聚合操作“反向”找出boss-下属的对应关系,用LIMIT找出最值。

SQL
1
2
3
4
5
SELECT boss
FROM employee
GROUP BY boss
ORDER BY COUNT(id) DESC
LIMIT 1
一个例子

Consider the student database below:

Text Only
1
2
3
student(student-no, student-name, sex, age, dept-name)
course(course-no, course-name, credit)
study(student-no, course-no, score)
  1. Find the names of students who have studied course ‘Database System’ and sort results by ascending score.
SQL
1
2
3
4
5
6
SELECT student_name
FROM student
JOIN study ON student.student_no = study.student_no
JOIN course ON course.course_no = study.course_no
WHERE course.course_name = 'Database System'
ORDER BY study.score
  1. Find the names of students who get the best score in course ‘Database System’.
SQL
1
2
3
4
5
6
7
-- SELECT student_name
-- FROM student
-- JOIN study ON student.student_no = study.student_no
-- JOIN course ON course.course_no = study.course_no
-- WHERE course.course_name = 'Database System'
-- ORDER BY study.score
-- LIMIT 1;  -- 获得最大值的学生可能不止一个,不能只输出一个人
SQL
-- SELECT student_name
-- FROM student
-- JOIN study ON student.student_no = study.student_no
-- JOIN course ON course.course_no = study.course_no
-- WHERE course.course_name = 'Database System'
--     AND study.score >= (
--         SELECT MAX(score)
--         FROM study
--         WHERE course_no = (
--             SELECT course_no
--             FROM course
--             WHERE course_name = 'Database System'
--         )
--     );
SQL
SELECT s.student_name
FROM student AS s
JOIN study AS st ON s.student_no = st.student_no
JOIN course AS c ON c.course_no = st.course_no
WHERE c.course_name = 'Database System'
    AND st.score = (
        SELECT MAX(st2.score)
        FROM study AS st2
        JOIN course AS c2 ON c2.course_no = st2.course_no
        WHERE c2.course_name = 'Database System'
    );
  1. Find the names of courses that have maximum average score.
SQL
-- SELECT course_name  -- 根据编号返回课程名称
-- FROM course
-- WHERE course_no IN (
--     SELECT course_no  -- 返回平均值最大的课程的编号
--     FROM study
--     GROUP BY course_no
--     HAVING AVG(score) >= (
--         SELECT MAX(AVG(score))  -- 返回课程平均值的最大值
--         FROM study
--         GROUP BY course_no
--     )
-- )
SQL
WITH t AS (  -- 返回各门课程平均值
    SELECT course_no, AVG(score) AS avg_score
    FROM study
    GROUP BY course_no
)
SELECT course_name  -- 输出课程名称
FROM course
JOIN t USING (course_no)
WHERE t.avg_score >= (
    SELECT MAX(avg_score)  -- 返回最大平均值
    FROM t
)
又一个例子
Text Only
1
2
3
Student(Sid, N ame, Sex, Age)
Participate(P rojectN ame, Sid, Score)
Project(P rojectN ame, Requirement, Starttime, Endtime)
  1. Find the names of students who are in ‘MiniSQL’ project.
SQL
1
2
3
4
SELECT Name
FROM Student
JOIN Participate USING (Sid)
WHERE Participate.ProjectName = 'MiniSQL'
  1. Find the Sid of students who have not been in any project yet.
SQL
1
2
3
4
5
6
SELECT Sid
FROM Student
WHERE Sid NOT IN (
    SELECT Sid
    FROM Participate
)

更规范的写法(NOT IN在判断null时不返回,可能有问题):

SQL
1
2
3
4
5
6
7
SELECT s.Sid
FROM Student s
WHERE NOT EXISTS (
    SELECT 1
    FROM Participate p
    WHERE p.Sid = s.Sid
)
  1. For each project, find the names of students who have the highest score
SQL
WITH t AS (
    SELECT ProjectName, MAX(Score) AS max_score -- 返回每个项目的最高分数
    FROM Participate
    GROUP BY ProjectName
)
SELECT Participate.ProjectName, Name
FROM Student
JOIN Participate ON Student.Sid = Participate.Sid
JOIN t ON t.ProjectName = Participate.ProjectName
WHERE Participate.Score >= t.max_score
  1. Find Top-2 projects which have the most female participants.
SQL
1
2
3
4
5
6
7
SELECT ProjectName
FROM Participate
JOIN Student USING (Sid)
WHERE Sex = 'female'
GROUP BY ProjectName
ORDER BY COUNT(*) DESC
LIMIT 2