你的位置:首页 > 数据库

[数据库]如何用分析函数找出EMP表中每个部门工资最高的员工


EMP表是Oracle测试账户SCOTT中的一张雇员表,首先,我们来看看emp表的数据

SQL> select * from emp;EMPNO ENAME   JOB       MGR HIREDATE     SAL    COMM   DEPTNO----- ---------- --------- ---------- --------- ---------- ---------- ---------- 7369 SMITH   CLERK      7902 17-DEC-80    800          20 7499 ALLEN   SALESMAN    7698 20-FEB-81    1600    300     30 7521 WARD    SALESMAN    7698 22-FEB-81    1250    500     30 7566 JONES   MANAGER     7839 02-APR-81    2975          20 7654 MARTIN   SALESMAN    7698 28-SEP-81    1250   1400     30 7698 BLAKE   MANAGER     7839 01-MAY-81    2850          30 7782 CLARK   MANAGER     7839 09-JUN-81    2450          10 7788 SCOTT   ANALYST     7566 19-APR-87    3000          20 7839 KING    PRESIDENT      17-NOV-81    5000          10 7844 TURNER   SALESMAN    7698 08-SEP-81    1500     0     30 7876 ADAMS   CLERK      7788 23-MAY-87    1100          20 7900 JAMES   CLERK      7698 03-DEC-81    950          30 7902 FORD    ANALYST     7566 03-DEC-81    3000          20 7934 MILLER   CLERK      7782 23-JAN-82    1300          1014 rows selected.

其中,empno是员工编号,同时也是该表的主键,ename是员工姓名,sal是员工工资,deptno是员工部门。

如何找出每个部门的最高工资的员工信息呢?

常用的方法是关联查询,SQL语句如下:

select emp.deptno,ename,salfrom emp,(select deptno,max(sal)maxsal from emp group by deptno) twhere emp.deptno=t.deptno and emp.sal=t.maxsal;

结果如下:

  DEPTNO ENAME       SAL---------- ---------- ----------    30 BLAKE      2850    20 SCOTT      3000    10 KING       5000    20 FORD       3000

下面我们来看看执行计划:

Execution Plan----------------------------------------------------------Plan hash value: 269884559-----------------------------------------------------------------------------| Id | Operation      | Name  | Rows  | Bytes | Cost (%CPU) | Time   |-----------------------------------------------------------------------------|  0 | SELECT STATEMENT   |    |   3 |  117 |   7 (15)| 00:00:01 ||* 1 | HASH JOIN      |    |   3 |  117 |   7 (15)| 00:00:01 ||  2 |  VIEW        |    |   3 |   78 |   4 (25)| 00:00:01 ||  3 |  HASH GROUP BY   |    |   3 |   21 |   4 (25)| 00:00:01 ||  4 |   TABLE ACCESS FULL| EMP  |   14 |   98 |   3  (0)| 00:00:01 ||  5 |  TABLE ACCESS FULL | EMP  |   14 |  182 |   3  (0)| 00:00:01 |-----------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------  1 - access("EMP"."DEPTNO"="T"."DEPTNO" AND "EMP"."SAL"="T"."MAXSAL")Statistics----------------------------------------------------------   0 recursive calls   0 db block gets   13 consistent gets   0 physical reads   0 redo size  625 bytes sent via SQL*Net to client  419 bytes received via SQL*Net from client   2 SQL*Net roundtrips to/from client   0 sorts (memory)   0 sorts (disk)   4 rows processed

不难看出,该查询针对同一个表走了两次全盘扫描,成本为7,逻辑读为13。

如何对上述查询进行优化呢?在这里,用到分析函数LAST_VALUE,LAST_VALUE返回排序集中的最后一个值。

SELECT deptno,ename,sal,    LAST_VALUE(sal)    OVER(PARTITION BY deptno      ORDER BY sal      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)maxsalFROM emp;

输出结果如下:

  DEPTNO ENAME       SAL   MAXSAL---------- ---------- ---------- ----------    10 MILLER      1300    5000    10 CLARK      2450    5000    10 KING       5000    5000    20 SMITH       800    3000    20 ADAMS      1100    3000    20 JONES      2975    3000    20 SCOTT      3000    3000    20 FORD       3000    3000    30 JAMES       950    2850    30 MARTIN      1250    2850    30 WARD       1250    2850    30 TURNER      1500    2850    30 ALLEN      1600    2850    30 BLAKE      2850    285014 rows selected.

不难看出,sal等于maxsal的行即为每个部门最高工资的员工,下面用嵌套子查询得到目标结果。

SELECT deptno,ename,sal FROM (    SELECT deptno,ename,sal,       LAST_VALUE(sal)       OVER(PARTITION BY deptno          ORDER BY sal          ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)maxsal    FROM emp) WHERE sal=maxsal;

输出结果如下:

  DEPTNO ENAME       SAL---------- ---------- ----------    10 KING       5000    20 SCOTT      3000    20 FORD       3000    30 BLAKE      2850

下面我们来看看该语句的执行计划:

Execution Plan----------------------------------------------------------Plan hash value: 4130734685----------------------------------------------------------------------------| Id | Operation      | Name | Rows | Bytes | Cost (%CPU)| Time    |----------------------------------------------------------------------------|  0 | SELECT STATEMENT   |    |  14 |  644 |   4 (25)| 00:00:01 ||* 1 | VIEW        |    |  14 |  644 |   4 (25)| 00:00:01 ||  2 |  WINDOW SORT    |    |  14 |  182 |   4 (25)| 00:00:01 ||  3 |  TABLE ACCESS FULL | EMP |  14 |  182 |   3  (0)| 00:00:01 |----------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------  1 - filter("SAL"="MAXSAL")Statistics----------------------------------------------------------   0 recursive calls   0 db block gets   6 consistent gets   0 physical reads   0 redo size  619 bytes sent via SQL*Net to client  419 bytes received via SQL*Net from client   2 SQL*Net roundtrips to/from client   1 sorts (memory)   0 sorts (disk)   4 rows processed

可见,引入了分析函数以后,成本和逻辑读都减少了一半。

通过查询的结果,我们可以看出,20号部门有两个人的工资最高,有时候,我们只想得到一个人的信息,如何实现呢?

在这里我们会用到分析函数LAG,具体SQL如下:

SELECT deptno,ename,sal,LAG(sal)OVER(ORDER BY deptno) presal FROM (    SELECT deptno,ename,sal,       LAST_VALUE(sal)       OVER(PARTITION BY deptno       ORDER BY sal       ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)maxsal    FROM emp) WHERE sal=maxsal;

输出结果如下:

  DEPTNO ENAME       SAL   PRESAL---------- ---------- ---------- ----------    10 KING       5000    20 SCOTT      3000    5000    20 FORD       3000    3000    30 BLAKE      2850    3000

剔除sal等于presal的行

SELECT deptno,ename,sal FROM (    SELECT deptno,ename,sal,LAG(sal)OVER(ORDER BY deptno) presal FROM (       SELECT deptno,ename,sal,           LAST_VALUE(sal)           OVER(PARTITION BY deptno           ORDER BY sal           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)maxsal        FROM emp)    WHERE sal=maxsal) WHERE sal <> presal or presal is null;

输出结果如下:

  DEPTNO ENAME       SAL---------- ---------- ----------    10 KING       5000    20 SCOTT      3000    30 BLAKE      2850

总结:

在实际生产环境中,此类应用还是蛮多的,譬如如何查询每个时段耗时最大的工单。当然,通过上述演示,我们也看出了group by函数的局限性。

关于LAST_VALUE和LAG函数的具体应用及说明,可参考Oracle官方文档:

1. LAST_VALUE

2. LAG