oracle分析函數(shù)參考手冊(cè)
Oracle從8.1.6開(kāi)始提供分析函數(shù),分析函數(shù)用于計(jì)算基于組的某種聚合值,它和聚合函數(shù)的不同之處是對(duì)于每個(gè)組返回多行,而聚合函數(shù)對(duì)于每個(gè)組只返回一行。
下面例子中使用的表來(lái)自O(shè)racle自帶的HR用戶下的表,如果沒(méi)有安裝該用戶,可以在SYS用戶下運(yùn)行$ORACLE_HOME/demo/schema/human_resources/hr_main.sql來(lái)創(chuàng)建。
除本文內(nèi)容外,你還可參考:
本文如果未指明,缺省是在HR用戶下運(yùn)行例子。
開(kāi)窗函數(shù)的的理解:
開(kāi)窗函數(shù)指定了分析函數(shù)工作的數(shù)據(jù)窗口大小,這個(gè)數(shù)據(jù)窗口大小可能會(huì)隨著行的變化而變化,舉例如下:
over(order by salary) 按照salary排序進(jìn)行累計(jì),order by是個(gè)默認(rèn)的開(kāi)窗函數(shù)
over(partition by deptno)按照部門(mén)分區(qū)
over(order by salary range between 50 preceding and 150 following)
每行對(duì)應(yīng)的數(shù)據(jù)窗口是之前行幅度值不超過(guò)50,之后行幅度值不超過(guò)150
over(order by salary rows between 50 preceding and 150 following)
每行對(duì)應(yīng)的數(shù)據(jù)窗口是之前50行,之后150行
over(order by salary rows between unbounded preceding and unbounded following)
每行對(duì)應(yīng)的數(shù)據(jù)窗口是從第一行到最后一行,等效:
over(order by salary range between unbounded preceding and unbounded following)
AVG
功能描述:用于計(jì)算一個(gè)組和數(shù)據(jù)窗口內(nèi)表達(dá)式的平均值。
SAMPLE:下面的例子中列c_mavg計(jì)算員工表中每個(gè)員工的平均薪水報(bào)告,該平均值由當(dāng)前員工和與之具有相同經(jīng)理的前一個(gè)和后一個(gè)三者的平均數(shù)得來(lái);
SELECT manager_id, last_name, hire_date, salary,
AVG(salary) OVER (PARTITION BY manager_id ORDER BY hire_date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS c_mavg
FROM employees;
MANAGER_ID LAST_NAME HIRE_DATE SALARY C_MAVG
---------- ------------------------- --------- ---------- ----------
100 Kochhar 21-SEP-89 17000 17000
100 De Haan 13-JAN-93 17000 15000
100 Raphaely 07-DEC-94 11000 11966.6667
100 Kaufling 01-MAY-95 7900 10633.3333
100 Hartstein 17-FEB-96 13000 9633.33333
100 Weiss 18-JUL-96 8000 11666.6667
100 Russell 01-OCT-96 14000 11833.3333
CORR
功能描述:返回一對(duì)表達(dá)式的相關(guān)系數(shù),它是如下的縮寫(xiě):
COVAR_POP(expr1,expr2)/STDDEV_POP(expr1)*STDDEV_POP(expr2))
從統(tǒng)計(jì)上講,相關(guān)性是變量之間關(guān)聯(lián)的強(qiáng)度,變量之間的關(guān)聯(lián)意味著在某種程度
上一個(gè)變量的值可由其它的值進(jìn)行預(yù)測(cè)。通過(guò)返回一個(gè)-1~1之間的一個(gè)數(shù), 相關(guān)
系數(shù)給出了關(guān)聯(lián)的強(qiáng)度,0表示不相關(guān)。
SAMPLE:下例返回1998年月銷售收入和月單位銷售的關(guān)系的累積系數(shù)(本例在SH用戶下運(yùn)行)
SELECT t.calendar_month_number,
CORR (SUM(s.amount_sold), SUM(s.quantity_sold))
OVER (ORDER BY t.calendar_month_number) as CUM_CORR
FROM sales s, times t
WHERE s.time_id = t.time_id AND calendar_year = 1998
GROUP BY t.calendar_month_number
ORDER BY t.calendar_month_number;
CALENDAR_MONTH_NUMBER CUM_CORR
--------------------- ----------
1
2 1
3 .994309382
4 .852040875
5 .846652204
6 .871250628
7 .910029803
8 .917556399
9 .920154356
10 .86720251
11 .844864765
12 .903542662
COVAR_POP
功能描述:返回一對(duì)表達(dá)式的總體協(xié)方差。
SAMPLE:下例CUM_COVP返回定價(jià)和最小產(chǎn)品價(jià)格的累積總體協(xié)方差
SELECT product_id, supplier_id,
COVAR_POP(list_price, min_price)
OVER (ORDER BY product_id, supplier_id) AS CUM_COVP,
COVAR_SAMP(list_price, min_price)
OVER (ORDER BY product_id, supplier_id) AS CUM_COVS
FROM product_information p
WHERE category_id = 29
ORDER BY product_id, supplier_id;
PRODUCT_ID SUPPLIER_ID CUM_COVP CUM_COVS
---------- ----------- ---------- ----------
1774 103088 0
1775 103087 1473.25 2946.5
1794 103096 1702.77778 2554.16667
1825 103093 1926.25 2568.33333
2004 103086 1591.4 1989.25
2005 103086 1512.5 1815
2416 103088 1475.97959 1721.97619
.
.
COVAR_SAMP
功能描述:返回一對(duì)表達(dá)式的樣本協(xié)方差
SAMPLE:下例CUM_COVS返回定價(jià)和最小產(chǎn)品價(jià)格的累積樣本協(xié)方差
SELECT product_id, supplier_id,
COVAR_POP(list_price, min_price)
OVER (ORDER BY product_id, supplier_id) AS CUM_COVP,
COVAR_SAMP(list_price, min_price)
OVER (ORDER BY product_id, supplier_id) AS CUM_COVS
FROM product_information p
WHERE category_id = 29
ORDER BY product_id, supplier_id;
PRODUCT_ID SUPPLIER_ID CUM_COVP CUM_COVS
---------- ----------- ---------- ----------
1774 103088 0
1775 103087 1473.25 2946.5
1794 103096 1702.77778 2554.16667
1825 103093 1926.25 2568.33333
2004 103086 1591.4 1989.25
2005 103086 1512.5 1815
2416 103088 1475.97959 1721.97619
.
.
COUNT
功能描述:對(duì)一組內(nèi)發(fā)生的事情進(jìn)行累積計(jì)數(shù),如果指定*或一些非空常數(shù),count將對(duì)所有行計(jì)數(shù),如果指定一個(gè)表達(dá)式,count返回表達(dá)式非空賦值的計(jì)數(shù),當(dāng)有相同值出現(xiàn)時(shí),這些相等的值都會(huì)被納入被計(jì)算的值;可以使用DISTINCT來(lái)記錄去掉一組中完全相同的數(shù)據(jù)后出現(xiàn)的行數(shù)。
SAMPLE:下面例子中計(jì)算每個(gè)員工在按薪水排序中當(dāng)前行附近薪水在[n-50,n+150]之間的行數(shù),n表示當(dāng)前行的薪水
例如,Philtanker的薪水2200,排在他之前的行中薪水大于等于2200-50的有1行,排在他之后的行中薪水小于等于2200+150的行沒(méi)有,所以count計(jì)數(shù)值cnt3為2(包括自己當(dāng)前行);cnt2值相當(dāng)于小于等于當(dāng)前行的SALARY值的所有行數(shù)
SELECT last_name, salary, COUNT(*) OVER () AS cnt1,
COUNT(*) OVER (ORDER BY salary) AS cnt2,
COUNT(*) OVER (ORDER BY salary RANGE BETWEEN 50 PRECEDING
AND 150 FOLLOWING) AS cnt3 FROM employees;
LAST_NAME SALARY CNT1 CNT2 CNT3
------------------------- ---------- ---------- ---------- ----------
Olson 2100 107 1 3
Markle 2200 107 3 2
Philtanker 2200 107 3 2
Landry 2400 107 5 8
Gee 2400 107 5 8
Colmenares 2500 107 11 10
Patel 2500 107 11 10
.
.
CUME_DIST
功能描述:計(jì)算一行在組中的相對(duì)位置,CUME_DIST總是返回大于0、小于或等于1的數(shù),該數(shù)表示該行在N行中的位置。例如,在一個(gè)3行的組中,返回的累計(jì)分布值為1/3、2/3、3/3
SAMPLE:下例中計(jì)算每個(gè)工種的員工按薪水排序依次累積出現(xiàn)的分布百分比
SELECT job_id, last_name, salary, CUME_DIST()
OVER (PARTITION BY job_id ORDER BY salary) AS cume_dist
FROM employees WHERE job_id LIKE 'PU%';
JOB_ID LAST_NAME SALARY CUME_DIST
---------- ------------------------- ---------- ----------
PU_CLERK Colmenares 2500 .2
PU_CLERK Himuro 2600 .4
PU_CLERK Tobias 2800 .6
PU_CLERK Baida 2900 .8
PU_CLERK Khoo 3100 1
PU_MAN Raphaely 11000 1
DENSE_RANK
功能描述:根據(jù)ORDER BY子句中表達(dá)式的值,從查詢返回的每一行,計(jì)算它們與其它行的相對(duì)位置。組內(nèi)的數(shù)據(jù)按ORDER BY子句排序,然后給每一行賦一個(gè)號(hào),從而形成一個(gè)序列,該序列從1開(kāi)始,往后累加。每次ORDER BY表達(dá)式的值發(fā)生變化時(shí),該序列也隨之增加。有同樣值的行得到同樣的數(shù)字序號(hào)(認(rèn)為null時(shí)相等的)。密集的序列返回的時(shí)沒(méi)有間隔的數(shù)
SAMPLE:下例中計(jì)算每個(gè)員工按部門(mén)分區(qū)再按薪水排序,依次出現(xiàn)的序列號(hào)(注意與RANK函數(shù)的區(qū)別)
SELECT d.department_id , e.last_name, e.salary, DENSE_RANK()
OVER (PARTITION BY e.department_id ORDER BY e.salary) as drank
FROM employees e, departments d
WHERE e.department_id = d.department_id
AND d.department_id IN ('60', '90');
DEPARTMENT_ID LAST_NAME SALARY DRANK
------------- ------------------------- ---------- ----------
60 Lorentz 4200 1
60 Austin 4800 2
60 Pataballa 4800 2
60 Ernst 6000 3
60 Hunold 9000 4
90 Kochhar 17000 1
90 De Haan 17000 1
90 King 24000 2
FIRST
功能描述:從DENSE_RANK返回的集合中取出排在最前面的一個(gè)值的行(可能多行,因?yàn)橹悼赡芟嗟龋虼送暾恼Z(yǔ)法需要在開(kāi)始處加上一個(gè)集合函數(shù)以從中取出記錄
SAMPLE:下面例子中DENSE_RANK按部門(mén)分區(qū),再按傭金commission_pct排序,F(xiàn)IRST取出傭金最低的對(duì)應(yīng)的所有行,然后前面的MAX函數(shù)從這個(gè)集合中取出薪水最低的值;LAST取出傭金最高的對(duì)應(yīng)的所有行,然后前面的MIN函數(shù)從這個(gè)集合中取出薪水最高的值
SELECT last_name, department_id, salary,
MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY commission_pct)
OVER (PARTITION BY department_id) "Worst",
MAX(salary) KEEP (DENSE_RANK LAST ORDER BY commission_pct)
OVER (PARTITION BY department_id) "Best"
FROM employees
WHERE department_id in (20,80)
ORDER BY department_id, salary;
LAST_NAME DEPARTMENT_ID SALARY Worst Best
------------------------- ------------- ---------- ---------- ----------
Fay 20 6000 6000 13000
Hartstein 20 13000 6000 13000
Kumar 80 6100 6100 14000
Banda 80 6200 6100 14000
Johnson 80 6200 6100 14000
Ande 80 6400 6100 14000
Lee 80 6800 6100 14000
Tuvault 80 7000 6100 14000
Sewall 80 7000 6100 14000
Marvins 80 7200 6100 14000
Bates 80 7300 6100 14000
.
.
.
FIRST_VALUE
功能描述:返回組中數(shù)據(jù)窗口的第一個(gè)值。
SAMPLE:下面例子計(jì)算按部門(mén)分區(qū)按薪水排序的數(shù)據(jù)窗口的第一個(gè)值對(duì)應(yīng)的名字,如果薪水的第一個(gè)值有多個(gè),則從多個(gè)對(duì)應(yīng)的名字中取缺省排序的第一個(gè)名字
SELECT department_id, last_name, salary, FIRST_VALUE(last_name)
OVER (PARTITION BY department_id ORDER BY salary ASC ) AS lowest_sal
FROM employees
WHERE department_id in(20,30);
DEPARTMENT_ID LAST_NAME SALARY LOWEST_SAL
------------- ------------------------- ---------- --------------
20 Fay 6000 Fay
20 Hartstein 13000 Fay
30 Colmenares 2500 Colmenares
30 Himuro 2600 Colmenares
30 Tobias 2800 Colmenares
30 Baida 2900 Colmenares
30 Khoo 3100 Colmenares
30 Raphaely 11000 Colmenares
LAG
功能描述:可以訪問(wèn)結(jié)果集中的其它行而不用進(jìn)行自連接。它允許去處理游標(biāo),就好像游標(biāo)是一個(gè)數(shù)組一樣。在給定組中可參考當(dāng)前行之前的行,這樣就可以從組中與當(dāng)前行一起選擇以前的行。Offset是一個(gè)正整數(shù),其默認(rèn)值為1,若索引超出窗口的范圍,就返回默認(rèn)值(默認(rèn)返回的是組中第一行),其相反的函數(shù)是LEAD
SAMPLE:下面的例子中列prev_sal返回按hire_date排序的前1行的salary值
SELECT last_name, hire_date, salary,
LAG(salary, 1, 0) OVER (ORDER BY hire_date) AS prev_sal
FROM employees
WHERE job_id = 'PU_CLERK';
LAST_NAME HIRE_DATE SALARY PREV_SAL
------------------------- ---------- ---------- ----------
Khoo 18-5月 -95 3100 0
Tobias 24-7月 -97 2800 3100
Baida 24-12月-97 2900 2800
Himuro 15-11月-98 2600 2900
Colmenares 10-8月 -99 2500 2600
LAST
功能描述:從DENSE_RANK返回的集合中取出排在最后面的一個(gè)值的行(可能多行,因?yàn)橹悼赡芟嗟龋?,因此完整的語(yǔ)法需要在開(kāi)始處加上一個(gè)集合函數(shù)以從中取出記錄
SAMPLE:下面例子中DENSE_RANK按部門(mén)分區(qū),再按傭金commission_pct排序,F(xiàn)IRST取出傭金最低的對(duì)應(yīng)的所有行,然后前面的MAX函數(shù)從這個(gè)集合中取出薪水最低的值;LAST取出傭金最高的對(duì)應(yīng)的所有行,然后前面的MIN函數(shù)從這個(gè)集合中取出薪水最高的值
SELECT last_name, department_id, salary,
MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY commission_pct)
OVER (PARTITION BY department_id) "Worst",
MAX(salary) KEEP (DENSE_RANK LAST ORDER BY commission_pct)
OVER (PARTITION BY department_id) "Best"
FROM employees
WHERE department_id in (20,80)
ORDER BY department_id, salary;
LAST_NAME DEPARTMENT_ID SALARY Worst Best
------------------------- ------------- ---------- ---------- ----------
Fay 20 6000 6000 13000
Hartstein 20 13000 6000 13000
Kumar 80 6100 6100 14000
Banda 80 6200 6100 14000
Johnson 80 6200 6100 14000
Ande 80 6400 6100 14000
Lee 80 6800 6100 14000
Tuvault 80 7000 6100 14000
Sewall 80 7000 6100 14000
Marvins 80 7200 6100 14000
Bates 80 7300 6100 14000
.
LAST_VALUE
功能描述:返回組中數(shù)據(jù)窗口的最后一個(gè)值。
SAMPLE:下面例子計(jì)算按部門(mén)分區(qū)按薪水排序的數(shù)據(jù)窗口的最后一個(gè)值對(duì)應(yīng)的名字,如果薪水的最后一個(gè)值有多個(gè),則從多個(gè)對(duì)應(yīng)的名字中取缺省排序的最后一個(gè)名字
SELECT department_id, last_name, salary, LAST_VALUE(last_name)
OVER(PARTITION BY department_id ORDER BY salary) AS highest_sal
FROM employees
WHERE department_id in(20,30);
DEPARTMENT_ID LAST_NAME SALARY HIGHEST_SAL
------------- ------------------------- ---------- ------------
20 Fay 6000 Fay
20 Hartstein 13000 Hartstein
30 Colmenares 2500 Colmenares
30 Himuro 2600 Himuro
30 Tobias 2800 Tobias
30 Baida 2900 Baida
30 Khoo 3100 Khoo
30 Raphaely 11000 Raphaely
LEAD
功能描述:LEAD與LAG相反,LEAD可以訪問(wèn)組中當(dāng)前行之后的行。Offset是一個(gè)正整數(shù),其默認(rèn)值為1,若索引超出窗口的范圍,就返回默認(rèn)值(默認(rèn)返回的是組中第一行)
SAMPLE:下面的例子中每行的"NextHired"返回按hire_date排序的下一行的hire_date值
SELECT last_name, hire_date,
LEAD(hire_date, 1) OVER (ORDER BY hire_date) AS "NextHired"
FROM employees WHERE department_id = 30;
LAST_NAME HIRE_DATE NextHired
------------------------- --------- ---------
Raphaely 07-DEC-94 18-MAY-95
Khoo 18-MAY-95 24-JUL-97
Tobias 24-JUL-97 24-DEC-97
Baida 24-DEC-97 15-NOV-98
Himuro 15-NOV-98 10-AUG-99
Colmenares 10-AUG-99
MAX
功能描述:在一個(gè)組中的數(shù)據(jù)窗口中查找表達(dá)式的最大值。
SAMPLE:下面例子中dept_max返回當(dāng)前行所在部門(mén)的最大薪水值
SELECT department_id, last_name, salary,
MAX(salary) OVER (PARTITION BY department_id) AS dept_max
FROM employees WHERE department_id in (10,20,30);
DEPARTMENT_ID LAST_NAME SALARY DEPT_MAX
------------- ------------------------- ---------- ----------
10 Whalen 4400 4400
20 Hartstein 13000 13000
20 Fay 6000 13000
30 Raphaely 11000 11000
30 Khoo 3100 11000
30 Baida 2900 11000
30 Tobias 2800 11000
30 Himuro 2600 11000
30 Colmenares 2500 11000
MIN
功能描述:在一個(gè)組中的數(shù)據(jù)窗口中查找表達(dá)式的最小值。
SAMPLE:下面例子中dept_min返回當(dāng)前行所在部門(mén)的最小薪水值
SELECT department_id, last_name, salary,
MIN(salary) OVER (PARTITION BY department_id) AS dept_min
FROM employees WHERE department_id in (10,20,30);
DEPARTMENT_ID LAST_NAME SALARY DEPT_MIN
------------- ------------------------- ---------- ----------
10 Whalen 4400 4400
20 Hartstein 13000 6000
20 Fay 6000 6000
30 Raphaely 11000 2500
30 Khoo 3100 2500
30 Baida 2900 2500
30 Tobias 2800 2500
30 Himuro 2600 2500
30 Colmenares 2500 2500
NTILE
功能描述:將一個(gè)組分為"表達(dá)式"的散列表示,例如,如果表達(dá)式=4,則給組中的每一行分配一個(gè)數(shù)(從1到4),如果組中有20行,則給前5行分配1,給下5行分配2等等。如果組的基數(shù)不能由表達(dá)式值平均分開(kāi),則對(duì)這些行進(jìn)行分配時(shí),組中就沒(méi)有任何percentile的行數(shù)比其它percentile的行數(shù)超過(guò)一行,最低的percentile是那些擁有額外行的percentile。例如,若表達(dá)式=4,行數(shù)=21,則percentile=1的有5行,percentile=2的有5行等等。
SAMPLE:下例中把6行數(shù)據(jù)分為4份
SELECT last_name, salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile FROM employees
WHERE department_id = 100;
LAST_NAME SALARY QUARTILE
------------------------- ---------- ----------
Greenberg 12000 1
Faviet 9000 1
Chen 8200 2
Urman 7800 2
Sciarra 7700 3
Popp 6900 4
PERCENT_RANK
功能描述:和CUME_DIST(累積分配)函數(shù)類似,對(duì)于一個(gè)組中給定的行來(lái)說(shuō),在計(jì)算那行的序號(hào)時(shí),先減1,然后除以n-1(n為組中所有的行數(shù))。該函數(shù)總是返回0~1(包括1)之間的數(shù)。
SAMPLE:下例中如果Khoo的salary為2900,則pr值為0.6,因?yàn)镽ANK函數(shù)對(duì)于等值的返回序列值是一樣的
SELECT department_id, last_name, salary,
PERCENT_RANK()
OVER (PARTITION BY department_id ORDER BY salary) AS pr
FROM employees
WHERE department_id < 50
ORDER BY department_id,salary;
DEPARTMENT_ID LAST_NAME SALARY PR
------------- ------------------------- ---------- ----------
10 Whalen 4400 0
20 Fay 6000 0
20 Hartstein 13000 1
30 Colmenares 2500 0
30 Himuro 2600 0.2
30 Tobias 2800 0.4
30 Baida 2900 0.6
30 Khoo 3100 0.8
30 Raphaely 11000 1
40 Mavris 6500 0
PERCENTILE_CONT
功能描述:返回一個(gè)與輸入的分布百分比值相對(duì)應(yīng)的數(shù)據(jù)值,分布百分比的計(jì)算方法見(jiàn)函數(shù)PERCENT_RANK,如果沒(méi)有正好對(duì)應(yīng)的數(shù)據(jù)值,就通過(guò)下面算法來(lái)得到值:
RN = 1+ (P*(N-1)) 其中P是輸入的分布百分比值,N是組內(nèi)的行數(shù)
CRN = CEIL(RN) FRN = FLOOR(RN)
if (CRN = FRN = RN) then
(value of expression from row at RN)
else
(CRN - RN) * (value of expression for row at FRN) +
(RN - FRN) * (value of expression for row at CRN)
注意:本函數(shù)與PERCENTILE_DISC的區(qū)別在找不到對(duì)應(yīng)的分布值時(shí)返回的替代值的計(jì)算方法不同
SAMPLE:在下例中,對(duì)于部門(mén)60的Percentile_Cont值計(jì)算如下:
P=0.7 N=5 RN =1+ (P*(N-1)=1+(0.7*(5-1))=3.8 CRN = CEIL(3.8)=4
FRN = FLOOR(3.8)=3
(4 - 3.8)* 4800 + (3.8 - 3) * 6000 = 5760
SELECT last_name, salary, department_id,
PERCENTILE_CONT(0.7) WITHIN GROUP (ORDER BY salary)
OVER (PARTITION BY department_id) "Percentile_Cont",
PERCENT_RANK()
OVER (PARTITION BY department_id ORDER BY salary) "Percent_Rank"
FROM employees WHERE department_id IN (30, 60);
LAST_NAME SALARY DEPARTMENT_ID Percentile_Cont Percent_Rank
------------------------- ---------- ------------- --------------- ------------
Colmenares 2500 30 3000 0
Himuro 2600 30 3000 0.2
Tobias 2800 30 3000 0.4
Baida 2900 30 3000 0.6
Khoo 3100 30 3000 0.8
Raphaely 11000 30 3000 1
Lorentz 4200 60 5760 0
Austin 4800 60 5760 0.25
Pataballa 4800 60 5760 0.25
Ernst 6000 60 5760 0.75
Hunold 9000 60 5760 1
PERCENTILE_DISC
功能描述:返回一個(gè)與輸入的分布百分比值相對(duì)應(yīng)的數(shù)據(jù)值,分布百分比的計(jì)算方法見(jiàn)函數(shù)CUME_DIST,如果沒(méi)有正好對(duì)應(yīng)的數(shù)據(jù)值,就取大于該分布值的下一個(gè)值。
注意:本函數(shù)與PERCENTILE_CONT的區(qū)別在找不到對(duì)應(yīng)的分布值時(shí)返回的替代值的計(jì)算方法不同
SAMPLE:下例中0.7的分布值在部門(mén)30中沒(méi)有對(duì)應(yīng)的Cume_Dist值,所以就取下一個(gè)分布值0.83333333所對(duì)應(yīng)的SALARY來(lái)替代
SELECT last_name, salary, department_id,
PERCENTILE_DISC(0.7) WITHIN GROUP (ORDER BY salary )
OVER (PARTITION BY department_id) "Percentile_Disc",
CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) "Cume_Dist"
FROM employees
WHERE department_id in (30, 60);
LAST_NAME SALARY DEPARTMENT_ID Percentile_Disc Cume_Dist
------------------------- ---------- ------------- --------------- ----------
Colmenares 2500 30 3100 .166666667
Himuro 2600 30 3100 .333333333
Tobias 2800 30 3100 .5
Baida 2900 30 3100 .666666667
Khoo 3100 30 3100 .833333333
Raphaely 11000 30 3100 1
Lorentz 4200 60 6000 .2
Austin 4800 60 6000 .6
Pataballa 4800 60 6000 .6
Ernst 6000 60 6000 .8
Hunold 9000 60 6000 1
RANK
功能描述:根據(jù)ORDER BY子句中表達(dá)式的值,從查詢返回的每一行,計(jì)算它們與其它行的相對(duì)位置。組內(nèi)的數(shù)據(jù)按ORDER BY子句排序,然后給每一行賦一個(gè)號(hào),從而形成一個(gè)序列,該序列從1開(kāi)始,往后累加。每次ORDER BY表達(dá)式的值發(fā)生變化時(shí),該序列也隨之增加。有同樣值的行得到同樣的數(shù)字序號(hào)(認(rèn)為null時(shí)相等的)。然而,如果兩行的確得到同樣的排序,則序數(shù)將隨后跳躍。若兩行序數(shù)為1,則沒(méi)有序數(shù)2,序列將給組中的下一行分配值3,DENSE_RANK則沒(méi)有任何跳躍。
SAMPLE:下例中計(jì)算每個(gè)員工按部門(mén)分區(qū)再按薪水排序,依次出現(xiàn)的序列號(hào)(注意與DENSE_RANK函數(shù)的區(qū)別)
SELECT d.department_id , e.last_name, e.salary, RANK()
OVER (PARTITION BY e.department_id ORDER BY e.salary) as drank
FROM employees e, departments d
WHERE e.department_id = d.department_id
AND d.department_id IN ('60', '90');
DEPARTMENT_ID LAST_NAME SALARY DRANK
------------- ------------------------- ---------- ----------
60 Lorentz 4200 1
60 Austin 4800 2
60 Pataballa 4800 2
60 Ernst 6000 4
60 Hunold 9000 5
90 Kochhar 17000 1
90 De Haan 17000 1
90 King 24000 3
RATIO_TO_REPORT
功能描述:該函數(shù)計(jì)算expression/(sum(expression))的值,它給出相對(duì)于總數(shù)的百分比,即當(dāng)前行對(duì)sum(expression)的貢獻(xiàn)。
SAMPLE:下例計(jì)算每個(gè)員工的工資占該類員工總工資的百分比
SELECT last_name, salary, RATIO_TO_REPORT(salary) OVER () AS rr
FROM employees
WHERE job_id = 'PU_CLERK';
LAST_NAME SALARY RR
------------------------- ---------- ----------
Khoo 3100 .223021583
Baida 2900 .208633094
Tobias 2800 .201438849
Himuro 2600 .18705036
Colmenares 2500 .179856115
REGR_ (Linear Regression) Functions
功能描述:這些線性回歸函數(shù)適合最小二乘法回歸線,有9個(gè)不同的回歸函數(shù)可使用。
REGR_SLOPE:返回斜率,等于COVAR_POP(expr1, expr2) / VAR_POP(expr2)
REGR_INTERCEPT:返回回歸線的y截距,等于
AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2)
REGR_COUNT:返回用于填充回歸線的非空數(shù)字對(duì)的數(shù)目
REGR_R2:返回回歸線的決定系數(shù),計(jì)算式為:
If VAR_POP(expr2) = 0 then return NULL
If VAR_POP(expr1) = 0 and VAR_POP(expr2) != 0 then return 1
If VAR_POP(expr1) > 0 and VAR_POP(expr2 != 0 then
return POWER(CORR(expr1,expr),2)
REGR_AVGX:計(jì)算回歸線的自變量(expr2)的平均值,去掉了空對(duì)(expr1, expr2)后,等于AVG(expr2)
REGR_AVGY:計(jì)算回歸線的應(yīng)變量(expr1)的平均值,去掉了空對(duì)(expr1, expr2)后,等于AVG(expr1)
REGR_SXX: 返回值等于REGR_COUNT(expr1, expr2) * VAR_POP(expr2)
REGR_SYY: 返回值等于REGR_COUNT(expr1, expr2) * VAR_POP(expr1)
REGR_SXY: 返回值等于REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2)
(下面的例子都是在SH用戶下完成的)
SAMPLE 1:下例計(jì)算1998年最后三個(gè)星期中兩種產(chǎn)品(260和270)在周末的銷售量中已開(kāi)發(fā)票數(shù)量和總數(shù)量的累積斜率和回歸線的截距
SELECT t.fiscal_month_number "Month", t.day_number_in_month "Day",
REGR_SLOPE(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_SLOPE,
REGR_INTERCEPT(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_ICPT
FROM sales s, times t
WHERE s.time_id = t.time_id
AND s.prod_id IN (270, 260)
AND t.fiscal_year=1998
AND t.fiscal_week_number IN (50, 51, 52)
AND t.day_number_in_week IN (6,7)
ORDER BY t.fiscal_month_desc, t.day_number_in_month;
Month Day CUM_SLOPE CUM_ICPT
---------- ---------- ---------- ----------
12 12 -68 1872
12 12 -68 1872
12 13 -20.244898 1254.36735
12 13 -20.244898 1254.36735
12 19 -18.826087 1287
12 20 62.4561404 125.28655
12 20 62.4561404 125.28655
12 20 62.4561404 125.28655
12 20 62.4561404 125.28655
12 26 67.2658228 58.9712313
12 26 67.2658228 58.9712313
12 27 37.5245541 284.958221
12 27 37.5245541 284.958221
12 27 37.5245541 284.958221
SAMPLE 2:下例計(jì)算1998年4月每天的累積交易數(shù)量
SELECT UNIQUE t.day_number_in_month,
REGR_COUNT(s.amount_sold, s.quantity_sold)
OVER (PARTITION BY t.fiscal_month_number ORDER BY t.day_number_in_month)
"Regr_Count"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.fiscal_year = 1998 AND t.fiscal_month_number = 4;
DAY_NUMBER_IN_MONTH Regr_Count
------------------- ----------
1 825
2 1650
3 2475
4 3300
.
.
.
26 21450
30 22200
SAMPLE 3:下例計(jì)算1998年每月銷售量中已開(kāi)發(fā)票數(shù)量和總數(shù)量的累積回歸線決定系數(shù)
SELECT t.fiscal_month_number,
REGR_R2(SUM(s.amount_sold), SUM(s.quantity_sold))
OVER (ORDER BY t.fiscal_month_number) "Regr_R2"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.fiscal_year = 1998
GROUP BY t.fiscal_month_number
ORDER BY t.fiscal_month_number;
FISCAL_MONTH_NUMBER Regr_R2
------------------- ----------
1
2 1
3 .927372984
4 .807019972
5 .932745567
6 .94682861
7 .965342011
8 .955768075
9 .959542618
10 .938618575
11 .880931415
12 .882769189
SAMPLE 4:下例計(jì)算1998年12月最后兩周產(chǎn)品260的銷售量中已開(kāi)發(fā)票數(shù)量和總數(shù)量的累積平均值
SELECT t.day_number_in_month,
REGR_AVGY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month)
"Regr_AvgY",
REGR_AVGX(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month)
"Regr_AvgX"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND s.prod_id = 260
AND t.fiscal_month_desc = '1998-12'
AND t.fiscal_week_number IN (51, 52)
ORDER BY t.day_number_in_month;
DAY_NUMBER_IN_MONTH Regr_AvgY Regr_AvgX
------------------- ---------- ----------
14 882 24.5
14 882 24.5
15 801 22.25
15 801 22.25
16 777.6 21.6
18 642.857143 17.8571429
18 642.857143 17.8571429
20 589.5 16.375
21 544 15.1111111
22 592.363636 16.4545455
22 592.363636 16.4545455
24 553.846154 15.3846154
24 553.846154 15.3846154
26 522 14.5
27 578.4 16.0666667
SAMPLE 5:下例計(jì)算產(chǎn)品260和270在1998年2月周末銷售量中已開(kāi)發(fā)票數(shù)量和總數(shù)量的累積REGR_SXY, REGR_SXX, and REGR_SYY統(tǒng)計(jì)值
SELECT t.day_number_in_month,
REGR_SXY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxy",
REGR_SYY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_syy",
REGR_SXX(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxx"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND prod_id IN (270, 260)
AND t.fiscal_month_desc = '1998-02'
AND t.day_number_in_week IN (6,7)
ORDER BY t.day_number_in_month;
DAY_NUMBER_IN_MONTH Regr_sxy Regr_syy Regr_sxx
------------------- ---------- ---------- ----------
1 18870.4 2116198.4 258.4
1 18870.4 2116198.4 258.4
1 18870.4 2116198.4 258.4
1 18870.4 2116198.4 258.4
7 18870.4 2116198.4 258.4
8 18870.4 2116198.4 258.4
14 18870.4 2116198.4 258.4
15 18870.4 2116198.4 258.4
21 18870.4 2116198.4 258.4
22 18870.4 2116198.4 258.4
ROW_NUMBER
功能描述:返回有序組中一行的偏移量,從而可用于按特定標(biāo)準(zhǔn)排序的行號(hào)。
SAMPLE:下例返回每個(gè)員工再在每個(gè)部門(mén)中按員工號(hào)排序后的順序號(hào)
SELECT department_id, last_name, employee_id, ROW_NUMBER()
OVER (PARTITION BY department_id ORDER BY employee_id) AS emp_id
FROM employees
WHERE department_id < 50;
DEPARTMENT_ID LAST_NAME EMPLOYEE_ID EMP_ID
------------- ------------------------- ----------- ----------
10 Whalen 200 1
20 Hartstein 201 1
20 Fay 202 2
30 Raphaely 114 1
30 Khoo 115 2
30 Baida 116 3
30 Tobias 117 4
30 Himuro 118 5
30 Colmenares 119 6
40 Mavris 203 1
STDDEV
功能描述:計(jì)算當(dāng)前行關(guān)于組的標(biāo)準(zhǔn)偏離。(Standard Deviation)
SAMPLE:下例返回部門(mén)30按雇傭日期排序的薪水值的累積標(biāo)準(zhǔn)偏離
SELECT last_name, hire_date,salary,
STDDEV(salary) OVER (ORDER BY hire_date) "StdDev"
FROM employees
WHERE department_id = 30;
LAST_NAME HIRE_DATE SALARY StdDev
------------------------- ---------- ---------- ----------
Raphaely 07-12月-94 11000 0
Khoo 18-5月 -95 3100 5586.14357
Tobias 24-7月 -97 2800 4650.0896
Baida 24-12月-97 2900 4035.26125
Himuro 15-11月-98 2600 3649.2465
Colmenares 10-8月 -99 2500 3362.58829
STDDEV_POP
功能描述:該函數(shù)計(jì)算總體標(biāo)準(zhǔn)偏離,并返回總體變量的平方根,其返回值與VAR_POP函數(shù)的平方根相同。(Standard Deviation-Population)
SAMPLE:下例返回部門(mén)20、30、60的薪水值的總體標(biāo)準(zhǔn)偏差
SELECT department_id, last_name, salary,
STDDEV_POP(salary) OVER (PARTITION BY department_id) AS pop_std
FROM employees
WHERE department_id in (20,30,60);
DEPARTMENT_ID LAST_NAME SALARY POP_STD
------------- ------------------------- ---------- ----------
20 Hartstein 13000 3500
20 Fay 6000 3500
30 Raphaely 11000 3069.6091
30 Khoo 3100 3069.6091
30 Baida 2900 3069.6091
30 Colmenares 2500 3069.6091
30 Himuro 2600 3069.6091
30 Tobias 2800 3069.6091
60 Hunold 9000 1722.32401
60 Ernst 6000 1722.32401
60 Austin 4800 1722.32401
60 Pataballa 4800 1722.32401
60 Lorentz 4200 1722.32401
STDDEV_SAMP
功能描述: 該函數(shù)計(jì)算累積樣本標(biāo)準(zhǔn)偏離,并返回總體變量的平方根,其返回值與VAR_POP函數(shù)的平方根相同。(Standard Deviation-Sample)
SAMPLE:下例返回部門(mén)20、30、60的薪水值的樣本標(biāo)準(zhǔn)偏差
SELECT department_id, last_name, hire_date, salary,
STDDEV_SAMP(salary) OVER
(PARTITION BY department_id ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_sdev
FROM employees
WHERE department_id in (20,30,60);
DEPARTMENT_ID LAST_NAME HIRE_DATE SALARY CUM_SDEV
------------- ------------------------- ---------- ---------- ----------
20 Hartstein 17-2月 -96 13000
20 Fay 17-8月 -97 6000 4949.74747
30 Raphaely 07-12月-94 11000
30 Khoo 18-5月 -95 3100 5586.14357
30 Tobias 24-7月 -97 2800 4650.0896
30 Baida 24-12月-97 2900 4035.26125
30 Himuro 15-11月-98 2600 3649.2465
30 Colmenares 10-8月 -99 2500 3362.58829
60 Hunold 03-1月 -90 9000
60 Ernst 21-5月 -91 6000 2121.32034
60 Austin 25-6月 -97 4800 2163.33077
60 Pataballa 05-2月 -98 4800 1982.42276
60 Lorentz 07-2月 -99 4200 1925.61678
SUM
功能描述:該函數(shù)計(jì)算組中表達(dá)式的累積和。
SAMPLE:下例計(jì)算同一經(jīng)理下員工的薪水累積值
SELECT manager_id, last_name, salary,
SUM (salary) OVER (PARTITION BY manager_id ORDER BY salary
RANGE UNBOUNDED PRECEDING) l_csum
FROM employees
WHERE manager_id in (101,103,108);
MANAGER_ID LAST_NAME SALARY L_CSUM
---------- ------------------------- ---------- ----------
101 Whalen 4400 4400
101 Mavris 6500 10900
101 Baer 10000 20900
101 Greenberg 12000 44900
101 Higgins 12000 44900
103 Lorentz 4200 4200
103 Austin 4800 13800
103 Pataballa 4800 13800
103 Ernst 6000 19800
108 Popp 6900 6900
108 Sciarra 7700 14600
108 Urman 7800 22400
108 Chen 8200 30600
108 Faviet 9000 39600
VAR_POP
功能描述:(Variance Population)該函數(shù)返回非空集合的總體變量(忽略null),VAR_POP進(jìn)行如下計(jì)算:
(SUM(expr2) - SUM(expr)2 / COUNT(expr)) / COUNT(expr)
SAMPLE:下例計(jì)算1998年每月銷售的累積總體和樣本變量(本例在SH用戶下運(yùn)行)
SELECT t.calendar_month_desc,
VAR_POP(SUM(s.amount_sold))
OVER (ORDER BY t.calendar_month_desc) "Var_Pop",
VAR_SAMP(SUM(s.amount_sold))
OVER (ORDER BY t.calendar_month_desc) "Var_Samp"
FROM sales s, times t
WHERE s.time_id = t.time_id AND t.calendar_year = 1998
GROUP BY t.calendar_month_desc;
CALENDAR Var_Pop Var_Samp
-------- ---------- ----------
1998-01 0
1998-02 6.1321E+11 1.2264E+12
1998-03 4.7058E+11 7.0587E+11
1998-04 4.6929E+11 6.2572E+11
1998-05 1.5524E+12 1.9405E+12
1998-06 2.3711E+12 2.8453E+12
1998-07 3.7464E+12 4.3708E+12
1998-08 3.7852E+12 4.3260E+12
1998-09 3.5753E+12 4.0222E+12
1998-10 3.4343E+12 3.8159E+12
1998-11 3.4245E+12 3.7669E+12
1998-12 4.8937E+12 5.3386E+12
VAR_SAMP
功能描述:(Variance Sample)該函數(shù)返回非空集合的樣本變量(忽略null),VAR_POP進(jìn)行如下計(jì)算:
(SUM(expr*expr)-SUM(expr)*SUM(expr)/COUNT(expr))/(COUNT(expr)-1)
SAMPLE:下例計(jì)算1998年每月銷售的累積總體和樣本變量
SELECT t.calendar_month_desc,
VAR_POP(SUM(s.amount_sold))
OVER (ORDER BY t.calendar_month_desc) "Var_Pop",
VAR_SAMP(SUM(s.amount_sold))
OVER (ORDER BY t.calendar_month_desc) "Var_Samp"
FROM sales s, times t
WHERE s.time_id = t.time_id AND t.calendar_year = 1998
GROUP BY t.calendar_month_desc;
CALENDAR Var_Pop Var_Samp
-------- ---------- ----------
1998-01 0
1998-02 6.1321E+11 1.2264E+12
1998-03 4.7058E+11 7.0587E+11
1998-04 4.6929E+11 6.2572E+11
1998-05 1.5524E+12 1.9405E+12
1998-06 2.3711E+12 2.8453E+12
1998-07 3.7464E+12 4.3708E+12
1998-08 3.7852E+12 4.3260E+12
1998-09 3.5753E+12 4.0222E+12
1998-10 3.4343E+12 3.8159E+12
1998-11 3.4245E+12 3.7669E+12
1998-12 4.8937E+12 5.3386E+12
VARIANCE
功能描述:該函數(shù)返回表達(dá)式的變量,Oracle計(jì)算該變量如下:
如果表達(dá)式中行數(shù)為1,則返回0
如果表達(dá)式中行數(shù)大于1,則返回VAR_SAMP
SAMPLE:下例返回部門(mén)30按雇傭日期排序的薪水值的累積變化
SELECT last_name, salary, VARIANCE(salary)
OVER (ORDER BY hire_date) "Variance"
FROM employees
WHERE department_id = 30;
LAST_NAME SALARY Variance
------------------------- ---------- ----------
Raphaely 11000 0
Khoo 3100 31205000
Tobias 2800 21623333.3
Baida 2900 16283333.3
Himuro 2600 13317000
Colmenares 2500 11307000
=====================================
連續(xù)求和問(wèn)題:
select name,sum(cnt) over(order by rownum) from t1;