HiveQL詳解

來(lái)源：九壹網(wǎng)

HiveQL是一種類(lèi)似SQL的語(yǔ)言, 它與大部分的SQL語(yǔ)法兼容, 但是并不完全支持SQL標(biāo)準(zhǔn), 如HiveQL不支持更新操作, 也不支持索引和事務(wù), 它的子查詢和join操作也很局限, 這是因其底層依賴于Hadoop云平臺(tái)這一特性決定的, 但其有些特點(diǎn)是SQL所無(wú)法企及的。例如多表查詢、支持create table as select和集成MapReduce腳本等, 本節(jié)主要介紹Hive的數(shù)據(jù)類(lèi)型和常用的HiveQL操作。

1.hive client命令
a.hive命令參數(shù)

-e: 命令行sql語(yǔ)句
-f: SQL文件
-h, --help: 幫助
--hiveconf: 指定配置文件
-i: 初始化文件
-S, --silent: 靜態(tài)模式（不將錯(cuò)誤輸出）
-v,--verbose: 詳細(xì)模式

b.交互模式

hive> show tables; #查看所有表名
hive> show tables  'ad*'  #查看以'ad'開(kāi)頭的表名
hive> set 命令 #設(shè)置變量與查看變量；
hive> set -v #查看所有的變量
hive> set hive.stats.atomic #查看hive.stats.atomic變量
hive> set hive.stats.atomic=false #設(shè)置hive.stats.atomic變量
hive> dfs  -ls #查看hadoop所有文件路徑
hive> dfs  -ls /user/hive/warehouse/ #查看hive所有文件
hive> dfs  -ls /user/hive/warehouse/ptest #查看ptest文件
hive> source file <filepath> #在client里執(zhí)行一個(gè)hive腳本文件
hive> quit #退出交互式shell
hive> exit #退出交互式shell
hive> reset #重置配置為默認(rèn)值
hive> !ls #從Hive shell執(zhí)行一個(gè)shell命令

2.操作及函數(shù)

查看函數(shù)：
hive> show  functions;   
正則查看函數(shù)名：
show  functions  'xpath.*';  
查看具體函數(shù)內(nèi)容：
describe function xpath; | desc function  xpath;

3.字段類(lèi)型
Hive支持基本數(shù)據(jù)類(lèi)型和復(fù)雜類(lèi)型, 基本數(shù)據(jù)類(lèi)型主要有數(shù)值類(lèi)型(INT、FLOAT、DOUBLE)、布爾型和字符串, 復(fù)雜類(lèi)型有三種:ARRAY、MAP 和 STRUCT。
a.基本數(shù)據(jù)類(lèi)型
TINYINT: 1個(gè)字節(jié)
SMALLINT: 2個(gè)字節(jié)
INT: 4個(gè)字節(jié)? ?
BIGINT: 8個(gè)字節(jié)
BOOLEAN: TRUE/FALSE ?
FLOAT: 4個(gè)字節(jié)，單精度浮點(diǎn)型
DOUBLE: 8個(gè)字節(jié)，雙精度浮點(diǎn)型STRING?????? 字符串

b.復(fù)雜數(shù)據(jù)類(lèi)型
ARRAY: 有序字段
MAP: 無(wú)序字段
STRUCT: 一組命名的字段

4.表類(lèi)型
hive表大致分為普通表、外部表、分區(qū)表三種。
a.普通表

創(chuàng)建表
hive> create table tb_person(id int, name string);

創(chuàng)建表并創(chuàng)建分區(qū)字段ds
hive> create table tb_stu(id int, name string) partitioned by(ds string);

查看分區(qū)
hive> show  partitions tb_stu;

顯示所有表
hive> show tables;

按正則表達(dá)式顯示表，
hive> show tables 'tb_*';

表添加一列
hive> alter table tb_person add columns (new_col int);

添加一列并增加列字段注釋
hive> alter table tb_stu add columns (new_col2 int comment 'a comment');

更改表名
hive> alter table tb_stu rename to tb_stu;

刪除表(hive只能刪分區(qū)，不能刪記錄或列 )
hive> drop table tb_stu;

對(duì)于托管表, drop 操作會(huì)把元數(shù)據(jù)和數(shù)據(jù)文件刪除掉, 對(duì)于外部表, 只是刪除元數(shù)據(jù)。如果只要?jiǎng)h除表中的數(shù)據(jù), 保留表名可以在 HDFS 上刪除數(shù)據(jù)文件:
hive> dfs –rmr /user/hive/warehouse/mutill1/*

將本地/home/hadoop/ziliao/stu.txt文件中的數(shù)據(jù)加載到表中, stu.txt文件數(shù)據(jù)如下：
1 zhangsan
2 lisi
3 wangwu

將文件中的數(shù)據(jù)加載到表中

hive> load data local inpath '/home/hadoop/ziliao/stu.txt' overwrite into table tb_person;

加載本地?cái)?shù)據(jù)，同時(shí)給定分區(qū)信息

hive> load data local inpath '/home/hadoop/ziliao/stu.txt' overwrite into table tb_stu partition (ds='2008-08-15');

hive> dfs -ls /user/hive/warehouse/tb_stu
hive> dfs -ls /user/hive/warehouse/tb_person

b.外部表
external關(guān)鍵字可以讓用戶創(chuàng)建一個(gè)外部表，在建表的同時(shí)指定一個(gè)指向?qū)嶋H數(shù)據(jù)的路徑(location)，hive創(chuàng)建內(nèi)部表時(shí)，會(huì)將數(shù)據(jù)移動(dòng)到數(shù)據(jù)倉(cāng)庫(kù)指向的路徑；若創(chuàng)建外部表，僅記錄數(shù)據(jù)所在的路徑，不對(duì)數(shù)據(jù)的位置做任何改變。在刪除表的時(shí)候，內(nèi)部表的元數(shù)據(jù)和數(shù)據(jù)會(huì)被一起刪除，而外部表只刪除元數(shù)據(jù)，不刪除數(shù)據(jù)。
eg. 創(chuàng)建外部表：

create external table tb_record(col1 string, col2 string) row format delimited fields terminated by '\t' location '/user/hadoop/input';

這樣表tb_record的數(shù)據(jù)就是hdfs://user/hadoop/input/* 的數(shù)據(jù)了。

c.分區(qū)表
分區(qū)是表的部分列的集合, 可以為頻繁使用的數(shù)據(jù)建立分區(qū), 這樣查找分區(qū)中的數(shù)據(jù)時(shí)就不需要掃描全表, 這對(duì)于提高查找效率很有幫助。
創(chuàng)建分區(qū)：create table log(ts bigint,line string) partitioned by(name string);
插入分區(qū)：insert overwrite table log partition(name='xiapi') select id from userinfo where name='xiapi';
查看分區(qū)：show? partitions log;
刪除分區(qū): alter table ptest drop partition (name='xiapi')
備注:通常情況下需要先預(yù)先創(chuàng)建好分區(qū),然后才能使用該分區(qū)。還有分區(qū)列的值要轉(zhuǎn)化為文件夾的存儲(chǔ)路徑,所以如果分區(qū)列的值中包含特殊值,如 '%', ':', '/', '#',它將會(huì)被使用%加上 2 字節(jié)的 ASCII 碼進(jìn)行轉(zhuǎn)義。

5. sql操作及桶
1). 創(chuàng)建表
首先建立三張測(cè)試表:
userinfo表中有兩列,以tab鍵分割,分別存儲(chǔ)用戶的id和名字name;
classinfo表中有兩列,以tab鍵分割,分別存儲(chǔ)課程老師teacher和課程名classname;
choice表中有兩列,以tab鍵分割,分別存儲(chǔ)用戶的userid和選課名稱(chēng)classname(類(lèi)似中間表)。

創(chuàng)建測(cè)試表:

hive> create table userinfo(id int,name string) row format delimited fields terminated by '\t';
hive> create table classinfo(teacher string,classname string) row format delimited fields terminated by '\t';
hive> create table choice(userid int,classname string) row format delimited fields terminated by '\t';

注意：'\t'相當(dāng)于一個(gè)tab鍵盤(pán)。
顯示剛才創(chuàng)建的數(shù)據(jù)表:
hive> show tables;

2). 導(dǎo)入數(shù)據(jù)
建表后,可以從本地文件系統(tǒng)或 HDFS 中導(dǎo)入數(shù)據(jù)文件,導(dǎo)入數(shù)據(jù)樣例如下:
userinfo.txt內(nèi)容如下(數(shù)據(jù)之間用tab鍵隔開(kāi))：
1?? ?xiapi
2?? ?xiaoxue
3?? ?qingqing

classinfo.txt內(nèi)容如下(數(shù)據(jù)之間用tab鍵隔開(kāi))：
jack?? ?math
sam?? ?china
lucy?? ?english

choice.txt內(nèi)容如下(數(shù)據(jù)之間用tab鍵隔開(kāi))：
1?? ?math
1?? ?china
1?? ?english
2?? ?china
2?? ?english
3?? ?english
首先在本地“/home/hadoop/ziliao”下按照上面建立三個(gè)文件, 并添加如上的內(nèi)容信息。

3. 按照下面導(dǎo)入數(shù)據(jù)。

hive> load data local inpath '/home/hadoop/ziliao/userinfo.txt' overwrite into table userinfo;
hive> load data local inpath '/home/hadoop/ziliao/classinfo.txt' overwrite into table classinfo;
hive> load data local inpath '/home/hadoop/ziliao/choice.txt' overwrite into table choice;

查詢表數(shù)據(jù)

hive> select * from userinfo;
hive> select * from classinfo;
hive> select * from choice;

4. 分區(qū)

a.創(chuàng)建分區(qū)
hive> create table ptest(userid int) partitioned by (name string) row format delimited fields terminated by '\t';
b.準(zhǔn)備導(dǎo)入數(shù)據(jù)
xiapi.txt內(nèi)容如下(數(shù)據(jù)之間用tab鍵隔開(kāi))：
1    
c.導(dǎo)入數(shù)據(jù)
hive> load data local inpath '/home/hadoop/ziliao/xiapi.txt' overwrite into table ptest partition (name='xiapi');
d.查看分區(qū)
hive> dfs -ls /user/hive/warehouse/ptest/name=xiapi;
e.查詢分區(qū)
hive> select * from ptest where name='xiapi';
f.顯示分區(qū)
hive> show partitions ptest;
g.對(duì)分區(qū)插入數(shù)據(jù)(每次都會(huì)覆蓋掉原來(lái)的數(shù)據(jù)):
hive> insert overwrite table ptest partition(name='xiapi') select id from userinfo where name='xiapi';
h.刪除分區(qū)
hive> alter table ptest drop partition (name='xiapi')

5.桶
可以把表或分區(qū)組織成桶, 桶是按行分開(kāi)組織特定字段, 每個(gè)桶對(duì)應(yīng)一個(gè) reduce 操作。在建立桶之前, 需要設(shè)置“hive.enforce.bucketing”屬性為 true, 使 Hive 能夠識(shí)別桶。在表中分桶的操作如下:

hive> set hive.enforce.bucketing=true;
hive> set hive.enforce.bucketing;
hive.enforce.bucketing=true;
hive> create table btest2(id int, name string) clustered by(id) into 3 buckets row format delimited fields terminated by '\t';

hive>dfs -cat /user/hive/warehouse/btest2/*0_0;
hive>dfs -cat /user/hive/warehouse/btest2/*1_0;
hive>dfs -cat /user/hive/warehouse/btest2/*2_0;

分桶可以獲得比分區(qū)更高的查詢效率,同時(shí)分桶也便于對(duì)全部數(shù)據(jù)進(jìn)行采樣處理。下面是對(duì)桶取樣的操作。
hive>select * from btest2 tablesample(bucket 1 out of 3 on id);

6. 多表插入
多表插入指的是在同一條語(yǔ)句中, 把讀取的同一份元數(shù)據(jù)插入到不同的表中。只需要掃描一遍元數(shù)據(jù)即可完成所有表的插入操作, 效率很高。多表操作示例如下。

hive> create table mutill as select id,name from userinfo; #有數(shù)據(jù)
hive> create table mutil2 like mutill; #無(wú)數(shù)據(jù)，只有表結(jié)構(gòu)
hive> from userinfo insert overwrite table mutill
      select id,name insert overwrite table mutil2 select count(distinct id),name group by name;

7.? 連接
連接是將兩個(gè)表中在共同數(shù)據(jù)項(xiàng)上相互匹配的那些行合并起來(lái), HiveQL 的連接分為內(nèi)連接、左向外連接、右向外連接、全外連接和半連接 5 種。

a. 內(nèi)連接(等值連接)
內(nèi)連接使用比較運(yùn)算符根據(jù)每個(gè)表共有的列的值匹配兩個(gè)表中的行。
例如, 檢索userinfo和choice表中標(biāo)識(shí)號(hào)相同的所有行。

hive> select userinfo.*, choice.* from userinfo join choice on(userinfo.id=choice.userid);

b. 左連接
左連接的結(jié)果集包括“LEFT OUTER”子句中指定的左表的所有行, 而不僅僅是連接列所匹配的行。如果左表的某行在右表中沒(méi)有匹配行, 則在相關(guān)聯(lián)的結(jié)果集中右表的所有選擇列均為空值。

hive> select userinfo.*, choice.* from userinfo left outer join choice on(userinfo.id=choice.userid);

c. 右連接
右連接是左向外連接的反向連接,將返回右表的所有行。如果右表的某行在左表中沒(méi)有匹配行,則將為左表返回空值。

hive> select userinfo.*, choice.* from userinfo right outer join choice on(userinfo.id=choice.userid);

d. 全連接
全連接返回左表和右表中的所有行。當(dāng)某行在另一表中沒(méi)有匹配行時(shí),則另一個(gè)表的選擇列表包含空值。如果表之間有匹配行,則整個(gè)結(jié)果集包含基表的數(shù)據(jù)值。

hive> select userinfo.*, choice.* from userinfo full outer join choice on(userinfo.id=choice.userid);

e. 半連接
半連接是 Hive 所特有的, Hive 不支持 IN 操作,但是擁有替代的方案; left semi join, 稱(chēng)為半連接, 需要注意的是連接的表不能在查詢的列中,只能出現(xiàn)在 on 子句中。

hive> select userinfo.* from userinfo left semi join choice on (userinfo.id=choice.userid);

8. 子查詢
標(biāo)準(zhǔn) SQL 的子查詢支持嵌套的 select 子句,HiveQL 對(duì)子查詢的支持很有限,只能在from 引導(dǎo)的子句中出現(xiàn)子查詢。如下語(yǔ)句在 from 子句中嵌套了一個(gè)子查詢(實(shí)現(xiàn)了對(duì)教課最多的老師的查詢)。

hive>select teacher,MAX(class_num)
         from (select teacher, count(classname) as class_num from classinfo group by teacher)  subq
         group by teacher;

hive> create view teacher_classsum as select teacher, count(classname)  from classinfo group by teacher;

刪除視圖：

hive>drop view teacher_classnum;

10. 函數(shù)
創(chuàng)建函數(shù)

hive> create temporary function function_name as class_name

該語(yǔ)句創(chuàng)建一個(gè)由類(lèi)名實(shí)現(xiàn)的函數(shù)。在 Hive 中用戶可以使用 Hive 類(lèi)路徑中的任何類(lèi),用戶通過(guò)執(zhí)行 add files 語(yǔ)句將函數(shù)類(lèi)添加到類(lèi)路徑,并且可持續(xù)使用該函數(shù)進(jìn)行操作。
刪除函數(shù)
注銷(xiāo)用戶定義函數(shù)的格式如下:

hive> drop temporary function function_name;

因篇幅問(wèn)題不能全部顯示，請(qǐng)點(diǎn)此查看更多更全內(nèi)容

查看全文

成熟丰满熟妇高潮XXXXX,人妻无码AV中文系列久久兔费 ,国产精品一国产精品,国精品午夜福利视频不卡麻豆

HiveQL詳解