您的当前位置:首页正文

Hive-DDL数据定义(表-2)表之分区表

2024-06-23 来源:步旅网
Hive-DDL数据定义(表-2)表之分区表

DDL数据定义(表-2)表之分区表

⼀、初识分区表

MapReduce的分区是分的⽂件,⽽Hive分的是⽂件夹。

分区表实际上就是对应⼀个HDFS⽂件系统上的独⽴的⽂件夹,该⽂件夹下是该分区所有的数据⽂件。Hive中的分区就是分⽬录,把⼀个⼤的数据集根据业务需要分割成⼩的数据集。在查询时通过WHERE⼦句中的表达式选择查询所需要的指定的分区,这样的查询效率会提⾼很多。

⼆、分区表基本操作

1.引⼊分区表(需要根据⽇期对⽇志进⾏管理)

/user/hive/warehouse/log_partition/20180702/20180702.log/user/hive/warehouse/log_partition/20180703/20180703.log/user/hive/warehouse/log_partition/20180704/20180704.log

2.创建分区表语法

hive (default)> create table dept_partition( deptno int, dname string, loc string )

partitioned by (month string)

row format delimited fields terminated by '\';************************************************************************hive (default)> create external table if not exists dept( > deptno int, > dname string, > loc int > )

> row format delimited fields terminated by '\'

3.加载数据到分区表中

hive (default)> load data local inpath '/opt/module/hiveTest/dept.txt' into table default.dept_partition partition(month='201909');hive (default)> load data local inpath '/opt/module/hiveTest/dept.txt' into table default.dept_partition partition(month='201910');hive (default)> load data local inpath '/opt/module/hiveTest/dept.txt' into table default.dept_partition partition(month='201911');

4.查询分区表中数据

1)单分区查询

hive (default)> select * from dept_partition where month='201909;2)多分区联合查询

hive (default)> select * from dept_partition where month='201909' union

select * from dept_partition where month='201910' union

select * from dept_partition where month='201911';查询结果:OK

_u3.deptno _u3.dname _u3.loc _u3.month10 ACCOUNTING 1700 20190910 ACCOUNTING 1700 20191010 ACCOUNTING 1700 20191120 RESEARCH 1800 20190920 RESEARCH 1800 20191020 RESEARCH 1800 20191130 SALES 1900 20190930 SALES 1900 20191030 SALES 1900 20191140 OPERATIONS 1700 20190940 OPERATIONS 1700 20191040 OPERATIONS 1700 201911

Time taken: 134.762 seconds, Fetched: 12 row(s)

5.增加分区

1)创建单个分区

hive (default)> alter table dept_partition add partition(month='201912') ;2)同时创建多个分区(分区之间⽤空格隔开)

hive(default)> alter table dept_partition add partition(month='201913') partition(month='201914');注:增加多个分区之间⽤空格\" \"隔开,删除多个分区⽤\隔开

6.删除分区

1)删除单个分区

hive (default)> alter table dept_partition drop partition (month='201912');2)同时删除多个分区(分区间之间⽤,隔开)

hive (default)> alter table dept_partition drop partition (month='201913'), partition (month='201914');

7.查看分区表有多少分区

hive>show partitions dept_partition;

8.查看分区表结构(month字段永远排在最后)

hive>desc formatted dept_partition;结果:

OK

col_name data_type commentdeptno int dname string location int deptdesc string

month string

# Partition Information

# col_name data_type comment

month string

Time taken: 0.116 seconds, Fetched: 10 row(s)

三、分区表注意事项

1.创建⼆级分区表

hive (default)> create table dept_partition2( deptno int, dname string, loc string )

partitioned by (month string, day string) row format delimited fields terminated by '\';

2.正常的加载数据

1)加载数据到⼆级分区表中

hive (default)> load data local inpath '/opt/module/hiveTest/dept.txt' into table default.dept_partition2 partition(month='201809', day='13');

2)查询分区数据

hive (default)> select * from dept_partition2 where month='201809' and day='13';

3.把数据直接上传到分区⽬录上,让分区表和数据产⽣关联的三种⽅式

1)⽅式⼀:上传数据后修复上传数据

hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201809/day=12;

hive (default)> dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201809/day=12;查询数据(查询不到刚上传的数据)

hive (default)> select * from dept_partition2 where month='201809' and day='12';执⾏修复命令

hive>msck repair table dept_partition2;再次查询数据

hive (default)> select * from dept_partition2 where month='201809' and day='12';2)⽅式⼆:上传数据后添加分区上传数据

hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201809/day=11;

hive (default)> dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201809/day=11;执⾏添加分区

hive (default)> alter table dept_partition2 add partition(month='201809', day='11');查询数据

hive (default)> select * from dept_partition2 where month='201809' and day='11';

3)⽅式三:上传数据后load数据到分区

创建⽬录

hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201809/day=10;上传数据

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201809',day='10');查询数据

hive (default)> select * from dept_partition2 where month='201809' and day='10';

因篇幅问题不能全部显示,请点此查看更多更全内容