在Liunx上安装Hive以及如何与Hadoop集成和将Hive的元数据存储到MySQL里,今天散仙就来看下,如何在Eclipse里通过JDBC的方式操作Hive.
我们都知道Hive是一个类SQL的框架,支持HSQL语法操作Hive,而Hive内部,会转成一个个MapReduce作业来完成具体的数据统计,虽然我们可以直接在Hive的shell里,向Hive发起命令,但这样做受限制比较多,如果我们能把它的操作结合在编程里,这样以来我们的Hive就会变得非常灵活了。
Hive是支持JDBC操作的,所以我们就可以像操作MySQL一样,在JAVA代码里,操作Hive,进行数据统计。
下面详细看下,操作步骤:
软件环境
序号 | 说明 | 系统 | 1 | centos6.5安装hadoop2.2.0 | linux | 2 | centos6.5安装Hive0.13 | linux | 3 | Eclipse4.2 | Windows7 |
序号 | 步骤 | 说明 | 1 | hadoop2.2.0安装,启动 | Hive依赖Hadoop环境 | 2 | hive安装 | 类SQL方式操作MapReduce | 3 | 启动hiveserver2 | 远程操作Hive的服务端程序 | 4 | 在win上新建一个java项目,并导入Hive所需jar包 | 远程操作必需步骤 | 5 | 在eclipse里编码,测试 | 测试连接hive是否成功 | 6 | 在hiveserver2端查看 | 检查是否对接成功和任务打印日志 | 7 | 在hadoop的8088端口上查看MR执行任务 | 查看MR执行调度 |
一些HIVE操作语句:
导入数据到一个表中:
LOAD DATA LOCAL INPATH '/home/search/abc1.txt' OVERWRITE INTO TABLE info;
show tables;//显示当前的所有的表
desc talbeName;查看当前表的字段结构
show databases;//查看所有的已有的数据库
建表语句
create table mytt (name string ,count int) row format delimited fields terminated by '#' stored as textfile ;
jar包,截图
Hive依赖Hadoop,因此客户端最好把hadoop的jar包夜引入项目中,下面是调用源码,运行前,确定你在服务端的hiversver2已经开启。
- package com.test;
- import java.sql.Connection;
- import java.sql.DriverManager;
- import java.sql.ResultSet;
- import java.sql.Statement;
- import org.apache.hadoop.conf.Configuration;
- /**
- * 在Win7上,使用JDBC操作Hive
- * @author qindongliang
- *
- * 大数据技术交流群:376932160
- * **/
- public class HiveJDBClient {
- /**Hive的驱动字符串*/
- private static String driver="org.apache.hive.jdbc.HiveDriver";
- public static void main(String[] args) throws Exception{
- //加载Hive驱动
- Class.forName(driver);
- //获取hive2的jdbc连接,注意默认的数据库是default
- Connection conn=DriverManager.getConnection("jdbc:hive2://192.168.46.32/default", "search", "dongliang");
- Statement st=conn.createStatement();
- String tableName="mytt";//表名
- ResultSet rs=st.executeQuery("select avg(count) from "+tableName+" ");//求平均数,会转成MapReduce作业运行
- //ResultSet rs=st.executeQuery("select * from "+tableName+" ");//查询所有,直接运行
- while(rs.next()){
- System.out.println(rs.getString(1)+" ");
- }
- System.out.println("成功!");
- st.close();
- conn.close();
- }
- }
package com.test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; import org.apache.hadoop.conf.Configuration; /** * 在Win7上,使用JDBC操作Hive * @author qindongliang * * 大数据技术交流群:376932160 * **/ public class HiveJDBClient { /**Hive的驱动字符串*/ private static String driver="org.apache.hive.jdbc.HiveDriver"; public static void main(String[] args) throws Exception{ //加载Hive驱动 Class.forName(driver); //获取hive2的jdbc连接,注意默认的数据库是default Connection conn=DriverManager.getConnection("jdbc:hive2://192.168.46.32/default", "search", "dongliang"); Statement st=conn.createStatement(); String tableName="mytt";//表名 ResultSet rs=st.executeQuery("select avg(count) from "+tableName+" ");//求平均数,会转成MapReduce作业运行 //ResultSet rs=st.executeQuery("select * from "+tableName+" ");//查询所有,直接运行 while(rs.next()){ System.out.println(rs.getString(1)+" "); } System.out.println("成功!"); st.close(); conn.close(); } }
结果如下:
- 48.6
- 成功!
48.6 成功!
Hive的hiveserver2 端log打印日志:
- [search@h1 bin]$ ./hiveserver2
- Starting HiveServer2
- 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
- 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
- 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
- 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
- 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
- 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
- 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
- 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
- 14/08/05 04:00:02 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
- OK
- OK
- Total jobs = 1
- Launching Job 1 out of 1
- Number of reduce tasks determined at compile time: 1
- In order to change the average load for a reducer (in bytes):
- set hive.exec.reducers.bytes.per.reducer=<number>
- In order to limit the maximum number of reducers:
- set hive.exec.reducers.max=<number>
- In order to set a constant number of reducers:
- set mapreduce.job.reduces=<number>
- Starting Job = job_1407179651448_0001, Tracking URL = http://h1:8088/proxy/application_1407179651448_0001/
- Kill Command = /home/search/hadoop/bin/hadoop job -kill job_1407179651448_0001
- Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
- 2014-08-05 04:03:49,951 Stage-1 map = 0%, reduce = 0%
- 2014-08-05 04:04:19,118 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.74 sec
- 2014-08-05 04:04:30,860 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.7 sec
- MapReduce Total cumulative CPU time: 3 seconds 700 msec
- Ended Job = job_1407179651448_0001
- MapReduce Jobs Launched:
- Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.7 sec HDFS Read: 253 HDFS Write: 5 SUCCESS
- Total MapReduce CPU Time Spent: 3 seconds 700 msec
- OK
[search@h1 bin]$ ./hiveserver2 Starting HiveServer2 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/08/05 04:00:02 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/08/05 04:00:02 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. OK OK Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1407179651448_0001, Tracking URL = http://h1:8088/proxy/application_1407179651448_0001/ Kill Command = /home/search/hadoop/bin/hadoop job -kill job_1407179651448_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-08-05 04:03:49,951 Stage-1 map = 0%, reduce = 0% 2014-08-05 04:04:19,118 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.74 sec 2014-08-05 04:04:30,860 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.7 sec MapReduce Total cumulative CPU time: 3 seconds 700 msec Ended Job = job_1407179651448_0001 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.7 sec HDFS Read: 253 HDFS Write: 5 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 700 msec OK
hadoop的8088界面截图如下:
下面这条SQL语句,不会转成MapReduce执行,select * from mytt limit 3;
结果如下:
- 中国
- 美国
- 中国
- 成功!
中国 美国 中国 成功!
至此,我们的JDBC调用Hive已经成功运行,我们可以在客户端执行,一些建表,建库,查询等操作,但是有一点需要注意的是,如果在win上对Hive的表,执行数据导入表的操作,那么一定确保你的数据是在linux上的,导入的路径也是linux路径,不能直接把win下面的数据,给导入到linux上的hive表里面
相关推荐
记录我的学习之旅,每份文档倾心倾力,带我成我大牛,回头观望满脸笑意,望大家多多给予意见,有问题或错误,请联系 我将及时改正;借鉴文章标明出处,谢谢
Linux环境下Hive的安装部署
Linux环境下Hive的安装部署,基础篇
hive的Linux安装包,hive0.13.1的Linux下的tgz包,可用于集群安装
所有Hadoop的子项目,如Hive, Pig,和HBase 支持Linux的操作系统...在Hive安装之前,Java必须在系统上已经安装。使用下面的命令来验证是否已经安装Java。(注:本文档整理自:易百教程 - http://www.yiibai.com/hive/)
本文档详细介绍了在linux环境下安装 hive的具体步骤,并有多张插图
在Hive 0.11.0版本之前,只有HiveServer服务可用,你得在程序操作Hive之前,必须在Hive安装的服务器上打开HiveServer服务,如下: 1 [wyp@localhost/home/q/hive-0.11.0]$ bin/hive --service hiveserver -p10002 2 ...
Hive_64bit_linux_centos,编译过的hive,64位。官网上下载的是32位。编译过程挺复杂的。建议下载!
Linux下的MySQL和Hive安装包,详细配置请看本人博客安装说明https://blog.csdn.net/qq_38705144/article/details/111731445
linux上在centos系统中进行hadoop完全分布式集群安装,在hadoop安装完成后的基础之上安装元数据库Mariadb,配置完成后安装hive,zookeeper等。
在虚拟机Linux系统上面安装部署hive后,如果需要在Windows系统上的IntelliJ IDEA 2020.3.3中连接使用Linux系统上的hive需要用到的驱动程序。包含:hive-exec-2.3.9.jar 、hive-jdbc-2.3.9.jar、hive-service-2.3.9....
1.1 下载并解压 1.2 配置环境变量 1.3 修改配置 1.4 拷贝数据库驱动 1.5 初始化元数据库 1.6 启动 2.2 启动hiveserver2 1
Hive 安装(1.2.1Linux 下 Mysql 数据hive 的运行及访问方式
在LINUX虚拟机中搭建 HADOOP+HIVE大数据平台,完善伪分布搭建手册 。Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速...
hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供完整的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。...下载资源包可以直接在linux系统下压缩使用安装hive
基于Hadoop的hive数据仓库的配置详细指南,linux环境下
在 Tez 上运行 Apache Hive 的 Docker 镜像此存储库包含一个 docker 文件,用于构建 docker 映像以在 Tez 上运行 Apache Hive。 这个 docker 文件依赖于我的其他包含和 基础镜像的存储库。当前版本Apache Hive(主干...
第 2 章 Hive 安装 2.1 Hive 安装地址 Hive 官网地址 http://hive.apache.org/ ...(1)把 apache-hive-1.2.1-bin.tar.gz 上传到 linux 的/opt/software 目录下 (2)解压 apache-hive-1.2.1-bin.tar.gz 到/
Linux 环境下 Hive 的安装部署 Hive CLI 和 Beeline 命令行的基本使用 Hive 常用 DDL 操作 Hive 分区表和分桶表 Hive 视图和索引 Hive 常用 DML 操作 Hive 数据查询详解 Spark、Storm、Flink、HBase、Kafka、...