程序员

Hadoop实验——Hive的安装和实验

实验目的

  1. 理解Hive在Hadoop体系结构中的角色。
  2. 熟悉Hive的DDL命令与DML操作。
  3. 区分数据仓库和数据库的概念。

实验平台

  • 操作系统:Ubuntu-16.04
  • Hadoop版本:2.6.0
  • JDK版本:1.8
  • IDE:Eclipse
  • Hive版本:1.2.3

实验内容和要求

Hive的安装(安装前开启hadoop和mysql服务):

  1. 把 Hive 压缩包放到 Home 文件夹中

  2. 右键打开终端,解压 Hive 到/usr/local
    sudo tar zxvf apache-hive-0.13.0-bin.tar.gz -C /usr/local

  3. 重命名方便后续操作
    sudo mv /usr/local/apache-hive-0.13.0-bin/ /usr/local/hive/

  4. 获取文件夹权限(tiny改为你的主机名)
    sudo chown -R tiny /usr/local/hive/

  5. 将MySQL驱动程序复制到/usr/local/hive/lib目录下
    cp mysql-connector-java-5.1.39-bin.jar /usr/local/hive/lib
  6. 设置环境变量
    sudo vim /etc/profile
    • 在最后一行添加内容:
      #set hive path
      export HIVE_HOME=/usr/local/hive
      export PATH=$HIVE_HOME/bin:$PATH
  7. 使环境变量生效
    source /etc/profile
  8. 配置Hive的配置文件
    • 进入/usr/local/hive/conf/
      cd /usr/local/hive/conf
    • 复制hive-env.sh.template,改名为hive-env.sh
      cp hive-env.sh.template hive-env.sh

    • 编辑内容
      vim hive-env.sh

    • 新建配置文件
      vim hive-site.xml
    • 添加内容并填好mysql用户名和密码:
      
      
      
      
      
      javax.jdo.option.ConnectionURL
      jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true&useSSL=false
      JDBC connect string for a JDBC metastore
      
      
      javax.jdo.option.ConnectionDriverName
      com.mysql.jdbc.Driver
      Driver class name for a JDBC metastore
      
      
      javax.jdo.option.ConnectionUserName
      MySQL用户名
      username to use against metastore database
      
      
      javax.jdo.option.ConnectionPassword
      MySQL密码
      password to use against metastore database
      
      
  9. 进入Hive Shell(quit;退出)
    hive

Hive实验:

一. DDL命令

  1. 数据库相关命令
    • 创建简单的数据库
      CREATE DATABASE testdb;
    • 查看数据库
      SHOW DATABASES;
    • 正则表达式检索
      SHOW DATABASES LIKE 't.*';
    • 创建数据库的同时,设置数据库的存储路径
      CREATE DATABASE testdb2 LOCATION '/user/mydb';
    • 在建库的同时,给数据库添加注释
      CREATE DATABASE testdb3 COMMENT 'This is a test database3';
    • 查看数据库的注释和存储路径
      DESCRIBE DATABASE testdb3;
    • 创建数据库的同时,为数据库添加键值对作为参数
      CREATE DATABASE testdb4 WITH DBPROPERTIES('creator'='tiny','date'='2016-12-21');
    • 查看数据参数
      DESCRIBE DATABASE EXTENDED testdb4;
    • 选择数据库
      USE testdb4;
    • 删除库
      DROP DATABASE IF EXISTS testdb3 CASCADE;
  2. 表相关命令
    • 创建一个普通表:
      CREATE TABLE IF NOT EXISTS test_1
      (id INT,
      name STRING,
      address STRING);
    • 创建一个外部表:
      CREATE EXTERNAL TABLE external_table (dummy STRING)
      LOCATION '/user/tom/external_table';
    • 创建一个分区表:
      CREATE TABLE partition_table (id INT, name STRING, city STRING)
      PARTITIONED BY (pt STRING)
      ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    • 创建一个与已经存在的表结构相同的表
      CREATE TABLE test_2 LIKE test_1;
    • 给表增加字段
      alter table test_1 add columns
      (telephone STRING,
      qq STRING,
      birthday date);
    • 修改表的字段名
      ALTER TABLE test_1 CHANGE address addr STRING;
    • 修改表名
      ALTER TABLE test_1 rename to test_table;

二. DML命令

  1. 加载数据
    现有一张表,建表语句如下所示:
    CREATE TABLE login (
    uid BIGINT,
    ip STRING
    )
    PARTITIONED BY (pt string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

    退出Hive Shell,创建login.txt

    11151007001,192.168.1.1
    11151007002,192.168.1.2

    创建login2.txt

    11151007003,192.168.1.3
    11151007004,192.168.1.4
    • 加载本地数据到Hive表(再打开Hive Shell,并且要重新USE testdb4;)
      LOAD DATA LOCAL INPATH '/usr/local/hadoop/login.txt' OVERWRITE INTO TABLE login PARTITION (pt='20161221');
      SELECT *FROM LOGIN;
    • 加载HDFS中的文件
      LOAD DATA INPATH '/tmp/login2.txt' INTO TABLE login PARTITION (pt='20161221');
      SELECT *FROM LOGIN;
  2. 查询结果插入到表
    • 单表插入
      CREATE TABLE login2(uid BIGINT);
      INSERT OVERWRITE TABLE login2 SELECT DISTINCT uid FROM login;
    • 多表插入
      CREATE TABLE login_ip(ip STRING);
      CREATE TABLE login_uid(uid BIGINT);
      FROM login
      INSERT OVERWRITE TABLE login_uid
      SELECT uid
      INSERT OVERWRITE TABLE login_ip
      SELECT ip;
  3. 查询结果输出到文件系统中
    FROM login
    INSERT OVERWRITE LOCAL DIRECTORY '/usr/local/hadoop/login` SELECT *
    INSERT OVERWRITE DIRECTORY '/tmp/ip' SELECT ip;

三. HiveJDBC

  1. 新建MapReduce项目

  2. 右键工程,选择 Properties ,然后在工程中导入外部jar包

    Paste_Image.png
  3. 创建userinfo.txt文件内容
    1     xiaoping
    2     xiaoxue
    3     qingqing
    4     wangwu
    5     zhangsan
    6     lisi

  4. 开启远程服务
    hive --service hiveserver
  5. JAVA端执行下面代码

    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.ResultSet;
    import java.sql.SQLException;
    import java.sql.Statement;
    import org.apache.log4j.Logger;
    public class HiveJdbcClient {
     private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
     private static String url = "jdbc:hive://localhost:10000/default";
     private static String user = "";
     private static String password = "";
     private static String sql = "";
     private static ResultSet res;
     private static final Logger log = Logger.getLogger(HiveJdbcClient.class);
    
     public static void main(String[] args) {
         try {
             Class.forName(driverName);
             Connection conn = DriverManager.getConnection(url, user, password);
             Statement stmt = conn.createStatement();
             String tableName = "testHiveDriverTable";
             sql = "drop table " + tableName;
             stmt.executeQuery(sql);
             sql = "create table "
                     + tableName
                     + " (key int, value string)  row format delimited fields terminated by '\t'";
             stmt.executeQuery(sql);
             sql = "show tables '" + tableName + "'";
             System.out.println("Running:" + sql);
             res = stmt.executeQuery(sql);
             System.out.println("执行“show tables”运行结果:");
             if (res.next()) {
                 System.out.println(res.getString(1));
             }
    
             sql = "describe " + tableName;
             System.out.println("Running:" + sql);
             res = stmt.executeQuery(sql);
             System.out.println("执行“describe table”运行结果:");
             while (res.next()) {
                 System.out.println(res.getString(1) + "\t" + res.getString(2));
             }
    
             String filepath = "/usr/local/hadoop/userinfo.txt";
             sql = "load data local inpath '" + filepath + "' into table "
                     + tableName;
             System.out.println("Running:" + sql);
             res = stmt.executeQuery(sql);
    
             sql = "select * from " + tableName;
             System.out.println("Running:" + sql);
             res = stmt.executeQuery(sql);
             System.out.println("执行“select * query”运行结果:");
             while (res.next()) {
                 System.out.println(res.getInt(1) + "\t" + res.getString(2));
             }
    
             sql = "select count(1) from " + tableName;
             System.out.println("Running:" + sql);
             res = stmt.executeQuery(sql);
             System.out.println("执行“regular hive query”运行结果:");
             while (res.next()) {
                 System.out.println(res.getString(1));
    
             }
    
             conn.close();
             conn = null;
         } catch (ClassNotFoundException e) {
             e.printStackTrace();
             log.error(driverName + " not found!", e);
             System.exit(1);
         } catch (SQLException e) {
             e.printStackTrace();
             log.error("Connection error!", e);
             System.exit(1);
         }
    
     }
    }