Spark 读取 Hive 数据库的代码谁能提供一份， Java 实现的

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

› Apache Hadoop

› Hortonworks Sandbox

› Intel Hadoop Distribution

› Treasure Data

这是一个创建于 3218 天前的主题，其中的信息可能已经有所发展或是发生改变。

找了半天也没有找到一份完整的 Spark 读取 Hive 数据库的代码，好心人能不能提供一份。

Hive

Spark

读取

数据库

7 条回复 • 2016-03-18 13:07:48 +08:00

liprais

2016-03-18 09:00:15 +08:00 via iPhone

文档看了么?
把 hive 的配置拷到 conf 下面，再 new 一个 hivecontext 就可以了

anonymoustian

2016-03-18 09:16:39 +08:00

@liprais 能再详细点吗，看了 spark 的文档，就一两句话。。

liprais

2016-03-18 09:26:16 +08:00

@anonymoustian

SparkConf conf = new SparkConf().setAppName("JavaRFormulaExample");
JavaSparkContext jsc = new JavaSparkContext(conf);
HiveContext hiveContext = new HiveContext(jsc);

anonymoustian

2016-03-18 10:57:15 +08:00

@liprais 对不起，这个我知道，但是前期的配置工作和后期的读写您能方便给个例子吗？我现在对读入 hive 里的数据输出出来一无所知，我还是有点不懂 /。

谢谢了

liprais

2016-03-18 11:02:26 +08:00

@anonymoustian
"Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), hdfs-site.xml (for HDFS configuration) file in conf/. Please note when running the query on a YARN cluster (cluster mode), the datanucleus jars under the lib directory and hive-site.xml under conf/ directory need to be available on the driver and all executors launched by the YARN cluster. The convenient way to do this is adding them through the --jars option and --file option of the spark-submit command."
把上述三个文件(hive-site.xml,core-site.xml,hdfs-site.xml)拷到 spark 的 conf 下面就行了
然后读写的时候代码如下

// sc is an existing JavaSparkContext.
HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc);

// Queries are expressed in HiveQL.
sqlContext.sql("select * from YOUR_HIVE_TABLE_NAME").collect();

anonymoustian

2016-03-18 12:47:30 +08:00

@liprais spark 的 conf 下面在哪里？

liprais

2016-03-18 13:07:48 +08:00

@anonymoustian
YOUR_SPARK_HOME/conf
拷到这个目录下就行了