[0.29.x: DONE] Connecting to local Spark

allansene · May 2, 2018, 5:44pm

Hi, guys!

I’m testing the new Spark SQL driver but I’m not sure how to connect Metabase to my cluster. My Metabase is running on the same machine of my Spark Master.

I’ve tried

Host: localhost
Host: local[*]
Host: spark://localhost

Always on port 7077, but the server is returning me this error:

05-02 17:48:25 ERROR metabase.driver :: Failed to connect to database: java.lang.NoClassDefFoundError: org/apache/h
adoop/conf/Configuration
05-02 17:48:25 DEBUG metabase.middleware :: POST /api/setup/validate 400 (3 ms) (0 DB calls).
{:errors {:dbname "java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration"}}

Anyone can help me? Thanks!

allansene · May 2, 2018, 5:53pm

I’ve tested both with this last release 0.29 and building by the master.

camsaul · May 2, 2018, 6:13pm

Hi @allansene,

Can you try removing the :exclusions for the Hive-JDBC dependency from project.clj? Remove lines 96-105 and replace the whole thing with

[org.spark-project.hive/hive-jdbc "1.2.1.spark2"]

and try building again and see if it works

allansene · May 2, 2018, 6:58pm

The build both from master and v0.29 is failling now

java.lang.IncompatibleClassChangeError: Implementing class, compiling:(jetty.clj:1:1)
Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class, compiling:(jetty.clj:1:1)

Full Stacktrace.

camsaul · May 2, 2018, 7:07pm

Try

                 [org.spark-project.hive/hive-jdbc "1.2.1.spark2"
                  :exclusions [jdk.tools
                               org.codehaus.jackson/jackson-xc
                               org.eclipse.jetty.aggregate/jetty-all
                               org.mortbay.jetty/jetty]
                  :classifier "standalone"]

allansene · May 2, 2018, 8:31pm

Now it builds, but I’m getting the same error.

05-02 20:24:48 ERROR metabase.driver :: Failed to connect to database: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
05-02 20:24:49 DEBUG metabase.middleware :: POST /api/setup/validate 400 (3 ms) (0 DB calls).
{:errors {:dbname "java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration"}}

Maybe this is happening because I’m running Metabase as root… make sense? Only as root, I’m able to run it on port 80

camsaul · May 2, 2018, 8:39pm

@senior any ideas? I’m starting to think the Hive JDBC driver didn’t have Hadoop stuff in it in the first place and we would need to add a separate dep? (Not sure why we need it, but it sounds like what we’d need to do to fix this error)

allansene · May 2, 2018, 8:50pm

I’ve tried with this build that wjoel has made available and it worked.

lucasloami · May 2, 2018, 10:17pm

Getting same error here. I tried to connect to Spark SQL using v0.29.0-RC1 and v0.29.0 and this java.lang.NoClassDefFoundError: appered to me in logs.

I’m trying to connect to a remote Spark.

senior · May 3, 2018, 1:30pm

@camsaul It looks like that might be right. Found some instructions https://streever.atlassian.net/wiki/spaces/HADOOP/pages/4390924/HS2+JDBC+Client+Jars+Hive+Server2. That Hadoop JAR is pretty big (~4 MB) but it also has a ton of transitive dependencies. Looks like we’ll need to spend some time narrowing those transitive dependencies, but even then I think it’ll have an impact on memory and the JAR size.

lucasloami · May 7, 2018, 8:59pm

@allansene , @senior I started a similar discussion in this issue . I was able to rebuild the project and make it work in my environment. There are some JDBC driver issues with older versions of HiveServer2 that I described there as well.

Hope it helps =)

allansene · May 8, 2018, 7:30pm

Awesome, Lucas!

I’m gonna build with these modifications that you pointed out and try on my environment again. Thank you for the help!

allansene · May 8, 2018, 9:04pm

It works!

lucasloami · May 9, 2018, 1:03am

@allansene do you know how I can configure Hive queue in which Spark can perform queries ? By default, this setup runs in the queue=root.default and I need to change it to other one.

lucasloami · May 9, 2018, 3:51pm

Nevermind, I figured out how to select a specific queue in my DBConnector. You just need to add the following config in extra params: ?mapred.job.queue=[QUEUE_NAME] (source here)

@allansene were you able to scan values and fields of your Hive table? In my version it doesn’t work.

VVV · December 19, 2018, 4:41pm

Hey guys
i got the same issue of using the plugins,
i suppose the reason is hive-jdbc,
anybody could help me to rebuild spark deps with org.apache.hadoop.hive pls,
thanks a lot!