[0.29.x: DONE] Connecting to local Spark


#1

Hi, guys!

I’m testing the new Spark SQL driver but I’m not sure how to connect Metabase to my cluster. My Metabase is running on the same machine of my Spark Master.

I’ve tried

  • Host: localhost
  • Host: local[*]
  • Host: spark://localhost

Always on port 7077, but the server is returning me this error:

05-02 17:48:25 ERROR metabase.driver :: Failed to connect to database: java.lang.NoClassDefFoundError: org/apache/h
adoop/conf/Configuration
05-02 17:48:25 DEBUG metabase.middleware :: POST /api/setup/validate 400 (3 ms) (0 DB calls).
{:errors {:dbname "java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration"}}

Anyone can help me? Thanks!


#2

I’ve tested both with this last release 0.29 and building by the master.


#3

Hi @allansene,

Can you try removing the :exclusions for the Hive-JDBC dependency from project.clj? Remove lines 96-105 and replace the whole thing with

[org.spark-project.hive/hive-jdbc "1.2.1.spark2"]

and try building again and see if it works


#4

The build both from master and v0.29 is failling now :frowning:

java.lang.IncompatibleClassChangeError: Implementing class, compiling:(jetty.clj:1:1)
Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class, compiling:(jetty.clj:1:1)

Full Stacktrace.


#5

Try

                 [org.spark-project.hive/hive-jdbc "1.2.1.spark2"
                  :exclusions [jdk.tools
                               org.codehaus.jackson/jackson-xc
                               org.eclipse.jetty.aggregate/jetty-all
                               org.mortbay.jetty/jetty]
                  :classifier "standalone"]

#6

Now it builds, but I’m getting the same error.

05-02 20:24:48 ERROR metabase.driver :: Failed to connect to database: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
05-02 20:24:49 DEBUG metabase.middleware :: POST /api/setup/validate 400 (3 ms) (0 DB calls).
{:errors {:dbname "java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration"}}

Maybe this is happening because I’m running Metabase as root… make sense? Only as root, I’m able to run it on port 80


#7

@senior any ideas? I’m starting to think the Hive JDBC driver didn’t have Hadoop stuff in it in the first place and we would need to add a separate dep? (Not sure why we need it, but it sounds like what we’d need to do to fix this error)


#8

I’ve tried with this build that wjoel has made available and it worked.


#9

Getting same error here. I tried to connect to Spark SQL using v0.29.0-RC1 and v0.29.0 and this java.lang.NoClassDefFoundError: appered to me in logs.

I’m trying to connect to a remote Spark.


#10

@camsaul It looks like that might be right. Found some instructions https://streever.atlassian.net/wiki/spaces/HADOOP/pages/4390924/HS2+JDBC+Client+Jars+Hive+Server2. That Hadoop JAR is pretty big (~4 MB) but it also has a ton of transitive dependencies. Looks like we’ll need to spend some time narrowing those transitive dependencies, but even then I think it’ll have an impact on memory and the JAR size.


#11

@allansene , @senior I started a similar discussion in this issue . I was able to rebuild the project and make it work in my environment. There are some JDBC driver issues with older versions of HiveServer2 that I described there as well.

Hope it helps =)


#12

Awesome, Lucas!

I’m gonna build with these modifications that you pointed out and try on my environment again. Thank you for the help!


#13

It works!


#14

@allansene do you know how I can configure Hive queue in which Spark can perform queries ? By default, this setup runs in the queue=root.default and I need to change it to other one.


#15

Nevermind, I figured out how to select a specific queue in my DBConnector. You just need to add the following config in extra params: ?mapred.job.queue=[QUEUE_NAME] (source here)

@allansene were you able to scan values and fields of your Hive table? In my version it doesn’t work.