Connecting Databricks Spark to Metabase

I am a few days trying to add Databricks connection as Spark SQL one but with no success. Does anyone already have done it?

Using this tutorial in Databricks documentation, but with no success. What I notice is that the connection driver protocol is jdbc:spark instead of jdbc:hive2 which seems the one that Metabase driver uses.
Databricks also provides a JAR with its own driver. Should I look for a Metabase driver linked to this? Or should I be able to connect using Spark SQL?

Could anyone help me?

1 Like

Hi @dzilioti

It’s the first time I’ve seen someone talk about Databricks support, so I’m not sure how you should connect - or if you even can.

If Databricks are up for creating a driver that works with Metabase, then I think that will be your best chance of getting integration.

There’s 99 other databases that has been requested support for, and most will likely never be supported unless someone else creates and maintains a driver.

Have you tried contacting Databricks to figure out if they have any knowledge about connecting with Metabase (probably using the current Spark driver)?

Thanks for the quick answer @flamber. I will try to contact Databricks support to see if I get any help from there.
Do you have any other tip of data warehouse that works with Metabase, preferably open source?

I’m not sure, but currently Metabase “only” supports 15 drivers - where Postgres and MySQL are some of the best supported.
There’s a few other drivers, which are maintained by other people - have a look at the link I provided in previous comment to see if one of the requests has a work-in-progress driver or is merely comments.

1 Like

For other folks coming across this thread, there’s a community-built Metabase Databricks driver here: https://github.com/ifood/metabase-sparksql-databricks-driver

I’ve only tested the basic connection myself. The tricky thing is ensuring the JDBC flags and UID were correct. The UID/username is literally token and the JDBC flags can be copied from the Databricks cluster JDBC info page.

Currently that driver only works with v0.32.x and v0.33.x - it has not been updated to work with v0.34.x. Here’s a Dockerfile based off of Metabase v0.33.6 that adds the plugin to the relevant location in the container:

FROM metabase/metabase:v0.33.6

ADD --chown=2000:2000 https://github.com/ifood/metabase-sparksql-databricks-driver/releases/download/1.0.0/sparksql-databricks.metabase-driver.jar /plugins/

@decort,
I have followed the same link https://github.com/ifood/metabase-sparksql-databricks-driver and downloaded the jar sparksql-databricks.metabase-driver.jar and placed in …/plugins/ directory .

The metabase version we are using is v0.33.1 and unfortunately the we were not able to establish connection.When all the fields are filled and I clicked on save,Below is the error message:

[Simba]SparkJDBCDriver Error setting/closing session: Open Session Error

To resolve above error we have downloaded the simba custom jar SparkJDBC41.jar and placed in /home/ubuntu/apps/metabase/plugins but no luck the error remains same.

Please let me know is there a way I can resolve this

Hi @chandan
Try https://github.com/fhsgoncalves/metabase-sparksql-databricks-driver which is the authors own personal Github account - and updates will likely only happen there.
Metabase 0.33.1 was a broken build, so I would highly recommend 0.33.7.3 (or perhaps the latest 0.34.2)
Where did you download the Simba dependency from? Should be simba-spark-jdbc41-2.6.3.1003.jar, and automatically downloaded, but perhaps it couldn’t save it to ./plugins/ because of permissions?

Thanks @flamber, could you please provide me a link to download simba-spark-jdbc41-2.6.3.1003.jar, I have downloaded it from different link and i just want to know do i have to place that jar in ./plugins/ or different directory to resolve the error

@chandan The dependency comes from the driver: https://github.com/fhsgoncalves/metabase-sparksql-databricks-driver/blob/master/project.clj
But the full download link (for driver version 1.0) is: https://dl.bintray.com/ifood/third/simba/simba-spark-jdbc41/2.6.3.1003/simba-spark-jdbc41-2.6.3.1003.jar

@flamber thanks, wanted to confirm,do i have to place simba-spark-jdbc41-2.6.3.1003.jar in ./plugins/ or any other directory

@chandan I actually haven’t played with Databricks yet, but dependencies are also places in ./plugins/. I would recommend not renaming any of the files, since it be able to find them.
And then check the log on startup to make sure it loads correctly (or gives errors) during the driver process.

@flamber,
I have followed the same steps as you mentioned.
1.Installed metabase version 0.33.7.3
2.Downloaded jars sparksql-databricks.metabase-driver.jar and simba-spark-jdbc41-2.6.3.1003.jar and placed in /./plugins directory
3.Restarted the metabase.

But it still displays same error while iam saving the database connection:

[Simba]SparkJDBCDriver Error setting/closing session: Open Session Error.

Along with this there is other java error stack also,not sure if it is related to spark SQL databricks connection.

Caused by: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
at java.base/sun.security.validator.PKIXValidator.(PKIXValidator.java:89)
at java.base/sun.security.validator.Validator.getInstance(Validator.java:181)
at java.base/sun.security.ssl.X509TrustManagerImpl.getValidator(X509TrustManagerImpl.java:300)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrustedInit(X509TrustManagerImpl.java:176)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:189)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:110)
at com.simba.spark.hivecommon.utils.DSTrustManager.checkServerTrusted(Unknown Source)
at java.base/sun.security.ssl.AbstractTrustManagerWrapper.checkServerTrusted(SSLContextImpl.java:1510)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:625)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:460)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:360)
at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443)
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421)
at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:177)
at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:164)
at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1152)
at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1063)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:402)

@dacort,@flamber let me know if any additional changes needs to be done to fix this connection issue.

@chandan How are you starting Metabase? It seems like it’s a generic Java error:
https://stackoverflow.com/questions/6784463/error-trustanchors-parameter-must-be-non-empty
Try using a container from a Dockerfile like Damon described earlier: Connecting Databricks Spark to Metabase

@flamber, thank you so much,we could able to resolve this issue and can able to establish connection now.
Along with the steps I mentioned earlier, there are few additional steps needs to be followed
to resolve the java error described above please follow below link(follow the verified answer).

https://stackoverflow.com/questions/50571685/maven-trustanchors-parameter-must-be-non-empty-and-parent-relativepath-inva

Hello @flamber and @chandan,

I’m with the same problem described in this GitHub issue: https://github.com/fhsgoncalves/metabase-sparksql-databricks-driver/issues/2

I can connect to my Databricks cluster, but this error occurs frequently, and the only solution is to restart Metabase.

Has someone experienced this before?

@relferreira, Even I faced the same problem its not resolved yet, please let me know if you find any solution or workaround

Hello @chandan, I’m suspecting that the error occurs when the Databricks cluster terminates. We have the following configuration in our cluster:

Terminate after 120 minutes of inactivity.

I’m not 100% sure, but manually shutting down the cluster, gives me the same error.

One week trying to figure this error out, but I couldn’t.

It doesn’t seem to be a driver error, since running in my MacBook everything works perfectly.

I tried running Metabase in Kubernetes and an Azure VM, and this error occurs in both.
When I shut down my Databricks cluster and try to query it using Metabase, different things happen based on where I’m running.

In my local machine, the query wakes up my Databricks cluster and finishes with success.
On Azure, the query fails with 503, wakes up my Databricks cluster, and then the next query gives me the error described by @chandan.

Does anybody have any solution?

Thanks

@relferreira, no solution yet, waiting to get it resolved. Does any of server configurations helped in resolving errors?

Really strange, running the docker image with mount everything works, even when I restart my Databricks cluster.

docker run -it --rm -p 3000:3000 \  --mount type=bind,source=/Users/relferreira/GitHub/metabase-sparksql-databricks-driver/plugins,destination=/plugins \
  --name metabase metabase/metabase

But with a Dockerfile like this:

FROM metabase/metabase

ENV MB_DB_CONNECTION_TIMEOUT_MS=600000

COPY plugins/* /plugins/

I can’t even connect to Databricks.