Should we build Kafka connecter or Kafka plugin

waseem · December 28, 2015, 8:37pm

Hello,

We just started playing and using Metabase we design our data warehouse using Spark, Cassandra and Kafka.

Metabase is excellent tools to build our BI dashboard but to do so we need to pull the data to Metabase, and we believe the best way to do this moves it direct from Kafka events.

But we are not sure if we should build Kafka connector inside Metabase (using Clojure) or use the Metabase APi layer from Kafka direct (as a plugin in Kafka).

What is you recommendation?

sameer · December 29, 2015, 4:15pm

In general we’d recommend connecting Metabase to something that can hit data at rest. In your case, I assume you’re pushing an event stream through Kafka into Cassandra and then running spark jobs against the data in Cassandra. Is that right?

If that’s the case, I’d say the simplest solution would be to point Metabase at Cassandra directly. If you’re willing and able to us there, we’d love the help.

Regarding hitting Kafka directly, it would require a pretty serious design review as currently we don’t have any notion of streams in Metabase. Open to exploring it with you! Would you mind either elaborating on your reasoning here or in a github issue? Sounds very exciting and a different angle than we were on =)

waseem · December 29, 2015, 6:10pm

I have thought about the direct connection with Cassandra but with +200 TB data and 120B documents, I did not feel right about the idea. As well as rely on CQL will reduce a lot of advantages!

We have a system that is similar to this images

https://www.dropbox.com/s/hml2hn1u5qmoqxz/Screenshot%202015-12-29%2010.08.06.png?dl=0

sameer · December 31, 2015, 6:16pm

Based on that image, I’d say that the best things to point metabase at would be the “initial aggregated data tables”.

In any event, we’re actively interested in exploring what a truly real time/streaming driver might look like and would love to work with you to figure out how best to do that. We also think a CQL/Cassandra driver might be useful to you and others, and if you’re able and willing to write one, would love to include one with Metabase =)

More than happy to take the conversation offline if I can be useful in working through which of the intermediate tables might be best to use with Metabase if you’re not comfortable sharing details in a public forum.