Data Studio Transformations First Impressions & Structured Feedback from Real-World Testing

I’ve just opened a GitHub issue after testing Data Studio Transformations.

Overall impression is very positive, especially around transforms and semantic layer structure.

Would love feedback from other users who tested it as well.

GitHub issue:

2 Likes

I played with Data Studio from out here in OSS-land for a couple of hours, which is limited to the glossary, the new version of the table metadata editor, and query-based transforms. I am a solo user, nobody sees my failed experiments but me.

The new metadata editor is much like the one in Admin, with more sliding panels. Having Segments and Measures accessible there is nice, especially as Segments was buried deep in Admin before. On the downside, editing those things requires scrolling, even on my huge display. It would fit better if the tab bar was at the top and the Attributes/Metadata panels were on a tab.

Transforms are the meat of the new features and the query-based choices makes for a useful tool, as long as your data transforms well in one step. There’s no way to set constraints or add data checks, so make sure your queries do the right thing. Iterative development is fast, the run button is in easy reach. Job scheduling is simple, though I can see people needing an every 4 or 6 hour step there. The run log is a nice touch. I am a SQL guy and having the ability to make managed scratch tables with native SQL query support will help me keep my databases more organized.

I wasn’t able to test the Python features as its not in OSS, but I read through the manual to see what it could do. The Python runner seems like a good start, but not being able to pull in your own modules is a big miss. The whole point of having a Python environment is to run cornerstone data science Python modules like pytorch or scikit-learn. With it running in a walled garden, it’s not compelling compared to using plpython in PG, Jupyter notebooks that push data to a table, or good ol’ cron jobs. (Unless you are a pandas-or-death person, then I get it, it goes farther than the limited access in Superset.)

On the other hand, Metabase calling an API to get data is a big want list feature. What would it take to get that call API documented so we can make our own runners, and a way to use that call to get data in a dashboard? :slight_smile:

1 Like

Transformations look very interesting. My current work is taking data from multiple sources (MS SQL mostly) using Apache Hop and storing in a Postgres DB for Metabase to access.

I could probably remove most of the Hop code using the transforms. Final step would be to run an SP that updates all the Materialized views.

Could also use if for improved caching.

One nice extra that I didn’t see though - would be nice to take the saved question used for the transform and change any existing questions to use the new transformed data.

As for the metalayer bit, nice to see that non-admin users will be able to access it.However, if you have a properly designed star schema, a lot of this becomes unnecessary. Personally, I worry about BI metalayers as they can lead to vendor lock-in.

1 Like

Great feedback guys,

I am sharing them on Gthub

1 Like

I’ve had another thought on using Data Studio for ETL. Unless it’s possible to define indexes, it’s only going to be useful for small tables or as the source for some admin defined Materialised Views.

1 Like