Internal-remapping-test failing for Exasol driver

I am trying to get all Metabase tests running with the Exasol driver. Currently the metabase.query-processor-test.breakout-test/internal-remapping-test test fails with the following error:

FAIL in metabase.query-processor-test.breakout-test/internal-remapping-test (breakout_test.clj:82)

:exasol With human readable values remapping test_data_venues.category_id -> {65 "Spanish", 70 "Tea Room", 62 "Seafood", 74 "Wine Bar", 7 "Bar", 59 "Ramen", 20 "Diner", 72 "Unknown", 58 "Pizza", 60 "Restaurant General", 27 "French", 1 "African", 69 "Tapas", 24 "Fashion", 55 "Nightclub", 39 "Italian", 46 "Late Dining", 4 "Asian", 54 "Museum", 15 "Chinese", 48 "Lounge", 50 "Mexican", 75 "Winery", 21 "Donut Shop", 31 "Greek", 32 "Grocery", 40 "Japanese", 56 "Nightlife", 33 "Health & Beauty", 13 "Café Sweets", 22 "English", 36 "Hot Dog", 41 "Jewish", 43 "Karaoke", 61 "Scandinavian", 29 "German", 44 "Korean", 6 "Bakery", 28 "Gay Bar", 64 "Southern", 51 "Middle Eastern", 25 "Fast Food", 34 "Home", 17 "Comedy Club", 3 "Artisan", 12 "Café", 2 "American", 66 "Stadium", 23 "Entertainment", 47 "Latin American", 35 "Hostel", 19 "Dim Sum", 57 "Outdoors", 68 "Strip Club", 11 "Burger", 9 "Breakfast / Brunch", 5 "BBQ", 14 "Caribbean", 45 "Landmark", 53 "Moroccan", 26 "Food Truck", 16 "Coffee Shop", 38 "Indian", 30 "Gluten-free", 73 "Vegetarian / Vegan", 10 "Brewery", 18 "Deli", 52 "Molecular Gastronomy", 67 "Steakhouse", 71 "Thai", 42 "Juice Bar", 37 "Hotel", 63 "South Pacific", 8 "Beer Garden", 49 "Mediterannian"}
expected: [[2 8 "American"]
           [3 2 "Artisan"]
           [4 2 "Asian"]
           [5 7 "BBQ"]
           [6 2 "Bakery"]]
  actual: ([[2 8 nil] [3 2 nil] [4 2 nil] [5 7 nil] [6 2 nil]])

The same test succeeds with the h2 driver. Both the Exasol and h2 driver basically generate the same two SQL statements:

-- H2 queries
SELECT "PUBLIC"."CATEGORIES"."ID" AS "ID",
    "PUBLIC"."CATEGORIES"."NAME" AS "NAME"
FROM "PUBLIC"."CATEGORIES"
LIMIT 1048575;

SELECT "PUBLIC"."VENUES"."CATEGORY_ID" AS "CATEGORY_ID",
    count(*) AS "count"
FROM "PUBLIC"."VENUES"
GROUP BY "PUBLIC"."VENUES"."CATEGORY_ID"
ORDER BY "PUBLIC"."VENUES"."CATEGORY_ID" ASC
LIMIT 5;

-- Exasol queries
SELECT "CAM_79"."test_data_categories"."id" AS "id",
    "CAM_79"."test_data_categories"."name" AS "name"
FROM "CAM_79"."test_data_categories"
LIMIT 1048575;

SELECT "CAM_79"."test_data_venues"."category_id" AS "category_id",
    count(*) AS "count"
FROM "CAM_79"."test_data_venues"
GROUP BY "CAM_79"."test_data_venues"."category_id"
ORDER BY "CAM_79"."test_data_venues"."category_id" ASC
LIMIT 5;

The VENUES queries for h2 and Exasol return only two columns (category_id and count) but not the category name. So I assume that Metabase joins the data internally, adding the category name to the result, but this doesn't work for Exasol.

Probably something is missing in the Exasol driver, but I didn't find anything obvious by comparing it with the h2 or oracle driver.

Thank you for your help!

Hi @kaklakariada
Which type of driver does Exasol most behave similar to? I'm guessing it's not H2, so I would recommend looking at the closest matching driver - even other community drivers.

We've just a fixed a bunch of remapping issues and updated tests for those, so make sure you're updated on master.

But it could be caused by something else you've changed somewhere in the driver to handle another problem, which then causes problems for remapping.

Hi @flamber,
Thank you for your reply! The Exasol driver is mostly based on the Oracle driver and I think there are no relevant differences.
The tests also fail on master.
Can you please point me to the Metabase source code that handles this feature? Then I can debug the issue myself.

@kaklakariada Remapping is handled several places, but I think you're asking for this:
https://github.com/metabase/metabase/blob/master/src/metabase/query_processor/middleware/add_dimension_projections.clj
You'll also want to look at these PRs:
https://github.com/metabase/metabase/pulls?q=is%3Aopen+is%3Apr+author%3Acamsaul+remapping

@flamber: Thank you for the link!
Looks like function transform-values-for-col converts the IDs to BitInteger. That's why the comparison with Long fails and no mapping is found. The reason is that the base_type for Exasol is :type/Decimal, and :type/Integer for H2 (where the tests succeed).

I will investigate why the base_type is wrong...

@kaklakariada Ohh yeah, decimal type causes the weirdest behaviors when remapping - and prevents some database types from allowing remapping (like my arch nemesis MongoDB). I've had loads to "fun" with remapping last year, where I created a handful of issues about it.

@flamber I feel your pain! Timezones are also fun :stuck_out_tongue:
Changing sql-jdbc.sync/database-type->base-type to return :type/BigInteger instead of :type/Decimal fixed the tests.

But I guess this will have performance implications. Do you have any recommendations?

I will try if modifying tx/aggregate-column-info could help.

@kaklakariada Don't even get me started on timezones - or even worse DST :wink:

I don't think you'll have performance implications. Though I don't know Exasol, but EXPLAIN ANALYZE is your friend.
This is getting into a territory, where you need to understand the underpinnings of what the database does in its query planner, and then even the CPU comes into play.

I finally managed to fix the tests. Root cause was a mismatch in the column types reported by connection.getMetaData().getColumns() and resultSet.getMetaData(). Using :type/Decimal consistently fixed the tests: https://github.com/exasol/metabase-driver/pull/31 :partying_face:

@flamber: thanks again for pointing me into the right direction!

1 Like