Problem with Data?

So I have a dataset that I want to group by a column called ‘TYPE’, values here are ‘REN’ or ‘NB’. When I create any graph using this, Metabase seems to think there are two distinct version of ‘REN’ and displays them separately (you can see this in the attached screenshot.) I’ve used phpMysqlAdmin to look at the data and it doesn’t register any difference. I’ve run a global update within mysql to ensure that some of this data isn’t space-padded etc, but Metabase still seems to see them as different.

I’m using the Mtabase Jar version 0.28.3, although it was the same in the previous version. Running on Ubuntu with MariaDB v 5.5.59.

What might be happening here and how can I stop it??

Thanks,

Tony

That looks like your data being the problem.
In MySQL, run a query:
select type, count(*) from yourTable group by Type
See how many types it returns.

Hi! Thanks for the reply…

Yes, I’ve tried this and it only finds the 2 types - ‘REN’ and ‘NB’

If you view the SQL for your chart, what happens if you reuse the SQL in MySQL?

Well, I’m aggregating data into a months value so I see a list of Type (REN or NB), a Count, and a date.

But as it seems to affect a whole month rather than partial days within the month, I only see what I expect to see.

i.e. if it affected a partial month I’d see
Feb, NB, 100
Feb, REN, 100
Feb, REN, 50

But I see what I would expect:
FEB, NB, 100
FEB, REN, 150

Can you share the SQL?

It’s essentially this. I had to change the ‘?’ to the appropriate filter values.

SELECT count(*) AS count, str_to_date(concat(date_format(MIDATA.CreatedDate1, ‘%Y-%m’), ‘-01’), ‘%Y-%m-%d’) AS CreatedDate1, MIDATA.Type AS Type
FROM MIDATA
WHERE (MIDATA.Agent <> ?
AND MIDATA.Product1 <> ? AND (MIDATA.Type = ‘NB’
OR MIDATA.Type = ‘REN’) AND (MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ? AND MIDATA.AddOnName <> ?))
GROUP BY str_to_date(concat(date_format(MIDATA.CreatedDate1, ‘%Y-%m’), ‘-01’), ‘%Y-%m-%d’), MIDATA.Type
ORDER BY str_to_date(concat(date_format(MIDATA.CreatedDate1, ‘%Y-%m’), ‘-01’), ‘%Y-%m-%d’) ASC, MIDATA.Type ASC

I’ve create a near identical chart here (modified slightly of MS SQL)
image
My query:

 from Orders o
 inner join OrdersDetail od on o.OrderID = od.OrderID
 inner join Product p on p.ProductID = od.ProductID
  group by cast(cast(100 * year(OrderDate) +  month(OrderDate) as varchar) + '01' as date), ProductClass
  order by  cast(cast(100 * year(OrderDate) +  month(OrderDate) as varchar) + '01' as date), ProductClass

All I can suggest is to remove all the filters and make your query as simple as possible. Build it up from there. It looks like something is odd in the data.

Thanks for the suggestion, I’ll give that a try. My feeling is that there is something in the data causing this, but its not obvious what it is!

Even more curious. I’ve created a small query just based on this ‘Type’ and the date. When I look at the data via month I see this:

So you can see the two ‘REN’ columns, yet no month apparently has entries for both types of ‘REN’. However, if I look at January 2017 and view by day, I see this (see next post):

Reminds me of the Turkish locale debacle in Metabase Centos İnstall Error- PLEASE HELP (Turkish locale) :open_mouth::scream: … always fear them encoding and locale issues. I guess there could be lot’s of different E characters as well …

edit: hah, that’s actually an idea for a possible extension of the :alembic: X-ray feature … detect character anomalies/outliers/spelling mistaeks in dimension labels … oh well it’s not like Metabase is missing feature requests already!

I’m not sure what is happening. I’m going to have a look at the data for Jan 3 and Jan 4 and see if I can see any difference in the data. I’m not hopeful though.