More narrow SQL query wielding larger dataset

I'm working on updating some dashboards for our installation of metabase. My first query just included everyone in the database who has subscribed to a certain message.

The 2nd query that someone else built was intended to filter out people who had been banned or blocked.

For some reason, the 2nd query returned a larger dataset (69k vs. 90k) even though it was designed to narrow the list.

SELECT count(*) AS count
FROM notif_option
LEFT JOIN user User ON notif_option.uid = User.uid
WHERE (notif_option.type_id = 20
AND User.last_alive_time >= str_to_date(concat(date_format(date_add(now(6), INTERVAL -12 month), '%Y-%m'), '-01'), '%Y-%m-%d') AND User.last_alive_time < str_to_date(concat(date_format(now(6), '%Y-%m'), '-01'), '%Y-%m-%d') AND notif_option.pm = 'Y')

<>

SELECT
DISTINCT u.username
FROM user u
JOIN notif_option n ON n.uid = u.uid
WHERE u.account_closed = 'N'
AND u.ban_database = 'N'
AND u.can_buy = 'Y'
AND u.can_sell = 'Y'
AND u.last_alive_time >= DATE_SUB(NOW(), INTERVAL 6 MONTH)
AND n.type_id = {{notification_type}} #21 is the new GDPR safe option
AND n.email = 'Y'
AND u.uid NOT IN (SELECT uid FROM banned_user)
AND u.username NOT IN (SELECT autobanned_by FROM banned_ip)
AND u.uid NOT IN (SELECT uid FROM u_blocks WHERE blocked_uid = 6505050)

Any idea why the 2nd one would return a larger set of data?

Hi @coryhuff
I don't think anyone would be able to give you an answer without access to your data.
Take the first query and then add more WHERE-clauses, so it's closer to what you have in the second query, then you should get less results.

1 Like