Missing Columns in Views

Hey, I'm trying to use metabase in combination with rudderstack and BigQuery.
The issue I'm facing is that some columns that appear in views in Big Query are not present in Metabase. All logs that I can see report success, and if I execute a custom SQL query referencing the missing fields, the query will execute and the result is as expected.
While this is a workaround, it prevents us from using (imo) one of the killer feature of metabase, namely the nice query builder for not too technical people.

Does anyone know how to fix this?

Hi @KillerX
Post "Diagnostic Info" from Admin > Troubleshooting.
And if the Views are accessible in the same project and dataset.
Also read this: https://www.metabase.com/docs/latest/troubleshooting-guide/sync-fingerprint-scan.html

Otherwise try enabling debug logging:
JAR - example, expecting MB_DB_* variables are already used otherwise include those as well:

java -Dlog4j.configurationFile="https://log4j.us/v2/templates/metabase?trace=metabase.sync" -jar metabase.jar

Docker - example, add this environmental variable to the docker run ... command:

... -e JAVA_OPTS="-Dlog4j.configurationFile=https://log4j.us/v2/templates/metabase?trace=metabase.sync" ...

AWS Elastic Beanstalk (EBS) - Configuration > Software > Environment properties:

JAVA_OPTS   :   -Dlog4j.configurationFile=https://log4j.us/v2/templates/metabase?trace=metabase.sync

First, thanks for the quick answer.

Diag info is below:

{
  "browser-info": {
    "language": "en-US",
    "platform": "Linux x86_64",
    "userAgent": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0",
    "vendor": ""
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.13+8",
    "java.vendor": "Eclipse Adoptium",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "11.0.13",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.13+8",
    "os.name": "Linux",
    "os.version": "5.4.120+",
    "user.language": "en",
    "user.timezone": "UTC"
  },
  "metabase-info": {
    "databases": [
      "h2",
      "bigquery-cloud-sdk"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "14.0 (Ubuntu 14.0-1.pgdg18.04+1)"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.2.23"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2021-11-09",
      "tag": "v0.41.2",
      "branch": "release-x.41.x",
      "hash": "ad599fd"
    },
    "settings": {
      "report-timezone": "Europe/Oslo"
    }
  }
}

As far as I can tell the views should not have any different permissions compared to the tables. They live side by side in the same data set.

I didn't see any relevant information there. The syncs complete correctly (new fields appear in various tables and views when updated), and there are not errors reported related to the scans that I can see.

I will try that next, but it might take a day or two before I get around to figuring out how to do it properly as it's currently running on GKE. Probably easiest to deploy a custom image with the changes you mentioned for docker above.

@KillerX

The syncs complete correctly (new fields appear in various tables and views when updated)

Perhaps I'm misunderstanding this, but if new fields appear, when you sync, then doesn't it work?

There is a "random" set of fields that never appears, even if they exist in the view from the start. But new fields do show up. There is a table that has the exact same set of fields and they appear there.

@KillerX Okay, if you can provide the DDL (both the table and view), so it's possible to reproduce, then it's a lot easier to figure out what the cause could be.

Note: I do not have direct control over the DDL as it is managed directly by Rudderstack

This is the table schema:

CREATE TABLE `removed.rudderstack_main.airplay_started`
(
  original_timestamp TIMESTAMP,
  context_screen STRING,
  timestamp TIMESTAMP,
  context_release_version_major INT64,
  context_app_language_list_separator STRING,
  context_app_language_culture_name STRING,
  id STRING,
  context_device_info STRING,
  context_library_version STRING,
  context_channel STRING,
  context_os STRING,
  context_release_version_minor INT64,
  uuid_ts TIMESTAMP,
  anonymous_id STRING,
  context_locale_culture_name STRING,
  context_was_online BOOL,
  context_library_name STRING,
  context_release_version_revision INT64,
  sent_at TIMESTAMP,
  context_locale_list_separator STRING,
  event STRING,
  context_locale_is_right_to_left BOOL,
  context_app_language_is_right_to_left BOOL,
  context_timezone STRING,
  received_at TIMESTAMP,
  event_text STRING,
  context_preson_id STRING,
  context_request_ip STRING,
  context_release_version_build INT64,
  loaded_at TIMESTAMP,
  context_ip STRING,
  context_device_info_version_minor INT64,
  context_screen_density INT64,
  context_device_info_model STRING,
  context_app_version STRING,
  context_device_info_idiom STRING,
  context_screen_orientation STRING,
  context_device_info_manufacturer STRING,
  context_device_info_platform STRING,
  context_screen_rotation STRING,
  context_screen_width INT64,
  context_screen_height INT64,
  context_device_info_version_string STRING,
  context_device_info_version_build INT64,
  context_device_info_version_major INT64,
  context_device_info_device_type STRING,
  context_screen_refresh_rate INT64,
  context_device_info_version_revision INT64,
  context_destination_type STRING,
  context_source_id STRING,
  context_source_type STRING,
  context_destination_id STRING,
  user_id STRING
)
PARTITION BY TIMESTAMP_TRUNC(_PARTITIONTIME, DAY);

And the view

SELECT * EXCEPT (__row_number) FROM (
			SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY loaded_at DESC ) AS __row_number FROM `removed.rudderstack_main.airplay_started` WHERE _PARTITIONTIME BETWEEN TIMESTAMP_TRUNC(TIMESTAMP_MICROS(UNIX_MICROS(CURRENT_TIMESTAMP()) - 60 * 60 * 60 * 24 * 1000000), DAY, 'UTC')
					AND TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY, 'UTC')
			)
		WHERE __row_number = 1

It appears that there the columns that are missing are in the second half of the columns. This is consistent for several table/view combinations that follow the same pattern of DDL as this one. The number of columns varies, but the cutoff always seems to be at about 50 - 60% mark.

Below is a visual diff of the columns copied from Show original schema tab

@KillerX Can you upgrade to 0.41.3.1 and then also try with the old BigQuery driver (you can change that in Admin > Databases > your-db)
Something makes me think that this issue is somehow involved:
https://github.com/metabase/metabase/issues/5904
I haven't tried reproducing this yet.

Hey, I have upgraded and created a new DB with the old driver but the results seem the same. I have also forced a rescan with the new driver and that did not produce anything new. I will attempt to add debugging during the day today to see if there is more information I can provide.

Here is the new diag dunp:

{
  "browser-info": {
    "language": "en-US",
    "platform": "Linux x86_64",
    "userAgent": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0",
    "vendor": ""
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.13+8",
    "java.vendor": "Eclipse Adoptium",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "11.0.13",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.13+8",
    "os.name": "Linux",
    "os.version": "5.4.120+",
    "user.language": "en",
    "user.timezone": "UTC"
  },
  "metabase-info": {
    "databases": [
      "bigquery-cloud-sdk",
      "h2",
      "bigquery"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "14.0 (Ubuntu 14.0-1.pgdg18.04+1)"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.2.23"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2021-12-02",
      "tag": "v0.41.3.1",
      "branch": "release-x.41.x",
      "hash": "20f9097"
    },
    "settings": {
      "report-timezone": "Europe/Oslo"
    }
  }
}