Crashing after db update-field-values

avasenin · May 16, 2021, 8:12am

Hi, after latest updates my metabase implementation crashes every time (night) after updating db field values.
It also crashes after after I click "re-scan field values now" in admin.

Error is:
# Internal Error (debug.cpp:338), pid=1, tid=51
# fatal error: OutOfMemory encountered: Java heap space

I tried to increase heap space to Java, but it didn't help.

Logs with dumps are here:

{
  "browser-info": {
    "language": "en-US",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.11+9",
    "java.vendor": "AdoptOpenJDK",
    "java.vendor.url": "https://adoptopenjdk.net/",
    "java.version": "11.0.11",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.11+9",
    "os.name": "Linux",
    "os.version": "4.4.0-1128-aws",
    "user.language": "en",
    "user.timezone": "GMT"
  },
  "metabase-info": {
    "databases": [
      "postgres"
    ],
    "hosting-env": "unknown",
    "application-database": "h2",
    "application-database-details": {
      "database": {
        "name": "H2",
        "version": "1.4.197 (2018-03-18)"
      },
      "jdbc-driver": {
        "name": "H2 JDBC Driver",
        "version": "1.4.197 (2018-03-18)"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2021-04-27",
      "tag": "v0.39.1",
      "branch": "release-x.39.x",
      "hash": "6beba48"
    },
    "settings": {
      "report-timezone": "Europe/Moscow"
    }
  }
}

flamber · May 16, 2021, 9:12am

Hi @avasenin

While Metabase isn't supposed to crash - it can happen, since it's bound by the Java process.

It's difficult to know how much memory is available. It looks like the host has 4GB and the Metabase container can consume 3GB, but if you have other things running on the host, then there might be less than 3GB available, but the Java process is still trying to consume up to 3GB, which will fail at some point.

For reference: https://www.metabase.com/docs/latest/troubleshooting-guide/running.html

Try starting your container with the debug logging, which should help narrowing down where it's having problems in the scan process:
... -e JAVA_OPTS="-Dlog4j.configurationFile=https://log4j.us/v2/templates/metabase?trace=metabase.sync" ...

And you should migrate away from H2 if you're using Metabase in production:
https://www.metabase.com/docs/latest/operations-guide/migrating-from-h2.html

avasenin · May 16, 2021, 11:50am

Hi, thank you for your reply,

Before this problem appears, I was using host with 2G RAM without specifying how much memory for JAVA. All worked fine.
There is only one another container in this host - postgresql database.

My installation is not so often used and I don't understand why it needs so much memory.

Now I performed a test:

Removed and started new container without specifying memory for JAVA (it picked 1g)
After metabase became up, triggered re-scan fields values
And it also crashed with the same error.

Why metabase need so much memory during scanning fields values?

avasenin · May 16, 2021, 12:00pm

I've enabled debug logging, you can see new logs here:

avasenin · May 16, 2021, 1:19pm

I've also looked through these logs and found some interesting things.

There is a field "send files".
Syncing this field's values or trying to fetch it via SQL causes metabase and container crash.

I've made this table hidden to disable scanning its values. And now scanning of total db fields completes correctly.

Do you know why syncing this fields values causes metabase to use all memory?

flamber · May 16, 2021, 4:16pm

@avasenin Can you go to this URL in Metabase and post the output here: /api/field/524
It should tell us a lot of information about the send_files column, but I'm guessing it contains large amount of information, which Metabase should skip during scan, but for some reason we're not, so it would be great if you could provide as much information about the column, so we can try to reproduce and create a fix for it.

avasenin · May 16, 2021, 7:30pm

{
  "description": "(DC2Type:json)",
  "database_type": "text",
  "semantic_type": "type/Category",
  "table_id": 135,
  "coercion_strategy": null,
  "table": {
    "description": null,
    "entity_type": "entity/GenericTable",
    "schema": "public",
    "db": {
      "description": null,
      "features": [
        "full-join",
        "basic-aggregations",
        "standard-deviation-aggregations",
        "expression-aggregations",
        "percentile-aggregations",
        "foreign-keys",
        "right-join",
        "left-join",
        "native-parameters",
        "nested-queries",
        "expressions",
        "set-timezone",
        "regex",
        "case-sensitivity-string-filter-options",
        "binning",
        "inner-join",
        "advanced-math-expressions"
      ],
      "cache_field_values_schedule": "0 0 5 ? *",
      "timezone": "Europe/Moscow",
      "auto_run_queries": true,
      "metadata_sync_schedule": "0 37 ? *",
      "name": "inda",
      "caveats": null,
      "is_full_sync": true,
      "updated_at": "2021-05-07T19:18:14.171",
      "details": {
        "let-user-control-scheduling": false
      },
      "is_sample": false,
      "id": 35,
      "is_on_demand": false,
      "options": null,
      "engine": "postgres",
      "refingerprint": null,
      "created_at": "2020-11-24T12:13:05.996",
      "points_of_interest": null
    },
    "show_in_getting_started": false,
    "name": "booking_telegram_message",
    "caveats": null,
    "updated_at": "2021-05-16T13:08:57.345",
    "entity_name": null,
    "active": true,
    "id": 135,
    "db_id": 35,
    "visibility_type": "technical",
    "field_order": "database",
    "display_name": "Booking Telegram Message",
    "created_at": "2021-02-09T19:00:12.206",
    "points_of_interest": null
  },
  "name": "send_files",
  "fingerprint_version": 5,
  "has_field_values": "list",
  "settings": null,
  "caveats": null,
  "fk_target_field_id": null,
  "dimensions": [],
  "updated_at": "2021-04-09T09:37:26.207",
  "custom_position": 0,
  "effective_type": "type/Text",
  "active": true,
  "parent_id": null,
  "id": 524,
  "last_analyzed": "2021-04-09T09:37:27.617",
  "position": 0,
  "visibility_type": "normal",
  "preview_display": true,
  "display_name": "Send Files",
  "database_position": 15,
  "name_field": null,
  "fingerprint": {
    "global": {
      "distinct-count": 39,
      "nil%": 0
    },
    "type": {
      "type/Text": {
        "percent-json": 1,
        "percent-url": 0,
        "percent-email": 0,
        "percent-state": 0,
        "average-length": 365.9663056558363
      }
    }
  },
  "created_at": "2021-02-09T19:00:13.793",
  "base_type": "type/Text",
  "points_of_interest": null
}

avasenin · May 16, 2021, 7:31pm

Yes, you are right.
It is not correct, but this field contains sent images. We will fix this in future.
I've sent you output.

flamber · May 17, 2021, 9:15am

@avasenin Would you be able to provide a dump of one of the values? It has to be exact, so if there's spaces or newlines before/after, then those has to be included.

I can see we're fingerprinting it as JSON, and it seems like you only have 39 distinct values of send_files, and they don't look like they're that long (average of 365 characters), so I'm really wondering how such a small result could lead to a OOM - though it might be some recursive problem.

avasenin · May 18, 2021, 8:13am

I will ask developers to send my dump.
I also tried to fetch it via metabase sql and it also leads to crash)