Multiple Out of Memory Errors

Hi,

Ive noticed a lot of OOM errors on my setup lately and would like some help figuring out how to manage this.
First,

  • OOM error running step ''fingerprint-fields'' for vertica
  • OOM error: Ran out of memory retrieving query results.

Below are more details on the same

OOM error running step ''fingerprint-fields'' for vertica

WARN sync.util :: Error running step ''fingerprint-fields'' for vertica''
java.lang.OutOfMemoryError: Java heap space
Caused by: Attached Guest Language Frames (16)
2024-11-18 12:02:13,563 INFO sync.util :: #033[35mFINISHED: step ''fingerprint-fields'' for vertica'' (1.5 hours)#033[0m
2024-11-18 12:02:13,563 INFO sync.util :: #033[35mSTARTING: step ''classify-fields'' for vertica''#033[0m
2024-11-18 12:02:13,577 DEBUG middleware.log :: #033[32mGET /api/timeline 200 1.1 hours (7 DB calls) App DB connections: 44/150 Jetty threads: 45/50 (0 idle, 109 queued) (227 total active threads) Queries in flight: 15 (0 queued) {:metabase-user-id 667}#033[0m
2024-11-18 12:02:13,586 DEBUG middleware.log :: #033[32mGET /api/alert/question/25890 200 42.8 s (1 DB calls) App DB connections: 41/150 Jetty threads: 45/50 (0 idle, 109 queued) (228 total active threads) Queries in flight: 15 (0 queued) {:metabase-user-id 507}#033[0m
2024-11-18 12:02:19,876 DEBUG middleware.log :: #033[32mGET /api/alert/question/17939 200 6.3 s (1 DB calls) App DB connections: 54/150 Jetty threads: 45/50 (0 idle, 91 queued) (230 total active threads) Queries in flight: 19 (0 queued) {:metabase-user-id 507}#033[0m
2024-11-18 12:02:19,900 INFO middleware.exceptions :: Request canceled before finishing.
2024-11-18 12:02:19,904 INFO middleware.exceptions :: Request canceled before finishing.
2024-11-18 12:02:26,124 ERROR metabase.server :: Unexpected Exception in API request handler
org.eclipse.jetty.io.EofException

This issue has made it impossible to sync Vertica dbs in the instance with the frequent timeouts due to it. I have to sync missing data manually on a table by table basis which is a pain

OOM error; Pulse card render error

  :at [org.postgresql.core.v3.QueryExecutorImpl processResults "QueryExecutorImpl.java" 2383]}
 {:type org.postgresql.util.PSQLException,
  :message "Ran out of memory retrieving query results.",
  :at [org.postgresql.core.v3.QueryExecutorImpl processResults "QueryExecutorImpl.java" 2383]}
 {:type java.lang.OutOfMemoryError, :message "Java heap space"}
 {:type com.oracle.truffle.api.TruffleStackTrace$LazyStackTrace}],
:trace [],
:action_id nil,
:json_query
{:constraints {:max-results 10000, :max-results-bare-rows 2000},
 :type :query,
 :middleware {:js-int-to-string? true, :ignore-cached-results? false, :userland-query? true},
 :cache-strategy {:multiplier 10, :min_duration_ms 30000, :type :ttl, :avg-execution-ms 9116},

Diagnostic Info

{
  "browser-info": {
    "language": "en-GB",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36",
    "vendor": "Google Inc."
  },
  "metabase-info": {
    "databases": [
      "postgres",
      "starburst",
      "vertica",
      "clickhouse",
      "h2"
    ],
    "run-mode": "prod",
    "plan-alias": "",
    "version": {
      "date": "2024-11-04",
      "tag": "v0.51.2",
      "hash": "8bdb22c"
    },
    "settings": {
      "report-timezone": null
    },
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "15.3"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.7.3"
      }
    }
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.22+7",
    "java.vendor": "Eclipse Adoptium",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "11.0.22",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.22+7",
    "os.name": "Linux",
    "os.version": "4.18.0-372.19.1.el8_6.x86_64",
    "user.language": "en",
    "user.timezone": "Etc/UTC"
  }
}

The machine has 7 cores and 32GB of RAM. Below is my init

ExecStart=/usr/java/latest/bin/java -Dlog4j.configuration="/opt/metabase/config" -Xms20g -Xmx25g -XX:MaxGCPauseMillis=320 -jar /opt/metabase/metabase.jar

Maybe someone can advise what options I can look into to manage memory usage for this instance?
In the end, I would like the instance to support all users during operation hours and also be able to run full syncs on databases through the night.

Thanks

Hi @kweyukevin, is there any way we can deploy a Vertica server? we can't repro this unfortunately

Hi @Luiggi ,
Not really sure how as not an expert myself but perhaps the issue lies mostly with memory management on the application side rather than anything to do with Vertica. For instance, I have three connections to different Vertica instances all with huge amount of records but this one is the largest and the one that fails frequently with the memory error. The others sync and query just fine.
Now, Im not sure if my memory configuration/allocation is enough or not. The server itself is not starved of memory but the application seems to be.
Maybe you might have ideas?
Regards,

your server is bigger than the biggest instance we have in our cloud, so there's something in the way we sync/scan/fingerprint that blows up the heap. We really REALLY need to set up a vertica db to fix this

I see what you mean. Thats going to be a tall order as Im not really an expert there. I work with an existing instance and the document I have from sources for installation is this. Please note that this is not a free service and that puts on another layer of challenges. Not sure how much more I can help out on that then.