Postgres 12.4 Segmentation Fault when connected to Metabase

deo · September 14, 2020, 6:02pm

Hi,

I have Metabase 0.35.4 running under Docker
I also have “vanilla” Postgress 12.4 running as a VM under CentoOS 7 with 1 test database.
Metabase is connected to to this database to visualize data.

Now, 1-2-3 times a day Postgres12 goes down with “Segmentation Fault” issue.

SAMPLE LOG:
2020-09-03 14:00:02.046 BST [3989] DETAIL: Failed process was running: – Metabase
2020-09-03 14:00:02.030 BST [1474] LOG: server process (PID 1854) was terminated by signal 11: Segmentation fault
2020-09-03 14:00:02.030 BST [1474] DETAIL: Failed process was running: – Metabase
SELECT “creditcard_demo”.“sysdiagrams”.“definition” AS “definition”, “creditcard_demo”.“sysdiagrams”.“name” AS “name”, “creditcard_demo”.“sysdiagrams”.“version” AS “version”, “creditcard_demo”.“sysdiagrams”.“principal_id” AS “principal_id”, “creditcard_demo”.“sysdiagrams”.“diagram_id” AS “diagram_id” FROM “creditcard_demo”.“sysdiagrams” LIMIT 10000
2020-09-03 14:00:02.031 BST [1474] LOG: terminating any other active server processes

Every time in the logs i see that the last SQL statement before this crash was the following (even if no activity is happening in the Metabase):

SELECT “creditcard_demo”.“sysdiagrams”.“definition” AS “definition”, “creditcard_demo”.“sysdiagrams”.“name” AS “name”, “creditcard_demo”.“sysdiagrams”.“version” AS “version”, “creditcard_demo”.“sysdiagrams”.“principal_id” AS “principal_id”, “creditcard_demo”.“sysdiagrams”.“diagram_id” AS “diagram_id” FROM “creditcard_demo”.“sysdiagrams” LIMIT 10000

Well…“sysdiagrams” does not exist in Postgres, it exists in MS SQL Server, but the datasource crated for the Postgres and it is the only datasource on that Metabase instance.
If i run this query from, say, DBeaver, Postgress will fail to run it, but will not crash.
However, when this is received from Metabase, it causes Segmentation Fault.

On the Metabase side we see the following, when this is happening:

09-03 13:00:01 INFO sync.analyze :: fingerprint-fields Analyzed [*****·············································] ߘ⠠ 10% Table 30 ‘public.d_date’
09-03 13:00:02 ERROR sync.util :: Error fingerprinting Table 6 ‘creditcard_demo.sysdiagrams’
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:337)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108)
----lots of other similar things

If Metabase is OFF, then Postgress is happyly running with no issues.

The only strange thing i noticed is that this is always happens at the begin of the hour (see log)

Any idea how to get this fixed?

flamber · September 14, 2020, 6:30pm

Hi @deo

Please post “Diagnostic Info” from Admin > Troubleshooting.

I have never heard of something like this, so I think the problem is somewhere in Postgres.
Again, if Metabase can send a simple query and that crashes your database, then it must be a bug in Postgres. A client should never be able to crash a server.
I would recommend that you report the problem to them.

As for the table sysdiagrams, which you say only exists on MSSQL, but the query is being executed against Postgres. Can you check the query_execution (table in the Metabase application database) to get details on who/when/where the query is being executed?

deo · September 15, 2020, 3:44pm

Hi and thanks for the response.

RE: Again, if Metabase can send a simple query and that crashes your database, then it must be a bug in Postgres.

Its not the query that takes Postgress down as this query can be run from DBeaver or directly in Postgress and will just error out.
Its just happened to be the last record in the log before postgress goes down.
There are obviously 2 issues here: 1) Metabase should not send queries that are not relevent to the datasource and 2) Postgress should not gow down regardless of what Client decided to request.

RE: query_execution table in the 'metabase' database

This table has no records for the days where i was trying to trace the issue. Actually it has no records for Septemper at all.

Here is the “Diagnostic Info” as per your request:

{
"browser-info": {
"language": "en-US",
"platform": "Win32",
"userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
"vendor": "Google Inc."
},
"system-info": {
"file.encoding": "UTF-8",
"java.runtime.name": "OpenJDK Runtime Environment",
"java.runtime.version": "11.0.7+10",
"java.vendor": "AdoptOpenJDK",
"java.vendor.url": "https://adoptopenjdk.net/",
"java.version": "11.0.7",
"java.vm.name": "OpenJDK 64-Bit Server VM",
"java.vm.version": "11.0.7+10",
"os.name": "Linux",
"os.version": "3.10.0-1062.18.1.el7.x86_64",
"user.language": "en",
"user.timezone": "UTC"
},
"metabase-info": {
"databases": [
"h2",
"postgres"
],
"hosting-env": "unknown",
"application-database": "postgres",
"application-database-details": {
"database": {
"name": "PostgreSQL",
"version": "12.2"
},
"jdbc-driver": {
"name": "PostgreSQL JDBC Driver",
"version": "42.2.8"
}
},
"run-mode": "prod",
"version": {
"date": "2020-05-28",
"tag": "v0.35.4",
"branch": "release-0.35.x",
"hash": "b3080fa"
},
"settings": {
"report-timezone": null
}
}
}

I hope it helps. Thanks.

flamber · September 15, 2020, 3:52pm

@deo

I think you need to setup a lot more logging, because I don’t think the query comes from Metabase.
Correct, a server should not be able to crash because of a client, but then you should report that problem to Postgres.

deo · September 15, 2020, 4:16pm

@flamber

that was a test bed to evaluate metabase, so nothing else was sending requests.
Also, if metabase is turned off, then the issue goes away.
Additionally, have created 2nd setup before raising ticket with Metabase and both have same problem.
I will try raising a ticket with Postgress and see if they can fix it on their end.

thanks for your input.

flamber · September 15, 2020, 4:24pm

@deo
I don’t see the MSSQL database configured in your Diagnostic Info, so has that been deleted at some point?
If yes, then it sounds like it might have been left-over data, which has been fixed in latest release:
https://github.com/metabase/metabase/issues/11813
There’s also a lot of sync+scan fixes in 0.36, so I would recommend upgrading.

deo · September 15, 2020, 5:07pm

@flamber

ms sql database was never registered. its when i started to google about that schema, i found that there is one in mssql, not in postgres,
so i was wondering why is it quering that from postgresql. to me it looks like a bug. Anyway, I will try to upgrade version of metabase as well.

flamber · September 15, 2020, 6:25pm

@deo There’s more than 25k Metabase active installations and I have never ever seen this problem before. I’m not saying it’s impossible it’s Metabase, since I know we have a lot of open issues, but I think this is something specific in your setup.

I would love to figure out more, but it would require a lot more logging and digging through the application database (specifically the metabase_table and metabase_field) to figure out if Metabase has ever seen this table.

We don’t have any code that references sysdiagrams, so that’s why I’m fairly sure it’s not the Metabase core, but it could be a user created question, since you could input anything, but then you’ll find the query in query_execution