PostgreSQL - SQL_ASCII

We have a PostgreSQL database that we use Metabase since 4 months. We were first on v 0.30.4. At that time, we were able to update the database schema without issues. Now with version 0.31 (tried RC1, .1 and .2) all three versions cause the following error message :

01-29 10:52:41 ←[1mDEBUG sync.util←[0m :: STARTING: step ‘sync-tables’ for postgres Database 2 ‘Arcfil’
01-29 10:52:41 ←[1mERROR sync.util←[0m :: Error running sync step: Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.
(“org.postgresql.jdbc.PgResultSet.getString(PgResultSet.java:1923)”
“org.postgresql.jdbc.PgResultSet.internalGetObject(PgResultSet.java:198)”
“org.postgresql.jdbc.PgResultSet.getObject(PgResultSet.java:2568)”
“com.mchange.v2.c3p0.impl.NewProxyResultSet.getObject(NewProxyResultSet.java:165)”
“clojure.java.jdbc$dft_read_columns$fn__4578.invoke(jdbc.clj:468)”
“clojure.core$mapv$fn__8088.invoke(core.clj:6832)”
“clojure.lang.LongRange.reduce(LongRange.java:233)”
“clojure.core$reduce.invokeStatic(core.clj:6747)”
“clojure.core$mapv.invokeStatic(core.clj:6823)”
“clojure.core$mapv.invoke(core.clj:6823)”
“clojure.java.jdbc$dft_read_columns.invokeStatic(jdbc.clj:468)”
“clojure.java.jdbc$dft_read_columns.invoke(jdbc.clj:464)”
“clojure.java.jdbc$result_set_seq$row_values__4588.invoke(jdbc.clj:514)”
“clojure.java.jdbc$result_set_seq$thisfn__4590.invoke(jdbc.clj:524)”
“clojure.java.jdbc$result_set_seq$thisfn__4590$fn__4591.invoke(jdbc.clj:524)”
“clojure.lang.LazySeq.sval(LazySeq.java:40)”
“clojure.lang.LazySeq.seq(LazySeq.java:49)”
“clojure.lang.Cons.next(Cons.java:39)”
“clojure.lang.RT.next(RT.java:706)”
“clojure.core$next__5108.invokeStatic(core.clj:64)”
“clojure.core$dorun.invokeStatic(core.clj:3134)”
“clojure.core$doall.invokeStatic(core.clj:3140)”
“clojure.core$doall.invoke(core.clj:3140)”
“–> driver.generic_sql$get_tables.invokeStatic(generic_sql.clj:304)”
“driver.generic_sql$get_tables.invoke(generic_sql.clj:298)”
“driver.generic_sql$fast_active_tables$iter__33373__33379$fn__33380.invoke(generic_sql.clj:318)”
“driver.generic_sql$fast_active_tables.invokeStatic(generic_sql.clj:317)”
“driver.generic_sql$fast_active_tables.doInvoke(generic_sql.clj:306)”
“driver.generic_sql$fn__33068$G__32960__33075.invoke(generic_sql.clj:30)”
“driver.generic_sql$describe_database.invokeStatic(generic_sql.clj:382)”
“driver.generic_sql$describe_database.invoke(generic_sql.clj:377)”
“driver$fn__29794$G__29745__29801.invoke(driver.clj:106)”
“sync.fetch_metadata$fn__41532$db_metadata__41537$fn__41538.invoke(fetch_metadata.clj:13)”
“sync.fetch_metadata$fn__41532$db_metadata__41537.invoke(fetch_metadata.clj:10)”
“sync.sync_metadata.tables$fn__42686$db_metadata__42691$fn__42692.invoke(tables.clj:142)”
“sync.sync_metadata.tables$fn__42686$db_metadata__42691.invoke(tables.clj:139)”
“sync.sync_metadata.tables$fn__42747$sync_tables_BANG___42752$fn__42753.invoke(tables.clj:159)”
“sync.sync_metadata.tables$fn__42747$sync_tables_BANG___42752.invoke(tables.clj:154)”
“sync.util$fn__35786$run_step_with_metadata__35791$fn__35795$fn__35797.invoke(util.clj:355)”
“sync.util$with_start_and_finish_logging_SINGLEQUOTE_.invokeStatic(util.clj:103)”
“sync.util$with_start_and_finish_logging_SINGLEQUOTE_.invoke(util.clj:97)”
“sync.util$with_start_and_finish_debug_logging.invokeStatic(util.clj:120)”
“sync.util$with_start_and_finish_debug_logging.invoke(util.clj:117)”
“sync.util$fn__35786$run_step_with_metadata__35791$fn__35795.invoke(util.clj:352)”
“sync.util$fn__35786$run_step_with_metadata__35791.invoke(util.clj:347)”
“sync.util$fn__35976$run_sync_operation__35981$fn__35982$fn__35983.invoke(util.clj:437)”
“sync.util$fn__35976$run_sync_operation__35981$fn__35982.invoke(util.clj:437)”
“sync.util$fn__35976$run_sync_operation__35981.invoke(util.clj:431)”
“sync.sync_metadata$fn__42800$sync_db_metadata_BANG___42805$fn__42806$fn__42807.invoke(sync_metadata.clj:51)”
“sync.util$do_with_error_handling.invokeStatic(util.clj:148)”
“sync.util$do_with_error_handling.invoke(util.clj:143)”
“sync.util$do_with_error_handling.invokeStatic(util.clj:146)”
“sync.util$do_with_error_handling.invoke(util.clj:143)”
“driver$fn__29991.invokeStatic(driver.clj:291)”
“driver$fn__29991.invoke(driver.clj:291)”
“driver$fn__29926$G__29733__29935.invoke(driver.clj:106)”
“sync.util$sync_in_context$fn__35692.invoke(util.clj:139)”
“sync.util$with_db_logging_disabled$fn__35689.invoke(util.clj:130)”
“sync.util$with_start_and_finish_logging_SINGLEQUOTE_.invokeStatic(util.clj:103)”
“sync.util$with_start_and_finish_logging_SINGLEQUOTE_.invoke(util.clj:97)”
“sync.util$with_start_and_finish_logging$fn__35678.invoke(util.clj:115)”
“sync.util$with_sync_events$fn__35673.invoke(util.clj:89)”
“sync.util$with_duplicate_ops_prevented$fn__35664.invoke(util.clj:68)”
“sync.util$do_sync_operation.invokeStatic(util.clj:167)”
“sync.util$do_sync_operation.invoke(util.clj:164)”
“sync.sync_metadata$fn__42800$sync_db_metadata_BANG___42805$fn__42806.invoke(sync_metadata.clj:50)”
“sync.sync_metadata$fn__42800$sync_db_metadata_BANG___42805.invoke(sync_metadata.clj:47)”
“sync$fn__42850$sync_database_BANG___42855$fn__42856$fn__42857.invoke(sync.clj:28)”
“sync.util$do_with_error_handling.invokeStatic(util.clj:148)”
“sync.util$do_with_error_handling.invoke(util.clj:143)”
“sync.util$do_with_error_handling.invokeStatic(util.clj:146)”
“sync.util$do_with_error_handling.invoke(util.clj:143)”
“driver$fn__29991.invokeStatic(driver.clj:291)”
“driver$fn__29991.invoke(driver.clj:291)”
“driver$fn__29926$G__29733__29935.invoke(driver.clj:106)”
“sync.util$sync_in_context$fn__35692.invoke(util.clj:139)”
“sync.util$with_db_logging_disabled$fn__35689.invoke(util.clj:130)”
“sync.util$with_start_and_finish_logging_SINGLEQUOTE_.invokeStatic(util.clj:103)”
“sync.util$with_start_and_finish_logging_SINGLEQUOTE_.invoke(util.clj:97)”
“sync.util$with_start_and_finish_logging$fn__35678.invoke(util.clj:115)”
“sync.util$with_sync_events$fn__35673.invoke(util.clj:89)”
“sync.util$with_duplicate_ops_prevented$fn__35664.invoke(util.clj:68)”
“sync.util$do_sync_operation.invokeStatic(util.clj:167)”
“sync.util$do_sync_operation.invoke(util.clj:164)”
“sync$fn__42850$sync_database_BANG___42855$fn__42856.invoke(sync.clj:26)”
“sync$fn__42850$sync_database_BANG___42855.invoke(sync.clj:19)”
“events.sync_database$process_sync_database_event$fn__63575.invoke(sync_database.clj:34)”)

I’ve rolled-back a test DB with version 0.30.4 with same data in DB and 0.30.4 schema synch works again. When I migrate back to 0.31.2, I get the error back.

We don’t have any ASCII characters > 127 in our tables names or fields names. We have some in the tables/fields comments and values though. I’m aware that SQL_ASCII is not the ideal database encoding, but our app is developed that way and we don’t have an easy way out of that encoding. But again, it wasn’t seem to be an issue with version 0.30.

Our main concern is that since we have version 0.31 we started using new Question chart types (ie. Gauges). So we don’t want to go back to version 0.30, but we are looking adding new tables/views to our DB so if the schema sync doesn’t work, we won’t be able to use in new questions creation.

Hi @slevert

Which database do you use for the backend (Metabase's internal database)?

I don't think this is only related to table/column names, but actual data also.
Not sure if you can use a JDBC parameter to do some conversion - searching the interwebs, it seems like everyone is recommending converting the database to UTF-8.

I cannot find anyone else, who has had this problem, so I'm going to try to ping @camsaul who is one of the core developers.

The Postgres documentation is really not fond of SQL_ASCII either :expressionless:

The SQL_ASCII setting behaves considerably differently from the other settings. When the server character set is SQL_ASCII, the server interprets byte values 0-127 according to the ASCII standard, while byte values 128-255 are taken as uninterpreted characters. No encoding conversion will be done when the setting is SQL_ASCII. Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting because PostgreSQL will be unable to help you by converting or validating non-ASCII characters.

Hi, I tried adding a parameter to the JDBC connection to force the client encoding (tried both LATIN2 and UTF-8) but no luck so far. Still get the same error in the backend when trigering a DB schema scan. If you have any custom string that you want me to try, they’ll be welcome.

My current test is to try to connect to an “empty” DB. All tables/fields are there, but no data. I still get the error. So it may read the colums comments which contains some >127 ASCII characters.

I understand that SQL_ASCII is not the prefered encoding, but I have to deal with it at the moment. My main interogation at that point is the change in the behavior between 0.30.4 and 0.31.X. Was it ignoring the column comments before? Was there any change in the way the JDBC connection is initialized?

Hey @slevert @flamber, based on that stacktrace it looks like it’s the Postgres JDBC driver that’s not handling non-SQL_ASCII encoded strings in your SQL_ASCII-encoded database well, rather than code in Metabase itself. Not 100% sure why it would have worked before but stopped working now. I think we upgraded the version of the Postgres JDBC driver between those releases, so it might be a regression there.

I’d try looking for information about issues with the Postgres JDBC driver handling different encodings in the same database and see if you can find anything that might help you fix it (perhaps some different connection string options?)

Good morning @camsaul, thank you for further investigation in my case. Can you let me know what is the precise Postgres JDBC driver/version that you are using in 0.31? I will at least try different parameters combinaison while you look deeper into where the regression might come from.

FYI, all existing questions/dashboards that were built in the earliest version (0.30) still work with 0.31.2. I can also build new questions/dashboards with tables/views that were sync at the time we were running 0.30. Only schema synchronization seems to worry about characters. That prevents us creating questions based on new tables/views.

So it turns out the Postgres JDBC driver version did not change between the two releases.

I looked around a bit more and it turns out 0.31.0 was the version where we started syncing comments/remarks. So I guess that’s the issue

@camsaul so that makes sense why we sync started to complain during schema sync at version 0.31.0. From there, we can work into two directions.

  1. Find the proper parameter for the JDBC driver to properly convert the characters between the server and client
  2. Add a parameter to Metabase to ignore field comments. Not sure what would be the consequence of not syncing comments anymore like it was back in 0.30.

Yeah, those sound like good starting steps. If you end up going with number 2 we’d love a PR!

I’m sorry, I’m not too much familiar with acronyms yet. What is PR?

Pull Request - it’s a typical Github term :slight_smile:

I tried adding again JDBC additional parameters but nothing seems to have an effect (still get same error during the shema sync). Here is my current string:

allowEncodingChanges=true&client_encoding=LATIN2

But PostgreSQL JDBC documentation doesn’t seem too hot about encoding change anymore. There is the charSet parameter that doesn’t seem to be applicable to latest server version anymore (ours is 9.3)

charSet = String

The character set to use for data sent to the database or received from the database. This property is only relevant for server versions less than or equal to 7.2. The 7.3 release was the first with multibyte support compiled by default and the driver uses its character set translation facilities instead of trying to do it itself.

Then, there is the allowEncodingChanges allong with client_encoding, but it only seem to have effect on COPY command.

allowEncodingChanges = boolean

When using the V3 protocol the driver monitors changes in certain server configuration parameters that should not be touched by end users. The client_encoding setting is set by the driver and should not be altered. If the driver detects a change it will abort the connection. There is one legitimate exception to this behaviour though, using the COPY command on a file residing on the server’s filesystem. The only means of specifying the encoding of this file is by altering the client_encoding setting. The JDBC team considers this a failing of the COPY command and hopes to provide an alternate means of specifying the encoding in the future, but for now there is this URL parameter. Enable this only if you need to override the client encoding when doing a copy.

Thanks @flamber and @camsaul . I have to admit that I’m also newbie on Github.

Anyhow, not sure I made it properly, but I created a Pull Request : https://github.com/metabase/metabase/pull/9349

What were you trying to create a pull request for? Did you fix your issue?

@camsaul, no the issue is not fixed. I have tried multiple combinaison for JDBC parameters but none so far as any effect. Not even sure if any conversion parameter is still permited by PostgreSQL.

As for for the PR, I only added the pull request. Have never been into Java coding before. My hope at the moment is to someone to look at the request and create a parameter for it.

Is it whishfull thinking?

Pull requests are requests for us to “pull” your code changes into the main codebase. So if you figured out what change you needed to make to the Metabase code to fix your issue, you could open a pull request so we could include that fix in Metabase and send it out to everyone who uses it. See https://help.github.com/articles/about-pull-requests/