Incorrect string value error (Polish characters)

Hi all,

Would really appreciate any help with an error I'm having. It happens when I run a MySQL view through Metabase (v0.39.2)

Incorrect string value: '\xC5\x81\xC3\xB3d\xC5...' for column '$str' at row 1`

The question used to work fine, but it suddenly stopped working. I can still run the view through mySQL workbench without errors.

I did some digging and was able to find the issue related to just two entries. Both contained polish characters. Specifically "ó".

I spent some time on google and most of the usual responses for this error focus on ensuring the charset is utf8mb4. But I've checked the char set for that field in the properties of the mySQL view and it's showing as utf8mb4.

The default charset for the database is latin1. Do I need to change that maybe?

Hi @DailySamba

So when did it "suddenly stopped working"? There must have been a change somewhere - MySQL was updates, perhaps Metabase was, or perhaps the database structure was changed?

Charset and collation are difficult, when it's mixed. Keeping everything as UTF8(mb4) is the best way.

Post "Diagnostic Info" from Admin > Troubleshooting.

Try running the following via Metabase's Native query:

SHOW SESSION VARIABLES WHERE Variable_name LIKE 'version' OR Variable_name LIKE 'character_set%' OR Variable_name LIKE 'collation%';
SHOW GLOBAL VARIABLES WHERE Variable_name LIKE 'version' OR Variable_name LIKE 'character_set%' OR Variable_name LIKE 'collation%';

Thanks for the quick response Flamber! The thing that changed was when the data changed, we had data that included polish characters. Specifically "Łódź" Before this there was no issue, and when I removed that data the query ran again.

Thanks for the tips on diagnostic data. I'll take a look now.

@DailySamba I can recommend reading this article on how to move to UTF8:
https://fromdual.com/mariadb-and-mysql-character-set-conversion

1 Like

Here's the results

SESSION VARIABLES

Variable_name Value
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database latin1
character_set_filesystem binary
character_set_results utf8mb4
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
collation_connection utf8mb4_unicode_ci
collation_database latin1_swedish_ci
collation_server latin1_swedish_ci
version 5.7.21-0ubuntu0.16.04.1

GLOBAL VARIABLES

Variable_name Value
character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
collation_connection latin1_swedish_ci
collation_database latin1_swedish_ci
collation_server latin1_swedish_ci
version 5.7.21-0ubuntu0.16.04.1

@DailySamba So you basically have everything as the default (latin1 Swedish), which is why you're having problems. Follow the article I linked and change all your database structure and data to UTF8mb4

@DailySamba Alternatively, try adding this to the connection string in Admin > Databases > (your-db):

sessionVariables=character_set_server=utf8mb4

I don't know if it will work. Perhaps. If not, then I would read the article and change everything.

Thanks so much Flamber!

Do you mean under "Additional JDBC connection string options"?

@DailySamba Yes

Thanks again Flamber.

I saw some improvement when I ran the session variable query. Now Character set server and collation server have both changed to utf8mb4. There was no change to the results of the global query.

But in any case it didn't fix the problem. I'll try the article you linked to.

Hi Flamber,

Just to give you an update we updated our database to MySQL8, set all the collation to utf8mb4 and ... same error.

Do you have any more suggestions?

DIAGNOSTIC INFO

{
"browser-info": {
"language": "en-GB",
"platform": "Win32",
"userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
"vendor": "Google Inc."
},
"system-info": {
"file.encoding": "UTF-8",
"java.runtime.name": "OpenJDK Runtime Environment",
"java.runtime.version": "11.0.11+9",
"java.vendor": "AdoptOpenJDK",
"java.vendor.url": "https://adoptopenjdk.net/",
"java.version": "11.0.11",
"java.vm.name": "OpenJDK 64-Bit Server VM",
"java.vm.version": "11.0.11+9",
"os.name": "Linux",
"os.version": "5.8.0-59-generic",
"user.language": "en",
"user.timezone": "GMT"
},
"metabase-info": {
"databases": [
"mysql"
],
"hosting-env": "unknown",
"application-database": "h2",
"application-database-details": {
"database": {
"name": "H2",
"version": "1.4.197 (2018-03-18)"
},
"jdbc-driver": {
"name": "H2 JDBC Driver",
"version": "1.4.197 (2018-03-18)"
}
},
"run-mode": "prod",
"version": {
"date": "2021-05-17",
"tag": "v0.39.2",
"branch": "release-x.39.x",
"hash": "b0d9436"
},
"settings": {
"report-timezone": "Europe/London"
}
}
}

SESSION VARIABLES

Variable_name Value
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8mb4
character_set_system utf8mb3
character_sets_dir /usr/share/mysql-8.0/charsets/
collation_connection utf8mb4_unicode_ci
collation_database utf8mb4_unicode_ci
collation_server utf8mb4_unicode_ci
version 8.0.25

GLOBAL VARIABLES

Variable_name Value
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8mb4
character_set_system utf8mb3
character_sets_dir /usr/share/mysql-8.0/charsets/
collation_connection utf8mb4_unicode_ci
collation_database utf8mb4_unicode_ci
collation_server utf8mb4_unicode_ci
version 8.0.25

@DailySamba You'll want to change character_set_system to utf8mb4
And it's important to remember to update all your data if it was stored in a different format.
Unless everything is running UTF8, then you're just going to fight encoding problems everywhere.

And remember that you have only posted you variables, that doesn't tell if your database, table or columns are using a different encoding.

Latest release of Metabase is 0.41.2:
https://github.com/metabase/metabase/releases/latest
And you should migrate away from H2 if you're running in production:
https://www.metabase.com/docs/latest/operations-guide/migrating-from-h2.html

Thanks Flamber, I'll take another look

It's fixed! Thanks again for al your help Flamber.

@DailySamba It would be great if you explained what the problem was and how you fixed it, so others might learn from it, when they search the forum.

Hi @flamber I just followed your advice. I checked through my tables, functions and views. I had a rogue function which was returning text as latin1 still rather than utf8mb4.

Interestingly I'd already tried changing that before we updated the database collations and it didn't fix the problem.

So it really needed absolutely everything to be aligned with utf8mb4.

@DailySamba Thank you. Trust me, you're not the first, and not the last person who have been fighting with encoding. But your life should be so much easier going forward, now that everything is UTF8 and you don't need to go through multiple conversions.
If you have ever tried to support multiple legacy apps, which each had their own encoding, but storing information to the same database, then you'll know how much work it takes.

I'm sure you're right. And we managed to upgrade MySQL to the latest version too.

By the way apparently you can't change character_set_system to utf8mb4 in MySQL 8.0
https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_character_set_system

Thought I'd mention it in case it saves anyone else going down a rabbit hole.

1 Like