Thanks for the help. I’ve heard back from our admin, no copy of the log file yet but he did tell me that in this db the three fields I’m selecting that normally contain text - “name”, “slug”, and “title” are set up as text, varchar(50), and text data type respectively.
I’ve also run some further tests on the export with oddly inconsistent results. Or, at least I’m not able to tell what the consistency is…
Each record in our data is related to a “jurisdiction ID”.
I can’t download the entire data set because it is 2.8 million records. To create smaller downloadable data sets I’ve been filtering by Jurisdiction ID to create smaller subsets for download.
Some of the data contains names without commas in them - format “firstname MI lastname”. Some of the data contains names with commas - format “lastname, firstname”
The sample “problem” jurisdiction is identified in our records as jurisdiction ID 4 (there are many more, that’s just what I’m using as a sample…)
There is a sample “not problem” jurisdiction as well, ID 3.
My original download was to pull data on a large set - ID’s 0 through 100, which resulted in about 400K records.
That worked, however the text fields were not output using quotes. This meant records without the comma had a different number of fields than the records with the comma.
A sample of this file showing a few records from Jurisdiction ID 3 (no commas) and 4 (commas).
ID,Name,Base,Benefits,Gross,Jurisdiction ID,Notes,Other,Overtime,Rank,Slug,Status,Title,Total,U Al,Year
28343762,Alexandra M.M. Madsen,83692.49,38368.78,83692.49,3,0.00,0.00,alexandra-mm-madsen,EHS Division Manager-Project,122061.27,2018
28343763,Caylin R Patterson,46518.14,20517.55,52839.87,3,6321.73,0.00,caylin-r-patterson,EHS Division Manager-Project,73357.42,2018
25044665,QUINONES, EDGAR,79906.28,35518.69,140000.52,4,10471.97,49622.27,quinones-edgar,SHERIFF’S SERGEANT,175519.21,2018
25043809,ESCOBAR, BECKY,10385.86,9123.61,10385.86,4,0.00,0.00,escobar-becky,OFFICE TECHNICIAN,19509.47,2018
On further testing, if I download Jurisdiction 4 on it’s own, the CSV does DOES use quotes.
id,name,base,benefits,gross,jurisdiction_id,other,overtime,status,title,total,ual,year
25045434,“ESPINO, ORLANDO GUADALUPE”,64776.85,26805.93,78088.73,4,4311.24,9000.64,DA INVESTIGATOR II,104894.66,2018
25045433,“CHAVARIN, JUAN”,66849.77,34153.39,70153.22,4,3303.45,0.00,CORRECTIONAL LIEUTENANT,104306.61,2018
Strangely enough, the header row is different between the two as well.
If I download a sample that includes JUST 3 and 4, I get this:
id,name,base,benefits,gross,jurisdiction_id,other,overtime,status,title,total,ual,year
28332001,Kelly L Dunham,28086.72,0.00,0.00,3,-28086.72,0.00,Animal Svcs Officer,0.00,2018
28332000,Tim J Dumandan Iii,67222.10,42208.41,76183.72,3,8961.62,0.00,Animal Svcs Officer,118392.13,2018
25045434,“ESPINO, ORLANDO GUADALUPE”,64776.85,26805.93,78088.73,4,4311.24,9000.64,DA INVESTIGATOR II,104894.66,2018
25045433,“CHAVARIN, JUAN”,66849.77,34153.39,70153.22,4,3303.45,0.00,CORRECTIONAL LIEUTENANT,104306.61,2018
Which has a MIX of quote usage and non-usage, in the same fields.
Seems to me like the Metabase system is using some kind of evaluation of the content of the field to determine the need for quotes - not simply based on field type - and somehow is “guessing wrong” when the data set is larger for some reason.
Meanwhile, it looks like I’m going to be able to pull this data from a backup that our Admin has so this is not such a priority for me any more - if it’s not something you’ve had come up as a problem for anyone else I don’t know if you need to continue working on it, but it would be interesting to know why the inconsistency.
Also fyi our admin told me he thinks the reason I can’t export a full 1M records without “Application Error” is because he thinks the Heroku system has a 30 second timeout and on the size of data set I’m trying to pull it takes longer than that to finish the query. I don’t know if that timeout is adjustable, but there are users working with larger data sets it might be good to do something there if possible.
Thanks again!
Todd