Lazy JSON data to sqlite database

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Lazy JSON data to sqlite database

David Bicking-2
Okay, I know this is just me being lazy, but I have a data file with JSON data, and I want to get the data into an sqlite database so I can run queries against it. It is not a large file, only about 600 records in the main table. I've never worked with JSON before, and really don't want to write my own parser to extract he data.
Is there a tool out there that will more or less automate the task for me? Hopefully free, as no one is paying me to do this. (The other volunteers have maybe a dozen records in total and are doing their reports by hand. )

A cursory look at the data: there is a top level "event" table.  "Group" and "venue" tables that are one-to-one with the event table, and a "hosts" tables with many hosts to the event table.

This is something that I will need to do once a year, so like I said, I really don't want to write a custom program.
Any ideas?
(Oh, to complicate things, the source data is not valid JSON, as the upstream source did not escape quote marks within the text.  So is there a tool that can clean up the quotes, hopefully there won't be many bad quotes... maybe three or four in the 600 record.)
Thanks,David
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lazy JSON data to sqlite database

Andrew.Goth
David Bicking wrote:
> I have a data file with JSON data, and I want to get the data into an
> sqlite database so I can run queries against it. It is not a large file,
> only about 600 records in the main table. I've never worked with JSON
> before, and really don't want to write my own parser to extract he data.

JSON and SQL have two fundamentally different data models.  It is trivial to map from SQL to JSON, but going the other way may take careful thought.

> Is there a tool out there that will more or less automate the task for
> me? Hopefully free, as no one is paying me to do this. (The other
> volunteers have maybe a dozen records in total and are doing their
> reports by hand. )

The automation is at a lower level than you seem to realize.  JSON I/O is a solved problem, but actually doing anything with that data, such as transforming it into a relational database form, is up to you.

> A cursory look at the data: there is a top level "event" table.  "Group"
> and "venue" tables that are one-to-one with the event table, and a
> "hosts" tables with many hosts to the event table.

JSON doesn't have tables, nor does JSON pre-declare a schema.  JSON intermixes the schema with the data, and the schema is allowed to vary over the course of the document.

Here's some code I wrote to process JSON:

https://wiki.tcl-lang.org/page/Alternative+JSON

JSON has arrays, objects, strings, numbers, and literals.

An array is a list of values of any type.

An object is a mapping from strings to values of any type.  The strings really ought to be unique, or results will be unpredictable.

Strings and numbers are pretty much what you expect.

Literals are false, true, or null.

For example, here's JSON that will drop into SQL fairly easily:

[{"name": "Andy", id: 42}, {"name": "David", id: 999}]

Just do:

CREATE TABLE TableNameGoesHere(name, id);
INSERT INTO TableNameGoesHere VALUES ('Andy', 42), ('David', 999);

But JSON allows anything, so there is no way to automatically extract a useful schema:

["Andy", 42, "David", 999, {"event": [1, 2, "345"]}, {"group": {}}]

Here's what you need to do.  Look through your data in detail and devise an SQL schema that captures it, or at least the subset you care about.  Then take any JSON library (I recommend the one I wrote, since it's in Tcl and the Tcl binding for SQLite is the easiest binding) and use it to write a program that reads your JSON and inserts it into your SQL database.

You will also have to think about whether you blow away the SQL database every time or you instead try to merge in new data, and if the old data needs to be kept, deleted, or archived.

I don't think this is necessarily a difficult task, but it is definitely not one that can be performed without a serious analysis which cannot be automated.

> Oh, to complicate things, the source data is not valid JSON, as the
> upstream source did not escape quote marks within the text.  So is there
> a tool that can clean up the quotes, hopefully there won't be many bad
> quotes... maybe three or four in the 600 record.

Then it's not really JSON and can't be processed with true-blue JSON tools.  Everything you're saying is calling for a custom tool.  Custom tools do not have to be hard or expensive though.

But in the case of quotes not being escaped, that all by itself could be difficult to automate, since heuristics will need to be applied to figure out whether any given quote mark is a string delimiter.  Human intervention might be required.
------------------------------------------- CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient and may contain material that is proprietary, confidential, privileged or otherwise legally protected or restricted under applicable government laws. Any review, disclosure, distributing or other use without expressed permission of the sender is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies without reading, printing, or saving..
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lazy JSON data to sqlite database

David Bicking-2
>> Is there a tool out there that will more or less automate the task for


>> me? Hopefully free, as no one is paying me to do this. (The other
>> volunteers have maybe a dozen records in total and are doing their
>> reports by hand. )

>The automation is at a lower level than you seem to realize.  JSON I/O is a solved problem, but actually >doing anything with that data, such as transforming it into a relational database form, is up to you.

I thought I got lucky. I found an web site called SQLify, that lets you upload a JSON file, and it will examine it, figure out a schema, and create  CREAT TABLE and INSERT lines with the data. Kind of neat. Unfortunately, it flattened the data in to one table, which didn't matter for the venue and group data, but it flattened the  hosts data by only using the first one and discarding the rest.
>> A cursory look at the data: there is a top level "event" table.  "Group"
>> and "venue" tables that are one-to-one with the event table, and a
>> "hosts" tables with many hosts to the event table.
>
>JSON doesn't have tables, nor does JSON pre-declare a schema.  JSON intermixes the schema with the >data, and the schema is allowed to vary over the course of the document.

Yeah, sloppy writing on my part. JSON has objects, which can be stored in SQL tables.

>> Oh, to complicate things, the source data is not valid JSON, as the>> upstream source did not escape quote marks within the text.  So is there
>> a tool that can clean up the quotes, hopefully there won't be many bad
>> quotes... maybe three or four in the 600 record.>
>Then it's not really JSON and can't be processed with true-blue JSON tools.  Everything you're saying is >calling for a custom tool.  Custom tools do not have to be hard or expensive though.
>
>But in the case of quotes not being escaped, that all by itself could be difficult to automate, since heuristics >will need to be applied to figure out whether any given quote mark is a string delimiter.  Human >intervention might be required.

I kind of solved this by running the data through a parser. I would then take the error message, and find the text in the source file, and escape the quotes. Then parse again and fix the next error. Tedious but that works.
David


   
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lazy JSON data to sqlite database

Andy Goth-3
If you have an SQL schema that works for you and also sample data, I might
be able to assist writing the conversion program. Dunno if you're
interested since it sounds like you won't need to do this again for another
year.

On Sat, Jan 19, 2019, 23:17 David Bicking <[hidden email] wrote:

> >> Is there a tool out there that will more or less automate the task for
>
>
> >> me? Hopefully free, as no one is paying me to do this. (The other
> >> volunteers have maybe a dozen records in total and are doing their
> >> reports by hand. )
>
> >The automation is at a lower level than you seem to realize.  JSON I/O is
> a solved problem, but actually >doing anything with that data, such as
> transforming it into a relational database form, is up to you.
>
> I thought I got lucky. I found an web site called SQLify, that lets you
> upload a JSON file, and it will examine it, figure out a schema, and
> create  CREAT TABLE and INSERT lines with the data. Kind of neat.
> Unfortunately, it flattened the data in to one table, which didn't matter
> for the venue and group data, but it flattened the  hosts data by only
> using the first one and discarding the rest.
> >> A cursory look at the data: there is a top level "event" table.  "Group"
> >> and "venue" tables that are one-to-one with the event table, and a
> >> "hosts" tables with many hosts to the event table.
> >
> >JSON doesn't have tables, nor does JSON pre-declare a schema.  JSON
> intermixes the schema with the >data, and the schema is allowed to vary
> over the course of the document.
>
> Yeah, sloppy writing on my part. JSON has objects, which can be stored in
> SQL tables.
>
> >> Oh, to complicate things, the source data is not valid JSON, as the>>
> upstream source did not escape quote marks within the text.  So is there
> >> a tool that can clean up the quotes, hopefully there won't be many bad
> >> quotes... maybe three or four in the 600 record.>
> >Then it's not really JSON and can't be processed with true-blue JSON
> tools.  Everything you're saying is >calling for a custom tool.  Custom
> tools do not have to be hard or expensive though.
> >
> >But in the case of quotes not being escaped, that all by itself could be
> difficult to automate, since heuristics >will need to be applied to figure
> out whether any given quote mark is a string delimiter.  Human
> >intervention might be required.
>
> I kind of solved this by running the data through a parser. I would then
> take the error message, and find the text in the source file, and escape
> the quotes. Then parse again and fix the next error. Tedious but that works.
> David
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users