Best way to store key,value pairs

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Best way to store key,value pairs

Andy-3
I am trying using Sqlite to news reader and local small NNTP server.
NNTP message has header and body. Some special headers field needed to
XOVER I keep in one table, body I will compress and store as blob, some
other required header fields I will store in own columns, but also exists
not required, user defined header fields.
How store its as pairs key,value? There fields I might not store as pair,
simply as one field - one whole string for header field. But it is variable
number this fields and column are strictly defined. Maybe all in one Sqlite
text field or define for example 10 field which often will empty?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

Simon Slavin-3
On 7 Jan 2020, at 7:13pm, Andy <[hidden email]> wrote:

> But it is variable number this fields and column are strictly defined. Maybe all in one Sqlite text field or define for example 10 field which often will empty?

Parent Child database.  One table has one entry per message.  The other table has one entry per header.  This is a natural conclusion for anyone uses SQL.

I suggest you pause the work on your project and read some SQL tutorials.  Read how relational databases work.  This is about SQL, no matter which implementation.  It is the same for SQLite, MS SQL, MySQL, and all other implementations of SQL.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

Jens Alfke-2
In reply to this post by Andy-3
Consider encoding the headers as JSON and storing them in a single column. SQLite has a JSON extension that makes it easy to access values from JSON data in a query. You can even index them.

Simon’s suggestion (a row per header) is correct in theory, but the large numbers of headers you’ll be storing in an NNTP database will make that approach pretty expensive, I think.

(I’ve mostly given up on relational-database orthodoxy, and doing so helped make version 2 of my program about 5x faster than version 1.)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

Simon Slavin-3
On 8 Jan 2020, at 3:11am, Jens Alfke <[hidden email]> wrote:

> Consider encoding the headers as JSON and storing them in a single column. SQLite has a JSON extension that makes it easy to access values from JSON data in a query. You can even index them.
>
> Simon’s suggestion (a row per header) is correct in theory, but the large numbers of headers you’ll be storing in an NNTP database will make that approach pretty expensive, I think.

I believe Jens' point is valid, as long as you don't have to search/scan headers.  I'm not used to being able to access JSON inside SQLite yet.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

Jens Alfke-2

> On Jan 8, 2020, at 3:13 AM, Simon Slavin <[hidden email]> wrote:
>
> I believe Jens' point is valid, as long as you don't have to search/scan headers.

You can even do that — to search for a specific header’s value, just create an index on json_extract(headers, ‘$Header-Name’), then in a query use that same expression in an equality or relational comparison.

(You’ll need to normalize the case of header names during the JSON conversion while inserting, since JSON keys are case-sensitive but RFC822 header names aren’t.)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

James K. Lowden
In reply to this post by Jens Alfke-2
On Tue, 7 Jan 2020 17:11:45 -1000
Jens Alfke <[hidden email]> wrote:

> Consider encoding the headers as JSON and storing them in a single
> column. SQLite has a JSON extension that makes it easy to access
> values from JSON data in a query.

What is the motivation behind this advice?  It's completely
unnecessary.  For all the complexity JSON adds to the design, it adds
exactly no power: precisely the same effects can be achieved without
it.  

I can understand the utility of using SQLite's JSON features where
there's pre-existing JSON, especially if there's need to preserve it
and reproduce it later.  I see no advantage to introducing JSON to a
system with no external use for it.  

--jkl

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

Jens Alfke-2


> On Jan 12, 2020, at 4:12 PM, James K. Lowden <[hidden email]> wrote:
>
> What is the motivation behind this advice?  It's completely unnecessary.  

Thanks for your opinion, James! I disagree.

RFC822 headers are schemaless, and in a Usenet or email database they have rather high volume (probably 20+ per message) but go mostly unused. An NNTP server can't simply throw away the headers it doesn't need, but IMHO there are too many of them to do the usual SQL thing and add every header of every message to a table — that multiples the number of inserts and deletes by an order of magnitude.

> For all the complexity JSON adds to the design, it adds exactly no power: precisely the same effects can be achieved without it.  

Well sure, but you could say the same thing about indexes, couldn't you?

I disagree about complexity. Encoding headers as JSON is pretty simple if you've got a JSON encoder handy, simpler than executing a SQL INSERT. And the JSON API is very easy to use.

> I see no advantage to introducing JSON to a system with no external use for it.  

Hm, a number of database companies (such as my employer, Couchbase) and their customers would disagree with you :)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

Petite Abeille-2


> On Jan 13, 2020, at 19:37, Jens Alfke <[hidden email]> wrote:
>
>> What is the motivation behind this advice?  It's completely unnecessary.  
>
> Thanks for your opinion, James! I disagree.

Arnt Gulbrandsen, of Archiveopteryx fame, would disagree with you, Jens :)

https://archiveopteryx.org/schema

In any case, this is not a matter of opinion, but rather one of purpose and benchmarks.

On that note, Happy New Year! Welcome to the future.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

James K. Lowden
In reply to this post by Jens Alfke-2
On Mon, 13 Jan 2020 10:37:57 -0800
Jens Alfke <[hidden email]> wrote:

> > On Jan 12, 2020, at 4:12 PM, James K. Lowden
> > <[hidden email]> wrote:
> >
> > What is the motivation behind this advice?  It's completely
> > unnecessary.  
>
> Thanks for your opinion, James! I disagree.
>
...
> IMHO there are too many of them to do the usual SQL thing
> and add every header of every message to a table ? that multiples the
> number of inserts and deletes by an order of magnitude.

Hi Jens,

I asked for your rationale; thanks for your answer.  

So, basically, a nomalized design requires too much use of INSERT?  

You're making an efficiency argument here, or maybe
ease-of-implementation assertion. For me, inserting one header row or
20 is the same coding effort (still need a loop).  I think transaction
throughput would be about the same if COMMIT is applied only to whole
messages.  

The OP wanted some basic design advice; he didn't say a simple,
straightforward design was too much work or too slow.  He in fact said,

> > > local small NNTP server

Given that, ISTM that textbook SQL 101 advice is in order.  JSON should
wait until your assumptions are tested.  

> > For all the complexity JSON adds to the design, it adds exactly no
> > power: precisely the same effects can be achieved without it.  
>
> Well sure, but you could say the same thing about indexes, couldn't
> you?

No.  Perhaps I should have been more explicit about what "complexity"
I was talking about. I'm saying you've added a user-visible aspect,
JSON, to the logical database design for him to cope with, but in no
way made the database capable of representing something it otherwise
couldn't.  

Indexes are the opposite: invisible affordances that don't affect the
database's logical design.  

> Encoding headers as JSON is pretty simple if you've got a JSON
> encoder handy

Perhaps.  It's still introducing an extraneous technology to the user's
problem domain.  

> > I see no advantage to introducing JSON to a system with no external
> > use for it.  
>
> Hm, a number of database companies (such as my employer, Couchbase)
> and their customers would disagree with you :)

Please don't take this personally, because I don't intend insult.  For
years I worked on databases before Couchbase existed, and for which
using Couchbase even today would be considered a joke.  

DBMSs are used for all kinds of purposes by people well trained and
not, to good effect and bad.  The number who don't understand the basic
theory of what they're working with far exceeds those that do.  Half of
them are below average, and the average isn't very high.  

I'm sure you'll understand if popular opinion doesn't impress me.  

I get why you would do it your way.  In your estimation, given your
particular skills, you feel it would be easier to use the JSON API.  I
didn't want to leave unchallenged the impression "6 one, half-dozen the
other" impression that bring that into the mix is just a matter of
taste.  Simpler systems are better, I'm sure you'd agree.  

--jkl
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

Jens Alfke-2


> On Jan 13, 2020, at 1:45 PM, James K. Lowden <[hidden email]> wrote:
>
> So, basically, a nomalized design requires too much use of INSERT?  
> You're making an efficiency argument here, or maybe
> ease-of-implementation assertion. For me, inserting one header row or
> 20 is the same coding effort (still need a loop).

I was suggesting _zero_ header rows, i.e. store the headers as a blob column in the messages table, instead of having a separate table.

> I think transaction throughput would be about the same if COMMIT is applied only to whole messages.  

There are other pain points. From what I've seen in the past, IIRC one of them is managing the foreign-key constraints (e.g. SQLite finding and deleting header rows when a message row is deleted), and the necessity of using JOINs to get at the headers.

Anyway: I merely said to "consider encoding the headers as JSON". Certainly didn't mean to imply it was the canonical answer.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

wmertens
In reply to this post by James K. Lowden
On Mon, Jan 13, 2020 at 10:45 PM James K. Lowden <[hidden email]>
wrote:

> You're making an efficiency argument here, or maybe
> ease-of-implementation assertion. For me, inserting one header row or
> 20 is the same coding effort (still need a loop).  I think transaction
> throughput would be about the same if COMMIT is applied only to whole
> messages.
>

Not quite - if all the headers are encoded client-side in a blob (actually
a string in this case), then they are always together in the database,
there is no row management needed per header, there is no index needed on
the message id, etc.

DB normalization is nice, but "small" arrays of child data can simply be
embedded in the row, gaining schema simplicity and efficiency. There are
some conditions needed for this to work well:
* data should be "small" (you can't page single row results)
* data should mostly be needed together (overfetching is bad)
* the only reference to the data should be from the parent object (can't
reference row contents)

Nowhere in my DB course at uni was this possibility covered.

In this case, there are only a few headers that really matter, and their
meaning can be encoded separately (sender, thread id etc), and then the
headers are kept for reference.

Given that, ISTM that textbook SQL 101 advice is in order.  JSON should
> wait until your assumptions are tested.
>

Well, that's sort of true, but it's easier to add a JSON field than
creating extra tables.


> Perhaps.  It's still introducing an extraneous technology to the user's
> problem domain.
>

Everything you need to manipulate JSON is available in SQLite, including
pretending it's a table. So if push comes to shove, you can pretend that
the JSON is a table and have the same "SQL surface" as before in the
application.

DBMSs are used for all kinds of purposes by people well trained and
> not, to good effect and bad.  The number who don't understand the basic
> theory of what they're working with far exceeds those that do.  Half of
> them are below average, and the average isn't very high.
>

Actually, half of them are below median. Depending on the distribution,
most of them could be above average ;-)


> I'm sure you'll understand if popular opinion doesn't impress me.
>

It shouldn't - but this isn't a popular opinion. This is a trade-off
between schema simplicity, storage layout and speed of some operations. I'd
argue that in this particular case, a JSON field is beneficial for
simplicity, speed and storage space.


> Simpler systems are better, I'm sure you'd agree.
>

I agree. That's why I like full-stack JavaScript, SQLite, and JSON fields.
I'm sure these are not all choices you would make, but for me, these are
simple. JOINs etc are hard.

Wout.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

Dominique Devienne
On Tue, Jan 14, 2020 at 9:35 AM Wout Mertens <[hidden email]> wrote:
> On Mon, Jan 13, 2020 at 10:45 PM James K. Lowden <[hidden email]>
> This is a trade-off between schema simplicity, storage layout and speed of some operations. I'd
> argue that in this particular case, a JSON field is beneficial for simplicity, speed and storage space.

+1. Echoes my own thoughts on this thread.

James is right too of course, in the absolute, but limited
denormalization for efficiency,
and arguably for simplicity too, despite James' opinion, are valuable.
Now it's "just" the
matter of making the right tradeoff based on the particular
circumstances, fully aware
of the pros and cons. And that requires experience and knowledge (and
testing/benchmarking, duh).

Hopefully this particular thread will help people on this recurring topic. --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to store key,value pairs

Peter da Silva-2
Another thing to consider is that RFC-822/1036 (et seq) headers are not
inherently unique and some are repeated, especially since there's software
that treats Usenet and Mail headers interchangeably. Also, debugging may
require being able to see the exact layout of the headers as received. The
safest solution is to add columns for the specific instances of the
specific headers that you need to index, and then store the original
headers unchanged as a blob or a big text column. Converting to json and
back without potentially losing data (even if you don't think you will need
that data) takes some care.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users