User-defined types

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

User-defined types

Matthias-Christian Ott
I want to define user-defined types, i.e. types not SQLite has not
built-in and make sure that I didn't overlook something. Is it correct
that values of user-defined types should be stored as text and have a
collation defined if there is an order relation for the type if the type
cannot be represented as a subset of integer or float?

Example:
Suppose I want to store arbitrary precision integers in SQLite. I would
create a column with text affinity, (uniquely) serialize and deserialize
the integers to text (e.g. by converting them into decimal
representation) and define and declare a collation that deserializes the
texts to arbitrary integers and compares the integers.

Is there another way to define user-defined types despite this method
and virtual tables?

- Matthias-Christian

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Richard Hipp-3
On 5/27/15, Matthias-Christian Ott <[hidden email]> wrote:
>
> Is there another way to define user-defined types despite this method
> and virtual tables?
>

I know of no other.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Darko Volaric
In reply to this post by Matthias-Christian Ott
I've tackled this problem from a couple of different angles. My goal was to
allow arbitrary user defined types, based on the builtin types (essentially
subtypes of the existing types), with a minimum of work and minimum
disruption of the normal/existing use of the database and API.

The approaches I considered were:

- encoding the user type codes for each data column in a separate column
dedicated to the purpose. This is a low impact but cumbersome, for instance
using a function that interprets the user type would have to have the user
type passed in for each argument, along with the actual data.

- modifying the data file format to carry user type information. There is
space in the record/row header where you can encode this information in a
backwards compatible way, but the source code for data record access is not
friendly, basically a dense blob of code with a lot of integer literals
which are all very important, but it's hard to be sure what they entail and
that you haven't introduced a subtle bug and ultimately data corruption.
Additionally the user type would have to be passed around internally - for
example in the sqlite3_value object - and tracking down all of those
reliably is a bit of work.

- using blobs. Although using text representation is friendly when looking
at the data with standard tools, it's slower and takes up more memory in
various places. I found that encoding some user types as blobs with a type
marker at their start (a single byte with extensions) and interpreting them
was a simple and low impact approach. I also split the standard integer
type four ways (negative and positive, odd and even) to get the scalar user
types I needed. User defined functions and collations need to be defined
for interpreting these user types of course.

The first option isn't very practical. The second option is the fastest and
most robust solution and my long term approach which I will be going back
to after development has progressed a bit more. Currently I'm using the
third approach as an interim measure. I'm supporting arbitrary prec ints
and reals, arrays and tuples and other types this way.



On Wed, May 27, 2015 at 3:48 AM, Matthias-Christian Ott <[hidden email]>
wrote:

> I want to define user-defined types, i.e. types not SQLite has not
> built-in and make sure that I didn't overlook something. Is it correct
> that values of user-defined types should be stored as text and have a
> collation defined if there is an order relation for the type if the type
> cannot be represented as a subset of integer or float?
>
> Example:
> Suppose I want to store arbitrary precision integers in SQLite. I would
> create a column with text affinity, (uniquely) serialize and deserialize
> the integers to text (e.g. by converting them into decimal
> representation) and define and declare a collation that deserializes the
> texts to arbitrary integers and compares the integers.
>
> Is there another way to define user-defined types despite this method
> and virtual tables?
>
> - Matthias-Christian
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Nico Williams
On Wed, Jun 03, 2015 at 03:55:04PM -0700, Darko Volaric wrote:
> I've tackled this problem from a couple of different angles. My goal was to
> allow arbitrary user defined types, based on the builtin types (essentially
> subtypes of the existing types), with a minimum of work and minimum
> disruption of the normal/existing use of the database and API.
>
> The approaches I considered were:

A UDT can't be first-class without modifying SQLite3.  Without
first-class UDT support the application has to be responsible to some
degree for adhering to whatever encoding conventions it chooses to use.

User-defined functions, collations, and virtual tables can be used to
move some of the burden from the application to the UDFs and VTs, but
you can't move all of it (e.g., because whatever SQLite3 type you use to
encode UDTs will often have other uses, leading to an aliasing problem
that requires the application to avoid it).

> - encoding the user type codes for each data column in a separate column
> dedicated to the purpose. This is a low impact but cumbersome, for instance
> using a function that interprets the user type would have to have the user
> type passed in for each argument, along with the actual data.

It's cumbersome because it creates a referential integrity problem.

> - modifying the data file format to carry user type information. There is
> space in the record/row header where you can encode this information in a
> backwards compatible way, but the source code for data record access is not
> friendly, basically a dense blob of code with a lot of integer literals
> which are all very important, but it's hard to be sure what they entail and
> that you haven't introduced a subtle bug and ultimately data corruption.
> Additionally the user type would have to be passed around internally - for
> example in the sqlite3_value object - and tracking down all of those
> reliably is a bit of work.

Right, you'd basically be talking about adding new first-class types to
SQLite3.  That's quite an understaking and not for the faint of heart.
Even if you tackle this, chances are it'd be very difficult to get the
SQLite3 dev team to accept the changes -- one would be forking SQLite3,
and that requires serious (read: lots of experienced software engineer
time) effort to develop and maintain.

> - using blobs. Although using text representation is friendly when looking
> at the data with standard tools, it's slower and takes up more memory in
> various places. I found that encoding some user types as blobs with a type
> marker at their start (a single byte with extensions) and interpreting them
> was a simple and low impact approach. [...]

Encoding as text or blobs is about your only realistic option.  Enums
can be encoded as numbers too, as can small bitsets.

> The first option isn't very practical. The second option is the fastest and
> most robust solution and my long term approach which I will be going back
> to after development has progressed a bit more. Currently I'm using the
> third approach as an interim measure. I'm supporting arbitrary prec ints
> and reals, arrays and tuples and other types this way.

At that point why not just switch to Postgres?

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Darko Volaric
Yep, references a another one. Just like the functions, you have to join on
the user type information, add it to constraints, etc.

In my case I'm already modifying and maintaining my own version of SQLite.
My project is basically a database with a lot of extensions. Submitting
patches is not an issue. The last time I brought these ideas up I was
practically chased off by a mob waving pitchforks and torches. Apparently
almost no-one thinks user defined types is a good idea so there is no point
sharing it. I don't expect anyone to help me maintain the code. The
critical parts of SQLite (like the record read/write) are very stable and
updates hardly ever affect me.

Regarding PgSQL, an advantage of encoding your own binary types is that you
can copy them straight into your code and execute with them directly - I
use the same encoding/data structures throughout and they serve my code and
requirements instead of the database's or its API.

PgSQL is also a poor fit for me because it's huge and assumes a (huge)
server, I'm running on small nodes with relatively little memory.

On Wed, Jun 3, 2015 at 4:26 PM, Nico Williams <[hidden email]> wrote:

> On Wed, Jun 03, 2015 at 03:55:04PM -0700, Darko Volaric wrote:
> > I've tackled this problem from a couple of different angles. My goal was
> to
> > allow arbitrary user defined types, based on the builtin types
> (essentially
> > subtypes of the existing types), with a minimum of work and minimum
> > disruption of the normal/existing use of the database and API.
> >
> > The approaches I considered were:
>
> A UDT can't be first-class without modifying SQLite3.  Without
> first-class UDT support the application has to be responsible to some
> degree for adhering to whatever encoding conventions it chooses to use.
>
> User-defined functions, collations, and virtual tables can be used to
> move some of the burden from the application to the UDFs and VTs, but
> you can't move all of it (e.g., because whatever SQLite3 type you use to
> encode UDTs will often have other uses, leading to an aliasing problem
> that requires the application to avoid it).
>
> > - encoding the user type codes for each data column in a separate column
> > dedicated to the purpose. This is a low impact but cumbersome, for
> instance
> > using a function that interprets the user type would have to have the
> user
> > type passed in for each argument, along with the actual data.
>
> It's cumbersome because it creates a referential integrity problem.
>
> > - modifying the data file format to carry user type information. There is
> > space in the record/row header where you can encode this information in a
> > backwards compatible way, but the source code for data record access is
> not
> > friendly, basically a dense blob of code with a lot of integer literals
> > which are all very important, but it's hard to be sure what they entail
> and
> > that you haven't introduced a subtle bug and ultimately data corruption.
> > Additionally the user type would have to be passed around internally -
> for
> > example in the sqlite3_value object - and tracking down all of those
> > reliably is a bit of work.
>
> Right, you'd basically be talking about adding new first-class types to
> SQLite3.  That's quite an understaking and not for the faint of heart.
> Even if you tackle this, chances are it'd be very difficult to get the
> SQLite3 dev team to accept the changes -- one would be forking SQLite3,
> and that requires serious (read: lots of experienced software engineer
> time) effort to develop and maintain.
>
> > - using blobs. Although using text representation is friendly when
> looking
> > at the data with standard tools, it's slower and takes up more memory in
> > various places. I found that encoding some user types as blobs with a
> type
> > marker at their start (a single byte with extensions) and interpreting
> them
> > was a simple and low impact approach. [...]
>
> Encoding as text or blobs is about your only realistic option.  Enums
> can be encoded as numbers too, as can small bitsets.
>
> > The first option isn't very practical. The second option is the fastest
> and
> > most robust solution and my long term approach which I will be going back
> > to after development has progressed a bit more. Currently I'm using the
> > third approach as an interim measure. I'm supporting arbitrary prec ints
> > and reals, arrays and tuples and other types this way.
>
> At that point why not just switch to Postgres?
>
> Nico
> --
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Dominique Devienne
On Thu, Jun 4, 2015 at 3:04 AM, Darko Volaric <[hidden email]> wrote:

> In my case I'm already modifying and maintaining my own version of SQLite.
> [...]. The last time I brought these ideas up I was
> practically chased off by a mob waving pitchforks and torches. Apparently
> almost no-one thinks user defined types is a good idea so there is no point
> sharing it. I don't expect anyone to help me maintain the code.


FWIW,  I think UDTs are a great idea. But also
- optional static typing of columns;
- checksums of blocks;-
- blob two-tier storage (a la Oracle);
- native indexing of virtual table;
- native JSON support;
- etc...

Yes, the community, just like the authors, of SQLite have a strong bias
against changes and to keep SQLite "lite".

And can be brutal in how they say it (or ignore it) when someone rants
about his pet-peeves, or try to push forward his wish list (including me
above).

But remember that SQLite didn't have FKs for a long time. Didn't have CTE.
Both of which are major enhancements. So there's hope long term IMHO :).

Now unlike most (including me again), you go further and actually code it
up apparently. That's great. But it's hard to fork SQLite and get any
traction given the fast-paced refactoring/optimization the main code goes
through. And also UDTs can have widespread side effects within SQLite, hard
to gauge w/o having the whole code-base and design in ones head like DRH.
Might be good enough for you, but not for the high quality standards which
is a hallmark of SQLite IMHO. All I can suggest is continue communicating
and perhaps also OSS your changes on GitHub or similar, and you may get
help somehow.

I suspect (hope really) first-class UDTs in SQLite (as Nico calls them)
haven't been dismissed, and it's more a question of finding the time and
funding to do them right, i.e. in a "lite" way that doesn't adversely
affect SQLite if you don't use them, and thoroughly tested as usual. My
$0.02. --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Christopher Vance-8
If you really want your own types, you could always bundle with ASN.1 and
store the result as a blob.

On Thu, Jun 4, 2015 at 4:52 PM, Dominique Devienne <[hidden email]>
wrote:

> On Thu, Jun 4, 2015 at 3:04 AM, Darko Volaric <[hidden email]> wrote:
>
> > In my case I'm already modifying and maintaining my own version of
> SQLite.
> > [...]. The last time I brought these ideas up I was
> > practically chased off by a mob waving pitchforks and torches. Apparently
> > almost no-one thinks user defined types is a good idea so there is no
> point
> > sharing it. I don't expect anyone to help me maintain the code.
>
>
> FWIW,  I think UDTs are a great idea. But also
> - optional static typing of columns;
> - checksums of blocks;-
> - blob two-tier storage (a la Oracle);
> - native indexing of virtual table;
> - native JSON support;
> - etc...
>
> Yes, the community, just like the authors, of SQLite have a strong bias
> against changes and to keep SQLite "lite".
>
> And can be brutal in how they say it (or ignore it) when someone rants
> about his pet-peeves, or try to push forward his wish list (including me
> above).
>
> But remember that SQLite didn't have FKs for a long time. Didn't have CTE.
> Both of which are major enhancements. So there's hope long term IMHO :).
>
> Now unlike most (including me again), you go further and actually code it
> up apparently. That's great. But it's hard to fork SQLite and get any
> traction given the fast-paced refactoring/optimization the main code goes
> through. And also UDTs can have widespread side effects within SQLite, hard
> to gauge w/o having the whole code-base and design in ones head like DRH.
> Might be good enough for you, but not for the high quality standards which
> is a hallmark of SQLite IMHO. All I can suggest is continue communicating
> and perhaps also OSS your changes on GitHub or similar, and you may get
> help somehow.
>
> I suspect (hope really) first-class UDTs in SQLite (as Nico calls them)
> haven't been dismissed, and it's more a question of finding the time and
> funding to do them right, i.e. in a "lite" way that doesn't adversely
> affect SQLite if you don't use them, and thoroughly tested as usual. My
> $0.02. --DD
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



--
Christopher Vance
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Dominique Devienne
On Thu, Jun 4, 2015 at 10:20 AM, Christopher Vance <[hidden email]>
wrote:

> If you really want your own types, you could always bundle with ASN.1 and
> store the result as a blob.
>

Or Protobuf, or ... But you're back to option 1, you must store somewhere
that knowledge, and it's an app-convention, SQL and SQLite don't cooperate.

That's far from first-class UDTs.

You can also use a self-describing format like AVRO (many others I'm sure),
but you still need to know "a-priori" which blobs are avro encoded, and
even assuming that, that doesn't enforce a "schema" (structure) on the
column (i.e. "static typing"), which I think is necessary. Actually, maybe
a CHECK constraint works here. I've never tried to use a
custom-SQL-function-based check constraint in SQLite. Does that work? --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Darko Volaric
In reply to this post by Dominique Devienne
That's an entirely valid point, but didn't come up in the discussions if
memory serves. It was the "you don't know what you're doing and don't
understand databases" which I thought was an odd response, but that's all
irrelevant.

I agree that feature bloat is not a good idea (hello, PgSQL!) and I don't
want to add features but rather programming interfaces. I don't make
extensive changes to the code at all. I basically try to provide hooks in
the code that don't actually change any functionality.

For instance my UDTs design just adds a few bits to the record header that
associates a small, unsigned integer with a stored field, nothing more.
These bits have no impact on any code that is not aware of them. It was the
most minimal design I could think of.

Similarly with the VM. I don't add opcodes nor do I change the form or
semantics of the existing codes, but I do sneak "links" into unused opcode
parameters which are interpreted by my own code, kind of like running a VM
of my own alongside.

In other places I just use private functions as if they were public APIs.

I think this is the sort of approach is beneficial to SQLite, especially
since it's an embedded database and it's very natural to want to extend it
and intertwine it with your own code, if you need more than the stock
functionality. I also understand that it makes the job of the developers
harder and that they have no reason to put time into hooks or APIs used by
a relative minority.

I suspect most people who need extensions are like me and want something
very particular and mix in their own proprietary code. For instance, I
think programming in C is like a visiting a dentist who doesn't use
lidocaine, so my actual functionality is in a completely different language.

On Wed, Jun 3, 2015 at 11:52 PM, Dominique Devienne <[hidden email]>
wrote:

> On Thu, Jun 4, 2015 at 3:04 AM, Darko Volaric <[hidden email]> wrote:
>
> > In my case I'm already modifying and maintaining my own version of
> SQLite.
> > [...]. The last time I brought these ideas up I was
> > practically chased off by a mob waving pitchforks and torches. Apparently
> > almost no-one thinks user defined types is a good idea so there is no
> point
> > sharing it. I don't expect anyone to help me maintain the code.
>
>
> FWIW,  I think UDTs are a great idea. But also
> - optional static typing of columns;
> - checksums of blocks;-
> - blob two-tier storage (a la Oracle);
> - native indexing of virtual table;
> - native JSON support;
> - etc...
>
> Yes, the community, just like the authors, of SQLite have a strong bias
> against changes and to keep SQLite "lite".
>
> And can be brutal in how they say it (or ignore it) when someone rants
> about his pet-peeves, or try to push forward his wish list (including me
> above).
>
> But remember that SQLite didn't have FKs for a long time. Didn't have CTE.
> Both of which are major enhancements. So there's hope long term IMHO :).
>
> Now unlike most (including me again), you go further and actually code it
> up apparently. That's great. But it's hard to fork SQLite and get any
> traction given the fast-paced refactoring/optimization the main code goes
> through. And also UDTs can have widespread side effects within SQLite, hard
> to gauge w/o having the whole code-base and design in ones head like DRH.
> Might be good enough for you, but not for the high quality standards which
> is a hallmark of SQLite IMHO. All I can suggest is continue communicating
> and perhaps also OSS your changes on GitHub or similar, and you may get
> help somehow.
>
> I suspect (hope really) first-class UDTs in SQLite (as Nico calls them)
> haven't been dismissed, and it's more a question of finding the time and
> funding to do them right, i.e. in a "lite" way that doesn't adversely
> affect SQLite if you don't use them, and thoroughly tested as usual. My
> $0.02. --DD
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Nico Williams
In reply to this post by Darko Volaric
On Wed, Jun 03, 2015 at 06:04:29PM -0700, Darko Volaric wrote:
> Yep, references a another one. Just like the functions, you have to join on
> the user type information, add it to constraints, etc.

Once you're extending SQLite3 proper the referential integrity problem
goes away (being no different than the "problem" SQLite3 has of tracking
value types internally).

> In my case I'm already modifying and maintaining my own version of SQLite.
> My project is basically a database with a lot of extensions. Submitting
> patches is not an issue. The last time I brought these ideas up I was
> practically chased off by a mob waving pitchforks and torches. Apparently
> almost no-one thinks user defined types is a good idea so there is no point
> sharing it. I don't expect anyone to help me maintain the code. The
> critical parts of SQLite (like the record read/write) are very stable and
> updates hardly ever affect me.

I'm not at all sure that UDTs are a bad idea.  I do want JSON support,
preferably using jq for expressing transformations (or merely addressing
specific values in a JSON text) of JSON texts.  And I'd like bigint (and
bigfloat?) support.  Those two types should be enough for easily dealing
with a great many needs for UDTs without necessarily having support for
arbitrary UDTs.  A bitstring type would also be convenient for things
like IP addresses and CIDR.

Given that SQLite3 is in the public domain, you're not obliged to share
your development.  But who knows, in spite of all the reasons that using
your "fork" would not be advisable, you might succeed in forming a
community of users.  So unless you have reasons not to share your work,
I'd encourage you to share it.  If you do share it, you'll want to give
it a distinct name (it wouldn't be SQLite3, would it), and you'll want
to very careful whose patches you accept: if you want to ever be able to
contribute your changes back into the mainline, you'll need to be able
to show that each contribution is permitted by the contributor's
employer and so on.

I have a long wishlist of features or changes that I don't think the
SQLite3 developers are going to be very interested in at this time.  You
might not be either, but if there was a community of external developers
that could pool its resources to make contributions that might be
welcomed by the SQLite3 developers...

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Nico Williams
In reply to this post by Dominique Devienne
On Thu, Jun 04, 2015 at 10:54:16AM +0200, Dominique Devienne wrote:
> On Thu, Jun 4, 2015 at 10:20 AM, Christopher Vance <[hidden email]>
> wrote:
> > If you really want your own types, you could always bundle with ASN.1 and
> > store the result as a blob.

FYI, Heimdal has a very nice, small, simple, featureful, and easily
separable ASN.1 compiler with a BSD-type license.

https://github.com/heimdal/heimdal/tree/master/lib/asn1

> Or Protobuf, or ... But you're back to option 1, you must store somewhere
> that knowledge, and it's an app-convention, SQL and SQLite don't cooperate.

I've never understood why Protocol Buffers.  Its authors made all the
same sorts of design choices that make DER a lousy encoding for ASN.1...

Anyways, the biggest problem with any kind of encoding (ASN.1, JSON,
whatever), is that if you want to be able to use decoded values in SQL
statements, you end up having to decode and re-encode in every UDF.
This could be avoided if SQLite3 had a sort of UDT that's just an opaque
handle for a value and which includes an interface for coercing it to a
native SQLite3 type (e.g., blobs): this way re-encoding can be avoided
as much as possible.

> You can also use a self-describing format like AVRO (many others I'm sure),
> but you still need to know "a-priori" which blobs are avro encoded, and
> even assuming that, that doesn't enforce a "schema" (structure) on the
> column (i.e. "static typing"), which I think is necessary. Actually, maybe
> a CHECK constraint works here. I've never tried to use a
> custom-SQL-function-based check constraint in SQLite. Does that work? --DD

Yes, it does.

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Darko Volaric
In reply to this post by Nico Williams
Which sort of leads me to my next feature, which is bypassing the SQL
language. Why use that crusty old syntax when it's equally expressible in
JSON, XML or something else. Again I see it just as an API, callable by
whatever parser you want, or none at all, if your code generates queries
directly.

I'll definitely release some patches when it's done. Maybe I'll call it
"ite" - SQLite without the SQL.

On Thu, Jun 4, 2015 at 11:23 AM, Nico Williams <[hidden email]>
wrote:

> On Wed, Jun 03, 2015 at 06:04:29PM -0700, Darko Volaric wrote:
> > Yep, references a another one. Just like the functions, you have to join
> on
> > the user type information, add it to constraints, etc.
>
> Once you're extending SQLite3 proper the referential integrity problem
> goes away (being no different than the "problem" SQLite3 has of tracking
> value types internally).
>
> > In my case I'm already modifying and maintaining my own version of
> SQLite.
> > My project is basically a database with a lot of extensions. Submitting
> > patches is not an issue. The last time I brought these ideas up I was
> > practically chased off by a mob waving pitchforks and torches. Apparently
> > almost no-one thinks user defined types is a good idea so there is no
> point
> > sharing it. I don't expect anyone to help me maintain the code. The
> > critical parts of SQLite (like the record read/write) are very stable and
> > updates hardly ever affect me.
>
> I'm not at all sure that UDTs are a bad idea.  I do want JSON support,
> preferably using jq for expressing transformations (or merely addressing
> specific values in a JSON text) of JSON texts.  And I'd like bigint (and
> bigfloat?) support.  Those two types should be enough for easily dealing
> with a great many needs for UDTs without necessarily having support for
> arbitrary UDTs.  A bitstring type would also be convenient for things
> like IP addresses and CIDR.
>
> Given that SQLite3 is in the public domain, you're not obliged to share
> your development.  But who knows, in spite of all the reasons that using
> your "fork" would not be advisable, you might succeed in forming a
> community of users.  So unless you have reasons not to share your work,
> I'd encourage you to share it.  If you do share it, you'll want to give
> it a distinct name (it wouldn't be SQLite3, would it), and you'll want
> to very careful whose patches you accept: if you want to ever be able to
> contribute your changes back into the mainline, you'll need to be able
> to show that each contribution is permitted by the contributor's
> employer and so on.
>
> I have a long wishlist of features or changes that I don't think the
> SQLite3 developers are going to be very interested in at this time.  You
> might not be either, but if there was a community of external developers
> that could pool its resources to make contributions that might be
> welcomed by the SQLite3 developers...
>
> Nico
> --
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Nico Williams
On Thu, Jun 04, 2015 at 11:45:28AM -0700, Darko Volaric wrote:
> Which sort of leads me to my next feature, which is bypassing the SQL
> language. [...]

I like SQL, but sure, if the compiler worked by first parsing into an
AST, and if the AST were enough of an interface (versioned, though not
necessarily backward-compatible between versions), then one could:

 - write different front-end languages (though an AST isn't needed for
   this: you can always generate SQL)

 - write powerful macro languages

 - write alternative/additional optimizers (and linters, syntax
   highlighters, ...) that work at the AST level

If the VDBE bytecode were also a versioned interface then one could
write peep-hole optimizers as well.

One might even want to generate IR code for LLVM, or use a JIT-er,
though for SQL I don't think that would pay off.  I suspect that most of
the CPU cycles go to data-intensive tasks such as I/O, cache thrashing,
and encoding/decoding.  I'd be much more interested in SQLite4 being
finished than an LLVM backend for SQLite3, and I'd be very interested in
seeing if word-optimized variable-length encoding would have better
performance than byte-optimized variable-length encoding.  The point
though is that using an AST would make the system more modular.

>    [...]. Why use that crusty old syntax when it's equally expressible in
> JSON, XML or something else. Again I see it just as an API, [...]

Now I'm confused.  JSON and XML are not query languages.  There exist
query languages for them (e.g., XPath/XSLT for XML).

I suppose you might have meant that SQL itself is a data representation
language like JSON and XML are, and it is (data being expressed as
INSERT .. VALUES ..; statements).

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Scott Hess
In reply to this post by Dominique Devienne
On Thu, Jun 4, 2015 at 1:54 AM, Dominique Devienne <[hidden email]> wrote:
> On Thu, Jun 4, 2015 at 10:20 AM, Christopher Vance <[hidden email]> wrote:
>> If you really want your own types, you could always bundle with ASN.1 and
>> store the result as a blob.
>
> Or Protobuf, or ... But you're back to option 1, you must store somewhere
> that knowledge, and it's an app-convention, SQL and SQLite don't cooperate.
>
> That's far from first-class UDTs.

Who else other than your app is going to read the data directly, though?

Many other SQL systems end up becoming the de facto integration point
between disparate backend and frontend systems.  So you end up growing
additional features to let everyone play in the database using the
same set of conventions put in place by the DBA.  That doesn't make it
a really strong design for the overall system, though, even if it is
convenient and incremental to build.  I don't think the argument is as
strong for things like this in SQLite, simply because often enough
SQLite isn't really in a position to leverage any of it any better
than your app code can, and just delegating it to your app code works
out pretty well for SQLite in terms of support burden and getting
bogged down by complexity.

-scott
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Darko Volaric
In reply to this post by Nico Williams
My point about JSON, etc is that there is no reason not to use that as a
query language if that makes it easier. If your system is efficient with
JSON, why not accept a query that is formatted as JSON? It's not
semantically different to SQL syntax. Here's an example (with a roughly
JSON notation):

{
  operation: "insert"
  table: "blah"
  columns: ["a", "b", "c"]
  values: [1.3, 2.0, 3.1]
  on-conflict: "replace"
}

That is equivalent to an INSERT SQL statement, but why form that SQL
string, possibly using memory and time, when your system can spit out JSON
(or whatever) effortlessly? Why are people who come from the websphere
learning SQL syntax? It has no magic, the magic is in what it means, which
anyone can understand (tables, columns, joins, search criteria). The syntax
is completely arbitrary, from the 70's or 80's and probably ultimately
inspired by COBOL. There is of course a lot existing information based
around SQL syntax, but most people want to insert some data and do fairly
straight forward queries on it. SQL is probably mostly confusing to them.

The feature I'm working on now, as a first step, basically feeds the parser
tokens so I don't have to generate a query string. Even that gives me a big
saving (mostly in memory), without changing the syntax or introducing
subtle bugs. The next step is "rationalize" the syntax progressively so
that the sequence of tokens I need to pass is closer to the representation
I use internally (in my code). This is the least impact approach I think.

You could insert yourself into any point in the SQL to bytecodes process.
If you wanted to fully support XML queries or whatever that could be a part
of the process, whereby appropriate function calls are generated (say into
your XML query library), or virtual table instance are created or the query
is transformed appropriately.



On Thu, Jun 4, 2015 at 12:05 PM, Nico Williams <[hidden email]>
wrote:

> On Thu, Jun 04, 2015 at 11:45:28AM -0700, Darko Volaric wrote:
> > Which sort of leads me to my next feature, which is bypassing the SQL
> > language. [...]
>
> I like SQL, but sure, if the compiler worked by first parsing into an
> AST, and if the AST were enough of an interface (versioned, though not
> necessarily backward-compatible between versions), then one could:
>
>  - write different front-end languages (though an AST isn't needed for
>    this: you can always generate SQL)
>
>  - write powerful macro languages
>
>  - write alternative/additional optimizers (and linters, syntax
>    highlighters, ...) that work at the AST level
>
> If the VDBE bytecode were also a versioned interface then one could
> write peep-hole optimizers as well.
>
> One might even want to generate IR code for LLVM, or use a JIT-er,
> though for SQL I don't think that would pay off.  I suspect that most of
> the CPU cycles go to data-intensive tasks such as I/O, cache thrashing,
> and encoding/decoding.  I'd be much more interested in SQLite4 being
> finished than an LLVM backend for SQLite3, and I'd be very interested in
> seeing if word-optimized variable-length encoding would have better
> performance than byte-optimized variable-length encoding.  The point
> though is that using an AST would make the system more modular.
>
> >    [...]. Why use that crusty old syntax when it's equally expressible in
> > JSON, XML or something else. Again I see it just as an API, [...]
>
> Now I'm confused.  JSON and XML are not query languages.  There exist
> query languages for them (e.g., XPath/XSLT for XML).
>
> I suppose you might have meant that SQL itself is a data representation
> language like JSON and XML are, and it is (data being expressed as
> INSERT .. VALUES ..; statements).
>
> Nico
> --
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Richard Hipp-3
On 6/4/15, Darko Volaric <[hidden email]> wrote:

> My point about JSON, etc is that there is no reason not to use that as a
> query language if that makes it easier. If your system is efficient with
> JSON, why not accept a query that is formatted as JSON? It's not
> semantically different to SQL syntax. Here's an example (with a roughly
> JSON notation):
>
> {
>   operation: "insert"
>   table: "blah"
>   columns: ["a", "b", "c"]
>   values: [1.3, 2.0, 3.1]
>   on-conflict: "replace"
> }
>
> That is equivalent to an INSERT SQL statement, but why form that SQL
> string, possibly using memory and time, when your system can spit out JSON
> (or whatever) effortlessly?

What is the JSON equivalent to the query shown below?  Can you
honestly say that the JSON equivalent (whatever it looks like) is
somehow easier to generate, read, parse, and/or debug than the SQL?

SELECT
  sp.name, st.bug_name,
  (SELECT cve_desc FROM nvd_data WHERE cve_name = st.bug_name),
  (SELECT debian_cve.bug FROM debian_cve
    WHERE debian_cve.bug_name = st.bug_name
    ORDER BY debian_cve.bug),
  sp.release,
  sp.subrelease,
  sp.version,
  (SELECT pn.fixed_version FROM package_notes AS pn
    WHERE pn.bug_name = st.bug_name
      AND pn.package = sp.name
      AND(pn.release = sp.release OR (pn.release = '' AND
fixed_version != ''))),
  st.vulnerable,
  st.urgency,
  (SELECT range_remote FROM nvd_data WHERE cve_name = st.bug_name),
  (SELECT comment FROM package_notes_nodsa AS nd
    WHERE nd.package = sp.name AND nd.release = sp.release
      AND nd.bug_name = st.bug_name) AS nodsa
FROM
   source_package_status AS st,
   source_packages AS sp, bugs
WHERE
   sp.rowid = st.package
   AND st.bug_name = bugs.name
   AND ( st.bug_name LIKE 'CVE-%' OR st.bug_name LIKE 'TEMP-%' )
   AND ( sp.release = 'sid' OR sp.release = 'stretch' OR sp.release = 'jessie'
          OR sp.release = 'wheezy' OR sp.release = 'squeeze' )
ORDER BY sp.name, st.bug_name, sp.release, sp.subrelease;

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Nico Williams
In reply to this post by Darko Volaric
On Thu, Jun 04, 2015 at 02:16:22PM -0700, Darko Volaric wrote:
> {
>   operation: "insert"
>   table: "blah"
>   columns: ["a", "b", "c"]
>   values: [1.3, 2.0, 3.1]
>   on-conflict: "replace"
> }

I do this all the time.  It's trivial enough to generate SQL from that
sort of thing.  If you have an AST then you can trivially map the
AST<->a JSON/XML/ASN.1/whatever schema.

But I don't think ease of alternative representation is the winning
argument for wanting the engine core to use an AST.  It's only
convenient.

The winning argument is that working with an AST makes some tasks easy
that are otherwise hard (e.g., common sub-expression elimination).

>                      [...]  Why are people who come from the websphere
> learning SQL syntax? [...]

Because it's standard.

> The feature I'm working on now, as a first step, basically feeds the parser
> tokens so I don't have to generate a query string. [...]

That seems rather basic, not really good enough.  It must save some
allocations.  But is it worth forking SQLite3 for this?!  Whatever you
do with a fork, it's got to be worth it.  Forking is quite hard, so
every change has got to be worth the effort.

Switching to an AST is going to require more allocations (and much more
developer effort), that's for sure.  But then, this is in statement
compilation, which should not be the most critical component.

Anyways, this is all very far afield from UDTs.  If you want to fork to
add UDTs, focus on that first.

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Simon Slavin-3
In reply to this post by Darko Volaric

On 4 Jun 2015, at 10:16pm, Darko Volaric <[hidden email]> wrote:

> Here's an example (with a roughly
> JSON notation):
>
> {
>  operation: "insert"
>  table: "blah"
>  columns: ["a", "b", "c"]
>  values: [1.3, 2.0, 3.1]
>  on-conflict: "replace"
> }
>
> That is equivalent to an INSERT SQL statement, but why form that SQL
> string, possibly using memory and time, when your system can spit out JSON
> (or whatever) effortlessly?

Why invent a new nonstandard notation for database operations when you have SQL ?

Given your JSON expression above it's easy to write code which turns the JSON into a SQL command.  So just do that (either outside SQLite or by creating a loadable external function for SQLite) and then you can use SQLite exactly as it is without having to keep modifying your project every time the developer releases a bug-fix.

The hard work in creating a fork is not in the initial work but in the maintenance every time the main project gets updated.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

Darko Volaric
In reply to this post by Richard Hipp-3
Are you seriously saying that that SQL syntax is friendly? How can you
defend SQL syntax other than on grounds of history or standardization? If
you're more comfortable and familiar with JSON the yes it is easier and you
can avoid an unnecessary conversion step.

If you're using JavaScript you'd send JS objects (ie binary equivalent of
JSON) to the parser rather than SQL since it's frictionless and binary to
binary. Doing that is safer and more efficient, not to mention the savings
in programming and bugs from the SQL generation, which is the biggest win
of all.

On Thu, Jun 4, 2015 at 2:26 PM, Richard Hipp <[hidden email]> wrote:

> On 6/4/15, Darko Volaric <[hidden email]> wrote:
> > My point about JSON, etc is that there is no reason not to use that as a
> > query language if that makes it easier. If your system is efficient with
> > JSON, why not accept a query that is formatted as JSON? It's not
> > semantically different to SQL syntax. Here's an example (with a roughly
> > JSON notation):
> >
> > {
> >   operation: "insert"
> >   table: "blah"
> >   columns: ["a", "b", "c"]
> >   values: [1.3, 2.0, 3.1]
> >   on-conflict: "replace"
> > }
> >
> > That is equivalent to an INSERT SQL statement, but why form that SQL
> > string, possibly using memory and time, when your system can spit out
> JSON
> > (or whatever) effortlessly?
>
> What is the JSON equivalent to the query shown below?  Can you
> honestly say that the JSON equivalent (whatever it looks like) is
> somehow easier to generate, read, parse, and/or debug than the SQL?
>
> SELECT
>   sp.name, st.bug_name,
>   (SELECT cve_desc FROM nvd_data WHERE cve_name = st.bug_name),
>   (SELECT debian_cve.bug FROM debian_cve
>     WHERE debian_cve.bug_name = st.bug_name
>     ORDER BY debian_cve.bug),
>   sp.release,
>   sp.subrelease,
>   sp.version,
>   (SELECT pn.fixed_version FROM package_notes AS pn
>     WHERE pn.bug_name = st.bug_name
>       AND pn.package = sp.name
>       AND(pn.release = sp.release OR (pn.release = '' AND
> fixed_version != ''))),
>   st.vulnerable,
>   st.urgency,
>   (SELECT range_remote FROM nvd_data WHERE cve_name = st.bug_name),
>   (SELECT comment FROM package_notes_nodsa AS nd
>     WHERE nd.package = sp.name AND nd.release = sp.release
>       AND nd.bug_name = st.bug_name) AS nodsa
> FROM
>    source_package_status AS st,
>    source_packages AS sp, bugs
> WHERE
>    sp.rowid = st.package
>    AND st.bug_name = bugs.name
>    AND ( st.bug_name LIKE 'CVE-%' OR st.bug_name LIKE 'TEMP-%' )
>    AND ( sp.release = 'sid' OR sp.release = 'stretch' OR sp.release =
> 'jessie'
>           OR sp.release = 'wheezy' OR sp.release = 'squeeze' )
> ORDER BY sp.name, st.bug_name, sp.release, sp.subrelease;
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: User-defined types

R Smith
In reply to this post by Darko Volaric


On 2015-06-04 11:16 PM, Darko Volaric wrote:

> My point about JSON, etc is that there is no reason not to use that as a
> query language if that makes it easier. If your system is efficient with
> JSON, why not accept a query that is formatted as JSON? It's not
> semantically different to SQL syntax. Here's an example (with a roughly
> JSON notation):
>
> {
>    operation: "insert"
>    table: "blah"
>    columns: ["a", "b", "c"]
>    values: [1.3, 2.0, 3.1]
>    on-conflict: "replace"
> }

It's an interesting idea and I for one am willing to entertain the
thought, but I'm having difficulty seeing the "simpler" and "easier"
things you claim, or the memory saving for that matter.

Just take the above JSON query and consider that in SQL that would
simply look like:

REPLACE INTO blah (a,b,c) VALUES (1.3, 2.0, 3.1);

If we have to open a pole on which version seems simpler or use less
memory, the result would probably be indecisive if not plainly favouring
the latter.

I am willing to learn though, for instance, how do you see this next
query represented in the JSON way?:

INSERT INTO blah (1,b,c) VALUES
(1.1, 2.2, 3.3),
(3.1, 3.2, 3.4),
(5.1, 4.2, 3.5),
(7.1, 5.2, 3.6);


Or maybe this one:

SELECT MAX(A.Code), MAX(A.Name), B.Age, MAX(B.LastEditedDate) AS LastDT
   FROM CodeNames AS A
   LEFT JOIN Codehist AS B ON A.Code = B.Code AND B.Age > 30
  WHERE A.Name LIKE 'SomeVal%'
  GROUP BY B.Age
  ORDER BY LastDT DESC
  LIMIT 50;

(I'll forgo the "Having" clause for simplicity).

I'm finding it difficult to imagine a better layout for that query in a
JSON (or any other Markup-based) document - but I am quite willing (and
even interested) to be shown a way that makes more sense and satisfies
the claims of simplicity and memory efficiency.

Once a layout is found that works, I imagine it would be a whole other
can of spaghetti to make any program author the syntax sensibly, but
that is a worthy bridge to cross once the first question is answered well.

Alternatively, you might be able to show how the other notation might
ease the query-planner's work, or how it might help any other SQL
process work better or faster. Some significant improvement in
functionality or efficiency will make a much stronger case than "It's
easier to compose".

Ryan


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
123