updating SQLite to implement The Third Manifesto

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

updating SQLite to implement The Third Manifesto

Darren Duncan
All, and particularly Dr. Hipp,

Lately, my own database project in Perl, named Rosetta, has evolved
to officially be an implementation of Chris Date's and Hugh Darwen's
proposal for relational databases called "The Third Manifesto", which
is talked about at http://www.thethirdmanifesto.com/ and in various
books and papers of theirs.

Rosetta has its own API which defines a "D" relational database
language, as apposed to a SQL language, and Rosetta's back-ends to
implement this are interchangeable.  I am making a pure Perl
reference implementation (called Rosetta::Engine::Example) which is
coded simply for ease of understanding but that is slow.

Separately, I or third parties would be making alternate back-ends
that either are self-implemented and better performing, or that
constitute wrappers over existing database products, usually SQL
based, since those are fairly mature and plentiful.  SQLite is one of
the first such back-ends to be used.

Now, I would like to propose, and if necessary I will contribute
significant targeted funding (when I have the money) to pay Dr. Hipp
and/or other developers, some significant feature changes (as a fork
if necessary) to SQLite such that it directly implements and gains
the efficiencies of The Third Manifesto.  This includes both the
addition of and the removal of current features, and certain
behaviours would change.  Hopefully all for the better.

As a result of these changes, not only would SQLite better serve as a
back-end of Rosetta, but non-Rosetta users of SQLite would get the
most critical of the same benefits from it directly.

I anticipate that the changes would mainly affect the upper layers,
which convert user commands into virtual machine code, but that the
virtual machine and b-tree and OS layers would remain more or less
unchanged (this depends, of course, on a few details).  Possibly, we
would add a new command language.

I am hoping that, to keep longer term maintenance easier, these
changes can be implemented in the trunk and activated using either
run time pragmas or compile time options or both.  But if they would
require a fork, then the forked product would have to be named
something else that doesn't have 'SQL' in its name, since SQL does
not satisfy The Third Manifesto.  Maybe 'TTMLite' or something that
sounds better.

Here are some of the changes that I propose the pragma or compile
time option or fork would have; they all refer to what the user sees,
not to implementation details that should be hidden:

1.  Add a distinct logical BOOLEAN data type.  It is the data type of
output from logical expressions like comparisons, and the input to
'and', 'or', etc.

2.  Have strong and strict data typing for both variables and values.

2.1  Table columns are always declared to be of a specific type (eg:
BOOLEAN, INTEGER, REAL, TEXT, BLOB) and nothing but values of the
same type can be stored in them; attempting to do otherwise would
fail with an exception.

2.2  The plain equality test is supported for all data types.

2.3  All operators/functions have strongly typed parameters and
return values, and invoking them with arguments that aren't of the
right type will fail with an exception.  The equality test likewise
can only compare operands of the same type.

2.4  There is no implicit type conversion; data types must be
explicitly converted from one type to another.

2.5  INTEGER and REAL data types have separate sets of operators,
which do the expected thing with their types.  For example, each has
a separate division operator whose input and output are all of that
same type.  No worrying about when to round or not.

2.6  SQLite may already be this way, but:  All characters in a string
are significant, including whitespace, so 'a' and 'a ' are always
unequal.

3.  There is no such thing as a NULL.

3.1  All logic is 2VL (true, false) not 3VL (true, false, unknown).

3.2  Every variable of a particular type always contains a value that
is valid for that type, so logic for dealing with it is simpler.
Likewise, every with every literal value.

3.3  The code to implement operators is a lot simpler.

3.4  Missing data can be either represented with the data type's
empty value, or a table column that may possibly be unknown can be
split into a separate related table, that only has records when the
value is known.

3.5  All variables default to a reasonable valid value for their type
if not explicitly set, such as the number zero or the empty string.

4.  There is no significant hidden data.  A row id can only be an
explicitly declared table column.  The implementation of a table can
use hidden row ids, but the user wouldn't see them.

5.  No duplicate rows in tables or queries are allowed.

5.1  In SQL terms, every table has an implicit unique key constraint
over all of its columns.  This is ignored if there are any actual
explicit keys, whether primary or otherwise.  In TTM terms, it is
impossible by definition to have duplicate rows.

5.2  The results of all stages of queries do not contain duplicate
rows.  In SQL terms, every query or subquery has an implicit
'distinct' or 'group by all' clause on it.  No joins produce
duplicates.  No unions etc do either.

5.3  By doing this and #3, all queries that look like they should
return the same results actually do, whereas in SQL they may return
different results in the presence of duplicates or nulls.  Queries
can also be simpler.

6.  Columns in tables and views and query results have no ordinal
value; they all have names and are referred to using only those
names.  Moreover, every column must have a different name from every
other column.

7.  Rows in tables and views and query results have no ordinal value;
they are referenced by relational expressions that match on the
values of columns, like in a SQL where-clause.

7.1  An order-by or limit clause only makes sense in an outer-most
query, right when results are being returned from the database to the
application, where it then specifies the order to return otherwise
order-less rows.

In doing all of the above, SQLite should actually be simpler to
implement, and it will be easier to use, with more predictable
results and fewer bugs.

This next one can be implemented separately from all the other suggestions:

8.  Add some standard relational logic operators that can be combined
and nested to get all the power of selects and more, with less
effort, such as any of the following you don't already have:
restrict, project, join, product, union, intersection, difference,
divide, rename.

8.1  The simplest join syntax, such as an unqualified comma-delimited
list, would perform a natural join by default.  Or we could more or
less just have natural joins (and cartesian products, 'product') as
the only kind of join.

8.2  Using these instead of 'select' should allow for easier
implementation and optimization; for one thing, the expressions are
more associative or commutative.

This next one can be implemented separately from all the other suggestions:

9.  Support nested/child transactions, such as a 'begin transaction'
inside another one, which can make things a lot easier for
applications; they have to worry less about whether a transaction
already exists before starting another one.  These are functionally
sort of like save-points in SQL, in that even if an inner transaction
commits, it is still thrown away if the outer transaction rolls back.
To implement this best, you would probably need multiple (cascading?)
journal files, one per transaction level.

Following are also features of The Third Manifesto, but can possibly
be left out of SQLite in accordance with its Lite nature:

1.  All views are updateable like they were tables.  From the user's
point of view, tables and views are the same sort of thing in how
they can be used.

2.  Tables can be assigned to directly like they were variables, and
insert/update/delete is actually a short-hand for this.  Eg, an
insert is equivalent to an assignment to a table of the table's old
value unioned with the rows being inserted.  Supporting this allows
users to define arbitrarily flexible updating operations, such as
"replace or add" and such.

3.  The system catalog tables can be updated directly using data
definition language, which results in the schema being updated.  Eg,
you can use insert statements to create a table rather than a create
statement.

4.  Support definition and use of custom data types.

5.  It should not be necessary to explicitly declare indexes to help
with speed.

6.  Generally speaking, users should not have to know about
implementation details, but rather just express what their data
actually means.

Okay, that's about all for this initial proposal email.

Ultimately, I believe that the core of my proposal involves
simplifying SQLite, making it leaner and meaner, and also reduces
possible or actual bugs or difficulty in understanding.

At the very least, I hope that the trunk would have the pragma or
compile option that essentially strips out current features like
nulls and other ambiguity, so essentially we have a restricted or
simplified SQL.

I also bring this up because I would expect that SQLite should be
able to perform faster when it doesn't handle nulls or duplicates or
weak data types than if it does.  The conceptual logic is simpler
when we don't have those, and the implementation code should also be
simpler, and perform faster, since there are fewer possibilities to
check at logical decision points.  And it should be easier to
optimize queries.

So even if no incompatible changes are made, I would hope that it is
possible to optimize for the simplest case.

-- Darren Duncan
Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

Roger Binns
> In doing all of the above, SQLite should actually be simpler to
> implement, and it will be easier to use, with more predictable
> results and fewer bugs.

Only for some applications.  It would be harder to use for
my apps.  Specifically manifest typing as currently implemented
in SQLite is a perfect match for apps writen in Python (which
also uses manifest typing).  It would require a lot more code
to go through and force the data type for each column.

Roger
Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

Darren Duncan
At 6:52 PM -0800 3/10/06, Roger Binns wrote:
>Only for some applications.  It would be harder to use for
>my apps.  Specifically manifest typing as currently implemented
>in SQLite is a perfect match for apps writen in Python (which
>also uses manifest typing).  It would require a lot more code
>to go through and force the data type for each column.

If that is so, then I would argue that any need to write more code
isn't tied to manifest typed programming languages themselves, but
specific programs themselves; depending on how you code your
applications, it wouldn't require any significant amount more code.
In fact, particularly for queries (and reading data tends to be more
common than writing), there should be less code.  Or, looking at this
another way, perhaps the Python bindings for SQLite should be taking
care of this for you.  Or, put another way, I would say this is a
small price to pay for what is gained.  Or, I doubt there actually is
more work.

(But I don't really want to get into an argument on this point, as
there are many other points in my proposal which I see as being of
greater importance.)

But regardless, I have an additional idea which may help bridge the
gap and work well for people.

That is, while the database itself is strongly typed, you can have a
specific type which is defined to manifestly be able to store values
from any of a variety of simpler types.

So for example, SQLite could have these types:

- Boolean
- Integer
- Real
- Text
- Blob
- Scalar

The first 5 are simple types that store just numbers or text or whatever.

The last, new 1, Scalar, is an actually-strong type which is defined
as being able to store any of the first 5 types (just one at a time),
and hence acts like a weak type.

In a conceptual sense, a Scalar value is like a composite type with 6
member elements, each of the last 5 being strongly typed as one of
the first 5 simple types, and the first element being an enum which
says which of the other 5 holds the over-all current value.

I believe something like this is what manifestly typed languages
actually do behind the scenes, having a multi-element struct where
one element says how to treat the other one(s).  I know Perl does
this, with its SV C-struct, and I'm sure other languages do similar.
I know SQLite does something similar, if you look at its design spec.

(Sure, that sounds more complicated, but then the actual work being
done to support manifest typing *is* more complicated.  Things look
more like they are.)

So if SQLite does it this way, then you can declare columns to be the
Scalar type when you want them to hold anything, and one of the other
types if you don't.  Moreover, the effectively manifest typed Scalar
is what you would get if you don't explicitly declare a type for a
column.  This happens already, but now the "what you get" actually
has a name.

The point is that you still get well defined behaviour that is
specific to the declared data type(s) you choose to use, and you can
count on its being consistent.

-- Darren Duncan
Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

Roger Binns
Just for the record:

> it wouldn't require any significant amount more code.

Yes it would.  My code currently approximates to this:

  cursor.execute("insert into foo (x,y,z) values(?,?,?)", x,y,z)

It would have to change into this:

  # column x is defined as string
  if isinstance(x, string): storex=x
  elif isinstance(x, int): storex=`x`
  elif isinstance(x, bool):
        if x: storex="1" else: storex="0"
  else # various other types and conditions for this context
  # repeat for y and z
  ....
  # add in values
  cursor.execute("insert into foo (x,y,z) values(?,?,?)", storex, storey, storez)
       
> In fact, particularly for queries (and reading data tends to be more
> common than writing), there should be less code.  

  Currently:

  x,y,z=cursor.execute("select x,y,z from foo")

  Change to:
 
  resultx, resulty, resultz=cursor.execute("select x,y,z from foo")
  # now do type conversion back to the correct types
  if # somehow know it may have been an int:
     x=int(resultx)
  elif # somehow know it may have been a bool:
     x=bool(int(resultx))
  ... you get the picture ...

Not having manifest types in the database throws away information
when you store values and requires restituting them when reading.
My main app happens to store phone numbers.  You won't believe
how irritating it is when I find things automatically assume they
are integers.

> Or, looking at this
> another way, perhaps the Python bindings for SQLite should be taking
> care of this for you.  

They can't, unless they do something like silenty add an extra
column that stores the types of the values in the other columns
and attempt to transparently modify the SQL as it flys by
to get or update that column.  (BTW I also happen to be an author
of wrappers for Python).  (Your proposal sort of does this
by introducing a manifest type.)

The solution you discussed will make code that has to deal with
random SQLite databases produced by other applications, or earlier
versions of the same app significantly more complicated.  However
I don't think there are too many of them to be concerned about.

I would suggest finding an open source application that uses
SQLite and see if you would indeed make it simpler.  One good
example I would suggest is Trac which was originally written
to use SQLite.

Roger
Reply | Threaded
Open this post in threaded view
|

RE: updating SQLite to implement The Third Manifesto

Marian Olteanu
In reply to this post by Darren Duncan
You're right Darren, but the problem is that we're not in a DB class. We
cannot tell people who have a solution for their problems that "your
solution is wrong. You need to reimplement your stuff to make it right".
Most of SQLite users are practical people, and all they want is their
problem to be solved. They don't really care if the SQL language is
implementing correctly relational algebra or not.


-----Original Message-----
From: Darren Duncan [mailto:[hidden email]]
Sent: Friday, March 10, 2006 11:21 PM
To: [hidden email]
Cc: [hidden email]
Subject: Re: [sqlite] updating SQLite to implement The Third Manifesto

At 6:52 PM -0800 3/10/06, Roger Binns wrote:
>Only for some applications.  It would be harder to use for
>my apps.  Specifically manifest typing as currently implemented
>in SQLite is a perfect match for apps writen in Python (which
>also uses manifest typing).  It would require a lot more code
>to go through and force the data type for each column.

If that is so, then I would argue that any need to write more code
isn't tied to manifest typed programming languages themselves, but
specific programs themselves; depending on how you code your
applications, it wouldn't require any significant amount more code.
In fact, particularly for queries (and reading data tends to be more
common than writing), there should be less code.  Or, looking at this
another way, perhaps the Python bindings for SQLite should be taking
care of this for you.  Or, put another way, I would say this is a
small price to pay for what is gained.  Or, I doubt there actually is
more work.

(But I don't really want to get into an argument on this point, as
there are many other points in my proposal which I see as being of
greater importance.)

But regardless, I have an additional idea which may help bridge the
gap and work well for people.

That is, while the database itself is strongly typed, you can have a
specific type which is defined to manifestly be able to store values
from any of a variety of simpler types.

So for example, SQLite could have these types:

- Boolean
- Integer
- Real
- Text
- Blob
- Scalar

The first 5 are simple types that store just numbers or text or whatever.

The last, new 1, Scalar, is an actually-strong type which is defined
as being able to store any of the first 5 types (just one at a time),
and hence acts like a weak type.

In a conceptual sense, a Scalar value is like a composite type with 6
member elements, each of the last 5 being strongly typed as one of
the first 5 simple types, and the first element being an enum which
says which of the other 5 holds the over-all current value.

I believe something like this is what manifestly typed languages
actually do behind the scenes, having a multi-element struct where
one element says how to treat the other one(s).  I know Perl does
this, with its SV C-struct, and I'm sure other languages do similar.
I know SQLite does something similar, if you look at its design spec.

(Sure, that sounds more complicated, but then the actual work being
done to support manifest typing *is* more complicated.  Things look
more like they are.)

So if SQLite does it this way, then you can declare columns to be the
Scalar type when you want them to hold anything, and one of the other
types if you don't.  Moreover, the effectively manifest typed Scalar
is what you would get if you don't explicitly declare a type for a
column.  This happens already, but now the "what you get" actually
has a name.

The point is that you still get well defined behaviour that is
specific to the declared data type(s) you choose to use, and you can
count on its being consistent.

-- Darren Duncan

Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

Andrew Piskorski
In reply to this post by Darren Duncan
On Fri, Mar 10, 2006 at 04:37:36PM -0800, Darren Duncan wrote:

> 3.  There is no such thing as a NULL.
>
> 3.1  All logic is 2VL (true, false) not 3VL (true, false, unknown).

There is no such thing as null, really?  So, when you do an outer join
between two tables, which in SQL would produce null columns in the
result set, what do YOU propose producing instead of those nulls?

Perhaps I missed it, but in my brief reading of some of Date's work, I
never saw him answer that question.

> 3.4  Missing data can be either represented with the data type's
> empty value, or a table column that may possibly be unknown can be
> split into a separate related table, that only has records when the
> value is known.
>
> 3.5  All variables default to a reasonable valid value for their type
> if not explicitly set, such as the number zero or the empty string.

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/
Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

Xavier Noria
On Mar 11, 2006, at 7:58, Andrew Piskorski wrote:

> On Fri, Mar 10, 2006 at 04:37:36PM -0800, Darren Duncan wrote:
>
>> 3.  There is no such thing as a NULL.
>>
>> 3.1  All logic is 2VL (true, false) not 3VL (true, false, unknown).
>
> There is no such thing as null, really?  So, when you do an outer join
> between two tables, which in SQL would produce null columns in the
> result set, what do YOU propose producing instead of those nulls?
>
> Perhaps I missed it, but in my brief reading of some of Date's work, I
> never saw him answer that question.

I never understood that restriction. I read in the books: "since we  
have defined things this ways from a formal point of view there's no  
room for NULL". And my question is well, why don't you change the  
definitions to augment the datatype sets with a special constant NULL  
which is by definition not present in any datatype? Wouldn't that  
give an analogous theory more aligned with real world?

The formalism in the relational model looks so-so to me (with due  
respect), starting from the fact that "tuples" are _sets_ instead of  
elements of a Cartesian Product, which by the way is what relations  
are defined from in Set Theory, they are subsets of Cartesian  
Products, not some kind of ad-hoc object. Sounds souspicious the  
claimed math foundation with so fundamental deviations from basic,  
standard math conventions. More than math foundation it is in my  
view, I don't know, just some stuff presented formally and with some  
degree of rigour.

-- fxn

Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

Thomas Chust
In reply to this post by Darren Duncan
Hello,

as the author of yet another SQLite3 binding for a manifestly typed
language (CHICKEN Scheme) I can only second the concerns expressed in
other answers to the original posting. I would expect the interface to get
far more complicated with strict typing in SQLite3. In fact, that SQLite3
is manifestly typed was one of the main reasons for choosing it as a
database backend in my Scheme applications in the first place.

I also do not understand why the absence of NULL should be helpful for the
database user. It may be helpful for the efficient implementation of the
database, but SQLite3 already does a pretty good job there.

I suppose that the definition of custom datatypes would add a large amount
of complexity to the database. And in the rare cases that I really need
something like a seperate type, I can usually use a BLOB and a few custom
functions.

What I do like about the proposal is the addition of a boolean datatype,
though. That would make the interface more convenient for some of the
scripting languages that have such a type.

cu,
Thomas
Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

Thomas Chust
On Sat, 11 Mar 2006, ThomasChust wrote:

> [...]
> as the author of yet another SQLite3 binding for a manifestly typed language
> (CHICKEN Scheme) I can only second the concerns expressed in other answers to
> the original posting. I would expect the interface to get far more
> complicated with strict typing in SQLite3. In fact, that SQLite3 is
> manifestly typed was one of the main reasons for choosing it as a database
> backend in my Scheme applications in the first place.
> [...]

Hello,

maybe I sounded a little harsh in my last posting.

I didn't want to say the whole proposal is bad in itself. It certainly
sounds interesting and for some applications it will be just the right
thing to do. But I do think it would be a bad idea to incorporate all
those changes into the mainstream SQLite3 distribution and I don't know
whether the target audience of the proposal and of SQLite3 is identical.

cu,
Thomas
Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

developir@yahoo.com
In reply to this post by Darren Duncan
--- Darren Duncan <[hidden email]> wrote:
> I anticipate that the changes would mainly affect the upper layers,
> which convert user commands into virtual machine code, but that the
> virtual machine and b-tree and OS layers would remain more or less
> unchanged (this depends, of course, on a few details).  Possibly, we
> would add a new command language.

Although your projects' new features seem similiar to the way SQLite
currently operates, my guess is that perhaps you could reuse at most
20% of the code (btree and OS layers only) for your proposed
code fork. It is not a minor undertaking, but a complete rewrite.
You would spend more time re-examining each line of code's SQL-specific
assumptions than you would by simply starting over using the SQLite
code-base as a reference.

> 5.  It should not be necessary to explicitly declare indexes to help
> with speed.

How do you plan to do this?
Implicitly index every column in every table at the expense of
insert time and huge database size?


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Linux

Roger-31
Hello All!

I have just completed installing PHP5 on my Linux box, but whenever i
try to restart my Apache WebServer it gives me the following error:

httpd: Syntax error on line 55 of /usr/local/apache2/conf/httpd.conf:
Cannot load /usr/local/apache2/modules/libphp5.so into
server: /usr/local/apache2/modules/libphp5.so: undefined symbol:
sqlite3SelectDelete:

I was just wondering if anyone experienced the same problem, i dont know
where the sqlite3 part came into play

On Sat, 2006-03-11 at 08:10 -0800, Joe Wilson wrote:

> --- Darren Duncan <[hidden email]> wrote:
> > I anticipate that the changes would mainly affect the upper layers,
> > which convert user commands into virtual machine code, but that the
> > virtual machine and b-tree and OS layers would remain more or less
> > unchanged (this depends, of course, on a few details).  Possibly, we
> > would add a new command language.
>
> Although your projects' new features seem similiar to the way SQLite
> currently operates, my guess is that perhaps you could reuse at most
> 20% of the code (btree and OS layers only) for your proposed
> code fork. It is not a minor undertaking, but a complete rewrite.
> You would spend more time re-examining each line of code's SQL-specific
> assumptions than you would by simply starting over using the SQLite
> code-base as a reference.
>
> > 5.  It should not be necessary to explicitly declare indexes to help
> > with speed.
>
> How do you plan to do this?
> Implicitly index every column in every table at the expense of
> insert time and huge database size?
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 

Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

Darren Duncan
In reply to this post by Roger Binns
Note: This email is a combined reply to several other emails in the
same thread.

At 12:32 AM -0600 3/11/06, Marian Olteanu wrote:
>You're right Darren, but the problem is that we're not in a DB class. We
>cannot tell people who have a solution for their problems that "your
>solution is wrong. You need to reimplement your stuff to make it right".
>Most of SQLite users are practical people, and all they want is their
>problem to be solved. They don't really care if the SQL language is
>implementing correctly relational algebra or not.

You'll notice that I never advocated replacing functionality in
SQLite.  People who already use it the way it is can continue to do
so.  What I proposed was options that people can choose to use that
make it behave a different way that works better for them.  So while
I think one way is better, I'm not proposing that the functionality
to support being able to do it another way is removed.

You have to take all the parts of my proposal in that context, that
they are options which I wanted SQLite to improve support for, not
the only way to do it.

As a furtherance to that proposal ...

I propose that to both get the main desired effects of reliability
and to remain backwards compatible with other SQLite database
installations, that much of the altered behaviour be associated with
individual database files themselves (from creation) so that their
data is treated correctly wherever they go.

I propose that this be similar to how SQLite deals with implementing
endian-ness and text encodings now.  That is, at the time when a
SQLite database is first created, it is declared to have its bytes in
a specific order (hi-lo or lo-hi) and that its text is a specific
encoding (utf8, utf16le, utf16be), and those attributes stick with it
for the life of that database file.

Similarly, I propose that it is at the time of a database file's
creation that one can declare they want TTM prescribed behaviour
enforced for that database file, such as nulls and duplicates being
forbidden.

When a SQLite install encounters a database file set this way, it
honors the TTM behaviour; when it encounters a database file without
that declaration, it follows the more traditional SQL prescribed
behaviour.

Of course, database files created for TTM behaviour will probably not
work with older SQLite versions, and only newer ones.  If my proposal
is adopted and implemented within the trunk, it would probably come
out as a major version change.  Or it doesn't have to; whatever is
appropriate.

At 1:58 AM -0500 3/11/06, Andrew Piskorski wrote:
>There is no such thing as null, really?  So, when you do an outer join
>between two tables, which in SQL would produce null columns in the
>result set, what do YOU propose producing instead of those nulls?
>
>Perhaps I missed it, but in my brief reading of some of Date's work, I
>never saw him answer that question.

There are several answers to this.

The main one is to consider why you are using an outer join, and how
the data is going to be used, such that you can prescribe (eg, in the
query requesting the outer join) appropriate default values for the
otherwise null fields in the result, so that your application can
then handle all values in each column as a value appropriate for the
type.

For example, given these 2 tables:

a: |foo INT|bar INT|
    -----------------
    |     1 |    17 |
    |     2 |     6 |

b: |foo INT|baz INT|
    -----------------
    |     1 |   101 |

You could outer-join b to a like this:

   SELECT *
   FROM a NATURAL LEFT OUTER JOIN b
     DEFAULT baz = 0;

And get:

   |foo INT|bar INT|baz INT|
   -------------------------
   |     1 |    17 |   101 |
   |     2 |     6 |     0 |

So there are no nulls here.

Alternately, situation depending, perhaps what you actually want to
do isn't best served by an outer join, but with some other kind of
query.

Alternately, see the next comment.

At 11:50 AM +0100 3/11/06, Xavier Noria wrote:

>>There is no such thing as null, really?  So, when you do an outer join
>>between two tables, which in SQL would produce null columns in the
>>result set, what do YOU propose producing instead of those nulls?
>
>I never understood that restriction. I read in the books: "since we
>have defined things this ways from a formal point of view there's no
>room for NULL". And my question is well, why don't you change the
>definitions to augment the datatype sets with a special constant
>NULL which is by definition not present in any datatype? Wouldn't
>that give an analogous theory more aligned with real world?

If you want to have a data type which can represent only a single
value and use it to mean unknown, and all instances of that value are
equal, then that would be fine.

The main problem with NULL is more how it is used in SQL than the idea itself.

For one thing, SQL's NULL violates the logical principle that after
you say "set foo to bar, then foo equals bar".  With every normal
data type, if "foo := 1; bar := foo;" then a subsequent comparison of
"foo = bar" would return true.  But with nulls, if you say "foo :=
NULL; bar:= foo", then a subsequent comparison of "foo = bar" does
not return true.

More simply, with nulls, saying "foo = foo" will not return true,
which flies in the face of common sense.

All sorts of other problems in SQL result from that basic situation,
that no NULL value ever equals itself.

But its worse than that, in that SQL isn't even consistent with
itself in how it treats nulls.  With some kinds of operations or
queries, it treats every null being unique, and in other situations
it treats them all as being equal.  No normal data type has this
problem.

So you have to write much more complicated SQL and application code
to handle data which may be null to get the results that you want.

At 10:28 PM -0800 3/10/06, Roger Binns wrote:
>My main app happens to store phone numbers.  You won't believe
>how irritating it is when I find things automatically assume they
>are integers.

The problem you describe only happens when you *are* using manifest
types, since code that you haven't written is looking at the content
of your variable and guessing incorrectly how to treat it based on
what its content looks like.  By contrast, if you explicitly declare
that your phone numbers are text (or a custom data type), for
example, then the database will never treat it like an integer.  In
this respect at least, you made my point for me about strong types
reducing errors.

>Just for the record:
>
>>it wouldn't require any significant amount more code.
>
>Yes it would.  My code currently approximates to this:
>
>  cursor.execute("insert into foo (x,y,z) values(?,?,?)", x,y,z)
>
>It would have to change into this:
>
>  # column x is defined as string
>  if isinstance(x, string): storex=x
>  elif isinstance(x, int): storex=`x`
>  elif isinstance(x, bool):        if x: storex="1" else: storex="0"
>  else # various other types and conditions for this context
>  # repeat for y and z
>  ....
>  # add in values
>  cursor.execute("insert into foo (x,y,z) values(?,?,?)", storex,
>storey, storez)

It's clear from your example that you actually want to store multiple
distinct types of data in the same table columns.  In this case,
under my proposal, you would declare that column to either be of the
Scalar type or don't specify a type at all.  Then your code remains
as it was.

My first point is that for people who actually want a column that
stores just text or just numbers etc, they declare columns as those
types explicitly, and therefore data of those types is all which will
be stored.

Moreover, such people using a manifestly typed programming language
would already be working under the assumption that, while their app
variables are capable of storing multiple data types, they think that
they are only storing the one type they want.  Eg, a count variable
would not be assigned 'abc' in their program, or if it was, that
would be an error.  Since they assume that the correct type of data
is in their variables, they can also just store it in the stricter
database type without any conditionals, using one line as before.

>Not having manifest types in the database throws away information
>when you store values and requires restituting them when reading.

I don't propose throwing away manifest types, but rather that people
can choose between manifest or non-manifest types as it suits
themselves.  SQLite 3 sort of does that already with its column
afinity, but my proposal would make the distinction more formal or
easier to optimize.

>>Or, looking at this another way, perhaps the Python bindings for
>>SQLite should be taking care of this for you.
>
>They can't, unless they do something like silenty add an extra
>column that stores the types of the values in the other columns
>and attempt to transparently modify the SQL as it flys by to get or
>update that column.  (BTW I also happen to be an author
>of wrappers for Python).  (Your proposal sort of does this
>by introducing a manifest type.)

SQLite and Python both already do this behind the scenes to implement
their manifest typing.  Computers only know numbers, with everything
else being an abstraction; some extra numbers are stored that tell it
how to interpret the other numbers.

But perhaps we're thinking of slightly different things.

>I would suggest finding an open source application that uses
>SQLite and see if you would indeed make it simpler.  One good
>example I would suggest is Trac which was originally written
>to use SQLite.

I'll look into this and get back to you some time.  Though I have
other usage scenarios that I would be addressing first.

-- Darren Duncan
Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

K. Haley
Darren Duncan wrote:

>> I never understood that restriction. I read in the books: "since we
>> have defined things this ways from a formal point of view there's no
>> room for NULL". And my question is well, why don't you change the
>> definitions to augment the datatype sets with a special constant NULL
>> which is by definition not present in any datatype? Wouldn't that
>> give an analogous theory more aligned with real world?
>
> If you want to have a data type which can represent only a single
> value and use it to mean unknown, and all instances of that value are
> equal, then that would be fine.
I think you've missed the concept of NULL.  You seem to be thinking of
the NULL pointer as used in most programming languages.  They use a
constant to indicate an empty pointer and call it NULL.  A NULL is NOT a
specific value, it's the absence of value or information.  Since you
don't know what a particular NULL is, it can be anything.

>
> The main problem with NULL is more how it is used in SQL than the idea
> itself.
>
> For one thing, SQL's NULL violates the logical principle that after
> you say "set foo to bar, then foo equals bar".  With every normal data
> type, if "foo := 1; bar := foo;" then a subsequent comparison of "foo
> = bar" would return true.  But with nulls, if you say "foo := NULL;
> bar:= foo", then a subsequent comparison of "foo = bar" does not
> return true.
>
> More simply, with nulls, saying "foo = foo" will not return true,
> which flies in the face of common sense.
>
> All sorts of other problems in SQL result from that basic situation,
> that no NULL value ever equals itself.
Given what I've said about NULL being unknown, it follows that
NULL!=NULL which is why each NULL is unique in a unique index and you
can only test for it with isnull.  The only problems I've had with NULL
have stemmed from my programmers view of NULL.
>
> But its worse than that, in that SQL isn't even consistent with itself
> in how it treats nulls.  With some kinds of operations or queries, it
> treats every null being unique, and in other situations it treats them
> all as being equal.  No normal data type has this problem.
What situations treat them all as being equal?  Off hand I can't think
of any, but then my SQL experience isn't that broad.



signature.asc (258 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

developir@yahoo.com
You can model NULL with the creation of a few more tables
listing the rows in which information is absent. But why should
you create more work for yourself? People find SQL to be useful in
its present form - NULLs and all. It is a fairly standard and well
known thing.

If some people wish to pursue writing The Third Manifesto or some
other SQL-derivative work, they should consider making a new
SourceForge project of it (or whatever). Debating its merits here
is offtopic to SQLite.

> I think you've missed the concept of NULL.  You seem to be thinking of
> the NULL pointer as used in most programming languages.  They use a
> constant to indicate an empty pointer and call it NULL.  A NULL is NOT a
> specific value, it's the absence of value or information.  Since you
> don't know what a particular NULL is, it can be anything.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

RE: updating SQLite to implement The Third Manifesto

Fred Williams
Sounds like a great idea to me.

I for one like SQLite just the way it is, warts and all.  It serves my
purposes for a small, fast, efficient, flexible, and etc. Database.  If
it starts "bloating up" to meet such esoteric purposes as the "third
manifesto",  I will once again have to restart my search for a small,
light, and etc. database to replace SQLite.  I was strictly looking for
those attributes when I found SQLite.

I am not against continued development and enhancement when those
endeavors serve to enhance the small, light, fast, and etc. virtues of
SQLite.  Things like improved "ALTER" and other missing SQL standards
are the things I'm looking forward to.

I just looked.  Can't find the book.  I do remember reading Date many
years ago.  I kept nodding off, so don't' know if ever read the third
manifesto or not.  Didn't seem to ruin my career in regards to
application of relational database technology.

Fred

> -----Original Message-----
> From: Joe Wilson [mailto:[hidden email]]
> Sent: Sunday, March 12, 2006 11:03 AM
> To: [hidden email]
> Subject: Re: [sqlite] updating SQLite to implement The Third Manifesto
>
>
> You can model NULL with the creation of a few more tables
> listing the rows in which information is absent. But why should
> you create more work for yourself? People find SQL to be useful in
> its present form - NULLs and all. It is a fairly standard and well
> known thing.
>
> If some people wish to pursue writing The Third Manifesto or some
> other SQL-derivative work, they should consider making a new
> SourceForge project of it (or whatever). Debating its merits here
> is offtopic to SQLite.
>
> > I think you've missed the concept of NULL.  You seem to be
> thinking of
> > the NULL pointer as used in most programming languages.  They use a
> > constant to indicate an empty pointer and call it NULL.  A
> NULL is NOT a
> > specific value, it's the absence of value or information.  Since you
> > don't know what a particular NULL is, it can be anything.
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

D. Richard Hipp
In reply to this post by K. Haley
"K. Haley" <[hidden email]> wrote:
> >
> > But its worse than that, in that SQL isn't even consistent with itself
> > in how it treats nulls.  With some kinds of operations or queries, it
> > treats every null being unique, and in other situations it treats them
> > all as being equal.  No normal data type has this problem.
> What situations treat them all as being equal?  Off hand I can't think
> of any, but then my SQL experience isn't that broad.

Every SQL engine I have tested treats NULLs as equal
in the following contexts:

   * ORDER BY
   * SELECT DISTINCT ...
   * SELECT ... UNION SELECT ....

--
D. Richard Hipp   <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

D. Richard Hipp
In reply to this post by developir@yahoo.com
Joe Wilson <[hidden email]> wrote:
>
> If some people wish to pursue writing The Third Manifesto or some
> other SQL-derivative work, they should consider making a new
> SourceForge project of it (or whatever). Debating its merits here
> is offtopic to SQLite.
>

Let me amplify this by pointing out that I have zero interest
in taking SQLite in the direction of The Third Manifesto.  Those
who want to do so are welcomed - encouraged even - to fork the
tree and go off in their own direction.  Just do not expect me
to follow along, please.
--
D. Richard Hipp   <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

Darren Duncan
At 7:05 PM -0500 3/12/06, [hidden email] wrote:
>Let me amplify this by pointing out that I have zero interest
>in taking SQLite in the direction of The Third Manifesto.  Those
>who want to do so are welcomed - encouraged even - to fork the
>tree and go off in their own direction.  Just do not expect me
>to follow along, please.

Richard, let me ask you this, though, to clarify your position.

Would you be willing to build a distinctly new, and different-named
product, reusing what you can of the SQLite code base (that you know
so well) in the process, mainly the virtual machine and b-tree etc,
which implements a set of specifications I provide, if you are paid
your standard custom work rates for the time spent?

I just wanted to know whether the whole idea I was raising was
distasteful to you period, or whether you simply didn't want to make
SQLite itself work that way.

I should also clarify my own position, that I *have* started my own
project to implement the ideas I had.  It isn't on Sourceforge, but
it is hosted on other public repositories and is released via CPAN.

The reason I am raising these issues in the SQLite community is that
I like SQLite and I think it provides a good and mature point of
departure for implementing a fast and efficient performing version of
what I am otherwise making on my own.

But moreover, that I was preferring to pay someone else who was a lot
stronger in C than I am to do this work, rather than doing it myself.

All this said, I think I will shelve this discussion for now.  Then,
when I actually have a working implementation of my proposal, I will
draw your attention to it later to see if anyone is interested in
seeing how things actually work out in practice; it may not be the
same as you preconceive.

-- Darren Duncan
Reply | Threaded
Open this post in threaded view
|

Re: updating SQLite to implement The Third Manifesto

D. Richard Hipp
Darren Duncan <[hidden email]> wrote:

> At 7:05 PM -0500 3/12/06, [hidden email] wrote:
> >Let me amplify this by pointing out that I have zero interest
> >in taking SQLite in the direction of The Third Manifesto.  Those
> >who want to do so are welcomed - encouraged even - to fork the
> >tree and go off in their own direction.  Just do not expect me
> >to follow along, please.
>
> Richard, let me ask you this, though, to clarify your position.
>
> Would you be willing to build a distinctly new, and different-named
> product, reusing what you can of the SQLite code base (that you know
> so well) in the process, mainly the virtual machine and b-tree etc,
> which implements a set of specifications I provide, if you are paid
> your standard custom work rates for the time spent?
>

Certainly.  But please know that my standard custom work rate is
pretty high.  The work you are talking about can easily come it
at 6 figures with a 6-month lead time.


--
D. Richard Hipp   <[hidden email]>