Duplicate records

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Duplicate records

Adam-76
Hi All,

I'm creating a database which will look a little like:
create table main (f1 TEXT, f2 TEXT, f3 TEXT, t1 TEXT, t2 TEXT, tn TEXT);

f1, f2 and f3 will always be present. Any or all of the ts might be NULL.

I'd like to prevent entries where all of the fields are the same being
created. How can I do this. I've tried experimenting with UNIQUE KEYs but
haven't been successful - perhaps because of the NULLs?

If this isn't possible my second best option would be a command which
removes existing duplicates.

Thanks a lot for any pointers,
Adam

--
Adam Richardson
Carpe Diem
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate records

Jay Sprenkle
> Hi All,
>
> I'm creating a database which will look a little like:
> create table main (f1 TEXT, f2 TEXT, f3 TEXT, t1 TEXT, t2 TEXT, tn TEXT);
>
> f1, f2 and f3 will always be present. Any or all of the ts might be NULL.
>
> I'd like to prevent entries where all of the fields are the same being
> created. How can I do this. I've tried experimenting with UNIQUE KEYs but
> haven't been successful - perhaps because of the NULLs?

You might try adding NOT NULL to your column constraints and
a default to an empty string in the create table.
I thought UNIQUE implied NOT NULL but I might be
wrong.
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate records

Chris Peachment
In reply to this post by Adam-76
On Thu, 15 Jun 2006 20:12:41 +0100, Adam wrote:

>Hi All,

>I'm creating a database which will look a little like:
>create table main (f1 TEXT, f2 TEXT, f3 TEXT, t1 TEXT, t2 TEXT, tn TEXT);

>f1, f2 and f3 will always be present. Any or all of the ts might be NULL.

>I'd like to prevent entries where all of the fields are the same being
>created. How can I do this. I've tried experimenting with UNIQUE KEYs but
>haven't been successful - perhaps because of the NULLs?

>If this isn't possible my second best option would be a command which
>removes existing duplicates.

>Thanks a lot for any pointers,
>Adam

>--
>Adam Richardson
>Carpe Diem

Use an MD5 signature across the values of all fields and store that value
in an indexed additional field. When the apparent duplicate is presented
for storage, compute its MD5 signature and do preliminary query to
check for existence of a record with the same signature. Reject the new
record if the signature is found in an existing record.



Reply | Threaded
Open this post in threaded view
|

Re: Duplicate records

Adam-76
In reply to this post by Jay Sprenkle
In message <[hidden email]>,
Jay Sprenkle wrote:

> > Hi All,
> >
> > I'm creating a database which will look a little like:
> > create table main (f1 TEXT, f2 TEXT, f3 TEXT, t1 TEXT, t2 TEXT, tn TEXT);
> >
> > f1, f2 and f3 will always be present. Any or all of the ts might be NULL.
> >
> > I'd like to prevent entries where all of the fields are the same being
> > created. How can I do this. I've tried experimenting with UNIQUE KEYs but
> > haven't been successful - perhaps because of the NULLs?
>
> You might try adding NOT NULL to your column constraints and
> a default to an empty string in the create table.
> I thought UNIQUE implied NOT NULL but I might be
> wrong.

Thanks a lot - I think I've got it working now:

  create table main (f1 TEXT not null default '~', f2 TEXT not null default
  '~', f3 TEXT not null default '~', t1 TEXT not null default '~', t2 TEXT
  not null default '~', tn TEXT not null default '~');
 
...gives me an error when I try to insert two similar rows :-) Now I'll just
have to work out how to sensibly deal with the error in my C app.

Cheers,
Adam
 
--
Adam Richardson
Carpe Diem
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate records (null handling)

David M. Cook
In reply to this post by Jay Sprenkle
On Thu, Jun 15, 2006 at 02:23:58PM -0500, Jay Sprenkle wrote:

> You might try adding NOT NULL to your column constraints and
> a default to an empty string in the create table.
> I thought UNIQUE implied NOT NULL but I might be
> wrong.

The null handling page

http://sqlite.org/nulls.html

says that nulls are distinct in a column, but I haven't found that to be the
case (this is a test with 3.3.5)

create table foo (bar text, baz text, unique(bar, baz));
insert into foo values ('quux', null);
insert into foo values ('quux', null);
insert into foo values ('quux', null);
.header on
.null null
select * from foo;

bar|baz
--------
quux|null
quux|null
quux|null


I ended up defining one of my unique key columns that I wanted to be
optional as

    NOT NULL ON CONFLICT REPLACE DEFAULT '',

to solve this problem.

Dave Cook
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate records (null handling)

Will Leshner-3
On 6/17/06, David M. Cook <[hidden email]> wrote:

> says that nulls are distinct in a column, but I haven't found that to be the
> case (this is a test with 3.3.5)
>
> create table foo (bar text, baz text, unique(bar, baz));

> select * from foo;
>
> bar|baz
> --------
> quux|null
> quux|null
> quux|null

But if NULL is distinct for UNIQUE columns, isn't that the result you
would expect?
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate records (null handling)

David M. Cook
On Sat, Jun 17, 2006 at 10:16:47AM -0700, Will Leshner wrote:

> But if NULL is distinct for UNIQUE columns, isn't that the result you
> would expect?

I took it to mean NULL is distinct from any other value, not that each NULL
is distinct from any other NULL.

Why would one want NULL to behave this way?

Dave Cook
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate records (null handling)

Will Leshner-3
On 6/17/06, David M. Cook <[hidden email]> wrote:

> I took it to mean NULL is distinct from any other value, not that each NULL
> is distinct from any other NULL.

I believe it means that NULL is distinct even with itself. In other
words, I believe NULL == NULL is false.
Reply | Threaded
Open this post in threaded view
|

Avoiding duplicate record insertion

David D Speck
Please pardon what may be a rank newbie question, but I'm just starting
with SQL and have not seen this addressed in the tutorials that I've
studied.

I have to compile a single name and address list from many small list
files.  It is likely that many of the names will be present in more than
one input list, but I only want to have one copy of them in the master
SQL table.

What would the most elegant way be to insert a name and address entry
into the main table ONLY if it is not already there?  I could see doing
a SELECT WHERE lname = new_lname AND fname = new_fname, AND street =
new_street, etc, and then aborting the INSERT if the SELECT returns a
match.

I just wondered if there was a neater way to accomplish this?

Thanks in advance,
Dave
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding duplicate record insertion

Chris Peachment
On Sun, 18 Jun 2006 00:23:58 -0400, David D Speck wrote:

>Please pardon what may be a rank newbie question, but I'm just starting
>with SQL and have not seen this addressed in the tutorials that I've
>studied.

>I have to compile a single name and address list from many small list
>files.  It is likely that many of the names will be present in more than
>one input list, but I only want to have one copy of them in the master
>SQL table.

>What would the most elegant way be to insert a name and address entry
>into the main table ONLY if it is not already there?  I could see doing
>a SELECT WHERE lname = new_lname AND fname = new_fname, AND street =
>new_street, etc, and then aborting the INSERT if the SELECT returns a
>match.

>I just wondered if there was a neater way to accomplish this?

You could declare each column in the table to be UNIQUE and let the
database report each duplicate. For example:

SQLite version 3.3.6
Enter ".help" for instructions
sqlite>create table test (
   ...>lname varchar(60) unique,
   ...>fname varchar(60) unique,
   ...>street varchar(60) unique);
sqlite>insert into test values ('a', 'b', 'c');
sqlite>insert into test values ('a', 'b', 'c');
SQL error: column street is not unique
sqlite>

In a program, you might want to catch the error and report it.




Reply | Threaded
Open this post in threaded view
|

Re: Avoiding duplicate record insertion

Aristotle Pagaltzis
In reply to this post by David D Speck
Hi David,

* David D Speck <[hidden email]> [2006-06-18 06:25]:
> What would the most elegant way be to insert a name and address
> entry into the main table ONLY if it is not already there? I
> could see doing a SELECT WHERE lname = new_lname AND fname =
> new_fname, AND street = new_street, etc, and then aborting the
> INSERT if the SELECT returns a match.
>
> I just wondered if there was a neater way to accomplish this?

how about adding a UNIQUE constraint to your table?

    CREATE TABLE foo (
        fname TEXT,
        lname TEXT,
        street TEXT,
        UNIQUE( fname, lname, street )
    );

Trying to `INSERT` a duplicate row will then throw an error. If
you don’t care to know about dupes and just want to bung the
data into the table, use `INSERT OR IGNORE ...` so failure will
be silent.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding duplicate record insertion

Geoff Lane-4
In reply to this post by Chris Peachment
On Sun, Jun 18, 2006 at 01:16:35AM -0400, C.Peachment wrote:

> On Sun, 18 Jun 2006 00:23:58 -0400, David D Speck wrote:
> >What would the most elegant way be to insert a name and address entry
> >into the main table ONLY if it is not already there?  I could see doing
> >a SELECT WHERE lname = new_lname AND fname = new_fname, AND street =
> >new_street, etc, and then aborting the INSERT if the SELECT returns a
> >match.
>
> >I just wondered if there was a neater way to accomplish this?
>
> You could declare each column in the table to be UNIQUE and let the
> database report each duplicate. For example:
>
> SQLite version 3.3.6
> Enter ".help" for instructions
> sqlite>create table test (
>    ...>lname varchar(60) unique,
>    ...>fname varchar(60) unique,
>    ...>street varchar(60) unique);
> sqlite>insert into test values ('a', 'b', 'c');
> sqlite>insert into test values ('a', 'b', 'c');
> SQL error: column street is not unique
> sqlite>
>
> In a program, you might want to catch the error and report it.

I have a related question. Suppose I have a table containing rows each with
values and a counter.  If the new row is unique in the table it should be
INSERTed and the counter set to 1; otherwise the counter for the matching
row should be incremented.  It's a classic data reduction procedure.

As most of the operations will be UPDATEs I want to do something like
        UPDATE ... if failed INSERT...
but I can't see an efficient way to express that in SQL understood by
sqlite.

Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding duplicate record insertion

Kurt Welgehausen
[hidden email] wrote:

> I have a related question. Suppose I have a table containing rows each with
> values and a counter.  If the new row is unique in the table it should be
> INSERTed and the counter set to 1; otherwise the counter for the matching
> row should be incremented.  It's a classic data reduction procedure.
>
> As most of the operations will be UPDATEs I want to do something like
> UPDATE ... if failed INSERT...
> but I can't see an efficient way to express that in SQL understood by
> sqlite.
>
> Thanks,

Don't worry so much about efficiency until you're sure you
have a problem.

 insert or ignore into tbl (primaryKey, counter, otherVal, ...)
 values (thisPK, 0, thisOtherVal, ...);

 update tbl set counter = counter + 1 where primaryKey = thisPK;


Regards