is SQLite the right tool to analyze a 44GB file

classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

is SQLite the right tool to analyze a 44GB file

peter korinis
I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).

 

I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all <15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data.

 

I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel & Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).

 

From what I've read SQLite can handle this size DB. So it seems that either
I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.

 

1.       Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.       If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?

 

Thanks much for helping a newbie!

 

peterK

 

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Baruch Burstein-2
If none of your fields contain a comma, you can just use the sqlite3
terminal to load a csv file.

On Tue, May 1, 2012 at 11:06 PM, peter korinis <[hidden email]>wrote:

> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
>
>
>
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all <15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data.
>
>
>
> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel
> dual-proc
> with 4GB RAM + 200GB free disk space.
>
> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).
>
>
>
> From what I've read SQLite can handle this size DB. So it seems that either
> I do not have enough RAM or there are memory/storage (default) limits or
> maybe time-out issues that prevent loading this large file . or the 2 GUI
> tools I tried have size limits. I do have a fast server (16GB, 12 procs,
> 64-bit intel, Win server) and an iMAC available.
>
>
>
> 1.       Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)
>
> 2.       If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this
> project?
>
>
>
> Thanks much for helping a newbie!
>
>
>
> peterK
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
Programming today is a race between software engineers striving to build
bigger and better idiot-proof programs, and the Universe trying to produce
bigger and better idiots. So far, the Universe is winning.  - Rich Cook
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Black, Michael (IS)
In reply to this post by peter korinis
You need to try and do an import from the shell.  GUIs seem to have way too many limits.

http://sqlite.org/download.html



Don't do any indexes up front....do them afterwords if they'll help your queries.  Indexes will slow down your import notably.



I don't think you're anywhere near the limits of sqlite since it talks about terabytes.

http://sqlite.org/limits.html



Somebody else can answer for sure but wrapping your .import inside a transaction may be a good thing.

I don't know if that's done by default.



Your queries are liable to be pretty slow depending on what you have to do.









Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems

________________________________
From: [hidden email] [[hidden email]] on behalf of peter korinis [[hidden email]]
Sent: Tuesday, May 01, 2012 3:06 PM
To: [hidden email]
Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file

I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).



I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all <15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data.



I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel & Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).



From what I've read SQLite can handle this size DB. So it seems that either
I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.



1.       Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.       If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?



Thanks much for helping a newbie!



peterK



_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Baruch Burstein-2
It is already wrapped in a transaction.
I seem to remember seeing somewhere that the .import command doesn't
understand escaping, e.g.

    "one","two,three"

will get imported as

    "one" | "two | three"

(the quotes are part of the data, and the second column was split into two
by the comma)
Just a point to be aware of.

On Tue, May 1, 2012 at 11:22 PM, Black, Michael (IS) <[hidden email]
> wrote:

> You need to try and do an import from the shell.  GUIs seem to have way
> too many limits.
>
> http://sqlite.org/download.html
>
>
>
> Don't do any indexes up front....do them afterwords if they'll help your
> queries.  Indexes will slow down your import notably.
>
>
>
> I don't think you're anywhere near the limits of sqlite since it talks
> about terabytes.
>
> http://sqlite.org/limits.html
>
>
>
> Somebody else can answer for sure but wrapping your .import inside a
> transaction may be a good thing.
>
> I don't know if that's done by default.
>
>
>
> Your queries are liable to be pretty slow depending on what you have to do.
>
>
>
>
>
>
>
>
>
> Michael D. Black
>
> Senior Scientist
>
> Advanced Analytics Directorate
>
> Advanced GEOINT Solutions Operating Unit
>
> Northrop Grumman Information Systems
>
> ________________________________
> From: [hidden email] [[hidden email]]
> on behalf of peter korinis [[hidden email]]
> Sent: Tuesday, May 01, 2012 3:06 PM
> To: [hidden email]
> Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file
>
> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
>
>
>
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all <15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data.
>
>
>
> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel
> dual-proc
> with 4GB RAM + 200GB free disk space.
>
> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).
>
>
>
> From what I've read SQLite can handle this size DB. So it seems that either
> I do not have enough RAM or there are memory/storage (default) limits or
> maybe time-out issues that prevent loading this large file . or the 2 GUI
> tools I tried have size limits. I do have a fast server (16GB, 12 procs,
> 64-bit intel, Win server) and an iMAC available.
>
>
>
> 1.       Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)
>
> 2.       If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this
> project?
>
>
>
> Thanks much for helping a newbie!
>
>
>
> peterK
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
Programming today is a race between software engineers striving to build
bigger and better idiot-proof programs, and the Universe trying to produce
bigger and better idiots. So far, the Universe is winning.  - Rich Cook
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

oliver1804
In reply to this post by peter korinis
Am 01.05.2012 22:06, schrieb peter korinis:

> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
>
>
>
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all<15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data.
>
>
>
> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
> with 4GB RAM + 200GB free disk space.
>
> End-user tools like Excel&  Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).
>
>
>
>> From what I've read SQLite can handle this size DB. So it seems that either
> I do not have enough RAM or there are memory/storage (default) limits or
> maybe time-out issues that prevent loading this large file . or the 2 GUI
> tools I tried have size limits. I do have a fast server (16GB, 12 procs,
> 64-bit intel, Win server) and an iMAC available.
>
>
>
> 1.       Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)
>
> 2.       If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this project?
>
>
>
> Thanks much for helping a newbie!
>
>
>
> peterK
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

Hello - reading your post I'd say you're on the same path I started a
few years ago :-)

You need the following knowledge to work with databases (including Access):

- databasedesign (catchword normalization)

- SQL (I mean writing your own code to really understand what you're doing)

- a scripting language to preprocess your data (i.e. your data might be
in a fixed length format,...). I recommend gawk because it's powerful
and simple.

- it might be necessary to inspect your csv file if you don't know the
structure (is there a header?, what separator is used?,...) so under
windows I'd try to print the first 10 lines into a new file and use an
editor to inspect it or use a pager program like "less" that is capable
to open your 44 GB file in a second (don't use notepad for this task ;-))

In a few words: a database is not just a more powerful spreadsheet program

greetings
Oliver
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

peter korinis
In reply to this post by Black, Michael (IS)
Thank you all.
Look like I'm stuck with the CLI though I have contacted Nucleon software
support ... tried CLI yesterday but need more practice.
Is there a good reference book you would recommend for SQLite?

peter

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Black, Michael (IS)
Sent: Tuesday, May 01, 2012 4:22 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

You need to try and do an import from the shell.  GUIs seem to have way too
many limits.

http://sqlite.org/download.html



Don't do any indexes up front....do them afterwords if they'll help your
queries.  Indexes will slow down your import notably.



I don't think you're anywhere near the limits of sqlite since it talks about
terabytes.

http://sqlite.org/limits.html



Somebody else can answer for sure but wrapping your .import inside a
transaction may be a good thing.

I don't know if that's done by default.



Your queries are liable to be pretty slow depending on what you have to do.









Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems

________________________________
From: [hidden email] [[hidden email]] on
behalf of peter korinis [[hidden email]]
Sent: Tuesday, May 01, 2012 3:06 PM
To: [hidden email]
Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file

I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).



I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all <15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data.



I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel & Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).



From what I've read SQLite can handle this size DB. So it seems that either
I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.



1.       Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.       If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?



Thanks much for helping a newbie!



peterK



_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Black, Michael (IS)
Does that mean using the CLI worked for you?



If so, you may be able to access the database with the other programs AFTER you create it.

Seems that creating a db from csv is challenging to some and not thoroughly tested for large data sources.



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems

________________________________
From: [hidden email] [[hidden email]] on behalf of peter korinis [[hidden email]]
Sent: Wednesday, May 02, 2012 9:06 AM
To: 'General Discussion of SQLite Database'
Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file

Thank you all.
Look like I'm stuck with the CLI though I have contacted Nucleon software
support ... tried CLI yesterday but need more practice.
Is there a good reference book you would recommend for SQLite?

peter

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Black, Michael (IS)
Sent: Tuesday, May 01, 2012 4:22 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

You need to try and do an import from the shell.  GUIs seem to have way too
many limits.

http://sqlite.org/download.html



Don't do any indexes up front....do them afterwords if they'll help your
queries.  Indexes will slow down your import notably.



I don't think you're anywhere near the limits of sqlite since it talks about
terabytes.

http://sqlite.org/limits.html



Somebody else can answer for sure but wrapping your .import inside a
transaction may be a good thing.

I don't know if that's done by default.



Your queries are liable to be pretty slow depending on what you have to do.









Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems

________________________________
From: [hidden email] [[hidden email]] on
behalf of peter korinis [[hidden email]]
Sent: Tuesday, May 01, 2012 3:06 PM
To: [hidden email]
Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file

I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).



I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all <15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data.



I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel & Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).



From what I've read SQLite can handle this size DB. So it seems that either
I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.



1.       Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.       If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?



Thanks much for helping a newbie!



peterK



_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

peter korinis
In reply to this post by peter korinis
not sure yet ... but i'm working on it (between interruptions).
thanks


-----Original Message-----

>From: "Black, Michael (IS)" <[hidden email]>
>Sent: May 2, 2012 10:15 AM
>To: General Discussion of SQLite Database <[hidden email]>
>Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>Does that mean using the CLI worked for you?
>
>
>
>If so, you may be able to access the database with the other programs AFTER you create it.
>
>Seems that creating a db from csv is challenging to some and not thoroughly tested for large data sources.
>
>
>
>Michael D. Black
>
>Senior Scientist
>
>Advanced Analytics Directorate
>
>Advanced GEOINT Solutions Operating Unit
>
>Northrop Grumman Information Systems
>
>________________________________
>From: [hidden email] [[hidden email]] on behalf of peter korinis [[hidden email]]
>Sent: Wednesday, May 02, 2012 9:06 AM
>To: 'General Discussion of SQLite Database'
>Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>Thank you all.
>Look like I'm stuck with the CLI though I have contacted Nucleon software
>support ... tried CLI yesterday but need more practice.
>Is there a good reference book you would recommend for SQLite?
>
>peter
>
>-----Original Message-----
>From: [hidden email]
>[mailto:[hidden email]] On Behalf Of Black, Michael (IS)
>Sent: Tuesday, May 01, 2012 4:22 PM
>To: General Discussion of SQLite Database
>Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>You need to try and do an import from the shell.  GUIs seem to have way too
>many limits.
>
>http://sqlite.org/download.html
>
>
>
>Don't do any indexes up front....do them afterwords if they'll help your
>queries.  Indexes will slow down your import notably.
>
>
>
>I don't think you're anywhere near the limits of sqlite since it talks about
>terabytes.
>
>http://sqlite.org/limits.html
>
>
>
>Somebody else can answer for sure but wrapping your .import inside a
>transaction may be a good thing.
>
>I don't know if that's done by default.
>
>
>
>Your queries are liable to be pretty slow depending on what you have to do.
>
>
>
>
>
>
>
>
>
>Michael D. Black
>
>Senior Scientist
>
>Advanced Analytics Directorate
>
>Advanced GEOINT Solutions Operating Unit
>
>Northrop Grumman Information Systems
>
>________________________________
>From: [hidden email] [[hidden email]] on
>behalf of peter korinis [[hidden email]]
>Sent: Tuesday, May 01, 2012 3:06 PM
>To: [hidden email]
>Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file
>
>I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
>dev support for a pilot project (single user, no updates, just queries).
>
>
>
>I want to analyze the data contained in a 44GB csv file with 44M rows x 600
>columns (fields all <15 char). Seems like a DBMS will allow me to query it
>in a variety of ways to analyze the data.
>
>
>
>I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
>with 4GB RAM + 200GB free disk space.
>
>End-user tools like Excel & Access failed due to lack of memory. I
>downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
>but it would not load the csv files - 'csv worker failed'. So I tried
>Database Master from Nucleon but it failed after loading (it took 100
>minutes) ~57,000 rows with error message = 'database or disk is full". I
>tried to create another table in the same db but could not with same error
>message. The DB size shows as 10,000KB (that looks suspiciously like a size
>setting?).
>
>
>
>From what I've read SQLite can handle this size DB. So it seems that either
>I do not have enough RAM or there are memory/storage (default) limits or
>maybe time-out issues that prevent loading this large file . or the 2 GUI
>tools I tried have size limits. I do have a fast server (16GB, 12 procs,
>64-bit intel, Win server) and an iMAC available.
>
>
>
>1.       Is SQLite the wrong tool for this project? (I don't want the
>overkill and admin overhead of a large MySQL or SQL Server, etc.)
>
>2.       If SQLite will work, are there configuration settings in SQLite or
>Win7 that will permit the load . or is there a better tool for this project?
>
>
>
>Thanks much for helping a newbie!
>
>
>
>peterK
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


pk
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Warren Young
In reply to this post by peter korinis
On 5/1/2012 2:06 PM, peter korinis wrote:
> Is SQLite the wrong tool for this project?

Probably.

SQLite is a data storage tool.  With enough SQL cleverness, you can turn
it into a data *reduction* tool.  But a data analysis tool?  No, not
without marrying it to a real programming language.

Granted, that's what almost everyone does do with SQLite, but if you're
going to learn a programming language, I'd recommend you learn R, a
language and environment made for the sort of problem you find yourself
stuck with.  http://r-project.org/

There are several R GUIs out there.  I like R Studio best:
http://www.rstudio.org/

You'll still find R Studio a sharp shock compared to Excel.  And yes, it
will require some programming, and yes, I know you said you aren't a
programmer.  But in the rest of the thread, it looks like people have
convinced you to use SQLite from the command line, typing in raw SQL
commands; guess what, that's programming.  Not on the level of R code,
but R isn't far down the same slippery slope.

It may help you to know that R is most popular in the statistics
community, which of course is populated by statisticians, not programmers.

R isn't the easiest programming language to pick up, but it's far from
the hardest.  It's very similar to JavaScript, though a bit tougher to
learn, mostly due to having accumulated some strange syntax over its 36
years.  (That's counting R's predecessor, S.)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

oliver1804
Am Do 03 Mai 2012 15:35:46 CEST schrieb Warren Young:

> On 5/1/2012 2:06 PM, peter korinis wrote:
>> Is SQLite the wrong tool for this project?
>
> Probably.
>
> SQLite is a data storage tool. With enough SQL cleverness, you can
> turn it into a data *reduction* tool. But a data analysis tool? No,
> not without marrying it to a real programming language.
>
> Granted, that's what almost everyone does do with SQLite, but if
> you're going to learn a programming language, I'd recommend you learn
> R, a language and environment made for the sort of problem you find
> yourself stuck with. http://r-project.org/
>
> There are several R GUIs out there. I like R Studio best:
> http://www.rstudio.org/
>
> You'll still find R Studio a sharp shock compared to Excel. And yes,
> it will require some programming, and yes, I know you said you aren't
> a programmer. But in the rest of the thread, it looks like people have
> convinced you to use SQLite from the command line, typing in raw SQL
> commands; guess what, that's programming. Not on the level of R code,
> but R isn't far down the same slippery slope.
>
> It may help you to know that R is most popular in the statistics
> community, which of course is populated by statisticians, not
> programmers.
>
> R isn't the easiest programming language to pick up, but it's far from
> the hardest. It's very similar to JavaScript, though a bit tougher to
> learn, mostly due to having accumulated some strange syntax over its
> 36 years. (That's counting R's predecessor, S.)
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

I don't think that the question is "R or sqlite?". As you already wrote
sqlite is only a container for more or less structured data (I prefer
more structure) and that is the job it really does well. If he only
needs to aggregate data the functionality sqlite offers is enough. If
more complex analysis is needed (graphical or not) the combination of R
with sqlite (i.e. via RODBC) would do a superb job.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

peter korinis
In reply to this post by Warren Young
I have R but really haven't used it much. I know it's a great stats package
and great for data reduction ... but I want to perform queries against my
44GB of data, filtering records by a variety of attributes, comparing those
subsets in a variety of ad hoc ways, perhaps summing/counting other fields,
etc.
This is the kind of job excel is good at ... but the data is too bit!
Seems like a database plus a good query GUI or some BI app would work. is R
a good query tool?

Thanks,
peter


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Warren Young
Sent: Thursday, May 03, 2012 9:36 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

On 5/1/2012 2:06 PM, peter korinis wrote:
> Is SQLite the wrong tool for this project?

Probably.

SQLite is a data storage tool.  With enough SQL cleverness, you can turn it
into a data *reduction* tool.  But a data analysis tool?  No, not without
marrying it to a real programming language.

Granted, that's what almost everyone does do with SQLite, but if you're
going to learn a programming language, I'd recommend you learn R, a language
and environment made for the sort of problem you find yourself stuck with.
http://r-project.org/

There are several R GUIs out there.  I like R Studio best:
http://www.rstudio.org/

You'll still find R Studio a sharp shock compared to Excel.  And yes, it
will require some programming, and yes, I know you said you aren't a
programmer.  But in the rest of the thread, it looks like people have
convinced you to use SQLite from the command line, typing in raw SQL
commands; guess what, that's programming.  Not on the level of R code, but R
isn't far down the same slippery slope.

It may help you to know that R is most popular in the statistics community,
which of course is populated by statisticians, not programmers.

R isn't the easiest programming language to pick up, but it's far from the
hardest.  It's very similar to JavaScript, though a bit tougher to learn,
mostly due to having accumulated some strange syntax over its 36 years.
(That's counting R's predecessor, S.)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Freddy López
As said Oliver, I don't think the real issue is have to choose one
software: or SQLite or R.

I've used SQLite with data around 10GB under Windows 7 with 4GB of RAM and
it worked perfectly. Yes, it size is less than yours but I learned that the
use of every GUI was a problem.

Since that, I always work simply with sqlite3.exe and other tools such as
SCITE to write queries.

Cheers.


On Thu, May 3, 2012 at 1:29 PM, peter korinis <[hidden email]> wrote:

> I have R but really haven't used it much. I know it's a great stats package
> and great for data reduction ... but I want to perform queries against my
> 44GB of data, filtering records by a variety of attributes, comparing those
> subsets in a variety of ad hoc ways, perhaps summing/counting other fields,
> etc.
> This is the kind of job excel is good at ... but the data is too bit!
> Seems like a database plus a good query GUI or some BI app would work. is R
> a good query tool?
>
> Thanks,
> peter
>
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Warren Young
> Sent: Thursday, May 03, 2012 9:36 AM
> To: General Discussion of SQLite Database
>  Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>  On 5/1/2012 2:06 PM, peter korinis wrote:
> > Is SQLite the wrong tool for this project?
>
> Probably.
>
> SQLite is a data storage tool.  With enough SQL cleverness, you can turn it
> into a data *reduction* tool.  But a data analysis tool?  No, not without
> marrying it to a real programming language.
>
> Granted, that's what almost everyone does do with SQLite, but if you're
> going to learn a programming language, I'd recommend you learn R, a
> language
> and environment made for the sort of problem you find yourself stuck with.
> http://r-project.org/
>
> There are several R GUIs out there.  I like R Studio best:
> http://www.rstudio.org/
>
> You'll still find R Studio a sharp shock compared to Excel.  And yes, it
> will require some programming, and yes, I know you said you aren't a
> programmer.  But in the rest of the thread, it looks like people have
> convinced you to use SQLite from the command line, typing in raw SQL
> commands; guess what, that's programming.  Not on the level of R code, but
> R
> isn't far down the same slippery slope.
>
> It may help you to know that R is most popular in the statistics community,
> which of course is populated by statisticians, not programmers.
>
> R isn't the easiest programming language to pick up, but it's far from the
> hardest.  It's very similar to JavaScript, though a bit tougher to learn,
> mostly due to having accumulated some strange syntax over its 36 years.
> (That's counting R's predecessor, S.)
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
«But Gwindor answered: 'The doom lies in yourself, not in your name.'»

JRR Tolkien
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Warren Young
In reply to this post by peter korinis
On 5/3/2012 11:59 AM, peter korinis wrote:
> is R a good query tool?

It's a programming language.  It can do anything within your power to
persuade the interpreter.

One of the fundamental data types in R is the data frame, which is
roughly equivalent to a SQLite table.

This is an R equivalent to "SELECT * FROM MYTABLE WHERE V2 < 9":

        results <- subset(my.table, V2 < 9)

But, this is not the place for an R tutorial.  Take the question up on
an R mailing list if you want to learn more of its capabilities.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

oliver1804
In reply to this post by peter korinis
Am 03.05.2012 19:59, schrieb peter korinis:
> I have R but really haven't used it much. I know it's a great stats package
> and great for data reduction ... but I want to perform queries against my
> 44GB of data, filtering records by a variety of attributes, comparing those
> subsets in a variety of ad hoc ways, perhaps summing/counting other fields,
> etc.

I prefer

http://www.yunqa.de/delphi/doku.php/products/sqlitespy/index

for creating the database (tables, views) and doing the queries cause
it's fast and reliable but I prefer to write(!) SQL code and not to
create it through a generator (as it is done in Access).

sqlitespy can't do the import job; I always do this with the CLI by
creating INSERT statements with my scripting language in a separate file

since sqlite 3.7.11 you don't need a statement like

INSERT INTO table(col01,col02,col03) VALUES(1,2,3);
INSERT INTO table(col01,col02,col03) VALUES(4,5,6);

you can make it shorter:

INSERT INTO table(col01,col02,col03) VALUES(1,2,3),(4,5,6);

this is a great advantage if you need to do many INSERTs cause your file
won't become so large


> This is the kind of job excel is good at ... but the data is too bit!
> Seems like a database plus a good query GUI or some BI app would work. is R
> a good query tool?

afaik there is no other way than to write (!) SQL Code - depending on
the problem this can be done in an R script or directly in the database
(i.e. as a VIEW) or as a combination

[...]

Oliver
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Valentin Davydov-2
In reply to this post by peter korinis
On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
>
>  
>
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all <15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data.

Yes, SQLite is quite capable of doing simple analyzis of such amounts of data,
especially selecting small subsets based on a simple criteria. However before
trying to do some real work you have to understand the structure of your data,
realize your possible queries and carefully design database schema (tables
and, equally important, indises). Perhaps, putting all data in a single
600-column table is not a good idea (though allowed technically), especially
if your columns are equal by their physical nature: it is not so easy to
select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
> with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times
more disk space than the raw data. Probably 200GB would not be enough,
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).

Try bare sqlite shell instead of those external tools. It should take
at least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3 terabytes
of disk space, contains more than 10^10 records and still provides small
selects of indexed data in real time.

> 1.       Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find
there. SQLite isn't very good for calculation of complex aggregate functions,
but works fine in simple selecting and sorting.

> 2.       If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and
perhaps CACHE_SIZE to fill most of the available RAM would help a bit.
Also, don't forget to turn off journaling and wrap all in a single
transaction when creating database for the first time.

Valentin Davydov.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

peter korinis
In reply to this post by oliver1804
Sqlitespy looks good ... I will try it.
website says download contains sqlite itself, which I already have - will
there be a problem using ...spy with existing sqlite?

I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" to load
a 999x46 comma-delimited file into a previously created empty table with 46
col. (if this works I will load two 22M row x 46 col csv files into that
table.) does this cmd work this way or must I create INSERT statements to do
999 inserts (later 44M inserts)?

Thanks,
peter

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Oliver Peters
Sent: Thursday, May 03, 2012 7:23 PM
To: [hidden email]
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

Am 03.05.2012 19:59, schrieb peter korinis:
> I have R but really haven't used it much. I know it's a great stats
> package and great for data reduction ... but I want to perform queries
> against my 44GB of data, filtering records by a variety of attributes,
> comparing those subsets in a variety of ad hoc ways, perhaps
> summing/counting other fields, etc.

I prefer

http://www.yunqa.de/delphi/doku.php/products/sqlitespy/index

for creating the database (tables, views) and doing the queries cause it's
fast and reliable but I prefer to write(!) SQL code and not to create it
through a generator (as it is done in Access).

sqlitespy can't do the import job; I always do this with the CLI by creating
INSERT statements with my scripting language in a separate file

since sqlite 3.7.11 you don't need a statement like

INSERT INTO table(col01,col02,col03) VALUES(1,2,3); INSERT INTO
table(col01,col02,col03) VALUES(4,5,6);

you can make it shorter:

INSERT INTO table(col01,col02,col03) VALUES(1,2,3),(4,5,6);

this is a great advantage if you need to do many INSERTs cause your file
won't become so large


> This is the kind of job excel is good at ... but the data is too bit!
> Seems like a database plus a good query GUI or some BI app would work. is
R
> a good query tool?

afaik there is no other way than to write (!) SQL Code - depending on
the problem this can be done in an R script or directly in the database
(i.e. as a VIEW) or as a combination

[...]

Oliver
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Simon Slavin-3

On 4 May 2012, at 4:02pm, peter korinis <[hidden email]> wrote:

> Sqlitespy looks good ... I will try it.
> website says download contains sqlite itself, which I already have - will
> there be a problem using ...spy with existing sqlite?

SQLite is not a single library which has to live somewhere on your computer.  It is C code which each programmer includes in their program.  You can have twenty programs on your disk, each using a different version of SQLite, without problems.

> I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" to load
> a 999x46 comma-delimited file into a previously created empty table with 46
> col. (if this works I will load two 22M row x 46 col csv files into that
> table.) does this cmd work this way or must I create INSERT statements to do
> 999 inserts (later 44M inserts)?

Semicolons are needed at the end of SQL commands.  You don't want them at the end of commands which start with a dot.  Apart from that you have something worth trying.  Why not make a tiny test case with three lines of two columns before you get started with the 46-column monster ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

peter korinis
In reply to this post by Valentin Davydov-2
I have scaled down the attributes of interest to 46 columns (discarding the
other 550). No columns are calculated. No updates to this file ... one user
... only  query, sort, etc. type transactions.
So I want to load two 22GB csv files into an empty 46 column table. (I
intend to test load with 999 records by 46 col file.) initially I only have
1 index on a record # ... am not positive several other fields that I want
to index may not be missing data in some records (I assume that will error
out if I make those an index).
After I get the data loaded and inspect for nulls in prospective index
attributes, can I add indices?

I was planning to load using sqlite3 CLI ".import" command. Is there a way I
can monitor the progress of the load, with only minimal impact on
performance ? I've started several loads only to find out hours later that
nothing has been loaded.

Thanks for helpful info.

peter


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Valentin Davydov
Sent: Friday, May 04, 2012 9:43 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user
> with no dev support for a pilot project (single user, no updates, just
queries).
>
>  
>
> I want to analyze the data contained in a 44GB csv file with 44M rows
> x 600 columns (fields all <15 char). Seems like a DBMS will allow me
> to query it in a variety of ways to analyze the data.

Yes, SQLite is quite capable of doing simple analyzis of such amounts of
data, especially selecting small subsets based on a simple criteria. However
before trying to do some real work you have to understand the structure of
your data, realize your possible queries and carefully design database
schema (tables and, equally important, indises). Perhaps, putting all data
in a single 600-column table is not a good idea (though allowed
technically), especially if your columns are equal by their physical nature:
it is not so easy to select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel
> dual-proc with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times
more disk space than the raw data. Probably 200GB would not be enough,
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager
> add-on but it would not load the csv files - 'csv worker failed'. So I
> tried Database Master from Nucleon but it failed after loading (it
> took 100
> minutes) ~57,000 rows with error message = 'database or disk is full".
> I tried to create another table in the same db but could not with same
> error message. The DB size shows as 10,000KB (that looks suspiciously
> like a size setting?).

Try bare sqlite shell instead of those external tools. It should take at
least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3
terabytes of disk space, contains more than 10^10 records and still provides
small selects of indexed data in real time.

> 1.       Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find
there. SQLite isn't very good for calculation of complex aggregate
functions, but works fine in simple selecting and sorting.

> 2.       If SQLite will work, are there configuration settings in SQLite
or
> Win7 that will permit the load . or is there a better tool for this
project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and
perhaps CACHE_SIZE to fill most of the available RAM would help a bit.
Also, don't forget to turn off journaling and wrap all in a single
transaction when creating database for the first time.

Valentin Davydov.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Simon Slavin-3

On 4 May 2012, at 4:23pm, peter korinis <[hidden email]> wrote:

> I have scaled down the attributes of interest to 46 columns (discarding the
> other 550). No columns are calculated. No updates to this file ... one user
> ... only  query, sort, etc. type transactions.
> So I want to load two 22GB csv files into an empty 46 column table. (I
> intend to test load with 999 records by 46 col file.) initially I only have
> 1 index on a record # ... am not positive several other fields that I want
> to index may not be missing data in some records (I assume that will error
> out if I make those an index).
> After I get the data loaded and inspect for nulls in prospective index
> attributes, can I add indices?

Yes.  You have done 'CREATE TABLE'.  The import process will execute many 'INSERT' commands.  And you can 'CREATE INDEX' commands after your data is already present in the table.  It will work the way you want.

> I was planning to load using sqlite3 CLI ".import" command. Is there a way I
> can monitor the progress of the load, with only minimal impact on
> performance ?

Using File Explorer or the Finder or whatever, just keep an information window on the database file open.  You should see the file size increasing constantly while the command is working.  Alternatively you might use Task Manager or Activity Monitor to monitor something else that's going on.  For example, how many octets that process has written to disk.

> I've started several loads only to find out hours later that
> nothing has been loaded.

Yes, that's very annoying.  You keep thinking "It must be finished soon.".

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: is SQLite the right tool to analyze a 44GB file

Black, Michael (IS)
In reply to this post by peter korinis
Here's a utility to import a comma separated file (does not work for quoted strings or strings with commas).



Figures out the # of columns automagically from the csv file.



All wrapped in a singled transaction.

Shows progress every 100,000 inserts.



Usage: csvimport filename databasename tablename



#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "sqlite3.h"

#define BUFSIZE 1000000
#define MOD 100000

char *sep=",\r\n";
// Add comma delimited file to exisiting database/table
// Strings cannot have commas in them and quotes will be retained
int main(int argc, char *argv[]) {
  sqlite3 *db;
  sqlite3_stmt *stmt;
  int rc;
  int ncol=0;
  int nline=0;
  char *buf=malloc(BUFSIZE);
  char sql[8192];
  FILE *fp;
  char *filename;
  char *databasename;
  char *tablename;
  if (argc != 4) {
    fprintf(stderr,"Usage: %s filename databasename tablename\n",argv[0]);
    exit(1);
  }
  filename = argv[1];
  databasename = argv[2];
  tablename = argv[3];
  rc = sqlite3_open_v2(databasename,&db,SQLITE_OPEN_READWRITE,NULL);
  if (rc) {
    fprintf(stderr,"Error opening database '%s': %s\n",databasename,sqlite3_errmsg(db));
    exit(1);
  }
  sprintf(sql,"insert into %s values (",tablename);
  fp=fopen(filename,"r");
  if (fp == NULL) {
    perror(filename);
    exit(1);
  }
  buf[BUFSIZE-1] = '*';
  fgets(buf,BUFSIZE,fp);
  if (buf[BUFSIZE-1] != '*') {
    fprintf(stderr,"BUFSIZE not big enough...aborting\n");
    exit(1);
  }
  // count the columns
  char *p=strtok(buf,sep);
  ncol=0;
  while(p) {
    ++ncol;
    strcat(sql,ncol==1?"":",");
    strcat(sql,"?");
    p=strtok(NULL,sep);
  }
  printf("%d columns detected\n",ncol);
  strcat(sql,")");
  puts(sql);
  rewind(fp);
  // Let's wrap things in a transaction
  rc = sqlite3_exec(db,"BEGIN",NULL,NULL,NULL);
  if (rc) {
    fprintf(stderr,"BEGIN failed on '%s': %s\n",sql,sqlite3_errmsg(db));
    exit(1);
  }
  // prepare our statement
  rc = sqlite3_prepare(db,sql,strlen(sql),&stmt,NULL);
  if (rc) {
    fprintf(stderr,"Prepare failed on '%s': %s\n",sql,sqlite3_errmsg(db));
    exit(1);
  }
  // Loop over file file
  while(fgets(buf,BUFSIZE,fp)) {
    char *p=strtok(buf,sep);
    int i=1;
    ++nline;
    if ((nline % MOD)==0) {
      printf("%d\r",nline);
      fflush(stdout);
    }
    while(p) { // bind the columns as text, table will take care of conversion to column types
      rc=sqlite3_bind_text(stmt,i,p,-1,SQLITE_TRANSIENT);
      if (rc) {
        fprintf(stderr,"bind_text failed on '%s': %s\n",sql,sqlite3_errmsg(db));
        exit(1);
      }
      ++i;
      p=strtok(NULL,sep);
    }
    if (--i != ncol) {
      fprintf(stderr,"expected %d cols, got %d cols on line#%d\n",ncol,i,nline)

    } else {
      rc = sqlite3_step(stmt);
      if (rc != SQLITE_DONE) {
        fprintf(stderr,"Insert failed on '%s': %s\n",sql,sqlite3_errmsg(db));
        exit(1);
      }
      rc = sqlite3_reset(stmt);
      if (rc) {
        fprintf(stderr,"Reset failed on '%s': %s\n",sql,sqlite3_errmsg(db));
        exit(1);
      }
    }
  }
  rc=sqlite3_finalize(stmt);
  if (rc) {
    fprintf(stderr,"Finalize failed: %s\n",sqlite3_errmsg(db));
    exit(1);
  }
  printf("%d inserts, committing...\n",nline);
  rc = sqlite3_exec(db,"COMMIT",NULL,NULL,NULL);
  if (rc) {
    fprintf(stderr,"COMMIT failed on '%s': %s\n",sql,sqlite3_errmsg(db));
    exit(1);
  }
  rc=sqlite3_close(db);
  if (rc) {
    fprintf(stderr,"Close failed on '%s': %s\n",argv[2],sqlite3_errmsg(db));
    exit(1);
  }
  fclose(fp);
  return 0;
}



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems

________________________________
From: [hidden email] [[hidden email]] on behalf of peter korinis [[hidden email]]
Sent: Friday, May 04, 2012 10:23 AM
To: 'General Discussion of SQLite Database'
Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file

I have scaled down the attributes of interest to 46 columns (discarding the
other 550). No columns are calculated. No updates to this file ... one user
... only  query, sort, etc. type transactions.
So I want to load two 22GB csv files into an empty 46 column table. (I
intend to test load with 999 records by 46 col file.) initially I only have
1 index on a record # ... am not positive several other fields that I want
to index may not be missing data in some records (I assume that will error
out if I make those an index).
After I get the data loaded and inspect for nulls in prospective index
attributes, can I add indices?

I was planning to load using sqlite3 CLI ".import" command. Is there a way I
can monitor the progress of the load, with only minimal impact on
performance ? I've started several loads only to find out hours later that
nothing has been loaded.

Thanks for helpful info.

peter


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Valentin Davydov
Sent: Friday, May 04, 2012 9:43 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user
> with no dev support for a pilot project (single user, no updates, just
queries).
>
>
>
> I want to analyze the data contained in a 44GB csv file with 44M rows
> x 600 columns (fields all <15 char). Seems like a DBMS will allow me
> to query it in a variety of ways to analyze the data.

Yes, SQLite is quite capable of doing simple analyzis of such amounts of
data, especially selecting small subsets based on a simple criteria. However
before trying to do some real work you have to understand the structure of
your data, realize your possible queries and carefully design database
schema (tables and, equally important, indises). Perhaps, putting all data
in a single 600-column table is not a good idea (though allowed
technically), especially if your columns are equal by their physical nature:
it is not so easy to select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel
> dual-proc with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times
more disk space than the raw data. Probably 200GB would not be enough,
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager
> add-on but it would not load the csv files - 'csv worker failed'. So I
> tried Database Master from Nucleon but it failed after loading (it
> took 100
> minutes) ~57,000 rows with error message = 'database or disk is full".
> I tried to create another table in the same db but could not with same
> error message. The DB size shows as 10,000KB (that looks suspiciously
> like a size setting?).

Try bare sqlite shell instead of those external tools. It should take at
least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3
terabytes of disk space, contains more than 10^10 records and still provides
small selects of indexed data in real time.

> 1.       Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find
there. SQLite isn't very good for calculation of complex aggregate
functions, but works fine in simple selecting and sorting.

> 2.       If SQLite will work, are there configuration settings in SQLite
or
> Win7 that will permit the load . or is there a better tool for this
project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and
perhaps CACHE_SIZE to fill most of the available RAM would help a bit.
Also, don't forget to turn off journaling and wrap all in a single
transaction when creating database for the first time.

Valentin Davydov.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
123