SQLite
SQLite 101
- Meant to be an embeded DB for apps.
- If App has lots of fopen(), SQLite would be a good replacement candidate.
- Not good for write centric app, especially needing lot of consistency from multiple client write.
- Good as local cache of a client/server DB
- It is still RDBMS.
-
- SQLite comes embeded in a number of programming env, eg Python, Perl.
- Also have RPM that provides a stand alone SQL interface to manipulate the DB.
-
- Where to use SQLite
Setting up the program on RHEL
Install the packages
(sys admin work, require root priv)
sudo yum install sqlite # this actually install sqlite3. the older version rpm is called sqlite2
Initialize an example DB
SQLite is meant to be an embeded db, so many thing run as a user, and db is installed into the current dir of the user.
sqlite3 mytest.db # this only specify to work with a new file as db, like calling vi myfile.txt
.help
create table tb1 ( # issue command to create table
f1 varchar(30) primary key,
f2 text
);
^D # end of file to terminate. it saves and exit.
Example Script for Taxonomy DB setup
-- this filename: import_taxo.sql
-- sqlite3 db creation and loading script
-- run as
-- sqlite3 ncbi-taxo.db < import_taxo.sql
-- or
-- .read import_taxo.sql
-- essentially, commands that can be typed inside the sqlite shell can be used in the script as-is.
create table gi_taxid(gi integer PRIMARY KEY, taxid integer);
.mode list
.separator \t
pragma temp_store = 2;
.import gi_taxid_prot.dmp gi_taxid
.import gi_taxid_nucl.dmp gi_taxid
CREATE UNIQUE INDEX gi_idx_on_gi_taxid ON gi_taxid(gi);
CREATE INDEX taxid_idx_on_gi_taxid ON gi_taxid(taxid);
-- UNIQUE keyword in index means duplicates in that column will result in error.
.schema -- show how tables are created
.indices -- show index(s)
# other possible load options...
.header on -- maybe this only affect export to csv to add header row?
.mode csv
-- see http://www.sqlite.org/cli.html for more details.
-- .import don't seems to have any options for specifying how to handle quotes, escape chars, etc.
SQL queries using sqlite
Example from a Taxonomy DB
# ref http://dgg32.blogspot.com/2013/07/map-ncbi-taxonomy-and-gi-into-sqlite.html
$ sqlite3 ncbi-taxo.db # call sqlite, giving it a file where the database is.
.tables # show a list of tables in the DB
.help # list sqlite commands
.mode list
SELECT taxid FROM tree WHERE name = "Proteobacteria";
SELECT name FROM tree WHERE taxid = '2';
SELECT parent FROM tree WHERE taxid = '976';
SELECT taxid FROM tree WHERE parent = '976';
SELECT gi FROM gi_taxid WHERE taxid = '2';
Ref:
https://www.sqlite.org/cli.html
Performance, Benchmark SQLite
A taxonomy db with file size of 33GB, including index, provided essentially instantataneous answer to simple SELECT queries like the one in the examples above.
A python program querying the db with 20 simultaneous thread that retrieve 100+ elements returned in 1-2 seconds.
Python program querying the DB 46k times took about 15 min (includes other sorting, processing in a taxonomy reporter tool), so about 50 queries per second.
Bulk loading using .import seems to be a lot faster than running python code that use cursor to INSERT INTO TABLE one row at a time. The fact that I was doing this inside a try/catch block may have added to the delay. Trying to find which record that would result in Primary Key violation (table admitedly changed a bit as well, more digging TBA, as current performance is about 700 GB per 8 hours, long way to go for a DB of 89GB, but there were index there)....
Datatypes
Datatypes in SQLite3 include the usual suspects. They are listed in
https://www.sqlite.org/datatype3.html"
- TEXT is used to store stings.
- VARCHAR is supported. VARCHAR(10) does NOT limit string to 10 chars, and sqlite allegedly will store without trucating anything, even if 50M chars... see
https://www.sqlite.org/faq.html#q9
- CLOB ... not sure if there is a point to use this if VARCHAR can store arbitrary length. perhaps data manipulation is different...
Troubleshooting
If get an error message like:
Error: database or disk is full
This isn't really out of space for the database or its file, but temp space!
pragma temp_store; # display what's the value set for temp_store. 0 is the default
pragma temp_store = 2; # set to 2, ie use RAM
# 1 = use file? need to define temp_store_directory = '/some/place'
# is /tmp not the default?
ref: http://stackoverflow.com/questions/5274202/sqlite3-database-or-disk-is-full-the-database-disk-image-is-malformed#5275022
Export
To export a specific table into a comma delimited file:
.mode csv
-- use '.separator SOME_STRING' for something other than a comma.
.headers on
.out file.dmp
select * from MyTable;
ref: http://stackoverflow.com/questions/75675/how-do-i-dump-the-data-of-some-sqlite3-tables
Secure SQLite Installation
LOL =)
File Formats
DB Dump
Ref
SQLite3 overview
Tools
hoti1
bofh1