from What is SQLite?
SQLite is a software library that provides a relational database management system. The lite in SQLite means light weight in terms of setup, database administration, and required resource.
SQLite has the following noticeable features: self-contained, serverless, zero-configuration, transactional. Serverless
Normally, an RDBMS such as MySQL, PostgreSQL, etc., requires a separate server process to operate. The applications that want to access the database server use TCP/IP protocol to send and receive requests. This is called client/server architecture.
SQLite does NOT work this way. SQLite does NOT require a server to run. SQLite database is integrated with the application that accesses the database. The applications interact with the SQLite database read and write directly from the database files stored on disk.
The following diagram illustrates the SQLite server-less architecture:
SQLite has three important features:
Following Python code shows how to connect to an existing database. If the database does not exist, then it will be created and finally a database object will be returned.
Te database of the examples has been dowloaded from SQLite sample database
The following database diagram illustrates the chinook database tables and their relationships:
#!/usr/bin/python
import sqlite3
sqlite_conn = sqlite3.connect('./db/chinook.db')
print("Opened database successfully")
Opened database successfully
We show the tables in database
cur = sqlite_conn.cursor()
cur.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
available_table = (cur.fetchall())
for table in available_table:
print(table)
('albums',) ('artists',) ('customers',) ('employees',) ('genres',) ('invoice_items',) ('invoices',) ('media_types',) ('playlist_track',) ('playlists',) ('sqlite_sequence',) ('sqlite_stat1',) ('tracks',)
We show the column names in table
cur.execute("PRAGMA table_info(tracks)")
rows = cur.fetchall()
for row in rows:
print(row)
(0, 'TrackId', 'INTEGER', 1, None, 1) (1, 'Name', 'NVARCHAR(200)', 1, None, 0) (2, 'AlbumId', 'INTEGER', 0, None, 0) (3, 'MediaTypeId', 'INTEGER', 1, None, 0) (4, 'GenreId', 'INTEGER', 0, None, 0) (5, 'Composer', 'NVARCHAR(220)', 0, None, 0) (6, 'Milliseconds', 'INTEGER', 1, None, 0) (7, 'Bytes', 'INTEGER', 0, None, 0) (8, 'UnitPrice', 'NUMERIC(10,2)', 1, None, 0)
Others examples with PRAGMA statements are:
cur.execute("PRAGMA page_count")
print(cur.fetchall())
cur.execute("PRAGMA function_list")
print(cur.fetchall())
cur.execute("PRAGMA database_list")
print(cur.fetchall())
[(864,)] [] [(0, 'main', 'C:\\Users\\Ibon.DOMINIO\\Documents\\github\\miscellaneous\\SQLite&python\\db\\chinook.db')]
The SELECT statement is used to select data from a database.
The data returned is stored in a result table, called the result-set.
cur = sqlite_conn.execute("SELECT * FROM tracks LIMIT 5;")
rows = cur.fetchall()
for row in rows:
print(row)
(1, 'For Those About To Rock (We Salute You)', 1, 1, 1, 'Angus Young, Malcolm Young, Brian Johnson', 343719, 11170334, 0.99) (2, 'Balls to the Wall', 2, 2, 1, None, 342562, 5510424, 0.99) (3, 'Fast As a Shark', 3, 2, 1, 'F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman', 230619, 3990994, 0.99) (4, 'Restless and Wild', 3, 2, 1, 'F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman', 252051, 4331779, 0.99) (5, 'Princess of the Dawn', 3, 2, 1, 'Deaffy & R.A. Smith-Diesel', 375418, 6290521, 0.99)
cur = sqlite_conn.execute("SELECT * FROM tracks LIMIT 5;")
row = cur.fetchone()
print(row)
row = cur.fetchone()
print(row)
row = cur.fetchone()
print(row)
row = cur.fetchone()
print(row)
row = cur.fetchone()
print(row)
(1, 'For Those About To Rock (We Salute You)', 1, 1, 1, 'Angus Young, Malcolm Young, Brian Johnson', 343719, 11170334, 0.99) (2, 'Balls to the Wall', 2, 2, 1, None, 342562, 5510424, 0.99) (3, 'Fast As a Shark', 3, 2, 1, 'F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman', 230619, 3990994, 0.99) (4, 'Restless and Wild', 3, 2, 1, 'F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman', 252051, 4331779, 0.99) (5, 'Princess of the Dawn', 3, 2, 1, 'Deaffy & R.A. Smith-Diesel', 375418, 6290521, 0.99)
There are three fetch
functions in python with SQLite:
fetchone(): This method fetches the next row of a query result set, returning a single sequence, or None when no more data is available.
fetchmany([size = cursor.arraysize]): This routine fetches the next set of rows of a query result, returning a list. An empty list is returned when no more rows are available. The method tries to fetch as many rows as indicated by the size parameter.
fetchall(): This routine fetches all (remaining) rows of a query result, returning a list. An empty list is returned when no rows are available.
cur = sqlite_conn.execute("SELECT * FROM tracks LIMIT 5;")
rows = cur.fetchmany(2)
print("fetchmany 2 ----------------")
for row in rows:
print(row)
row = cur.fetchone()
print("fetchone ----------------")
print(row)
print("fetchall ----------------")
rows = cur.fetchall()
for row in rows:
print(row)
fetchmany 2 ---------------- (1, 'For Those About To Rock (We Salute You)', 1, 1, 1, 'Angus Young, Malcolm Young, Brian Johnson', 343719, 11170334, 0.99) (2, 'Balls to the Wall', 2, 2, 1, None, 342562, 5510424, 0.99) fetchone ---------------- (3, 'Fast As a Shark', 3, 2, 1, 'F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman', 230619, 3990994, 0.99) fetchall ---------------- (4, 'Restless and Wild', 3, 2, 1, 'F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman', 252051, 4331779, 0.99) (5, 'Princess of the Dawn', 3, 2, 1, 'Deaffy & R.A. Smith-Diesel', 375418, 6290521, 0.99)
cur = sqlite_conn.execute("""
--====
SELECT Name, Composer, Milliseconds
FROM tracks
WHERE Milliseconds >= ?
AND
Milliseconds <= ?
ORDER BY Milliseconds;
--====""", (200000, 201000))
rows = cur.fetchall()
for row in rows:
print(row)
('Alabama Song', 'Weill-Brecht', 200097) ('Purgatory', 'Steve Harris', 200150) ('F**k Me Pumps', 'Salaam Remi', 200253) ('Why Go', 'Jeff Ament', 200254) ('Ice Cream Man', 'John Brim', 200306) ('Summertime', 'Miles Davis', 200437) ('Coroné Antonio Bento', None, 200437) ('Último Pau-De-Arara', 'Corumbá/José Gumarães/Venancio', 200437) ('The Wind Cries Mary', 'Jimi Hendrix', 200463) ('Over Again', 'Tim Maia', 200489) ('Rock Bottom', 'Paul Stanley, Ace Frehley', 200594) ('All My Love', 'E. Schrody/L. Dimant', 200620) ('Question!', 'Tankian, Serj', 200698) ('Posso Até Me Apaixonar', 'Dudu Nobre', 200698) ('Cold Day In The Sun', 'Dave Grohl, Taylor Hawkins, Nate Mendel, Chris Shiflett/FOO FIGHTERS', 200724) ('Too Fast For Love', 'Nikki Sixx', 200829) ('O Cidadão Do Mundo', 'Chico Science', 200933)
References:
yotube
SQL for Data Science #2: SQLite+pandas to analyze 10 million NYC citibike records
SQL for Data Science #3a: SQLite+pandas to analyze 10 million NYC citibike records
SQL for Data Science #3b: SQLite+pandas to analyze 10 million NYC citibike records
Python Pandas connect directly to SQLite, Oracle, IBM Db2, MS SQL Server, PostgreSQL, MySQL
import pandas as pd
df = pd.read_sql_query(sql = """
--====
SELECT *
FROM tracks
LIMIT 5;
--====""", con = sqlite_conn)
df
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 |
1 | 2 | Balls to the Wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 |
2 | 3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Ho... | 230619 | 3990994 | 0.99 |
3 | 4 | Restless and Wild | 3 | 2 | 1 | F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D... | 252051 | 4331779 | 0.99 |
4 | 5 | Princess of the Dawn | 3 | 2 | 1 | Deaffy & R.A. Smith-Diesel | 375418 | 6290521 | 0.99 |
df = pd.read_sql_query(sql = """
--====
SELECT *
FROM tracks
LIMIT %d;
--====""" % (10), con = sqlite_conn)
df
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 |
1 | 2 | Balls to the Wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 |
2 | 3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Ho... | 230619 | 3990994 | 0.99 |
3 | 4 | Restless and Wild | 3 | 2 | 1 | F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D... | 252051 | 4331779 | 0.99 |
4 | 5 | Princess of the Dawn | 3 | 2 | 1 | Deaffy & R.A. Smith-Diesel | 375418 | 6290521 | 0.99 |
5 | 6 | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 |
6 | 7 | Let's Get It Up | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 233926 | 7636561 | 0.99 |
7 | 8 | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 |
8 | 9 | Snowballed | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 203102 | 6599424 | 0.99 |
9 | 10 | Evil Walks | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263497 | 8611245 | 0.99 |
df = pd.read_sql_query(sql = """
SELECT Name, Composer, Milliseconds
FROM tracks
WHERE Milliseconds >= %d
AND
Milliseconds <= %d
ORDER BY Milliseconds;
--====""" % (200000, 201000), con = sqlite_conn)
df
Name | Composer | Milliseconds | |
---|---|---|---|
0 | Alabama Song | Weill-Brecht | 200097 |
1 | Purgatory | Steve Harris | 200150 |
2 | F**k Me Pumps | Salaam Remi | 200253 |
3 | Why Go | Jeff Ament | 200254 |
4 | Ice Cream Man | John Brim | 200306 |
5 | Summertime | Miles Davis | 200437 |
6 | Coroné Antonio Bento | None | 200437 |
7 | Último Pau-De-Arara | Corumbá/José Gumarães/Venancio | 200437 |
8 | The Wind Cries Mary | Jimi Hendrix | 200463 |
9 | Over Again | Tim Maia | 200489 |
10 | Rock Bottom | Paul Stanley, Ace Frehley | 200594 |
11 | All My Love | E. Schrody/L. Dimant | 200620 |
12 | Question! | Tankian, Serj | 200698 |
13 | Posso Até Me Apaixonar | Dudu Nobre | 200698 |
14 | Cold Day In The Sun | Dave Grohl, Taylor Hawkins, Nate Mendel, Chris... | 200724 |
15 | Too Fast For Love | Nikki Sixx | 200829 |
16 | O Cidadão Do Mundo | Chico Science | 200933 |
A JOIN clause is used to combine rows from two or more tables, based on a related column between them.
Different Types of SQL JOINs
Here are the different types of the JOINs in SQL:
FULL (OUTER) JOIN: Return all records when there is a match in either left or right table
In the following sentence we will join three tables: tracks, albums and artists. The result will be a table with the name of the artis/group, the title of the album and the name of the track. We will sort teh results by artis and album.
cur = sqlite_conn.execute("""
--==== ====================================================
--==== INICIO DE LA QUERY
SELECT artists.Name, albums.Title, tracks.Name
FROM tracks
LEFT JOIN albums ON albums.Albumid = tracks.Albumid
LEFT JOIN artists ON albums.Artistid = artists.Artistid
ORDER BY artists.Name, albums.Title ASC;
--==== FIN DE LA QUERY
--==== ====================================================
""")
rows = cur.fetchmany(20) # Only the 20 first results
for row in rows:
print(row)
rows = cur.fetchall()
('AC/DC', 'For Those About To Rock We Salute You', 'For Those About To Rock (We Salute You)') ('AC/DC', 'For Those About To Rock We Salute You', 'Put The Finger On You') ('AC/DC', 'For Those About To Rock We Salute You', "Let's Get It Up") ('AC/DC', 'For Those About To Rock We Salute You', 'Inject The Venom') ('AC/DC', 'For Those About To Rock We Salute You', 'Snowballed') ('AC/DC', 'For Those About To Rock We Salute You', 'Evil Walks') ('AC/DC', 'For Those About To Rock We Salute You', 'C.O.D.') ('AC/DC', 'For Those About To Rock We Salute You', 'Breaking The Rules') ('AC/DC', 'For Those About To Rock We Salute You', 'Night Of The Long Knives') ('AC/DC', 'For Those About To Rock We Salute You', 'Spellbound') ('AC/DC', 'Let There Be Rock', 'Go Down') ('AC/DC', 'Let There Be Rock', 'Dog Eat Dog') ('AC/DC', 'Let There Be Rock', 'Let There Be Rock') ('AC/DC', 'Let There Be Rock', 'Bad Boy Boogie') ('AC/DC', 'Let There Be Rock', 'Problem Child') ('AC/DC', 'Let There Be Rock', 'Overdose') ('AC/DC', 'Let There Be Rock', "Hell Ain't A Bad Place To Be") ('AC/DC', 'Let There Be Rock', 'Whole Lotta Rosie') ('Aaron Copland & London Symphony Orchestra', 'A Copland Celebration, Vol. I', 'Fanfare for the Common Man') ('Aaron Goldberg', 'Worlds', "OAM's Blues")
A example with analysis. We will calculate the time average by album in milliseconds.
cur = sqlite_conn.execute("""
--==== ====================================================
--==== INICIO DE LA QUERY
SELECT artists.Name, albums.Title, tracks.Name, ROUND(AVG(tracks.Milliseconds),0)
FROM tracks -- TABLA PRINCIPAL
LEFT JOIN albums ON albums.Albumid = tracks.Albumid
LEFT JOIN artists ON albums.Artistid = artists.Artistid
GROUP BY artists.Name, albums.Title
ORDER BY artists.Name, albums.Title ASC;
--==== FIN DE LA QUERY
--==== ====================================================
""")
rows = cur.fetchmany(10)
for row in rows:
print(row)
rows = cur.fetchall()
('AC/DC', 'For Those About To Rock We Salute You', 'Spellbound', 240042.0) ('AC/DC', 'Let There Be Rock', 'Whole Lotta Rosie', 306657.0) ('Aaron Copland & London Symphony Orchestra', 'A Copland Celebration, Vol. I', 'Fanfare for the Common Man', 198064.0) ('Aaron Goldberg', 'Worlds', "OAM's Blues", 266936.0) ('Academy of St. Martin in the Fields & Sir Neville Marriner', 'The World of Classical Favourites', 'Fantasia On Greensleeves', 232601.0) ('Academy of St. Martin in the Fields Chamber Ensemble & Sir Neville Marriner', 'Sir Neville Marriner: A Celebration', '"Eine Kleine Nachtmusik" Serenade In G, K. 525: I. Allegro', 348971.0) ('Academy of St. Martin in the Fields, John Birch, Sir Neville Marriner & Sylvia McNair', 'Fauré: Requiem, Ravel: Pavane & Others', 'Requiem, Op.48: 4. Pie Jesu', 258924.0) ('Academy of St. Martin in the Fields, Sir Neville Marriner & Thurston Dart', 'Bach: Orchestral Suites Nos. 1 - 4', 'Suite No. 3 in D, BWV 1068: III. Gavotte I & II', 225933.0) ('Accept', 'Balls to the Wall', 'Balls to the Wall', 342562.0) ('Accept', 'Restless and Wild', 'Princess of the Dawn', 286029.0)
We will calculate the number of tracks in each album.
cur = sqlite_conn.execute("""
--==== ====================================================
--==== INICIO DE LA QUERY
SELECT artists.Name, albums.Title, COUNT(tracks.Name) AS N
FROM tracks
LEFT JOIN albums ON albums.Albumid = tracks.Albumid
LEFT JOIN artists ON albums.Artistid = artists.Artistid
GROUP BY artists.Name, albums.Title
ORDER BY N DESC;
--==== FIN DE LA QUERY
--==== ====================================================
""")
rows = cur.fetchmany(10)
for row in rows:
print(row)
rows = cur.fetchall()
('Lenny Kravitz', 'Greatest Hits', 57) ('Chico Buarque', 'Minha Historia', 34) ('Eric Clapton', 'Unplugged', 30) ('Lost', 'Lost, Season 3', 26) ('Lost', 'Lost, Season 1', 25) ('The Office', 'The Office, Season 3', 25) ('Battlestar Galactica (Classic)', 'Battlestar Galactica (Classic), Season 1', 24) ('Frank Sinatra', 'My Way: The Best Of Frank Sinatra [Disc 1]', 24) ('Lost', 'Lost, Season 2', 24) ('Chico Science & Nação Zumbi', 'Afrociberdelia', 23)
In the previous examples we have seen SQL codes very large, with comments and several analysis. If is a code that will be used in several examples or analysis, then we will be more efficient if we will save the SQL code in a file.
If the SQL statement is very large then is a better way to save the SQL in a file *.sql and load it in the read_sql_query
. In the file example.sql is written the following SQL code.
--====
SELECT Name, Composer, Milliseconds
FROM tracks
WHERE Milliseconds >= 200000
AND
Milliseconds <= 201000
ORDER BY Milliseconds;
--====
f = open("example.sql")
sql_statement = f.read()
f.close()
print(sql_statement)
--==== SELECT Name, Composer, Milliseconds FROM tracks WHERE Milliseconds >= 200000 AND Milliseconds <= 201000 ORDER BY Milliseconds; --====
df = pd.read_sql_query(sql = sql_statement,
con = sqlite_conn)
df
Name | Composer | Milliseconds | |
---|---|---|---|
0 | Alabama Song | Weill-Brecht | 200097 |
1 | Purgatory | Steve Harris | 200150 |
2 | F**k Me Pumps | Salaam Remi | 200253 |
3 | Why Go | Jeff Ament | 200254 |
4 | Ice Cream Man | John Brim | 200306 |
5 | Summertime | Miles Davis | 200437 |
6 | Coroné Antonio Bento | None | 200437 |
7 | Último Pau-De-Arara | Corumbá/José Gumarães/Venancio | 200437 |
8 | The Wind Cries Mary | Jimi Hendrix | 200463 |
9 | Over Again | Tim Maia | 200489 |
10 | Rock Bottom | Paul Stanley, Ace Frehley | 200594 |
11 | All My Love | E. Schrody/L. Dimant | 200620 |
12 | Question! | Tankian, Serj | 200698 |
13 | Posso Até Me Apaixonar | Dudu Nobre | 200698 |
14 | Cold Day In The Sun | Dave Grohl, Taylor Hawkins, Nate Mendel, Chris... | 200724 |
15 | Too Fast For Love | Nikki Sixx | 200829 |
16 | O Cidadão Do Mundo | Chico Science | 200933 |
If the SQL statement is very large then is a better way to save the SQL in a file *.sql and load it in the read_sql_query
. In the file example2.sql is written the following SQL code.
--====
SELECT Name, Composer, Milliseconds
FROM %s
WHERE Milliseconds >= %d
AND
Milliseconds <= %d
ORDER BY Milliseconds;
--====
We can see the %s (string) and %d (integer) insertions. These parameters allow the reutilitation of the SQL sentence.
f = open("example2.sql")
sql_statement = f.read()
f.close()
df = pd.read_sql_query(sql = sql_statement % ('tracks', 200000, 200500),
con = sqlite_conn)
df
Name | Composer | Milliseconds | |
---|---|---|---|
0 | Alabama Song | Weill-Brecht | 200097 |
1 | Purgatory | Steve Harris | 200150 |
2 | F**k Me Pumps | Salaam Remi | 200253 |
3 | Why Go | Jeff Ament | 200254 |
4 | Ice Cream Man | John Brim | 200306 |
5 | Summertime | Miles Davis | 200437 |
6 | Coroné Antonio Bento | None | 200437 |
7 | Último Pau-De-Arara | Corumbá/José Gumarães/Venancio | 200437 |
8 | The Wind Cries Mary | Jimi Hendrix | 200463 |
9 | Over Again | Tim Maia | 200489 |
df = pd.read_sql_query(sql = sql_statement % ('tracks', 200500, 201000),
con = sqlite_conn)
df
Name | Composer | Milliseconds | |
---|---|---|---|
0 | Rock Bottom | Paul Stanley, Ace Frehley | 200594 |
1 | All My Love | E. Schrody/L. Dimant | 200620 |
2 | Question! | Tankian, Serj | 200698 |
3 | Posso Até Me Apaixonar | Dudu Nobre | 200698 |
4 | Cold Day In The Sun | Dave Grohl, Taylor Hawkins, Nate Mendel, Chris... | 200724 |
5 | Too Fast For Love | Nikki Sixx | 200829 |
6 | O Cidadão Do Mundo | Chico Science | 200933 |
We close the conexion with database with the following command:
sqlite_conn.close()