I have a folder with more than 150 txt files.
To combine the data
Resulting file has a size of 320 MB. I create a database and insert combined data:
with open("resall.TXT", "r") as f: result = f.readlines() result = [x.split(",") for x in result] import sqlite3 as sq3 con = sq3.connect("popular.db") con.execute( """ CREATE TABLE IF NOT EXISTS popname ( id INTEGER PRIMARY KEY, state TEXT, sex TEXT, year TEXT, forename TEXT, count INTEGER ); """ ) for i, (state, sex, year, forename, count) in enumerate(result): con.execute( """ INSERT INTO popname VALUES (?, ?, ?, ?, ?, ?); """, (i, state, sex, year, forename, count.strip()), ) con.commit()
I could not create the database because it is too large. How to reduce size of the database?
Advertisement
Answer
The problem is not SQLite, it is your import script. SQLite can easily handle that much data, 320 MB is not that large.
The first problem is you’re slurping all 320 MB into memory. Instead, iterate over the lines.
with open("resall.TXT", "r") as f: for line in f: rows = line.split(",") // then insert
When you do your insert, committing after every insert will make it dramatically slower. Every commit forces SQLite to write to disk which is slow.
Instead, commit every 1000 rows or so.
for i, (state, sex, year, forename, count) in enumerate(rows): con.execute( """ INSERT INTO popname VALUES (?, ?, ?, ?, ?, ?); """, (i, state, sex, year, forename, count.strip()), ) if i % 1000 == 0: con.commit()
However, there’s a much faster way to do this. Use SQLite’s own built in importer.
- Add a header line to your file, make sure they match your column names.
- Open your database in the SQLite shell.
- Create your table.
- Import the CSV file(s) into your table:
.import /path/to/your/file.csv popname --csv