When I first published my small csv-to-sqlite
script over three years ago, I didn't really expect anyone to use it. But it seems to have gained a few fans over the years. One thing that gathered a few complaints was its memory usage - one user reported needing 30GB of RAM to import a 3GB .csv file. Yeah, really does sound a bit... excessive. To put it very mildly.
Of course, the main problem here was that I didn't really expect anyone to use it on files that big. And since I'd already been reading the entire file anyway (to determine column typings), I thought I might as well store it in memory and then just dump it into the database.
Well, that is no longer the case. I've done some tests with a ~500MB .csv file - and where it required ~4GB of memory before, it now makes do with just ~15MB. The data are now read and inserted in batches of 10,000 rows at a time.
However, due to some restructuring that was needed to effect this change, a breaking change to the CLI was needed.
Where there were two kinds of type detection before, there are three now.
string
. No processing is done. Other than that, the CLI remains the same. I suspect most users won't even notice the change. Well, except for the fact that you now shouldn't be running out of memory. 🙂