When I first published my small
csv-to-sqlite script over three years ago, I didn't really expect anyone to use it. But it seems to have gained a few fans over the years. One thing that gathered a few complaints was its memory usage - one user reported needing 30GB of RAM to import a 3GB .csv file. Yeah, really does sound a bit... excessive. To put it very mildly.
Of course, the main problem here was that I didn't really expect anyone to use it on files that big. And since I'd already been reading the entire file anyway (to determine column typings), I thought I might as well store it in memory and then just dump it into the database.
Well, that is no longer the case. I've done some tests with a ~500MB .csv file - and where it required ~4GB of memory before, it now makes do with just ~15MB. The data are now read and inserted in batches of 10,000 rows at a time.
However, due to some restructuring that was needed to effect this change, a breaking change to the CLI was needed.
Where there were two kinds of type detection before, there are three now.
string. No processing is done.
Other than that, the CLI remains the same. I suspect most users won't even notice the change. Well, except for the fact that you now shouldn't be running out of memory. 🙂