Making my family tree searchable using cool new Postgres features
I'm slowly adding features to my Universal Personal Search project, which lets me search across my various data sources. Up to this point, there were 2 main ways data could flow into the indexer:
- Either they already lived in my central postgres database, such as anything stored in NocoDB; in this case, I just wrote views that transformed all data into the desired JSON document format (which is then sent to Meilisearch, the search engine I use).
- Or they were directly loaded by an application plugin. (The plugin, currently running in Joplin and Calibre, prepares the data in JSON, and sends it directly to Meilisearch.)
This already made the system very useful to me over the past year. But there are other kinds of data that I want indexed, most of them available either over APIs or file dumps. Things such as bookmarks from Raindrop, or genealogy from webtrees.
In these cases, where I'm fine with only sporadic updates to the data, using an API isn't ideal: every integration would require dedicated code to fetch my data. It's much simpler to download a dump and parse it.
The goal is to have a process where I can upload a file, have it processed automatically, and sent to Meilisearch.
Postgres 17 to the rescue
I like to keep a snapshot of the imported data in a database; so the process should be:
- Upload a file to NocoDb
- The upload triggers processing (via webhook). Here, the file is parsed, and stored in Postgres.
- Postgres views format the data into the target JSON format, same as in case #1 above
- That JSON is sent to Meilisearch. I use n8n to schedule and orchestrate this, so adding a new source here is trivial.
Step #2 here was part of the reason why I vacillated with the implementation - I wanted to keep a snapshot of any data in the DB, so I could re-build the Meilisearch index at any time, without needing to re-process the files.
I've been running on Postgres 13, but upgraded to 17 when I learned it added support for the merge
statement, which lets me do upserts[1] easily. Before that it was still doable,[2] but now it's much nicer.
Basically, for every row, you can write a query saying "merge these values into table X: check columns A and B, and if you get a match, do an update; if you don't, do an insert".
GEDCOM and dotnet
Now that I have all I need to implement any file import, the only remaining piece of the puzzle is the code that will parse the input file and dump it into postgres.
Webtrees lets me export the genealogy data in the standard GEDCOM format. I've found multiple parser libraries, but none that looked both mature and free.
The best one I found is GeneGenie.Gedcom. I had to build it myself, because there are no releases on Github, and the package on Nuget is outdated and useless. It's been a while since last I had to directly reference binaries from the filesystem in dotnet!
The library probably isn't feature-complete yet, but I just want to read people and events, and won't be writing anything. And it can handle that well.
After that, it was just a matter of reading the needed data from GEDCOM, formatting them as JSON documents used by my indexer, and the data started showing up in my search results.
Seems to work like a charm so far. I'll be using the same pattern again.