Document management, part 1: Paperless-ngx
Nobody likes paper folders. And although I don't expect having to move ever again, if I do, I don't want to carry boxes of them around. So I set out to finally digitize all of them.
When looking for self-hosted document management software, everybody seems to be using Paperless-ngx nowadays. It's the successor to the discontinued Paperless-ng, which was the successor to the discontinued Paperless. (Wonder what the next reincarnation will be called…)
So naturally I checked out why everybody loved this piece of software. And it does have its advantages:
- Simple automatic OCR of scanned PDFs
- Automatic creation date recognition
- Automatic assignment of correspondents, tags and other metadata based on heuristics
- Full text search
- Arbitrarily complex metadata for every document
- Automatic import of files from a “consume” directory, to which a script puts everything my scanner scans
- An inbox of documents that yet have to be categorized
- Side-by-side view of editable metadata and document for easy entry of metadata
But after using it for a while, I don't consider it the right fit for my needs.
Finding documents
I often find myself wondering “What was the name of that insurance company again?” just to find their latest invoice.
In my previous setup, I just had a folder structure like Insurances/Company Name/YYYY-MM-DD Invoice.pdf
. Having a list of ~10 insurance companies to pick from is much easier than remembering it from scratch (or looking through ~100 correspondents in total).
Now you could associate so called “storage paths” with documents in Paperless-ngx that would cause the PDFs to have this exact same layout on disk. Or you could associate tags with the documents. And you can also automate all that. But it doesn't beat the simplicity and accessibility of nested folders…
When I said “arbitrarily complex metadata”, I referred to the ability to create custom metadata fields. But I don't need any of those! In fact, I don't even need those metadata fields that Paperless-ngx has built-in. Why would I care about the “document type“ of a document? Its title probably says it all! And those storage paths seem inferior to simply folders of files, as mentioned above.
Now of course I could simply ignore those fields. But then I feel like I'm holding it wrong! So I always try to come up with reasonable values at least for the document type, although I never use that field at all.
Access control
Last but definitely not least is access control.
I'm sure there are situations where fine grained access control is useful. For example in a business setting, with departments and everything.
But my instance is to be used by exactly two people: Me wife and me. And we're coming from a digital folder we both have access to and paper folders we both have access to. So I would simply want everything to be available to her as well.
But apparently, that's surprisingly hard. Every document has its own permissions. So does every correspondent. And while I could grant her “Superuser“ rights, that would eliminate the possibility of every having different views of the system - which might come in handy at some point.
Another curiosity: Documents imported from the “consume” directory don't have an owner and can be accessed by everyone!
Conclusion
Paperless-ngx is a great piece of software! But somehow, it doesn't fit my thought model, which seems to be very centered around classical folders.
Also, I'm in the process of simplifying my private IT infrastructure in order for it to be easier to maintain. And introducing another application doesn't quite seem justified.
What I definitely don't want to miss are full text search and automatic import and OCR of scanned documents. But I'll try to implement that in Nextcloud, which I have anyway and will very likely keep. We'll see how that works out, stay tuned…