Jan-Philipp Litza

Document management, part 3: Scanning documents

I previously always used gscan2pdf for scanning. A great piece of software for technically minded folks, providing tweaks for every option and kind of integrating OCR. But for batch scanning, that workflow was simply too cumbersome. And OCR was automatically done in a later stage anyway, as discussed in the first part of this post series. So I had a look at the alternatives.

When scanning documents, most business devices have a function to send an email somewhere, with the document attached. My decades-old Epson BX635FWD, which I nowadays only use for its duplex scanning capabilities (not printing), doesn't have such a feature. Tt does have network connectivity! (Although its WiFi only supports WPA-Personal, not WPA2, making it incompatible with any WiFi network in 2025 - but it does have RJ45 as well.) But in its webinterface, you can configure almost nothing (except for "AirPrint" and "Google Cloud Print").

But it does support hosting a file share itself! Of course you need to provide it with something to share, so I plugged in a USB thumbdrive. Now, what exactly is a "file share"? Probably SMB/CIFS. But opening the share via either Nautilus/GVfs/Gio/whatever-it's-called-nowadays or smbclient failed with strange error messages. The latter at least was a bit descriptive:

$ smbclient -L //EPSON63556D.fritz.box/
Protocol negotiation to server EPSON63556D.fritz.box (for a protocol between SMB2_02 and SMB3) failed:  NT_STATUS_INVALID_NETWORK_RESPONSE

So there is something there, it just doesn't do what's expected. The important bit is in parenthesis: The printer doesn't speak at least SMB2_02! man smb.conf provides us with the option client min protocol and its possible values, of which I'll just choose the lowest:

$ smbclient -L //EPSON63556D.fritz.box/MEMORYCARD --option='client min protocol=CORE'
Server does not support EXTENDED_SECURITY  but 'client use spnego = yes' and 'client ntlmv2 auth = yes' is set
Anonymous login successful
Password for [WORKGROUP\jplitza]:

	Sharename       Type      Comment
	---------       ----      -------
	IPC$            IPC       IPC Service
	MEMORYCARD      Disk      EPSON
Reconnecting with SMB1 for workgroup listing.
Server does not support EXTENDED_SECURITY  but 'client use spnego = yes' and 'client ntlmv2 auth = yes' is set
Anonymous login successful

	Server               Comment
	---------            -------

	Workgroup            Master
	---------            -------

Hm, so what should I use for login? I didn't configure any user! Also, I should probably change that option it complains about…

smbclient //EPSON63556D.fritz.box/MEMORYCARD --option='client min protocol=CORE' --option='client use spnego=no' -U WORKGROUP/user%pass -c ls
lpcfg_do_global_parameter: WARNING: The "client use spnego" option is deprecated
lpcfg_do_global_parameter: WARNING: The "client use spnego" option is deprecated
  EPSCAN                              D        0  Thu Mar  6 21:22:56 2025

Yay! So it just accepts all credentials. Also, my ability to downgrade the connection smbclient establishes to such a low security level that my scanner supports it might come to an end sooner or later, given that option is deprecated. But for now, it works!

The rest is just a little Bash scripting to fetch all documents from the scanner into the current working directory and delete them on the printer (so they don't lie around unencrypted, and also do save space on the thumbdrive).

#!/usr/bin/env bash

set -o pipefail

SHARE="//EPSON63556D.fritz.box/MEMORYCARD"
TMPFILE="$(mktemp)"

cleanup() {
    rm -f "$TMPFILE"
}
trap cleanup EXIT

smbclient() {
    command smbclient -d0 -U WORKGROUP/user%password "$SHARE" --option="client use spnego=no" --option="client min protocol=CORE" "$@"
}

smbclient -Tcag - | tar -vx --strip-components 3 --backup=numbered > "$TMPFILE"

while read FILENAME; do
    smbclient -c "del $FILENAME"
done < "$TMPFILE"

Why no cifs mount

I specifically wanted a userspace solution for this problem, because the device isn't always powered on. I was able to mount it using the cifs module of the Linux kernel using mount -t cifs //EPSON63556D.fritz.box/MEMORYCARD /mnt/scanner -o vers=1.0,sec=none, but that mount would make the whole system hang when the device was turned off.

Also, I was positively surprised to find out how much smbclient was actually able to do! That --tar option really made the whole task quite easy.

And in Nextcloud?

The above script successfully worked for several months, copying the documents into the "consume" folder of Paperless-ngx. But since I'm thinking about migrating that functionality back into Nextcloud, I wanted to see if I could simply set up anexternal SMB/CIFS storage.

My first attempt played out like this:

Me: *trying to add an external SMB storage
Nextcloud: "This action needs authentication"
Me: "OK, sure thing." *lets password manager fill in password

Nextcloud: "Wrong password"
Me: "Err, what? No, it's definitely correct!" *tries again
Nextcloud: *just hangs there

Me: *opens browser devtools* "Oh look, the response says 'Invalid storage backend "smb"'"

Yeah well… turns out I hit three bugs at once:

  1. The error message was wrong. My password wasn't wrong at all, but the password dialog interprets every non-OK response that way. After some digging, that seems to be a global problem with that dialog in the Nextcloud frontend.
  2. The second password entry didn't even trigger a request.
  3. The smb backend actually should be available, since the server has smbclient installed, which the aforementioned docs say is sufficient.

That last one actually was introduced only recently. You can find the details in the issue I filed.

So after the detour of fixing that bug, I was able to successfully attempt the setup - with the expected problems, since there is no interface to specify custom options for smbclient. So I had to insert those pesky options into the system-wide /etc/samba/smb.conf:

[global]
# lower client security standards for Epson printer to work
client use spnego=no
client min protocol=CORE

Now my idea was to set up a Nextcloud flow to move the files to permanent storage and start OCR. Alas… how would Nextcloud know there were new files on the share?

The answer seems to be to regularly execute occ files_external:scan 2 in a cronjob or systemd timer. Not pretty, but doable.

But another problem arose: How to even move a file in a flow? The number of apps that provide flow actions seems to be quite small, and they are either highly specific integrations or as broad as "call a script" or "send a webhook".

Furthermore, even matching on files in a directory is next to impossible in Nextcloud flows. It involves tagging the directory, letting Nextcloud add another tag to all files inside it and then matching on that tag.

So just like I felt that Paperless-ngx doesn't quite fit my thought model, I'm now under the impression that Nextcloud flows are either completely useless or just not useful for me.