Is the checksum used in BIP39 mnemonic sentences too short?

In the BIP39 specification, you add 1 bit of checksum for every 32 bits of entropy you generate.
Therefore, with a common entropy size of 128 bits, you are only adding 4 bits of checksum. This means that if you were to write down the words in the mnemonic sentence in the wrong order (or write down an incorrect word from the wordlist), there is a 1 in 16 (0b1111
) chance that a tool for validating your mnemonic will tell you that you have written down a valid mnemonic.
Is this too short to make checksums for mnemonic sentences reliable?
Alternatively, you could instead add 4 bits of checksum for every 32 bits, which would mean a mnemonic from 128 bits of entropy will have a 1 in 65535 (1 in 0b1111111111111111
) chance of being valid if written down incorrectly. This would require a wordlist of 4096 words (12 bits per word) to accommodate the checksum and keep the mnemonic sentence the same length, but it seems that it would be much more reliable.
Is there a reason why BIP39 chose to use such small checksums?