Add lastChunkHandling ("loose", "strict", or "stop-before-partial") to base64 decoding #55360

lemire · 2024-10-11T18:59:27Z

What is the problem this feature will solve?

The current array buffer JavaScript proposal includes support for lastChunkHandling ("loose", "strict", or "stop-before-partial") in addition to the standard base64 and base64url format.

https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64

Specifically, the following syntax is considered:

 FromBase64 ( string, alphabet, lastChunkHandling [ , maxLength ] )

where alphabet is base64 or base64url and lastChunkHandling is one of the relevant values.

What is the feature you are proposing to solve the problem?

Node buffers might want to implement this feature. Currently, one can do Buffer.from(data, 'base64') and it defaults on 'loose' but Node could add some way for the user to specify how the last chunk is handled. E.g., Buffer.from(data, 'base64+strict').

Note that Node currently relies on simdutf for base64 decoding and as of version 5.6.0, the simdutf library has the necessary support. Thus, the implementation would not be a significant challenge.

What alternatives have you considered?

Quite reasonably, Node could just not add this functionality.

The text was updated successfully, but these errors were encountered:

bakkot · 2024-10-11T20:24:43Z

Since the proposal itself mostly just says that the feature exists rather than explaining why, let me give that explanation here:

The point of "strict" is to allow enforcing that the data is canonically encoded (you will also have to enforce absence of whitespace separately, unfortunately). This is useful in some applications where you might want to enforce that there is exactly one string which decodes to a specific sequence of bytes. Getting this right in userland requires manually decoding the last chunk and inspecting the "extra" bits are 0.

The point of "stop-before-partial" is for streaming: if you're getting a stream of base64 characters over the network, possibly intermixed with whitespace, you may want to be able to decode them without first waiting for the whole stream to come in. If there's no whitespace you can cut at any multiple of 4 input characters, but if there's whitespace you have to scan the whole string to figure that out, which can double (or more than double) the time it takes to do the decode. Unfortunately when there's whitespace you really need to use the setFromBase64 version of the API so you can get the number of characters read from the input as part of the return type, so that you can slice off the remainder to prepend to the next chunk.

Anyway, these APIs will be coming to Buffer already because it inherits from Uint8Array, so I don't know that there's all that much advantage to Node supporting them in its own base64 decoders.

lemire · 2024-10-13T15:43:54Z

Anyway, these APIs will be coming to Buffer already because it inherits from Uint8Array, so I don't know that there's all that much advantage to Node supporting them in its own base64 decoders.

Performance might be a reason.

lemire added buffer Issues and PRs related to the buffer subsystem. feature request Issues that request new features to be added to Node.js. labels Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lastChunkHandling ("loose", "strict", or "stop-before-partial") to base64 decoding #55360

Add lastChunkHandling ("loose", "strict", or "stop-before-partial") to base64 decoding #55360

lemire commented Oct 11, 2024

bakkot commented Oct 11, 2024

lemire commented Oct 13, 2024

Add lastChunkHandling ("loose", "strict", or "stop-before-partial") to base64 decoding #55360

Add lastChunkHandling ("loose", "strict", or "stop-before-partial") to base64 decoding #55360

Comments

lemire commented Oct 11, 2024

What is the problem this feature will solve?

What is the feature you are proposing to solve the problem?

What alternatives have you considered?

bakkot commented Oct 11, 2024

lemire commented Oct 13, 2024