Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lastChunkHandling ("loose", "strict", or "stop-before-partial") to base64 decoding #55360

Open
lemire opened this issue Oct 11, 2024 · 2 comments
Labels
buffer Issues and PRs related to the buffer subsystem. feature request Issues that request new features to be added to Node.js.

Comments

@lemire
Copy link
Member

lemire commented Oct 11, 2024

What is the problem this feature will solve?

The current array buffer JavaScript proposal includes support for lastChunkHandling ("loose", "strict", or "stop-before-partial") in addition to the standard base64 and base64url format.

https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64

Specifically, the following syntax is considered:

 FromBase64 ( string, alphabet, lastChunkHandling [ , maxLength ] )

where alphabet is base64 or base64url and lastChunkHandling is one of the relevant values.

What is the feature you are proposing to solve the problem?

Node buffers might want to implement this feature. Currently, one can do Buffer.from(data, 'base64') and it defaults on 'loose' but Node could add some way for the user to specify how the last chunk is handled. E.g., Buffer.from(data, 'base64+strict').

Note that Node currently relies on simdutf for base64 decoding and as of version 5.6.0, the simdutf library has the necessary support. Thus, the implementation would not be a significant challenge.

What alternatives have you considered?

Quite reasonably, Node could just not add this functionality.

@lemire lemire added buffer Issues and PRs related to the buffer subsystem. feature request Issues that request new features to be added to Node.js. labels Oct 11, 2024
@bakkot
Copy link

bakkot commented Oct 11, 2024

Since the proposal itself mostly just says that the feature exists rather than explaining why, let me give that explanation here:

The point of "strict" is to allow enforcing that the data is canonically encoded (you will also have to enforce absence of whitespace separately, unfortunately). This is useful in some applications where you might want to enforce that there is exactly one string which decodes to a specific sequence of bytes. Getting this right in userland requires manually decoding the last chunk and inspecting the "extra" bits are 0.

The point of "stop-before-partial" is for streaming: if you're getting a stream of base64 characters over the network, possibly intermixed with whitespace, you may want to be able to decode them without first waiting for the whole stream to come in. If there's no whitespace you can cut at any multiple of 4 input characters, but if there's whitespace you have to scan the whole string to figure that out, which can double (or more than double) the time it takes to do the decode. Unfortunately when there's whitespace you really need to use the setFromBase64 version of the API so you can get the number of characters read from the input as part of the return type, so that you can slice off the remainder to prepend to the next chunk.


Anyway, these APIs will be coming to Buffer already because it inherits from Uint8Array, so I don't know that there's all that much advantage to Node supporting them in its own base64 decoders.

@lemire
Copy link
Member Author

lemire commented Oct 13, 2024

Anyway, these APIs will be coming to Buffer already because it inherits from Uint8Array, so I don't know that there's all that much advantage to Node supporting them in its own base64 decoders.

Performance might be a reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. feature request Issues that request new features to be added to Node.js.
Projects
Status: Awaiting Triage
Development

No branches or pull requests

2 participants