Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do we need CollatorType.cldrWithoutFFFx? #794

Open
markusicu opened this issue May 1, 2024 · 5 comments
Open

do we need CollatorType.cldrWithoutFFFx? #794

markusicu opened this issue May 1, 2024 · 5 comments
Labels

Comments

@markusicu
Copy link
Member

markusicu commented May 1, 2024

WriteCollationData.getCollator(type) (issue #793 would move this function to class UCA) works with three types, one is cldrWithoutFFFx which builds a CLDR collator except that it leaves U+FFFE and U+FFFF with their DUCET mappings rather than their CLDR tailorings.

Strangely, FractionalUCA.java works with such a collator, even though it writes "SPECIAL MAX/MIN COLLATION ELEMENTS" for these noncharacters, corresponding to the CLDR tailorings.

This type is also used for UCA.Main option testCompatibilityCharacters.

Why? It seems confusing to have this third type, especially to get something different from what we actually output.
Try to remove it and only use either a DUCET collator or a CLDR collator.

If we need and keep this option, then at least consider changing buildCldrCollator(boolean) to buildCldrCollator(enum type) for readability.

@macchiati FYI

@markusicu
Copy link
Member Author

@macchiati do we need the cldrWithoutFFFx option?

@macchiati
Copy link
Member

Hmmm. As I recall, the FFFE and FFFF are to allow users to have minimum and maximum collation elements. As long as we continue to keep those in the CLDR data, I think we are ok.

@markusicu
Copy link
Member Author

Hmmm. As I recall, the FFFE and FFFF are to allow users to have minimum and maximum collation elements. As long as we continue to keep those in the CLDR data, I think we are ok.

Of course we are going to keep them in CLDR. --> https://www.unicode.org/reports/tr35/tr35-collation.html#tailored_noncharacter_weights

It (still) makes sense that we have two choices for collators, but why three? class UCA -->

    public enum CollatorType {
        ducet,
        cldr,
        cldrWithoutFFFx
    }

@macchiati
Copy link
Member

macchiati commented Aug 21, 2024 via email

@markusicu
Copy link
Member Author

Thanks. Setting priority=high because the question is resolved, and it looks like the code change will be easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants