Back to API Reference
Class

UnicodeData

static

Provides access to Unicode character properties and the global data provider.

Remarks

Contains common Unicode codepoint constants (whitespace, control characters, BiDi marks) and helper methods for character classification. The static Provider property gives access to the full Unicode property database for script, line break, and grapheme cluster information.
public static bool IsInitialized{ get }
public static bool IsLineBreak()
public static bool IsMandatoryBreakChar()

Returns true if the codepoint is a line or paragraph separator that produces a mandatory break (UAX #14 classes BK, CR, LF, NL) and should not be shaped.

public static bool IsRegionalIndicator()
public static bool IsKeycapBase()
public static bool IsEmojiModifier()
public static bool IsTagSequenceCodepoint()
public static bool IsInCommonEmojiRange()
public static int GetSimpleUppercase()

Returns the simple uppercase mapping for a codepoint, falling back to the codepoint itself when no mapping is defined. Backed by UnicodeData.txt so behavior is identical across Mono/IL2CPP/standard.NET — unlike char.ToUpperInvariant, which has gaps for codepoints such as Greek final sigma U+03C2.

public static int GetSimpleLowercase()

Returns the simple lowercase mapping for a codepoint, falling back to the codepoint itself when no mapping is defined. See GetSimpleUppercase for the rationale.

public static int GetSimpleTitlecase()

Returns the simple titlecase mapping for a codepoint, falling back to the codepoint itself when no mapping is defined. Differs from uppercase only for digraph letters such as U+01C5 (Dž) — uppercase DŽ, titlecase Dž, lowercase dž.

public static GeneralCategory GetGeneralCategory()

Returns the Unicode General Category of a codepoint. Useful for filtering codepoints in custom modifiers — apply only to letters (Lu/Ll/Lt/Lm/Lo), skip combining marks (Mn/Mc/Me), select punctuation (Pc/Pd/Ps/Pe/Pi/Pf/Po), and so on.

public static UnicodeScript GetScript()

Returns the Unicode Script of a codepoint (UAX #24). Useful for script-conditional modifiers — for example applying a stylistic effect only to Devanagari or only to Han. Values Common and Inherited are shared across scripts (punctuation, combining marks).

public static bool IsExtendedPictographic()

Returns when the codepoint has the Extended_Pictographic property (UTS #51). Distinct from emoji presentation: pictographic glyphs that may render either as text or as emoji depending on context.

public static bool IsEmojiPresentation()

Returns when the codepoint defaults to emoji-style presentation (UTS #51 Emoji_Presentation). Use this to skip emoji glyphs in text-only effects (color, gradient, outline) without relying on the live mesh-pass font.IsColor flag.

public static bool IsEmojiModifierBase()

Returns when the codepoint is an emoji that accepts a skin-tone modifier (U+1F3FB..U+1F3FF) immediately after it (UTS #51 Emoji_Modifier_Base).

public static bool IsDefaultIgnorable()

Returns when the codepoint has the Default_Ignorable_Code_Point property (formatting characters, variation selectors, ZWJ/ZWNJ, etc.). Custom modifiers that walk codepoints to compute statistics or apply effects should typically skip these.

public static void EnsureInitialized()
public static const int Tab
public static const int LineFeed
public static const int VerticalTab
public static const int FormFeed
public static const int CarriageReturn
public static const int Space
public static const int Hyphen
public static const int NextLine
public static const int NoBreakSpace
public static const int SoftHyphen
public static const int NonBreakingHyphen
public static const int ZeroWidthSpace
public static const int ZeroWidthNonJoiner
public static const int ZeroWidthJoiner
public static const int WordJoiner
public static const int LeftToRightMark
public static const int RightToLeftMark
public static const int ArabicLetterMark
public static const int LineSeparator
public static const int ParagraphSeparator
public static const int LatinCapitalA
public static const int HebrewAlef
public static const int PlusSign
public static const int DollarSign
public static const int ArabicIndicDigitZero
public static const int Comma
public static const int CombiningGraveAccent
public static const int ExclamationMark
public static const int LeftToRightEmbedding
public static const int RightToLeftEmbedding
public static const int PopDirectionalFormat
public static const int LeftToRightOverride
public static const int RightToLeftOverride
public static const int LeftToRightIsolate
public static const int RightToLeftIsolate
public static const int FirstStrongIsolate
public static const int PopDirectionalIsolate
public static const int LeftParenthesis
public static const int RightParenthesis
public static const int LeftPointingAngleBracket
public static const int RightPointingAngleBracket
public static const int LeftAngleBracket
public static const int RightAngleBracket
public static const int ArabicLam
public static const int ArabicAlefMaddaAbove
public static const int ArabicAlefHamzaAbove
public static const int ArabicAlefHamzaBelow
public static const int ArabicAlef
public static const int ArabicLigatureLamAlefMaddaIsolated
public static const int ArabicLigatureLamAlefMaddaFinal
public static const int ArabicLigatureLamAlefHamzaAboveIsolated
public static const int ArabicLigatureLamAlefHamzaAboveFinal
public static const int ArabicLigatureLamAlefHamzaBelowIsolated
public static const int ArabicLigatureLamAlefHamzaBelowFinal
public static const int ArabicLigatureLamAlefIsolated
public static const int ArabicLigatureLamAlefFinal
public static const int ReplacementCharacter
public static const int DottedCircle
public static const int ArabicBlockStart
public static const int ArabicBlockEnd
public static const int ArabicSupplementStart
public static const int ArabicSupplementEnd
public static const int ArabicExtendedAStart
public static const int ArabicExtendedAEnd
public static const int ArabicPresentationFormsAStart
public static const int ArabicPresentationFormsAEnd
public static const int ArabicPresentationFormsBStart
public static const int ArabicPresentationFormsBEnd
public static const int MaxBmp
public static const int VariationSelector15
public static const int VariationSelector16
public static const int CombiningEnclosingKeycap
public static const int CombiningEnclosingCircleBackslash
public static const int RegionalIndicatorStart
public static const int RegionalIndicatorEnd
public static const int EmojiModifierStart
public static const int EmojiModifierEnd
public static const int TagSequenceStart
public static const int TagSequenceEnd
public static const int CancelTag
public static const int BlackFlagEmoji
public static const int NumberSign
public static const int Asterisk
public static const int DigitZero
public static const int DigitNine
public static const int EmojiRangeThreshold
public static const int CommonEmojiRangeStart
public static const int CommonEmojiRangeSize

See Also

UnicodeDataProvider