Package unicode

Overview ▾

软件包unicode提供数据和功能来测试Unicode代码点的某些属性.

示例(是)

以" Is"开头的函数可用于检查符文属于哪个范围表. 请注意,符文可能适合多个范围.

For '\b':
	is control rune
	is not printable rune
For '5':
	is digit rune
	is graphic rune
	is number rune
	is printable rune
For 'Ὂ':
	is graphic rune
	is letter rune
	is printable rune
	is upper case rune
For 'g':
	is graphic rune
	is letter rune
	is lower case rune
	is printable rune
For '̀':
	is graphic rune
	is mark rune
	is printable rune
For '9':
	is digit rune
	is graphic rune
	is number rune
	is printable rune
For '!':
	is graphic rune
	is printable rune
	is punct rune
For ' ':
	is graphic rune
	is printable rune
	is space rune
For '℃':
	is graphic rune
	is printable rune
	is symbol rune
For 'ᾭ':
	is graphic rune
	is letter rune
	is printable rune
	is title case rune
For 'G':
	is graphic rune
	is letter rune
	is printable rune
	is upper case rune

Constants

const (
    MaxRune         = '\U0010FFFF' // Maximum valid Unicode code point.
    ReplacementChar = '\uFFFD'     // Represents invalid code points.
    MaxASCII        = '\u007F'     // maximum ASCII value.
    MaxLatin1       = '\u00FF'     // maximum Latin-1 value.
)

指示CaseRanges内部的Delta数组以进行案例映射.

const (
    UpperCase = iota
    LowerCase
    TitleCase
    MaxCase
)

如果CaseRange的Delta字段为UpperLower,则表示此CaseRange表示形式为(例如)Upper Lower Upper Lower Lower的序列.

const (
    UpperLower = MaxRune + 1 // (Cannot be a valid delta.)
)

版本是从中派生表的Unicode版本.

const Version = "11.0.0"

Variables

这些变量的类型为* RangeTable.

var (
    Cc     = _Cc // Cc is the set of Unicode characters in category Cc (Other, control).
    Cf     = _Cf // Cf is the set of Unicode characters in category Cf (Other, format).
    Co     = _Co // Co is the set of Unicode characters in category Co (Other, private use).
    Cs     = _Cs // Cs is the set of Unicode characters in category Cs (Other, surrogate).
    Digit  = _Nd // Digit is the set of Unicode characters with the "decimal digit" property.
    Nd     = _Nd // Nd is the set of Unicode characters in category Nd (Number, decimal digit).
    Letter = _L  // Letter/L is the set of Unicode letters, category L.
    L      = _L
    Lm     = _Lm // Lm is the set of Unicode characters in category Lm (Letter, modifier).
    Lo     = _Lo // Lo is the set of Unicode characters in category Lo (Letter, other).
    Lower  = _Ll // Lower is the set of Unicode lower case letters.
    Ll     = _Ll // Ll is the set of Unicode characters in category Ll (Letter, lowercase).
    Mark   = _M  // Mark/M is the set of Unicode mark characters, category M.
    M      = _M
    Mc     = _Mc // Mc is the set of Unicode characters in category Mc (Mark, spacing combining).
    Me     = _Me // Me is the set of Unicode characters in category Me (Mark, enclosing).
    Mn     = _Mn // Mn is the set of Unicode characters in category Mn (Mark, nonspacing).
    Nl     = _Nl // Nl is the set of Unicode characters in category Nl (Number, letter).
    No     = _No // No is the set of Unicode characters in category No (Number, other).
    Number = _N  // Number/N is the set of Unicode number characters, category N.
    N      = _N
    Other  = _C // Other/C is the set of Unicode control and special characters, category C.
    C      = _C
    Pc     = _Pc // Pc is the set of Unicode characters in category Pc (Punctuation, connector).
    Pd     = _Pd // Pd is the set of Unicode characters in category Pd (Punctuation, dash).
    Pe     = _Pe // Pe is the set of Unicode characters in category Pe (Punctuation, close).
    Pf     = _Pf // Pf is the set of Unicode characters in category Pf (Punctuation, final quote).
    Pi     = _Pi // Pi is the set of Unicode characters in category Pi (Punctuation, initial quote).
    Po     = _Po // Po is the set of Unicode characters in category Po (Punctuation, other).
    Ps     = _Ps // Ps is the set of Unicode characters in category Ps (Punctuation, open).
    Punct  = _P  // Punct/P is the set of Unicode punctuation characters, category P.
    P      = _P
    Sc     = _Sc // Sc is the set of Unicode characters in category Sc (Symbol, currency).
    Sk     = _Sk // Sk is the set of Unicode characters in category Sk (Symbol, modifier).
    Sm     = _Sm // Sm is the set of Unicode characters in category Sm (Symbol, math).
    So     = _So // So is the set of Unicode characters in category So (Symbol, other).
    Space  = _Z  // Space/Z is the set of Unicode space characters, category Z.
    Z      = _Z
    Symbol = _S // Symbol/S is the set of Unicode symbol characters, category S.
    S      = _S
    Title  = _Lt // Title is the set of Unicode title case letters.
    Lt     = _Lt // Lt is the set of Unicode characters in category Lt (Letter, titlecase).
    Upper  = _Lu // Upper is the set of Unicode upper case letters.
    Lu     = _Lu // Lu is the set of Unicode characters in category Lu (Letter, uppercase).
    Zl     = _Zl // Zl is the set of Unicode characters in category Zl (Separator, line).
    Zp     = _Zp // Zp is the set of Unicode characters in category Zp (Separator, paragraph).
    Zs     = _Zs // Zs is the set of Unicode characters in category Zs (Separator, space).
)

这些变量的类型为* RangeTable.

var (
    Adlam                  = _Adlam                  // Adlam is the set of Unicode characters in script Adlam.
    Ahom                   = _Ahom                   // Ahom is the set of Unicode characters in script Ahom.
    Anatolian_Hieroglyphs  = _Anatolian_Hieroglyphs  // Anatolian_Hieroglyphs is the set of Unicode characters in script Anatolian_Hieroglyphs.
    Arabic                 = _Arabic                 // Arabic is the set of Unicode characters in script Arabic.
    Armenian               = _Armenian               // Armenian is the set of Unicode characters in script Armenian.
    Avestan                = _Avestan                // Avestan is the set of Unicode characters in script Avestan.
    Balinese               = _Balinese               // Balinese is the set of Unicode characters in script Balinese.
    Bamum                  = _Bamum                  // Bamum is the set of Unicode characters in script Bamum.
    Bassa_Vah              = _Bassa_Vah              // Bassa_Vah is the set of Unicode characters in script Bassa_Vah.
    Batak                  = _Batak                  // Batak is the set of Unicode characters in script Batak.
    Bengali                = _Bengali                // Bengali is the set of Unicode characters in script Bengali.
    Bhaiksuki              = _Bhaiksuki              // Bhaiksuki is the set of Unicode characters in script Bhaiksuki.
    Bopomofo               = _Bopomofo               // Bopomofo is the set of Unicode characters in script Bopomofo.
    Brahmi                 = _Brahmi                 // Brahmi is the set of Unicode characters in script Brahmi.
    Braille                = _Braille                // Braille is the set of Unicode characters in script Braille.
    Buginese               = _Buginese               // Buginese is the set of Unicode characters in script Buginese.
    Buhid                  = _Buhid                  // Buhid is the set of Unicode characters in script Buhid.
    Canadian_Aboriginal    = _Canadian_Aboriginal    // Canadian_Aboriginal is the set of Unicode characters in script Canadian_Aboriginal.
    Carian                 = _Carian                 // Carian is the set of Unicode characters in script Carian.
    Caucasian_Albanian     = _Caucasian_Albanian     // Caucasian_Albanian is the set of Unicode characters in script Caucasian_Albanian.
    Chakma                 = _Chakma                 // Chakma is the set of Unicode characters in script Chakma.
    Cham                   = _Cham                   // Cham is the set of Unicode characters in script Cham.
    Cherokee               = _Cherokee               // Cherokee is the set of Unicode characters in script Cherokee.
    Common                 = _Common                 // Common is the set of Unicode characters in script Common.
    Coptic                 = _Coptic                 // Coptic is the set of Unicode characters in script Coptic.
    Cuneiform              = _Cuneiform              // Cuneiform is the set of Unicode characters in script Cuneiform.
    Cypriot                = _Cypriot                // Cypriot is the set of Unicode characters in script Cypriot.
    Cyrillic               = _Cyrillic               // Cyrillic is the set of Unicode characters in script Cyrillic.
    Deseret                = _Deseret                // Deseret is the set of Unicode characters in script Deseret.
    Devanagari             = _Devanagari             // Devanagari is the set of Unicode characters in script Devanagari.
    Dogra                  = _Dogra                  // Dogra is the set of Unicode characters in script Dogra.
    Duployan               = _Duployan               // Duployan is the set of Unicode characters in script Duployan.
    Egyptian_Hieroglyphs   = _Egyptian_Hieroglyphs   // Egyptian_Hieroglyphs is the set of Unicode characters in script Egyptian_Hieroglyphs.
    Elbasan                = _Elbasan                // Elbasan is the set of Unicode characters in script Elbasan.
    Ethiopic               = _Ethiopic               // Ethiopic is the set of Unicode characters in script Ethiopic.
    Georgian               = _Georgian               // Georgian is the set of Unicode characters in script Georgian.
    Glagolitic             = _Glagolitic             // Glagolitic is the set of Unicode characters in script Glagolitic.
    Gothic                 = _Gothic                 // Gothic is the set of Unicode characters in script Gothic.
    Grantha                = _Grantha                // Grantha is the set of Unicode characters in script Grantha.
    Greek                  = _Greek                  // Greek is the set of Unicode characters in script Greek.
    Gujarati               = _Gujarati               // Gujarati is the set of Unicode characters in script Gujarati.
    Gunjala_Gondi          = _Gunjala_Gondi          // Gunjala_Gondi is the set of Unicode characters in script Gunjala_Gondi.
    Gurmukhi               = _Gurmukhi               // Gurmukhi is the set of Unicode characters in script Gurmukhi.
    Han                    = _Han                    // Han is the set of Unicode characters in script Han.
    Hangul                 = _Hangul                 // Hangul is the set of Unicode characters in script Hangul.
    Hanifi_Rohingya        = _Hanifi_Rohingya        // Hanifi_Rohingya is the set of Unicode characters in script Hanifi_Rohingya.
    Hanunoo                = _Hanunoo                // Hanunoo is the set of Unicode characters in script Hanunoo.
    Hatran                 = _Hatran                 // Hatran is the set of Unicode characters in script Hatran.
    Hebrew                 = _Hebrew                 // Hebrew is the set of Unicode characters in script Hebrew.
    Hiragana               = _Hiragana               // Hiragana is the set of Unicode characters in script Hiragana.
    Imperial_Aramaic       = _Imperial_Aramaic       // Imperial_Aramaic is the set of Unicode characters in script Imperial_Aramaic.
    Inherited              = _Inherited              // Inherited is the set of Unicode characters in script Inherited.
    Inscriptional_Pahlavi  = _Inscriptional_Pahlavi  // Inscriptional_Pahlavi is the set of Unicode characters in script Inscriptional_Pahlavi.
    Inscriptional_Parthian = _Inscriptional_Parthian // Inscriptional_Parthian is the set of Unicode characters in script Inscriptional_Parthian.
    Javanese               = _Javanese               // Javanese is the set of Unicode characters in script Javanese.
    Kaithi                 = _Kaithi                 // Kaithi is the set of Unicode characters in script Kaithi.
    Kannada                = _Kannada                // Kannada is the set of Unicode characters in script Kannada.
    Katakana               = _Katakana               // Katakana is the set of Unicode characters in script Katakana.
    Kayah_Li               = _Kayah_Li               // Kayah_Li is the set of Unicode characters in script Kayah_Li.
    Kharoshthi             = _Kharoshthi             // Kharoshthi is the set of Unicode characters in script Kharoshthi.
    Khmer                  = _Khmer                  // Khmer is the set of Unicode characters in script Khmer.
    Khojki                 = _Khojki                 // Khojki is the set of Unicode characters in script Khojki.
    Khudawadi              = _Khudawadi              // Khudawadi is the set of Unicode characters in script Khudawadi.
    Lao                    = _Lao                    // Lao is the set of Unicode characters in script Lao.
    Latin                  = _Latin                  // Latin is the set of Unicode characters in script Latin.
    Lepcha                 = _Lepcha                 // Lepcha is the set of Unicode characters in script Lepcha.
    Limbu                  = _Limbu                  // Limbu is the set of Unicode characters in script Limbu.
    Linear_A               = _Linear_A               // Linear_A is the set of Unicode characters in script Linear_A.
    Linear_B               = _Linear_B               // Linear_B is the set of Unicode characters in script Linear_B.
    Lisu                   = _Lisu                   // Lisu is the set of Unicode characters in script Lisu.
    Lycian                 = _Lycian                 // Lycian is the set of Unicode characters in script Lycian.
    Lydian                 = _Lydian                 // Lydian is the set of Unicode characters in script Lydian.
    Mahajani               = _Mahajani               // Mahajani is the set of Unicode characters in script Mahajani.
    Makasar                = _Makasar                // Makasar is the set of Unicode characters in script Makasar.
    Malayalam              = _Malayalam              // Malayalam is the set of Unicode characters in script Malayalam.
    Mandaic                = _Mandaic                // Mandaic is the set of Unicode characters in script Mandaic.
    Manichaean             = _Manichaean             // Manichaean is the set of Unicode characters in script Manichaean.
    Marchen                = _Marchen                // Marchen is the set of Unicode characters in script Marchen.
    Masaram_Gondi          = _Masaram_Gondi          // Masaram_Gondi is the set of Unicode characters in script Masaram_Gondi.
    Medefaidrin            = _Medefaidrin            // Medefaidrin is the set of Unicode characters in script Medefaidrin.
    Meetei_Mayek           = _Meetei_Mayek           // Meetei_Mayek is the set of Unicode characters in script Meetei_Mayek.
    Mende_Kikakui          = _Mende_Kikakui          // Mende_Kikakui is the set of Unicode characters in script Mende_Kikakui.
    Meroitic_Cursive       = _Meroitic_Cursive       // Meroitic_Cursive is the set of Unicode characters in script Meroitic_Cursive.
    Meroitic_Hieroglyphs   = _Meroitic_Hieroglyphs   // Meroitic_Hieroglyphs is the set of Unicode characters in script Meroitic_Hieroglyphs.
    Miao                   = _Miao                   // Miao is the set of Unicode characters in script Miao.
    Modi                   = _Modi                   // Modi is the set of Unicode characters in script Modi.
    Mongolian              = _Mongolian              // Mongolian is the set of Unicode characters in script Mongolian.
    Mro                    = _Mro                    // Mro is the set of Unicode characters in script Mro.
    Multani                = _Multani                // Multani is the set of Unicode characters in script Multani.
    Myanmar                = _Myanmar                // Myanmar is the set of Unicode characters in script Myanmar.
    Nabataean              = _Nabataean              // Nabataean is the set of Unicode characters in script Nabataean.
    New_Tai_Lue            = _New_Tai_Lue            // New_Tai_Lue is the set of Unicode characters in script New_Tai_Lue.
    Newa                   = _Newa                   // Newa is the set of Unicode characters in script Newa.
    Nko                    = _Nko                    // Nko is the set of Unicode characters in script Nko.
    Nushu                  = _Nushu                  // Nushu is the set of Unicode characters in script Nushu.
    Ogham                  = _Ogham                  // Ogham is the set of Unicode characters in script Ogham.
    Ol_Chiki               = _Ol_Chiki               // Ol_Chiki is the set of Unicode characters in script Ol_Chiki.
    Old_Hungarian          = _Old_Hungarian          // Old_Hungarian is the set of Unicode characters in script Old_Hungarian.
    Old_Italic             = _Old_Italic             // Old_Italic is the set of Unicode characters in script Old_Italic.
    Old_North_Arabian      = _Old_North_Arabian      // Old_North_Arabian is the set of Unicode characters in script Old_North_Arabian.
    Old_Permic             = _Old_Permic             // Old_Permic is the set of Unicode characters in script Old_Permic.
    Old_Persian            = _Old_Persian            // Old_Persian is the set of Unicode characters in script Old_Persian.
    Old_Sogdian            = _Old_Sogdian            // Old_Sogdian is the set of Unicode characters in script Old_Sogdian.
    Old_South_Arabian      = _Old_South_Arabian      // Old_South_Arabian is the set of Unicode characters in script Old_South_Arabian.
    Old_Turkic             = _Old_Turkic             // Old_Turkic is the set of Unicode characters in script Old_Turkic.
    Oriya                  = _Oriya                  // Oriya is the set of Unicode characters in script Oriya.
    Osage                  = _Osage                  // Osage is the set of Unicode characters in script Osage.
    Osmanya                = _Osmanya                // Osmanya is the set of Unicode characters in script Osmanya.
    Pahawh_Hmong           = _Pahawh_Hmong           // Pahawh_Hmong is the set of Unicode characters in script Pahawh_Hmong.
    Palmyrene              = _Palmyrene              // Palmyrene is the set of Unicode characters in script Palmyrene.
    Pau_Cin_Hau            = _Pau_Cin_Hau            // Pau_Cin_Hau is the set of Unicode characters in script Pau_Cin_Hau.
    Phags_Pa               = _Phags_Pa               // Phags_Pa is the set of Unicode characters in script Phags_Pa.
    Phoenician             = _Phoenician             // Phoenician is the set of Unicode characters in script Phoenician.
    Psalter_Pahlavi        = _Psalter_Pahlavi        // Psalter_Pahlavi is the set of Unicode characters in script Psalter_Pahlavi.
    Rejang                 = _Rejang                 // Rejang is the set of Unicode characters in script Rejang.
    Runic                  = _Runic                  // Runic is the set of Unicode characters in script Runic.
    Samaritan              = _Samaritan              // Samaritan is the set of Unicode characters in script Samaritan.
    Saurashtra             = _Saurashtra             // Saurashtra is the set of Unicode characters in script Saurashtra.
    Sharada                = _Sharada                // Sharada is the set of Unicode characters in script Sharada.
    Shavian                = _Shavian                // Shavian is the set of Unicode characters in script Shavian.
    Siddham                = _Siddham                // Siddham is the set of Unicode characters in script Siddham.
    SignWriting            = _SignWriting            // SignWriting is the set of Unicode characters in script SignWriting.
    Sinhala                = _Sinhala                // Sinhala is the set of Unicode characters in script Sinhala.
    Sogdian                = _Sogdian                // Sogdian is the set of Unicode characters in script Sogdian.
    Sora_Sompeng           = _Sora_Sompeng           // Sora_Sompeng is the set of Unicode characters in script Sora_Sompeng.
    Soyombo                = _Soyombo                // Soyombo is the set of Unicode characters in script Soyombo.
    Sundanese              = _Sundanese              // Sundanese is the set of Unicode characters in script Sundanese.
    Syloti_Nagri           = _Syloti_Nagri           // Syloti_Nagri is the set of Unicode characters in script Syloti_Nagri.
    Syriac                 = _Syriac                 // Syriac is the set of Unicode characters in script Syriac.
    Tagalog                = _Tagalog                // Tagalog is the set of Unicode characters in script Tagalog.
    Tagbanwa               = _Tagbanwa               // Tagbanwa is the set of Unicode characters in script Tagbanwa.
    Tai_Le                 = _Tai_Le                 // Tai_Le is the set of Unicode characters in script Tai_Le.
    Tai_Tham               = _Tai_Tham               // Tai_Tham is the set of Unicode characters in script Tai_Tham.
    Tai_Viet               = _Tai_Viet               // Tai_Viet is the set of Unicode characters in script Tai_Viet.
    Takri                  = _Takri                  // Takri is the set of Unicode characters in script Takri.
    Tamil                  = _Tamil                  // Tamil is the set of Unicode characters in script Tamil.
    Tangut                 = _Tangut                 // Tangut is the set of Unicode characters in script Tangut.
    Telugu                 = _Telugu                 // Telugu is the set of Unicode characters in script Telugu.
    Thaana                 = _Thaana                 // Thaana is the set of Unicode characters in script Thaana.
    Thai                   = _Thai                   // Thai is the set of Unicode characters in script Thai.
    Tibetan                = _Tibetan                // Tibetan is the set of Unicode characters in script Tibetan.
    Tifinagh               = _Tifinagh               // Tifinagh is the set of Unicode characters in script Tifinagh.
    Tirhuta                = _Tirhuta                // Tirhuta is the set of Unicode characters in script Tirhuta.
    Ugaritic               = _Ugaritic               // Ugaritic is the set of Unicode characters in script Ugaritic.
    Vai                    = _Vai                    // Vai is the set of Unicode characters in script Vai.
    Warang_Citi            = _Warang_Citi            // Warang_Citi is the set of Unicode characters in script Warang_Citi.
    Yi                     = _Yi                     // Yi is the set of Unicode characters in script Yi.
    Zanabazar_Square       = _Zanabazar_Square       // Zanabazar_Square is the set of Unicode characters in script Zanabazar_Square.
)

这些变量的类型为* RangeTable.

var (
    ASCII_Hex_Digit                    = _ASCII_Hex_Digit                    // ASCII_Hex_Digit is the set of Unicode characters with property ASCII_Hex_Digit.
    Bidi_Control                       = _Bidi_Control                       // Bidi_Control is the set of Unicode characters with property Bidi_Control.
    Dash                               = _Dash                               // Dash is the set of Unicode characters with property Dash.
    Deprecated                         = _Deprecated                         // Deprecated is the set of Unicode characters with property Deprecated.
    Diacritic                          = _Diacritic                          // Diacritic is the set of Unicode characters with property Diacritic.
    Extender                           = _Extender                           // Extender is the set of Unicode characters with property Extender.
    Hex_Digit                          = _Hex_Digit                          // Hex_Digit is the set of Unicode characters with property Hex_Digit.
    Hyphen                             = _Hyphen                             // Hyphen is the set of Unicode characters with property Hyphen.
    IDS_Binary_Operator                = _IDS_Binary_Operator                // IDS_Binary_Operator is the set of Unicode characters with property IDS_Binary_Operator.
    IDS_Trinary_Operator               = _IDS_Trinary_Operator               // IDS_Trinary_Operator is the set of Unicode characters with property IDS_Trinary_Operator.
    Ideographic                        = _Ideographic                        // Ideographic is the set of Unicode characters with property Ideographic.
    Join_Control                       = _Join_Control                       // Join_Control is the set of Unicode characters with property Join_Control.
    Logical_Order_Exception            = _Logical_Order_Exception            // Logical_Order_Exception is the set of Unicode characters with property Logical_Order_Exception.
    Noncharacter_Code_Point            = _Noncharacter_Code_Point            // Noncharacter_Code_Point is the set of Unicode characters with property Noncharacter_Code_Point.
    Other_Alphabetic                   = _Other_Alphabetic                   // Other_Alphabetic is the set of Unicode characters with property Other_Alphabetic.
    Other_Default_Ignorable_Code_Point = _Other_Default_Ignorable_Code_Point // Other_Default_Ignorable_Code_Point is the set of Unicode characters with property Other_Default_Ignorable_Code_Point.
    Other_Grapheme_Extend              = _Other_Grapheme_Extend              // Other_Grapheme_Extend is the set of Unicode characters with property Other_Grapheme_Extend.
    Other_ID_Continue                  = _Other_ID_Continue                  // Other_ID_Continue is the set of Unicode characters with property Other_ID_Continue.
    Other_ID_Start                     = _Other_ID_Start                     // Other_ID_Start is the set of Unicode characters with property Other_ID_Start.
    Other_Lowercase                    = _Other_Lowercase                    // Other_Lowercase is the set of Unicode characters with property Other_Lowercase.
    Other_Math                         = _Other_Math                         // Other_Math is the set of Unicode characters with property Other_Math.
    Other_Uppercase                    = _Other_Uppercase                    // Other_Uppercase is the set of Unicode characters with property Other_Uppercase.
    Pattern_Syntax                     = _Pattern_Syntax                     // Pattern_Syntax is the set of Unicode characters with property Pattern_Syntax.
    Pattern_White_Space                = _Pattern_White_Space                // Pattern_White_Space is the set of Unicode characters with property Pattern_White_Space.
    Prepended_Concatenation_Mark       = _Prepended_Concatenation_Mark       // Prepended_Concatenation_Mark is the set of Unicode characters with property Prepended_Concatenation_Mark.
    Quotation_Mark                     = _Quotation_Mark                     // Quotation_Mark is the set of Unicode characters with property Quotation_Mark.
    Radical                            = _Radical                            // Radical is the set of Unicode characters with property Radical.
    Regional_Indicator                 = _Regional_Indicator                 // Regional_Indicator is the set of Unicode characters with property Regional_Indicator.
    STerm                              = _Sentence_Terminal                  // STerm is an alias for Sentence_Terminal.
    Sentence_Terminal                  = _Sentence_Terminal                  // Sentence_Terminal is the set of Unicode characters with property Sentence_Terminal.
    Soft_Dotted                        = _Soft_Dotted                        // Soft_Dotted is the set of Unicode characters with property Soft_Dotted.
    Terminal_Punctuation               = _Terminal_Punctuation               // Terminal_Punctuation is the set of Unicode characters with property Terminal_Punctuation.
    Unified_Ideograph                  = _Unified_Ideograph                  // Unified_Ideograph is the set of Unicode characters with property Unified_Ideograph.
    Variation_Selector                 = _Variation_Selector                 // Variation_Selector is the set of Unicode characters with property Variation_Selector.
    White_Space                        = _White_Space                        // White_Space is the set of Unicode characters with property White_Space.
)

CaseRanges是描述非字母映射的所有字母的大小写映射的表.

var CaseRanges = _CaseRanges

类别是Unicode类别表的集合.

var Categories = map[string]*RangeTable{
    "C":  C,
    "Cc": Cc,
    "Cf": Cf,
    "Co": Co,
    "Cs": Cs,
    "L":  L,
    "Ll": Ll,
    "Lm": Lm,
    "Lo": Lo,
    "Lt": Lt,
    "Lu": Lu,
    "M":  M,
    "Mc": Mc,
    "Me": Me,
    "Mn": Mn,
    "N":  N,
    "Nd": Nd,
    "Nl": Nl,
    "No": No,
    "P":  P,
    "Pc": Pc,
    "Pd": Pd,
    "Pe": Pe,
    "Pf": Pf,
    "Pi": Pi,
    "Po": Po,
    "Ps": Ps,
    "S":  S,
    "Sc": Sc,
    "Sk": Sk,
    "Sm": Sm,
    "So": So,
    "Z":  Z,
    "Zl": Zl,
    "Zp": Zp,
    "Zs": Zs,
}

FoldCategory将类别名称映射到类别外部的代码点表,这些代码点在简单的大小写折叠下等同于类别内部的代码点. 如果没有类别名称的条目,则没有这些点.

var FoldCategory = map[string]*RangeTable{
    "L":  foldL,
    "Ll": foldLl,
    "Lt": foldLt,
    "Lu": foldLu,
    "M":  foldM,
    "Mn": foldMn,
}

FoldScript将脚本名称映射到脚本外部的代码点表,在简单的大小写折叠下,脚本表中的代码点等效于脚本内部的代码点. 如果没有脚本名称的条目,则没有这些要点.

var FoldScript = map[string]*RangeTable{
    "Common":    foldCommon,
    "Greek":     foldGreek,
    "Inherited": foldInherited,
}

GraphicRanges根据Unicode定义图形字符集.

var GraphicRanges = []*RangeTable{
    L, M, N, P, S, Zs,
}

PrintRanges根据Go定义可打印字符集. ASCII空间U + 0020是分开处理的.

var PrintRanges = []*RangeTable{
    L, M, N, P, S,
}

属性是Unicode属性表的集合.

var Properties = map[string]*RangeTable{
    "ASCII_Hex_Digit":                    ASCII_Hex_Digit,
    "Bidi_Control":                       Bidi_Control,
    "Dash":                               Dash,
    "Deprecated":                         Deprecated,
    "Diacritic":                          Diacritic,
    "Extender":                           Extender,
    "Hex_Digit":                          Hex_Digit,
    "Hyphen":                             Hyphen,
    "IDS_Binary_Operator":                IDS_Binary_Operator,
    "IDS_Trinary_Operator":               IDS_Trinary_Operator,
    "Ideographic":                        Ideographic,
    "Join_Control":                       Join_Control,
    "Logical_Order_Exception":            Logical_Order_Exception,
    "Noncharacter_Code_Point":            Noncharacter_Code_Point,
    "Other_Alphabetic":                   Other_Alphabetic,
    "Other_Default_Ignorable_Code_Point": Other_Default_Ignorable_Code_Point,
    "Other_Grapheme_Extend":              Other_Grapheme_Extend,
    "Other_ID_Continue":                  Other_ID_Continue,
    "Other_ID_Start":                     Other_ID_Start,
    "Other_Lowercase":                    Other_Lowercase,
    "Other_Math":                         Other_Math,
    "Other_Uppercase":                    Other_Uppercase,
    "Pattern_Syntax":                     Pattern_Syntax,
    "Pattern_White_Space":                Pattern_White_Space,
    "Prepended_Concatenation_Mark":       Prepended_Concatenation_Mark,
    "Quotation_Mark":                     Quotation_Mark,
    "Radical":                            Radical,
    "Regional_Indicator":                 Regional_Indicator,
    "Sentence_Terminal":                  Sentence_Terminal,
    "STerm":                              Sentence_Terminal,
    "Soft_Dotted":                        Soft_Dotted,
    "Terminal_Punctuation":               Terminal_Punctuation,
    "Unified_Ideograph":                  Unified_Ideograph,
    "Variation_Selector":                 Variation_Selector,
    "White_Space":                        White_Space,
}

脚本是Unicode脚本表的集合.

var Scripts = map[string]*RangeTable{
    "Adlam":                  Adlam,
    "Ahom":                   Ahom,
    "Anatolian_Hieroglyphs":  Anatolian_Hieroglyphs,
    "Arabic":                 Arabic,
    "Armenian":               Armenian,
    "Avestan":                Avestan,
    "Balinese":               Balinese,
    "Bamum":                  Bamum,
    "Bassa_Vah":              Bassa_Vah,
    "Batak":                  Batak,
    "Bengali":                Bengali,
    "Bhaiksuki":              Bhaiksuki,
    "Bopomofo":               Bopomofo,
    "Brahmi":                 Brahmi,
    "Braille":                Braille,
    "Buginese":               Buginese,
    "Buhid":                  Buhid,
    "Canadian_Aboriginal":    Canadian_Aboriginal,
    "Carian":                 Carian,
    "Caucasian_Albanian":     Caucasian_Albanian,
    "Chakma":                 Chakma,
    "Cham":                   Cham,
    "Cherokee":               Cherokee,
    "Common":                 Common,
    "Coptic":                 Coptic,
    "Cuneiform":              Cuneiform,
    "Cypriot":                Cypriot,
    "Cyrillic":               Cyrillic,
    "Deseret":                Deseret,
    "Devanagari":             Devanagari,
    "Dogra":                  Dogra,
    "Duployan":               Duployan,
    "Egyptian_Hieroglyphs":   Egyptian_Hieroglyphs,
    "Elbasan":                Elbasan,
    "Ethiopic":               Ethiopic,
    "Georgian":               Georgian,
    "Glagolitic":             Glagolitic,
    "Gothic":                 Gothic,
    "Grantha":                Grantha,
    "Greek":                  Greek,
    "Gujarati":               Gujarati,
    "Gunjala_Gondi":          Gunjala_Gondi,
    "Gurmukhi":               Gurmukhi,
    "Han":                    Han,
    "Hangul":                 Hangul,
    "Hanifi_Rohingya":        Hanifi_Rohingya,
    "Hanunoo":                Hanunoo,
    "Hatran":                 Hatran,
    "Hebrew":                 Hebrew,
    "Hiragana":               Hiragana,
    "Imperial_Aramaic":       Imperial_Aramaic,
    "Inherited":              Inherited,
    "Inscriptional_Pahlavi":  Inscriptional_Pahlavi,
    "Inscriptional_Parthian": Inscriptional_Parthian,
    "Javanese":               Javanese,
    "Kaithi":                 Kaithi,
    "Kannada":                Kannada,
    "Katakana":               Katakana,
    "Kayah_Li":               Kayah_Li,
    "Kharoshthi":             Kharoshthi,
    "Khmer":                  Khmer,
    "Khojki":                 Khojki,
    "Khudawadi":              Khudawadi,
    "Lao":                    Lao,
    "Latin":                  Latin,
    "Lepcha":                 Lepcha,
    "Limbu":                  Limbu,
    "Linear_A":               Linear_A,
    "Linear_B":               Linear_B,
    "Lisu":                   Lisu,
    "Lycian":                 Lycian,
    "Lydian":                 Lydian,
    "Mahajani":               Mahajani,
    "Makasar":                Makasar,
    "Malayalam":              Malayalam,
    "Mandaic":                Mandaic,
    "Manichaean":             Manichaean,
    "Marchen":                Marchen,
    "Masaram_Gondi":          Masaram_Gondi,
    "Medefaidrin":            Medefaidrin,
    "Meetei_Mayek":           Meetei_Mayek,
    "Mende_Kikakui":          Mende_Kikakui,
    "Meroitic_Cursive":       Meroitic_Cursive,
    "Meroitic_Hieroglyphs":   Meroitic_Hieroglyphs,
    "Miao":                   Miao,
    "Modi":                   Modi,
    "Mongolian":              Mongolian,
    "Mro":                    Mro,
    "Multani":                Multani,
    "Myanmar":                Myanmar,
    "Nabataean":              Nabataean,
    "New_Tai_Lue":            New_Tai_Lue,
    "Newa":                   Newa,
    "Nko":                    Nko,
    "Nushu":                  Nushu,
    "Ogham":                  Ogham,
    "Ol_Chiki":               Ol_Chiki,
    "Old_Hungarian":          Old_Hungarian,
    "Old_Italic":             Old_Italic,
    "Old_North_Arabian":      Old_North_Arabian,
    "Old_Permic":             Old_Permic,
    "Old_Persian":            Old_Persian,
    "Old_Sogdian":            Old_Sogdian,
    "Old_South_Arabian":      Old_South_Arabian,
    "Old_Turkic":             Old_Turkic,
    "Oriya":                  Oriya,
    "Osage":                  Osage,
    "Osmanya":                Osmanya,
    "Pahawh_Hmong":           Pahawh_Hmong,
    "Palmyrene":              Palmyrene,
    "Pau_Cin_Hau":            Pau_Cin_Hau,
    "Phags_Pa":               Phags_Pa,
    "Phoenician":             Phoenician,
    "Psalter_Pahlavi":        Psalter_Pahlavi,
    "Rejang":                 Rejang,
    "Runic":                  Runic,
    "Samaritan":              Samaritan,
    "Saurashtra":             Saurashtra,
    "Sharada":                Sharada,
    "Shavian":                Shavian,
    "Siddham":                Siddham,
    "SignWriting":            SignWriting,
    "Sinhala":                Sinhala,
    "Sogdian":                Sogdian,
    "Sora_Sompeng":           Sora_Sompeng,
    "Soyombo":                Soyombo,
    "Sundanese":              Sundanese,
    "Syloti_Nagri":           Syloti_Nagri,
    "Syriac":                 Syriac,
    "Tagalog":                Tagalog,
    "Tagbanwa":               Tagbanwa,
    "Tai_Le":                 Tai_Le,
    "Tai_Tham":               Tai_Tham,
    "Tai_Viet":               Tai_Viet,
    "Takri":                  Takri,
    "Tamil":                  Tamil,
    "Tangut":                 Tangut,
    "Telugu":                 Telugu,
    "Thaana":                 Thaana,
    "Thai":                   Thai,
    "Tibetan":                Tibetan,
    "Tifinagh":               Tifinagh,
    "Tirhuta":                Tirhuta,
    "Ugaritic":               Ugaritic,
    "Vai":                    Vai,
    "Warang_Citi":            Warang_Citi,
    "Yi":                     Yi,
    "Zanabazar_Square":       Zanabazar_Square,
}

func In 1.2

func In(r rune, ranges ...*RangeTable) bool

在报告中,符文是否为范围之一的成员.

func Is

func Is(rangeTab *RangeTable, r rune) bool

是报告符文是否在指定的范围表中.

func IsControl

func IsControl(r rune) bool

IsControl报告该符文是否为控制字符. C(其他)Unicode类别包括更多代码点,例如代理; 使用Is(C,r)进行测试.

func IsDigit

func IsDigit(r rune) bool

IsDigit报告该符文是否为十进制数字.

func IsGraphic

func IsGraphic(r rune) bool

IsGraphic报告是否通过Unicode将符文定义为图形. 此类字符包括字母L,M,N,P,S,Zs中的字母,标记,数字,标点符号,符号和空格.

func IsLetter

func IsLetter(r rune) bool

IsLetter报告该符文是否为字母(类别L).

func IsLower

func IsLower(r rune) bool

IsLower报告该符文是否为小写字母.

func IsMark

func IsMark(r rune) bool

IsMark报告该符文是否为标记字符(类别M).

func IsNumber

func IsNumber(r rune) bool

IsNumber报告该符文是否为数字(类别N).

func IsOneOf

func IsOneOf(ranges []*RangeTable, r rune) bool

IsOneOf报告该符文是否为范围之一的成员. 函数" In"提供了更好的签名,应优先于IsOneOf使用.

func IsPrint

func IsPrint(r rune) bool

IsPrint报告该符文是否被Go定义为可打印. 此类字符包括字母,标记,数字,标点符号,符号以及ASCII空格字符,它们来自类别L,M,N,P,S和ASCII空格字符. 此分类与IsGraphic相同,除了唯一的空格字符是ASCII空格U + 0020.

func IsPunct

func IsPunct(r rune) bool

IsPunct报告该符文是否为Unicode标点字符(类别P).

func IsSpace

func IsSpace(r rune) bool

IsSpace报告符文是否为Unicode的White Space属性定义的空格字符; 在Latin-1空间中,这是

'\t', '\n', '\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).

间隔字符的其他定义由类别Z和属性Pattern_White_Space设置.

func IsSymbol

func IsSymbol(r rune) bool

IsSymbol报告该符文是否为符号字符.

func IsTitle

func IsTitle(r rune) bool

IsTitle报告该符文是否为标题大小写字母.

func IsUpper

func IsUpper(r rune) bool

IsUpper报告该符文是否为大写字母.

func SimpleFold

func SimpleFold(r rune) rune

SimpleFold遍历Unicode定义的简单大小写折叠下的等效Unicode代码点. 在相当于符文的代码点(包括符文本身)中,如果存在,则SimpleFold返回最小的符文> r,否则返回最小的符文> =0.如果r不是有效的Unicode代码点,则Si​​mpleFold(r)返回r.

例如:

SimpleFold('A') = 'a'
SimpleFold('a') = 'A'

SimpleFold('K') = 'k'
SimpleFold('k') = '\u212A' (Kelvin symbol, K)
SimpleFold('\u212A') = 'K'

SimpleFold('1') = '1'

SimpleFold(-2) = -2

Example

U+0061 'a'
U+0041 'A'
U+006B 'k'
U+212A 'K'
U+004B 'K'
U+0031 '1'

func To

func To(_case int, r rune) rune

要将符文映射到指定的大小写:UpperCase,LowerCase或TitleCase.

Example

U+0047 'G'
U+0067 'g'
U+0047 'G'
U+0047 'G'
U+0067 'g'
U+0047 'G'

func ToLower

func ToLower(r rune) rune

ToLower将符文映射为小写.

Example

U+0067 'g'

func ToTitle

func ToTitle(r rune) rune

ToTitle将符文映射到标题大小写.

Example

U+0047 'G'

func ToUpper

func ToUpper(r rune) rune

ToUpper将符文映射为大写.

Example

U+0047 'G'

type CaseRange

CaseRange表示用于简单(一个代码点到一个代码点)大小写转换的Unicode代码点范围. 范围从Lo到Hi(包括1和2),固定跨度为1.Delta是要添加到代码点以达到该字符在不同情况下的代码点的数字. 他们可能是负面的. 如果为零,则表示字符在相应的情况下. 有一种特殊情况表示交替的相应的上对和下对对的序列. 出现固定的Delta

{UpperLower, UpperLower, UpperLower}

常量UpperLower具有否则不可能的增量值.

type CaseRange struct {
    Lo    uint32
    Hi    uint32
    Delta d
}

type Range16

Range16代表一系列16位Unicode代码点. 范围从Lo到Hi(含),并具有指定的跨度.

type Range16 struct {
    Lo     uint16
    Hi     uint16
    Stride uint16
}

type Range32

Range32代表一系列Unicode代码点,并且当一个或多个值不适合16位时使用. 范围从Lo到Hi(含),并具有指定的跨度. Lo和Hi必须始终为> = 1 << 16.

type Range32 struct {
    Lo     uint32
    Hi     uint32
    Stride uint32
}

type RangeTable

RangeTable通过列出一组Unicode代码点的范围来定义它. 为了节省空间,在两个切片中列出了范围:切片的16位范围和切片的32位范围. 这两个片必须按排序顺序且不重叠. 同样,R32应该仅包含> = 0x10000(1 << 16)的值.

type RangeTable struct {
    R16         []Range16
    R32         []Range32
    LatinOffset int // number of entries in R16 with Hi <= MaxLatin1; added in Go 1.1
}

type SpecialCase

SpecialCase表示特定于语言的案例映射,例如土耳其语. SpecialCase方法自定义(通过覆盖)标准映射.

type SpecialCase []CaseRange
var AzeriCase SpecialCase = _TurkishCase
var TurkishCase SpecialCase = _TurkishCase

Example

U+0069 'i'
U+0130 'İ'
U+0130 'İ'
U+0069 'i'
U+0130 'İ'
U+0130 'İ'

func (SpecialCase) ToLower

func (special SpecialCase) ToLower(r rune) rune

ToLower将符文映射为小写,并优先使用特殊映射.

func (SpecialCase) ToTitle

func (special SpecialCase) ToTitle(r rune) rune

ToTitle将符文映射到标题大小写,并优先使用特殊映射.

func (SpecialCase) ToUpper

func (special SpecialCase) ToUpper(r rune) rune

ToUpper maps the rune to upper case giving priority to the special mapping.

Bugs

Subdirectories

Name Synopsis
..
utf16 软件包utf16实现UTF-16序列的编码和解码.
utf8 软件包utf8实现了功能和常量以支持以UTF-8编码的文本.

by  ICOPY.SITE