现在的位置: 首页 > 综合 > 正文

unicode码的分布情况

2017年12月06日 ⁄ 综合 ⁄ 共 2805字 ⁄ 字号 评论关闭

0000..007F; Basic Latin  
  0080..00FF; Latin-1 Supplement  
  0100..017F; Latin Extended-A  
  0180..024F; Latin Extended-B  
  0250..02AF; IPA Extensions  
  02B0..02FF; Spacing Modifier Letters  
  0300..036F; Combining Diacritical Marks  
  0370..03FF; Greek  
  0400..04FF; Cyrillic  
  0530..058F; Armenian  
  0590..05FF; Hebrew  
  0600..06FF; Arabic  
  0700..074F; Syriac  
  0780..07BF; Thaana  
  0900..097F; Devanagari  
  0980..09FF; Bengali  
  0A00..0A7F; Gurmukhi  
  0A80..0AFF; Gujarati  
  0B00..0B7F; Oriya  
  0B80..0BFF; Tamil  
  0C00..0C7F; Telugu  
  0C80..0CFF; Kannada  
  0D00..0D7F; Malayalam  
  0D80..0DFF; Sinhala  
  0E00..0E7F; Thai  
  0E80..0EFF; Lao  
  0F00..0FFF; Tibetan  
  1000..109F; Myanmar  
  10A0..10FF; Georgian  
  1100..11FF; Hangul Jamo  
  1200..137F; Ethiopic  
  13A0..13FF; Cherokee  
  1400..167F; Unified Canadian Aboriginal Syllabics  
  1680..169F; Ogham  
  16A0..16FF; Runic  
  1780..17FF; Khmer  
  1800..18AF; Mongolian  
  1E00..1EFF; Latin Extended Additional  
  1F00..1FFF; Greek Extended  
  2000..206F; General Punctuation  
  2070..209F; Superscripts and Subscripts  
  20A0..20CF; Currency Symbols  
  20D0..20FF; Combining Marks for Symbols  
  2100..214F; Letterlike Symbols  
  2150..218F; Number Forms  
  2190..21FF; Arrows  
  2200..22FF; Mathematical Operators  
  2300..23FF; Miscellaneous Technical  
  2400..243F; Control Pictures  
  2440..245F; Optical Character Recognition  
  2460..24FF; Enclosed Alphanumerics  
  2500..257F; Box Drawing  
  2580..259F; Block Elements  
  25A0..25FF; Geometric Shapes  
  2600..26FF; Miscellaneous Symbols  
  2700..27BF; Dingbats  
  2800..28FF; Braille Patterns  
  2E80..2EFF; CJK Radicals Supplement  
  2F00..2FDF; Kangxi Radicals  
  2FF0..2FFF; Ideographic Description Characters  
  3000..303F; CJK Symbols and Punctuation  
  3040..309F; Hiragana  
  30A0..30FF; Katakana  
  3100..312F; Bopomofo  
  3130..318F; Hangul Compatibility Jamo  
  3190..319F; Kanbun  
  31A0..31BF; Bopomofo Extended  
  3200..32FF; Enclosed CJK Letters and Months  
  3300..33FF; CJK Compatibility  
  3400..4DB5; CJK Unified Ideographs Extension A  
  4E00..9FFF; CJK Unified Ideographs  
  A000..A48F; Yi Syllables  
  A490..A4CF; Yi Radicals  
  AC00..D7A3; Hangul Syllables  
  D800..DB7F; High Surrogates  
  DB80..DBFF; High Private Use Surrogates  
  DC00..DFFF; Low Surrogates  
  E000..F8FF; Private Use  
  F900..FAFF; CJK Compatibility Ideographs  
  FB00..FB4F; Alphabetic Presentation Forms  
  FB50..FDFF; Arabic Presentation Forms-A  
  FE20..FE2F; Combining Half Marks  
  FE30..FE4F; CJK Compatibility Forms  
  FE50..FE6F; Small Form Variants  
  FE70..FEFE; Arabic Presentation Forms-B  
  FEFF..FEFF; Specials  
  FF00..FFEF; Halfwidth and Fullwidth Forms  
  FFF0..FFFD; Specials  
  10300..1032F; Old Italic  
  10330..1034F; Gothic  
  10400..1044F; Deseret  
  1D000..1D0FF; Byzantine Musical Symbols  
  1D100..1D1FF; Musical Symbols  
  1D400..1D7FF; Mathematical Alphanumeric Symbols  
  20000..2A6D6; CJK Unified Ideographs Extension B  
  2F800..2FA1F; CJK Compatibility Ideographs Supplement  
  E0000..E007F; Tags  
  F0000..FFFFD; Private Use  
  100000..10FFFD; Private Use  

jdk 对原有 扩展 u4e00~u9fff

 

自适应 String regex = "[//p{InCJK Unified Ideographs}&&//P{Cn}]]";

抱歉!评论已关闭.