博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
(C#) Encoding.
阅读量:4613 次
发布时间:2019-06-09

本文共 8981 字,大约阅读时间需要 29 分钟。

Encoding.GetEncoding(936)).Contains(@"这是简体中文")

在.NET的世界里,string永远是unicode,所以通过读取TXT文件的每行,然后来判断其内容时,需要进行解码。

foreach (string line in File.ReadAllLines(“D:\\test.txt"))

{
  Console.writeline (" {0}" + line);
}

 

具体编码参考MSDN. Encoding类

 

 

As defined by Microsoft, a locale is either a language or a language in combination with a country. See.

CLICK one of the Column Titles to sort the table by that item.

Telugu 1098 044a 0 IND
Gujarati 1095 0447 0 IND
Punjabi 1094 0446 0 IND
Sanskrit 1103 044f 0 IND
Konkani 1111 0457 0 IND
Syriac 1114 045a 0 SYR
Kannada 1099 044b 0 IND
Marathi 1102 044e 0 IND
Divehi 1125 0465 0 MDV
Armenian 1067 042b 0 ARM
Hindi 1081 0439 0 IND
Georgian 1079 0437 0 GEO
Tamil 1097 0449 0 IND
Thai 1054 041e 874 THA
Japanese 1041 0411 932 JPN
Chinese (PRC) 2052 0804 936 CHN
Chinese (Singapore) 4100 1004 936 SGP
Korean 1042 0412 949 KOR
Chinese (Macau S.A.R.) 5124 1404 950 MCO
Chinese (Hong Kong S.A.R.) 3076 0c04 950 HKG
Chinese (Taiwan) 1028 0404 950 TWN
Romanian 1048 0418 ROM
Slovenian 1060 0424 SVN
Hungarian 1038 040e HUN
Slovak 1051 041b SVK
Polish 1045 0415 POL
Albanian 1052 041c ALB
Serbian (Latin) 2074 081a SPB
Croatian 1050 041a HRV
Czech 1029 0405 CZE
Mongolian (Cyrillic) 1104 0450 MNG
FYRO Macedonian 1071 042f MKD
Uzbek (Cyrillic) 2115 0843 UZB
Ukrainian 1058 0422 UKR
Azeri (Cyrillic) 2092 082c AZE
Tatar 1092 0444 RUS
Kazakh 1087 043f KAZ
Belarusian 1059 0423 BLR
Kyrgyz (Cyrillic) 1088 0440 KGZ
Bulgarian 1026 0402 BGR
Serbian (Cyrillic) 3098 0c1a SPB
Russian 1049 0419 RUS
English (Jamaica) 8201 2009 JAM
French (Canada) 3084 0c0c CAN
French (France) 1036 040c FRA
French (Luxembourg) 5132 140c LUX
English (New Zealand) 5129 1409 NZL
English (Ireland) 6153 1809 IRL
Dutch (Netherlands) 1043 0413 NLD
English (Caribbean) 9225 2409 CAR
French (Switzerland) 4108 100c CHE
English (Canada) 4105 1009 CAN
Galician 1110 0456 ESP
English (Belize) 10249 2809 BLZ
German (Austria) 3079 0c07 AUT
French (Monaco) 6156 180c MCO
English (Zimbabwe) 12297 3009 ZWE
Basque 1069 042d ESP
Dutch (Belgium) 2067 0813 BEL
French (Belgium) 2060 080c BEL
Finnish 1035 040b FIN
Faroese 1080 0438 FRO
German (Germany) 1031 0407 DEU
English (Australia) 3081 0c09 AUS
English (United States) 1033 0409 USA
English (United Kingdom) 2057 0809 GBR
Catalan 1027 0403 ESP
English (Trinidad) 11273 2c09 TTO
English (South Africa) 7177 1c09 ZAF
Danish 1030 0406 DNK
English (Philippines) 13321 3409 PHL
Spanish (Paraguay) 15370 3c0a PRY
Spanish (Colombia) 9226 240a COL
Spanish (Costa Rica) 5130 140a CRI
Spanish (Dominican Republic) 7178 1c0a DOM
Spanish (Ecuador) 12298 300a ECU
Spanish (El Salvador) 17418 440a SLV
Spanish (Guatemala) 4106 100a GTM
Spanish (Honduras) 18442 480a HND
Spanish (International Sort) 3082 0c0a ESP
Spanish (Chile) 13322 340a CHL
Spanish (Nicaragua) 19466 4c0a NIC
Spanish (Mexico) 2058 080a MEX
Spanish (Peru) 10250 280a PER
Spanish (Puerto Rico) 20490 500a PRI
Spanish (Traditional Sort) 1034 040a ESP
Spanish (Uruguay) 14346 380a URY
Spanish (Venezuela) 8202 200a VEN
Swahili 1089 0441 KEN
Swedish 1053 041d SWE
Swedish (Finland) 2077 081d FIN
German (Liechtenstein) 5127 1407 LIE
Afrikaans 1078 0436 ZAF
Spanish (Panama) 6154 180a PAN
German (Luxembourg) 4103 1007 LUX
Spanish (Bolivia) 16394 400a BOL
German (Switzerland) 2055 0807 CHE
Icelandic 1039 040f ISL
Indonesian 1057 0421 IDN
Italian (Italy) 1040 0410 ITA
Italian (Switzerland) 2064 0810 CHE
Norwegian (Nynorsk) 2068 0814 NOR
Spanish (Argentina) 11274 2c0a ARG
Portuguese (Brazil) 1046 0416 BRA
Norwegian (Bokmal) 1044 0414 NOR
Malay (Malaysia) 1086 043e MYS
Malay (Brunei Darussalam) 2110 083e BRN
Portuguese (Portugal) 2070 0816 PRT
Greek 1032 0408 GRC
Uzbek (Latin) 1091 0443 UZB
Azeri (Latin) 1068 042c AZE
Turkish 1055 041f TUR
Hebrew 1037 040d ISR
Arabic (Algeria) 5121 1401 DZA
Arabic (Bahrain) 15361 3c01 BHR
Arabic (Yemen) 9217 2401 YEM
Arabic (Egypt) 3073 0c01 EGY
Arabic (Iraq) 2049 0801 IRQ
Arabic (Jordan) 11265 2c01 JOR
Arabic (Kuwait) 13313 3401 KWT
Arabic (Lebanon) 12289 3001 LBN
Arabic (Libya) 4097 1001 LBY
Arabic (Morocco) 6145 1801 MAR
Arabic (Oman) 8193 2001 OMN
Arabic (Qatar) 16385 4001 QAT
Arabic (Saudi Arabia) 1025 0401 SAU
Arabic (Syria) 10241 2801 SYR
Arabic (U.A.E.) 14337 3801 ARE
Farsi 1065 0429 IRN
Urdu 1056 0420 PAK
Arabic (Tunisia) 7169 1c01 TUN
Estonian 1061 0425 EST
Latvian 1062 0426 LVA
Lithuanian 1063 0427 LTU
Vietnamese 1066 042a VNM

This table was generated from information at 

Definitions

Locale: A collection of language-related, user-preference information represented as a list of values. ()

Locale ID (LCID): A 32-bit value defined by Microsoft Windows that consists of a language ID, sort ID, and reserved bits that identify a particular language.

Codepage: "An ordered set of characters in which a numeric index (code point values) is associated with each character. The first 128 characters of each codepage are functionally the same and include all characters needed to type English text. The upper 128 characters of OEM and ANSI codepages contain characters used in a language or group of languages (Taken from Related resources below)".

 

Character Encoding Recommendation for Language

IANA encoding Java Canonical Name Language Comment
UTF-8 UTF8 8bit Universal character set  
UTF-16 UTF-16 16bit Universal character set  
US-ASCII ASCII American Standard Code for Information Interchange  
windows-1250 Cp1250 Eastern European (Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovenian, Serbian) Windows encoding
windows-1251 Cp1251 Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian Windows encoding
windows-1252 Cp1252 Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) Windows encoding
windows-1253 Cp1253 Greek Windows encoding
windows-1254 Cp1254 Turkish Windows encoding
windows-1255 Cp1255 Hebrew Windows encoding
windows-1256 Cp1256 Arabic Windows encoding
windows-1257 Cp1257 Baltic Windows encoding
windows-1258 Cp1258 Vietnamese Windows encoding
ISO-8859-1 ISO8859_1 Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) Euro Symbol is not supported
ISO-8859-2 ISO8859_2 Eastern European (Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovenian, Serbian) Euro Symbol is not supported
ISO-8859-3 ISO8859_3 Southeastern European (Afrikaans, Catalan, Dutch, English, Esperanto, German, Italian, Maltese, Spanish, Turkish)  
ISO-8859-4 ISO8859_4 Northern European (Danish, English, Estonian, Finnish, German, Greenlandic, Latin, Latvian, Lithuanian, Norwegian, Sテ。mi, Slovenian, Swedish)  
ISO-8859-5 ISO8859_5 Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian)  
ISO-8859-6 ISO8859_6 Arabic  
ISO-8859-7 ISO8859_7 Greek  
ISO-8859-8 ISO8859_8 Hebrew  
ISO-8859-9 ISO8859_9 Western European (Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch, English, Finnish, French, Frisian, Galician, German, Greenlandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish, Turkish)  
ISO-8859-13 ISO8859_13 Baltic Rim (English, Estonian, Finnish, Latin, Latvian, Norwegian)  
ISO-8859-15 ISO8859_15 Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) ISO-8859-1 with Euro symbol support
windows-31j MS932 Japanese Windows encoding
EUC-JP EUC_JP Japanese EUC encoding used on Unix platform
Shift_JIS SJIS Japanese Shift JIS, does not support MS external characters
ISO-2022-JP ISO2022JP Japanese JIS X 0201, 0208, in ISO 2022 form, this is used for e-mail
x-mswin-936 MS936 Simplified Chinese Windows encoding, This is not registered in IANA.
GB18030 GB18030 Simplified Chinese PRC standard
x-EUC-CN EUC_CN Simplified Chinese GB2312, EUC encoding
GBK GBK Simplified Chinese  
x-windows-949 MS949 Korean Windows encoding, this is not registered in IANA.
EUC-KR EUC_KR Korean KS C 5601, EUC encoding
x-windows-950 MS950 Traditional Chinese Windows encoding, this is not registered in IANA
x-MS950-HKSCS MS950_HKSCS Traditional Chinese with Hong Kong extensions Windows encoding, this is not registered in IANA
x-EUC-TW EUC_TW Traditional Chinese CNS11643 (Plane 1-3), EUC encoding, this is not registered in IANA
Big5 Big5 Traditional Chinese  
Big5-HKSCS Big5_HKSCS Traditional Chinese Big5 with Hong Kong extensions
TIS-620 TIS620 Thai

转载于:https://www.cnblogs.com/fdyang/archive/2013/04/20/3032171.html

你可能感兴趣的文章
Java 8 Lambda 表达式
查看>>
BZOJ-3289 Mato的文件管理
查看>>
自旋锁和互斥锁的区别
查看>>
react混合开发APP,资源分享
查看>>
入门篇
查看>>
【洛谷1829】 [国家集训队] Crash的数字表格(重拾莫比乌斯反演)
查看>>
[转]免费api大全
查看>>
git 认证问题之一的解决 : http ssh 互换
查看>>
sql where 1=1作用
查看>>
搜索算法----二分查找
查看>>
Python语言编程
查看>>
[poj 1469]Courses
查看>>
Xcode8出现AQDefaultDevice(173):Skipping input stram 0 0 0x0
查看>>
数据结构(二十四)二叉树的链式存储结构(二叉链表)
查看>>
Material Design Lite,简洁惊艳的前端工具箱 之 布局组件。
查看>>
Django----------路由控制
查看>>
将数字转化为字符串的快捷方式
查看>>
java23种设计模式
查看>>
优化算法与特征缩放
查看>>
NOIP模板复习(4)区间操作之莫队算法,树状数组,线段树
查看>>