python编码转换

现在的位置: 首页 > 综合 > 正文

python编码转换

2014年09月05日 ⁄ 综合 ⁄ 共 1299字 ⁄ 字号小中大 ⁄ 评论关闭

参见：http://www.pythonclub.org/python-basic/codec

主要介绍了python的编码机制，unicode, utf-8, utf-16, GBK, GB2312,ISO-8859-1
等编码之间的转换。

常见的编码转换分为以下几种情况：
1.自动识别字符串编码：

#coding:utf8
#chartdet官方下载网站http://pypi.python.org/pypi/chardet

import urllib
import chardet

rawdata = urllib.urlopen('http://www.google.cn/').read()
print chardet.detect(rawdata)

输出：

#confidence是可信度，encoding是编码
{'confidence': 0.99, 'encoding': 'utf-8'}

2.unicode转换为其他编码

#coding:utf8

a = u'中文'
a_gb2312 = a.encode('gb2312')
print a_gb2312

输出：中文

3.其他编码转换为unicode

#coding:utf8

a = u'中文'
a_gb2312 = a.encode('gb2312')
print a_gb2312

#a为gb2312编码，要转为unicode. unicode(a, 'gb2312')或a.decode('gb2312')
print [unicode(a_gb2312,'gb2312')]
print [a_gb2312.decode('gb2312')]

输出：

中文
[u'\u4e2d\u6587']
[u'\u4e2d\u6587']

4.非unicode编码之间的相互转化

#coding:utf8

a = u'中文'
a_gb2312 = a.encode('gb2312')
print a_gb2312

#编码1转换为编码2可以先转为unicode再转为编码2
a_unicode = a_gb2312.decode('gb2312')
print [a_unicode]
a_utf8 = a_unicode.encode('utf8')

#dos不识别utf8编码，直接输出会是乱码
print [a_utf8]

5.判断字符串编码

#coding:utf8

#isinstance(s, str) 用来判断是否为一般字符串 
#isinstance(s, unicode) 用来判断是否为unicode 3
#如果一个字符串已经是unicode了，再执行unicode转换有时会出错(并不都出错) 

def u(s,encoding):
    if isinstance(s,unicode):
        return s
    else:
        return unicode(s,encoding)

6.汉字转化为unicode编码

#coding:utf8

#该方法没看懂，先留下了
name = '中国' 
name = name.decode('utf8')
print name
tmpname = ""

for c in name:
    c = "%%u%04X" % ord(c)
    tmpname += c

print tmpname

输出结果：

中国
%u4E2D%u56FD

【上篇】iOS:从xib文件中加载Cell
【下篇】Linux怎样修改系统时间

作者: sunk

该日志由 sunk 于10年前发表在综合分类下，最后更新于 2014年09月05日.
转载请注明: python编码转换 | 学步园 +复制链接

抱歉!评论已关闭.

学步园

python编码转换

作者: sunk

书签

最新文章New

本站推荐

返回首页