一区二区三区日韩精品-日韩经典一区二区三区-五月激情综合丁香婷婷-欧美精品中文字幕专区

分享

ASPN : Python Cookbook : Visualize unicode strings

 accesine 2005-12-01

Source: Text Source

text=u"""Europython 2005
G\u00f6teborg, Sweden
\u8463\u5049\u696d
Hotel rates 100\N{euro sign}
"""

import codecs 

def printu(ustr):
    print ustr.encode(‘raw_unicode_escape‘)
    
def saveu(ustr, filename=‘output.txt‘):
    file(filename,‘wb‘).write(codecs.BOM_UTF8 + ustr.encode(‘utf8‘))

Discussion:

Someday all software, including the console and text editors, would fully support unicode and display any languages effortlessly. Until then we will have to settle with console that works with 8 bit characters only. Here I will show a few tricks to help displaying unicode in Python.

First of all I have defined a variable ‘text‘ above as a sample text. It is an unicode string contains characters in several languages. In Python the ‘u‘ or ‘U‘ prefix denote an Unicode string. Unicode characters outside of ASCII can be entered using the ‘\uXXXX‘ escape sequence or the ‘\N{name}‘ notation by the unicode character name.

If we just try ‘print text‘, it will run into the dreaded UnicodeEncodeError. Since the console in general support only ASCII characters, Python automatically transform unicode strings into ASCII before printing. Any character that falls outside of the ASCII range, like the \u8463, would cause an exception.

One simple way to see at least some result is to use the the ‘replace‘ as the error handling method as oppose to the default ‘strict‘ in encoding. For example,

>>> print text.encode(‘a(chǎn)scii‘,‘replace‘)
Europython 2005
G?teborg, Sweden

Hotel rates 100?

The characters that cannot be represented in ASCII are turned into ‘?‘. The result is a corrupted string. But I still preferred this to not showing anything at all. Just replacing non-ASCII characters into ‘?‘ is a quick and dirty trick. But sometimes you really need to know what the characters are. The printu() method uses a little known internal encoding scheme ‘raw_unicode_encoding‘ to render the string:

>>> printu(text)
Europython 2005
G鰐eborg, Sweden
\u8463\u5049\u696d
Hotel rates 100\u20ac

Characters that cannot be displayed in the console are show as \u escaped sequence. So you can verify the euro sign U+20AC is correctly represented. Also the text can be easily cut and paste to form a string literal to reconstruct the string.

To actually see the sample rendered we need to find some software that support displaying unicode. The good old vi will not do. I highly recommend a Windows shareware EmEditor (http://www./). It is by far the best in handling various character encodings and fonts. Otherwise web browsers are also very good in rendering unicode text. First use saveu() to dump the string into a file:

>>> saveu(text)

Next open the file ‘output.txt‘ with you browser. The characters should show there. If you do not have time to execute the examples, I have posted a copy of the output at http:///2005/sample_utf8.txt. saveu() output the file using a common utf-8 encoding. The codecs.BOM_UTF8 inserted is a 3 byte magic number that denote the file as a unicode text file encoded using utf-8. The BOM is optional but in this case it helps the browser to detect the encoding correctly.

    本站是提供個人知識管理的網(wǎng)絡(luò)存儲空間,所有內(nèi)容均由用戶發(fā)布,不代表本站觀點。請注意甄別內(nèi)容中的聯(lián)系方式、誘導購買等信息,謹防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點擊一鍵舉報。
    轉(zhuǎn)藏 分享 獻花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多

    婷婷色香五月综合激激情| 亚洲欧美日本视频一区二区| 熟女乱一区二区三区丝袜| 粉嫩内射av一区二区| 91精品国产综合久久精品| 女同伦理国产精品久久久| 视频一区二区 国产精品| 99久久国产综合精品二区| 日本男人女人干逼视频| 国产一级一片内射视频在线| 福利在线午夜绝顶三级| 亚洲日本加勒比在线播放| 少妇肥臀一区二区三区| 国内自拍偷拍福利视频| av免费视屏在线观看| 欧美日本精品视频在线观看| 国产精品免费自拍视频| 国产免费无遮挡精品视频| 精品国产亚洲免费91| 91爽人人爽人人插人人爽| 欧美一级日韩中文字幕| 黄色片一区二区在线观看| 99福利一区二区视频| 欧美日韩国产福利在线观看| 国产精品免费不卡视频| 免费精品一区二区三区| 精品一区二区三区中文字幕| 久久精品a毛片看国产成人| 亚洲精品一区二区三区日韩| 国产精品一区二区成人在线| 国产传媒精品视频一区| 日韩女优精品一区二区三区| 欧美精品一区二区三区白虎| 国产欧美日韩一级小黄片| 日韩国产亚洲一区二区三区| 日本最新不卡免费一区二区| 国产麻豆成人精品区在线观看| 激情国产白嫩美女在线观看| 在线观看视频日韩精品| 日韩精品中文字幕在线视频| 国产又粗又长又大的视频|