1
apoclast 2017-08-31 00:50:30 +08:00
没说要 html encode 吗, 那这样吧
``` ';'.join(['&#%04x' % ord(c) for c in "你好啊".decode("utf8")]) ``` |
2
apoclast 2017-08-31 00:51:51 +08:00
仔细一看少了个 x 呀
';'.join(['&#x%04x' % ord(c) for c in "你好啊".decode("utf8")]) |
3
he458361789 OP @apoclast O(∩_∩)O 谢谢大神
|
4
imn1 2017-08-31 07:03:17 +08:00 1
>>> import html.parser
>>> h = html.parser.HTMLParser() >>> s = h.unescape('© 2010') >>> s u'\xa9 2010' >>> print s © 2010 >>> s = h.unescape('© 2010') >>> s u'\xa9 2010' >>> '袈'.encode("unicode-escape") b'\\u8888' >>> chr(int('8888', 16)) '袈' >>> h.unescape('♥') '♥' >>> h.unescape('♥') '♥' >>> h.unescape('♥') '♥' >>> '♥'.encode("unicode-escape") b'\\u2665' >>> chr(int('2665', 16)) '♥' >>> import html.entities as h >>> h.name2codepoint['hearts'] 9829 >>> a='汉字먀니'.encode('utf-8') >>> b=re.findall(b'\xe4[\xb8-\xff][\x00-\xff]|[\xe5-\xe8][\x00-\xff][\x00-\xff]|\xe9[\x00-\xbe][\x00-\xff]', a) >>> b [b'\xe6\xb1\x89', b'\xe5\xad\x97'] |
5
Sanko 2017-08-31 07:39:20 +08:00 via Android 1
|
6
yucongo 2017-09-04 19:16:33 +08:00 1
import html; html.unescape('你好你好') # '你好你好'
'你好你好'.encode("ascii", errors='xmlcharrefreplace').decode("ascii") # '你好你好' # 10 进制, 上面的 你好你好 是 16 进制 html.unescape('你好你好') # '你好你好' |