Unicode警告:Unicode相等比较无法将两个参数都转换为Unicode
问题内容:
我知道很多人以前都遇到过此错误,但是我找不到解决问题的方法。
我有一个要标准化的URL:
url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5"
scheme, host_port, path, query, fragment = urlsplit(url)
path = urllib.unquote(path)
path = urllib.quote(path,safe="%/")
这给出了一条错误消息:
/usr/lib64/python2.6/urllib.py:1236: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
res = map(safe_map.__getitem__, s)
Traceback (most recent call last):
File "url_normalization.py", line 246, in <module>
logging.info(get_canonical_url(url))
File "url_normalization.py", line 102, in get_canonical_url
path = urllib.quote(path,safe="%/")
File "/usr/lib64/python2.6/urllib.py", line 1236, in quote
res = map(safe_map.__getitem__, s)
KeyError: u'\xc3'
我试图从URL字符串中删除unicode指示器“ u”,但没有收到错误消息。但是,由于我直接从数据库中读取了unicode,因此如何自动摆脱它。
问题答案:
urllib.quote()
无法正确解析Unicode。为了解决这个问题,您可以.encode()
在读取url时(或从数据库读取的变量)在url上调用该方法。所以跑url = url.encode('utf-8')
。有了这个你得到:
import urllib
import urlparse
from urlparse import urlsplit
url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5"
url = url.encode('utf-8')
scheme, host_port, path, query, fragment = urlsplit(url)
path = urllib.unquote(path)
path = urllib.quote(path,safe="%/")
然后该path
变量的输出将是:
>>> path
'/Dienste/Fachbeitr%C3%A4ge.aspx'
这样行吗?