links = sel.xpath('//i[contains(@title,"置顶")]/following-sibling::a/@href').extract()
报错:ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
1
revotu 2017-06-28 19:18:37 +08:00
参见文章:[解决 Scrapy 中 xpath 用到中文报错问题][1]
## 解决方法 ## 方法一:将整个 xpath 语句转成 Unicode ```Python links = sel.xpath(u'//i[contains(@title,"置顶")]/following-sibling::a/@href').extract() ``` 方法二:xpath 语句用已转成 Unicode 的 title 变量 ```Python title = u"置顶" links = sel.xpath('//i[contains(@title,"%s")]/following-sibling::a/@href' %(title)).extract() ``` 方法三:直接用 xpath 中变量语法(`$`符号加变量名)`$title`, 传参 title 即可 ```Python links = sel.xpath('//i[contains(@title,$title)]/following-sibling::a/@href', title="置顶").extract() ``` [1]: http://www.revotu.com/solve-unicode-erros-using-xpath-in-scrapy.html |
2
bsns 2017-06-28 20:36:27 +08:00
我一般是加 u
|
4
NaVient 2017-06-29 09:15:09 +08:00
独立爬虫项目,请用 py3
|