Scrapy 中 xpath 用到中文报错

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 2794 天前的主题，其中的信息可能已经有所发展或是发生改变。

问题描述

links = sel.xpath('//i[contains(@title,"置顶")]/following-sibling::a/@href').extract()

报错：ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

xpath

报错

links

unicode

4 条回复

revotu

2017-06-28 19:18:37 +08:00

参见文章：[解决 Scrapy 中 xpath 用到中文报错问题][1]

## 解决方法 ##
方法一：将整个 xpath 语句转成 Unicode
```Python
links = sel.xpath(u'//i[contains(@title,"置顶")]/following-sibling::a/@href').extract()
```
方法二：xpath 语句用已转成 Unicode 的 title 变量
```Python
title = u"置顶"
links = sel.xpath('//i[contains(@title,"%s")]/following-sibling::a/@href' %(title)).extract()
```
方法三：直接用 xpath 中变量语法(`$`符号加变量名)`$title`, 传参 title 即可
```Python
links = sel.xpath('//i[contains(@title,$title)]/following-sibling::a/@href', title="置顶").extract()
```

[1]: http://www.revotu.com/solve-unicode-erros-using-xpath-in-scrapy.html

bsns

2017-06-28 20:36:27 +08:00

我一般是加 u

mingyun

2017-06-28 23:58:04 +08:00

@revotu nice

NaVient

2017-06-29 09:15:09 +08:00

独立爬虫项目，请用 py3