scrapy 爬取阳光政务出现 Error,但数据出来了,求怎么解决这俩报错,错误如下: [scrapy.robotstxt] WARNING: Failure while parsing robots.txt. File either contains garbage or is in an encoding other than UTF-8, treating it as an empty file. Traceback (most recent call last): File "/home/python/.virtualenvs/webspider/lib/python3.5/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/home/python/.virtualenvs/webspider/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request defer.returnValue((yield download_func(request=request, spider=spider))) File "/home/python/.virtualenvs/webspider/lib/python3.5/site-packages/twisted/internet/defer.py", line 1362, in returnValue raise _DefGen_Return(val) twisted.internet.defer._DefGen_Return: <200 http://www.sun0769.com/error/404.htm>
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/python/.virtualenvs/webspider/lib/python3.5/site-packages/scrapy/robotstxt.py", line 15, in decode_robotstxt robotstxt_body = robotstxt_body.decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 327: invalid start byte {'content': '东莞南城周溪东径北街 6 号天台严重违建,现在还出租了,没有跟进后续情况', 'content_img': [], 'href': 'http://wz.sun0769.com/html/question/201911/436799.shtml', 'publish_date': '2019-11-25 11:58:44', 'title': '东莞南城周溪东径北街 6 号天台严重违建现在还出租了,相关部门没有跟进后续情况'} 最下面是数据
1
zdnyp 2019-11-25 14:04:09 +08:00
Failure while parsing robots.txt. File either contains garbage or is in an encoding other than UTF-8, treating it as an empty file.
可以在 settings 里把 robots 改为 Flase |
2
yifengs OP 谢谢,错误不见了,是我 scrapy 没安装好吗,为啥 robots.txt 会解析失败呢
|
3
yifengs OP 哦哦看到了 robots 协议上不允许,谢谢哈
|