求大佬帮助刚入门的萌新自己练习爬虫天气网的信息 - V2EX

首页注册登录

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 2665 天前的主题，其中的信息可能已经有所发展或是发生改变。

只是爬虫简单的日期和天气情况，但是出现乱码情况，一直困扰我，没有得到解决。
#-*- coding:utf-8 -*-
import requests
from lxml import html

r = requests.get("http://www.weather.com.cn/weather/101070601.shtml")
rawdata = html.fromstring(r.text)
in_row = rawdata.xpath('//div[@id="7d"]/ul/li/h1/text() | //div[@id="7d"]/ul/li/p[1]/text()')

for i in in_row:
intro = i.encode('utf-8')
print(intro)

输出结果是这样的：
b'15\xc3\xa6\xc2\x97\xc2\xa5\xc3\xaf\xc2\xbc\xc2\x88\xc3\xa4\xc2\xbb\xc2\x8a\xc3\xa5\xc2\xa4\xc2\xa9\xc3\xaf\xc2\xbc\xc2\x89'
b'\xc3\xa6\xc2\x99\xc2\xb4'
后面还有很多我就不复制了。
--------------------分割线------------------------------
对了我直接 print(r.text) 输出的中文那些都是乱码情况。

4 条回复

1

wq67200976

2017-11-15 19:49:08 +08:00

我之前也碰到过这种问题，不知道是不是 python2 和 3 的区别，2 的话就没问题，你把他输出到文件里打开也是正常

2

hcnhcn012

2017-11-15 21:07:17 +08:00 via iPhone

i.encode('utf-8')

3

Marsss

2017-11-16 09:01:37 +08:00

不要用 r.text，这样 rawdata = html.fromstring(r.content) ，然后下面就不需要 i.encode('utf-8')了，直接打印

4

lance418

OP

2017-11-16 09:38:38 +08:00

@Marsss 谢谢正常输出了感谢大佬指点给大佬递茶 0.0

关于 · 帮助文档 · 博客 · API · FAQ · 实用小工具 · 5611 人在线 最高记录 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 40ms · UTC 06:32 · PVG 14:32 · LAX 22:32 · JFK 01:32
Developed with CodeLauncher
♥ Do have faith in what you're doing.