V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
wei6666
V2EX  ›  问与答

抓取知网遇到的一个 bug,改了很多次的 xpath 匹配方法了,还是报错,求支援

  •  
  •   wei6666 · 2018-06-07 09:37:42 +08:00 · 954 次点击
    这是一个创建于 2360 天前的主题,其中的信息可能已经有所发展或是发生改变。
    Traceback (most recent call last):
    File "China_hownet_journal_end.py", line 296, in <module>
    china_hownet.run()
    File "China_hownet_journal_end.py", line 281, in run
    url_list = self.parse_content_html(html3str)
    File "China_hownet_journal_end.py", line 212, in parse_content_html
    html = etree.HTML(html3str)
    File "lxml.etree.pyx", line 2945, in lxml.etree.HTML (src/lxml/lxml.etree.c:62546)
    File "parser.pxi", line 1617, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:93194)
    File "parser.pxi", line 1488, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:91938)
    File "parser.pxi", line 969, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:88328)
    File "parser.pxi", line 577, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:84385)
    File "parser.pxi", line 676, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:85488)
    File "parser.pxi", line 625, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:84945)
    lxml.etree.XMLSyntaxError: line 1046: htmlParseEntityRef: expecting ';'
    目前尚无回复
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2867 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 27ms · UTC 11:17 · PVG 19:17 · LAX 03:17 · JFK 06:17
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.