1
ospider 2017-11-20 23:10:21 +08:00 via iPad
直接渲染出来页面啊
|
2
qwjhb 2017-11-20 23:14:51 +08:00
|
5
xiaobai987 2017-11-21 09:07:11 +08:00
@yeyu1989 用 py 模拟 post 请求啊
|
6
raighne 2017-11-21 09:26:05 +08:00
在 console 打$('#resultsTable tr')看看是不是你要的结果
|
7
Marsss 2017-11-21 09:27:49 +08:00
1 楼和 2 楼都已经告诉你答案了,而且都是对的。1 楼的意思是直接使用 selenium 等自动化包驱动浏览器访问目标链接,浏览器运行 js 后渲染得到目标数据,具体实现搜索 selenium 相关知识点。
2 楼的意思是分析 http 请求数据,发现目标数据实际是通过 XHR,带参数 POST 访问 https://cn.investing.com/stock-screener/Service/SearchStocks,直接得到数据。具体分析,可 F12 看 network 或者代理抓包。 |
8
qwjhb 2017-11-21 09:42:20 +08:00 1
@yeyu1989 你要的数据并不是在你写的那个网页里 而是载入网页后通过 api 调用获取的 api 的地址就是我上面写的这个 构造个一样的请求就能获得数据了
建议找本爬虫的书看看 别跟视频教程学 |
9
yeyu1989 OP @qwjhb
data={'sp':'country::37|sector::a|industry::a|equityType::a<eq_market_cap;1'} header=func.randHeader() s = requests.post('https://cn.investing.com/stock-screener/Service/SearchStocks',params=data,headers=header) 我这么写的,有什么问题吗?还是没有数据... |
12
qwjhb 2017-11-21 10:17:13 +08:00
@yeyu1989 然后 data 不是你写的这些 f12 打开看看
request 是这些内容 POST /stock-screener/Service/SearchStocks HTTP/1.1 Host: cn.investing.com Connection: keep-alive Content-Length: 909 Accept: application/json, text/javascript, */*; q=0.01 Origin: https://cn.investing.com X-Requested-With: XMLHttpRequest User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 Content-Type: application/x-www-form-urlencoded Referer: https://cn.investing.com/stock-screener/?sp=country::37|sector::a|industry::a|equityType::a|exchange::a%3Ceq_market_cap;1 Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7,ja;q=0.6 Cookie: PHPSESSID=tt0b1qp47ancp40619ftigb2t1; geoC=CN; adBlockerNewUserDomains=1511230139; StickySession=id.70178265937.000.cn.investing.com; adbBLk=6; billboardCounter_6=2; nyxDorf=Y2AxYmYuP2JkNWtgZCkxMjZnYj0%2BJzAzMDRlZw%3D%3D DNT: 1 data 是这些 country[]:37 sector:2,11,7,10,1,4,9,5,8,3,6,12 industry:63,85,82,21,10,86,7,78,36,25,4,28,67,5,71,27,61,90,23,68,34,89,43,50,81,41,56,59,69,9,83,29,52,100,58,95,102,94,60,53,38,87,31,6,16,48,55,74,66,35,65,40,99,42,92,98,39,70,32,45,77,20,54,33,24,72,51,30,64,2,96,8,14,22,26,80,15,37,93,13,46,1,79,44,75,91,49,62,88,12,47,84,57,76,17,97,18,19,3,11,101,73 equityType:ORD,DRC,Preferred,Unit,ClosedEnd,REIT,ELKS,OpenEnd,Right,ParticipationShare,CapitalSecurity,PerpetualCapitalSecurity,GuaranteeCertificate,IGC,Warrant,SeniorNote,Debenture,ETF,ADR,ETC,ETN exchange[]:54 exchange[]:103 pn:1 order[col]:eq_market_cap order[dir]:d |
13
yeyu1989 OP @qwjhb
header={ 'Accept':'application/json, text/javascript, */*; q=0.01', 'Accept-Encoding':'gzip, deflate, br', 'Accept-Language':'zh-CN,zh;q=0.9', 'Connection':'keep-alive', 'Content-Length':'909', 'Content-Type':'application/x-www-form-urlencoded', 'Host':'cn.investing.com', 'Origin':'https://cn.investing.com', 'Referer':'https://cn.investing.com/stock-screener/?sp=country::37|sector::a|industry::a|equityType::a|exchange::a%3Ceq_market_cap;1', 'User-Agent':'Opera/8.0 (Macintosh; PPC Mac OS X; U; en)', 'X-Requested-With':'XMLHttpRequest' } data={ 'country[]':'37', 'sector':'2,11,7,10,1,4,9,5,8,3,6,12', 'industry':'63,85,82,21,10,86,7,78,36,25,4,28,67,5,71,27,61,90,23,68,34,89,43,50,81,41,56,59,69,9,83,29,52,100,58,95,102,94,60,53,38,87,31,6,16,48,55,74,66,35,65,40,99,42,92,98,39,70,32,45,77,20,54,33,24,72,51,30,64,2,96,8,14,22,26,80,15,37,93,13,46,1,79,44,75,91,49,62,88,12,47,84,57,76,17,97,18,19,3,11,101,73', 'equityType':'ORD,DRC,Preferred,Unit,ClosedEnd,REIT,ELKS,OpenEnd,Right,ParticipationShare,CapitalSecurity,PerpetualCapitalSecurity,GuaranteeCertificate,IGC,Warrant,SeniorNote,Debenture,ETF,ADR,ETC,ETN', 'exchange[]':'54', 'exchange[]':'103', 'pn':'1', 'order[col]':'eq_market_cap', 'order[dir]':'d' } session = requests.Session() s = session.post('https://cn.investing.com/stock-screener/Service/SearchStocks',params=data,headers=header) html = etree.HTML(s.text) 我理解应该是这么写的?但还是得不到想要的结果... |
14
qwjhb 2017-11-21 20:31:27 +08:00
|
15
yeyu1989 OP |