V2EX › wangfeng3769 的所有回复 › 第 41 页 / 共 53 页

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

1 ... 37 38 39 40 41 42 43 44 45 46 ... 53

❮

❯

2014-08-12 17:51:28 +08:00

回复了 wangfeng3769 创建的主题 › Linux › ubuntu 下访问 github 不能访问,其他机器没问题

是被劫持了吗？

2014-08-12 17:50:35 +08:00

回复了 wangfeng3769 创建的主题 › Linux › ubuntu 下访问 github 不能访问,其他机器没问题

ping github 返回ip但是
PING github.com (192.30.252.129) 56(84) bytes of data.
From bogon (192.168.0.77) icmp_seq=1 Destination Host Unreachable
From bogon (192.168.0.77) icmp_seq=2 Destination Host Unreachable
From bogon (192.168.0.77) icmp_seq=3 Destination Host Unreachable

2014-08-12 16:15:39 +08:00

回复了 wangfeng3769 创建的主题 › Linux › ubuntu 下访问 github 不能访问,其他机器没问题

sudo hostname home 解决问题

2014-08-12 15:48:34 +08:00

回复了 wangfeng3769 创建的主题 › Linux › ubuntu 下访问 github 不能访问,其他机器没问题

god@god:~$ nslookup github.com
Server: 127.0.1.1
Address: 127.0.1.1#53

Non-authoritative answer:
Name: github.com
Address: 192.30.252.128

2014-08-12 15:47:49 +08:00

回复了 wangfeng3769 创建的主题 › Linux › ubuntu 下访问 github 不能访问,其他机器没问题

; <<>> DiG 9.9.5-3-Ubuntu <<>> github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55854
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;github.com. IN A

;; ANSWER SECTION:
github.com. 9 IN A 192.30.252.128

;; Query time: 6 msec
;; SERVER: 127.0.1.1#53(127.0.1.1)
;; WHEN: Tue Aug 12 15:48:04 CST 2014
;; MSG SIZE rcvd: 55

2014-08-12 11:23:42 +08:00

回复了 wangfeng3769 创建的主题 › 问与答 › ubuntu14.04 安装后就一个工作区,但是 u 盘里装的系统升级过来还是四个工作区,想问一下怎么回事?

这个问题已经解决了,选择comptiz 就行了

2014-08-11 14:12:18 +08:00

回复了 wangfeng3769 创建的主题 › Python › python 下的多线程选哪个比较好？

@skybr 面对技术牛，我只能说一句，文档没有好好读，没有好好对比，坑还是得自己填上以后多研究吧。

2014-08-10 11:46:03 +08:00

回复了 wangfeng3769 创建的主题 › Python › python 下的多线程选哪个比较好？

@Zuckonit
@yuelang85
是的,以前用gevent 但是机子不好使,只能stackless 跑一阵子吧.

2014-08-10 08:08:07 +08:00

回复了 wangfeng3769 创建的主题 › Python › python 下的多线程选哪个比较好？

很多游戏都是stackless做的，只想说一句stackless 如果那么渣的话，早就完蛋了。

2014-08-09 22:08:46 +08:00

回复了 wangfeng3769 创建的主题 › Python › python 下的多线程选哪个比较好？

@skybr
说实话threading<mutiprocessing（速度）< stackless,说真的stackless的速度还是不错的。

2014-08-09 20:23:47 +08:00

回复了 wangfeng3769 创建的主题 › Python › python 下的多线程选哪个比较好？

@skybr 希望试一试，确实很爽。但是记住千万别默认安装。

2014-08-09 20:08:20 +08:00

回复了 wangfeng3769 创建的主题 › Python › python 下的多线程选哪个比较好？

http：//architects.dzone.com/articles/install-stackless-python 完美安装不影响原来的，但是stackless之后不能import django 等一些第三方库。
@skybr

2014-08-09 19:34:02 +08:00

回复了 wangfeng3769 创建的主题 › Python › python 下的多线程选哪个比较好？

stackless 完美安装。

2014-08-06 11:17:57 +08:00

回复了 yuelang85 创建的主题 › 酷工作 › 超级英雄制作公司，寻找 python 程序员

remote 可以吗！

2014-08-06 08:15:44 +08:00

回复了 ksex 创建的主题 › Linux › Linux 基金会推出免费课程《Linux 导论》

写的不好跟鸟哥的私房菜查多了。

2014-08-05 16:10:11 +08:00

回复了 wangfeng3769 创建的主题 › 问与答 › python 的 multiprocessing

退出的时候有点小问题，不知道哪儿出了问题。

2014-08-05 16:09:36 +08:00

回复了 wangfeng3769 创建的主题 › 问与答 › python 的 multiprocessing

https://github.com/wangfeng3769/weixinspider/blob/master/weixinspider.py

2014-08-05 15:57:43 +08:00

回复了 hustlzp 创建的主题 › 程序员 › 9 月份要开始找工作了，希望能找到 Python 开发，一个月的时间怎么准备下？大家给点意见吧~

好好写代码，看代码

2014-08-05 15:54:05 +08:00

回复了 wangfeng3769 创建的主题 › 问与答 › python 的 multiprocessing

有点流氓但是呢现在就这样了。

2014-08-05 15:53:10 +08:00

回复了 wangfeng3769 创建的主题 › 问与答 › python 的 multiprocessing

#coding:utf-8
import re
import os
import requests as R
from BeautifulSoup import BeautifulSoup as BS
import multiprocessing
import urlparse
import time
opt = "Mozilla/5.0 (Linux; U; Android 4.1.2; zh-cn; GT-I9300 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 MicroMessenger/5.2.380"
headers = {'User-Agent':opt}
a,b = multiprocessing.Pipe()
domain_url = "66365.m.weimob.com"
G_queue_url = []
G_spidered_url = []

def is_existed(file_real_path):
i=1
while 1:
if i==1:
file_real_path_tem = file_real_path+'%s.htm'%""
if os.path.isfile(file_real_path_tem):
file_real_path_tem = file_real_path+'_%s.htm'%str(i)
else:
return file_real_path_tem

i=i+1

def get_web_page(url):
try:
r = R.get(url,headers=headers)
html = r.text
except:
return None

if html:
return html
else:
return None

def btree(O):
if O:
return BS(O,fromEncoding="utf-8")
else:
return None

def download():
url = "http://66365.m.weimob.com/weisite/home?pid=66365&bid=135666&wechatid=oAdrCtzBdLhgpyIOYtBNELkWXJ68&wxref=mp.weixin.qq.com"
print 'download'
checked_list = []

while True:
print 'I am busy'

recv_data = b.recv()
# recv_data = [url]
# print recv_data
if type(recv_data)!=type([]):
if recv_data ==0:
break

for url in recv_data:
print url
if url in checked_list:
# checked_list.append(url)
continue
else:
checked_list.append(url)

if re.search(domain_url,url):
url_list = urlparse.urlparse(url)
domain_folder = url_list[1]
file_path = url_list.path
real_path_r = os.path.sep.join([domain_folder,file_path])
real_path_l = re.split(r'/|\\',real_path_r)
# print real_path_l
if len(real_path_l)==2:
if not real_path_l[-1]:
continue
real_path_f = os.path.sep.join(real_path_l[0:-1])
real_path_r = is_existed(real_path_r)
try:
if not os.path.exists(real_path_f) :
os.makedirs(real_path_f)
try:
f = open(real_path_r,'w')
except :
open(real_path_r).close()
f = open(real_path_r,'w')
else:
try:
f = open(real_path_r,'w')
except :
open(real_path_r).close()
f = open(real_path_r,'w')
r = R.get(url)
content = unicode(r.text).encode(r.encoding,'ignore')
if not content:
continue
f.write(content)
f.close()
except:
pass
else:
pass

def get_links(html):
soup = btree(html)
# print soup
if not soup:
return []
a_links = soup.findAll('a')
if not a_links:
return []
link_list = []
for link in a_links:
# print link
try:
link = link.get('href')
if not link:
continue
except:
# print link
continue

if not re.search(domain_url,link) and not re.search('http', link):
link_list.append("http://"+domain_url+link)
return link_list

def work(url):

global G_spidered_url
global G_queue_url
# print G_spidered_url,G_queue_url
G_spidered_url.append(url)
html = get_web_page(url)
all_links = get_links(html)
send_list=[]
if G_queue_url and all_links:
for slink in all_links:
if slink not in G_queue_url:
send_list .append(slink)
G_queue_url.append(slink)
a.send(send_list)
elif not G_queue_url and all_links :

G_queue_url = all_links
a.send(all_links)

for url in G_queue_url:
if url in G_spidered_url:
continue
else:
G_spidered_url.append(url)
work(url)
a.send(0)

def main(url):
multiprocessing.freeze_support()
lock = multiprocessing.Lock()
w = multiprocessing.Process(target=work, args=(url, ))
nw = multiprocessing.Process(target=download, args=())
w.start()
nw.start()
w.join()
nw.join()

if __name__ == '__main__':
url= "http://66365.m.weimob.com/weisite/home?pid=66365&bid=135666&wechatid=oAdrCtzBdLhgpyIOYtBNELkWXJ68&wxref=mp.weixin.qq.com"

import sys
try:
url = sys.argv[1]
except:
print "You have to input a complete URL"
# main(url)
multiprocessing.freeze_support()
lock = multiprocessing.Lock()
w = multiprocessing.Process(target=work, args=(url, ))
nw = multiprocessing.Process(target=download, args=())
w.start()
nw.start()
w.join()
nw.join()

想说一下在windows下无法运行download ，看一下怎么回事，专门扒人家网站的爬虫。希望copy下来试试。祝大家好运。

1 ... 37 38 39 40 41 42 43 44 45 46 ... 53

❮

❯