运用flask和爬虫技术制作网页版的爬虫

正文:

参考:

https://www.cnblogs.com/sss4/p/8097653.html   flask的学习
https://blog.csdn.net/qq_41769259/article/details/79407571
https://www.cnblogs.com/lucky-pin/p/7117182.html

上面第一个网址对于flask基础讲的真的很详细,很棒了
1)利用flask的web框架和mvc的概念技术实现网页版的爬虫(基础知识可以参考本人之前的文章)
2)也可以用js实现数据的交互(由于本人接触过python_web,暂时不用它)
1.直接复制网页中的html源码
要进行修改的是<img src={{apath}}>  模板操作,我看了下当我把图片拉过来时,图片地址没有问题,但没有显示出来,猜测处理应该没有进行过处理,只是可能由于是临时文件造成了影响。

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="applicable-device" content="pc">
<meta name="mobile-agent" content="format=html5;url=http://m.uustv.com">
<title>签名设计免费版在线 艺术签名设计</title>
<style type="text/css">
body{font-size:14px;margin:0px;}
* { margin:0; padding:0;}
a:link,a:visited{color:#000000;text-decoration:none;}
a:hover{color:#000000;}
p { margin:10px 0; text-align:left;}
h1 { color:#FF0000; font-size:20px;}
form {color:#FF0000;}
.tijiao { color:#FF0000; font-weight:bold; padding:5px;}
.in1 , select {padding:5px; font-size:18px;border-radius:3px; border:1px #999 solid;}
table , form {margin:10px auto; text-align:center;}
.tu {border-top:1px #ECE4DC solid;border-bottom:1px #ECE4DC solid;text-align:center; background-color:#F6F2EF;}
.title {margin:10px auto; text-align:center;}
.bottom { text-align:center; margin:10px 0; padding-top:10px; border-top:1px #ECE4DC solid;}
.wz {text-indent:2em;letter-spacing:1px;}
.ls a{ padding-bottom:2px; border-bottom:1px #FFC487 solid; margin-left:15px;}
</style>
</head>
<body>
<table width="560" align="center"><tr><td>①、输入姓名</td>
<td>②、选择样式</td></tr></table>
<form name="form1" id="form1" method="post">
<table align="center">
<tr><td>输入你的名字:<input type="word" name="word" id="id" maxlength="50" class="in1" value=""></td><td width="5"></td>
      <td>	  <select name="sizes" onchange="document.form1.submit();" style="display:none;">
  <option VALUE="60" >60像素</OPTION>
        </select>
        样式:
        <select name="fonts" id="fonts" onchange="document.form1.submit();">
    <option value="jfcs.ttf" >个性签</option>
    <option value="qmt.ttf" >连笔签</option>
    <option value="bzcs.ttf" >潇洒签</option>
    <option value="lfc.ttf" >草体签</option>
    <option value="haku.ttf" >合文签</option>
    <option value="zql.ttf" >商务签</option>
    <option value="yqk.ttf" >可爱签</option>
        </select>
          <input f="bnn" data-target="bn2" name="fontcolor" class="bn" type="text" id="bn2" value="#000000" onblur="tijiao();" / style="display:none;">
        </td><td width="5"></td>
<td><input class="tijiao" type="submit" value="马上给我设计" /></td></tr></table>
</form>
<p style="margin:10px auto; text-align:center; padding-top:10px;">
<script async src="http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle"
     style="display:inline-block;width:728px;height:90px"
     data-ad-client="ca-pub-2074102369974939"
     data-ad-slot="2340188630"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</p>
<div class="tu"><img src="1.gif" /></div>
<table width="760" align="center"><tr>
<td valign="top"><div class="title"><h1>签名设计免费版在线 艺术签名设计</h1></div>
<p class="wz">个别字体样式文字不完整,请换一种字体继续转换!增加个性签名图片尺寸,更改文件格式的同时缩小了文件大小,生成速度更快,下载更方便。兼顾使用手机上网的朋友们!</p>
<p class="wz">去掉一些不必要的步骤,软件设计更方便快捷。下载签名:请勿直接调用生成的签名图片地址,本站生成的图片定期清理。若需发布到微博等请先下载再上传,以免图片失效!</p>
<p class="ls"><a href="yw.php" target="_blank">英文签名</a> <a href="ygy.php" target="_blank">叶根友签名</a> <a href="/xjl/" target="_blank">徐静蕾为你设计签名</a></p>
<p class="ls"><a href="/zgf/" target="_blank">中国风毛笔书法艺术签名设计免费版生成器</a></p>
<p class="ls"><a href="/sx/" target="_blank">手写体艺术签名设计在线生成器</a> <a href="bjx.php" target="_blank">百家姓</a> <a href="lk.php" target="_blank">六款签名</a></p>
<p class="ls"><a href="/shufa/" target="_blank">毛泽东邓小平名人书法签名在线设计免费版</a></p>
<p><div class="bdsharebuttonbox"><a href="#" class="bds_more" data-cmd="more"></a><a href="#" class="bds_qzone" data-cmd="qzone" title="分享到QQ空间"></a><a href="#" class="bds_tsina" data-cmd="tsina" title="分享到新浪微博"></a><a href="#" class="bds_weixin" data-cmd="weixin" title="分享到微信"></a><a href="#" class="bds_douban" data-cmd="douban" title="分享到豆瓣网"></a><a href="#" class="bds_huaban" data-cmd="huaban" title="分享到花瓣"></a><a href="#" class="bds_duitang" data-cmd="duitang" title="分享到堆糖"></a></div>
<script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"24"},"share":{}};with(document)0[(getElementsByTagName('head')[0]||body).appendChild(createElement('script')).src='http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)];</script></p>
</td>
<td width="336"><script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- 336a -->
<ins class="adsbygoogle"
     style="display:inline-block;width:336px;height:280px"
     data-ad-client="ca-pub-2074102369974939"
     data-ad-slot="4822839838"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script></td>
</tr>
</table>
<div style="text-align:center;"><img src="weixin.gif"></div>
<div class="bottom">Copyright 2012 - 2015 www.uustv.com <a href="http://www.miibeian.gov.cn" target="_blank">豫ICP备11021208号</a></div>
</body>
</html>

 
2.

#coding=utf-8
from flask import Flask,render_template,request
from requests import post
from bs4 import BeautifulSoup
from lxml import etree
app=Flask(__name__,template_folder='templates')
@app.route('/',methods=['GET','POST'])
def index():
	if request.method=='GET':
		return render_template('index.html')
	elif request.method=='POST':
		word=request.form.get('word')
		sizes=request.form.get('size')
		fonts=request.form.get('fonts')
		fontcolor=request.form.get('fontcolor')
		data={
			'word':word,
			'sizes':sizes,
			'fonts':fonts,
			'fontcolor':fontcolor,
		}
                #dom=etree.HTML(html)
                #img_url=dom.xpath('/html/body/div[1]/img/@src')[0]
                #apath='http://www.uustc.com/'+img_url
                #print apath
		html=post("http://www.uustv.com/",data=data).text
		dom=BeautifulSoup(html,'html5lib')
		img_url=dom.find_all('div','tu')[0].img['src']
		apath='http://www.uustc.com/'+img_url
		return render_template('index.html',apath=apath)
if __name__=='__main__':
	app.debug=True
	app.run()

 
 
2018.8.8

发表评论

电子邮件地址不会被公开。 必填项已用*标注