40天训练-第1天-123-信息处理篇-子域名爆破、版本识别、cms识别、目录爆破、url爬取

  • by

本文比较硬核,解释有点多,内容有点庞杂:

下载链接:。。(其中有打包相应txt文件与字典)

这些个脚本个人也是很早就写好了,都是可以单独使用的,在此进行合并简化,归纳流程,其target即为多个目标,进行初步的信息提取处理

一个脚本一个脚本的运行(个人很早以前想一起解决所有问题,即所谓的一键over,后来发现其中的文件读写锁、数据库操纵、去重、效率、卡壳等都存在极大问题,还有各种奇葩的符号和格式问题,还没有一个脚本一个脚本的运行速度效率好,当然这里存在问题的起因是因为要对成千上万个目标进行初步操作,如果是单个目标,这些都不是什么问题)

一直也很想把脚本统一起来,可确实太繁琐了点,极容易造成华而不实的感觉

这些脚本更多是文件的操作,5000以内的目标可以如此;思路在这,等完成这阶段任务,过段时间再完成数据库操作的脚本

准备思路(本文只做1-4):

1.打开target目录,下有各大目录的子域名目录,下有各子域名的目录
2.子域名的目录暂时有子域名的url链接.txt、。。。。等txt文件


0.收集目标网址:
补天
漏洞盒子

一.子域名
这里将目标放入../store/dict/domains.txt中,实验内容如:
www.qq.com
www.baidu.com
1.1.收集众多目标的子域名.py
存储进../store/deal/domain_result.txt,得到内容如下:
v.baidu.com
ss.qq.com
u.qq.com
v5.qq.com


二.处理收集到的子域名

2.0.whatweb命令处理
2.1.利用whatweb命令处理收集到的子域名
whatweb -i  domain_result.txt   存储进whatweb_urls.txt
whatweb -i domain_result.txt | sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" | tee whatweb_urls.txt
存储进../store/deal/whatweb_urls.txt,得到内容如下:
https://aq.qq.com/ [302 Found] Country[CHINA][CN], HTTPServer[nginx], IP[14.18.180.36], RedirectLocation[https://aq.qq.com/cn2/index], Title[302 Found], nginx
http://www1.baidu.com/baidu.html?from=noscript [200 OK] Apache, Cookies[BAIDUID], Country[CHINA][CN], HTML5, HTTPServer[Apache], IP[14.215.177.39], Script, Title[百度一下,你就知道], X-UA-Compatible[IE=Edge]

2.1.2.处理上面这样的数据
whatweb命令扫描的处理脚本.py
存储进../store/deal/whatweb_result.txt,得到内容如下:
http://m.baidu.com/?cip=139.205.216.246&baiduid=D6CDE0F25
http://m.baidu.com/?cip=139.205.216.246&baiduid=D6C
http://m.baidu.com/?cip=139.205.216.246&baiduid

2.1.3
再次处理whatweb结果 将能用的子域名(带http)整合出来.py
存储进../store/deal/带http的子域名.txt,得到内容如下:
http://u.qq.com
https://ss.qq.com
http://v.qq.com
http://lvyou.baidu.com
http://wp.qq.com
http://p.qq.com

2.2.根据挑选后的子域名进行cms识别
识别目标的CMS.py
存储进../store/deal/得到target与其对应的cms.txt,得到内容如下:
http://y.baidu.com
HdWiki(中文维基)
--if是whatweb已经识别出编程语言 就再次根据php、java、python进行CMS识别:

2.3
处理../store/deal/带http的子域名.txt文件
将具备cms的排除 其余的普通url存进../store/deal/不具备cms特征的子域名.txt


三.爆破web目录
if是cms 不爆破
elif是其他的 爆破大范围,根据是php asp jsp python 爆破web目录
根据容器进行爆破
这里只做基础目录爆破,更多url链接个人设想是通过爬取
3.1

3.2 进行爆破
网站目录爆破.py 存入..store/deal/网站目录爆破结果.txt


四.目标网站url爬取
if是cms 不爬取
if不是cms 爬取


5.CMS 漏洞识别
if是cms 识别


6.中间件漏洞识别
根据whatweb下的进行中间件的漏洞扫描


7.url地址学习


8.web漏洞
get sql
反射性 xss
命令执行
ssrf
。。。。。。


9.手动验证


很多公共函数就揉在一起用了

公共函数

0.收集目标网址:

0.1 补天

爬取补天src

# coding=utf-8
import requests
from lxml import etree
import random
import time

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36",
    "Cookie": ""}
# 在请求头中携带cookie信息。

temp = "https://www.butian.net/Loo/submit?cid="
for i in range(57500, 63000):
    # for i in range(62498, 62500):
    time.sleep(random.random())
    s = requests.get(url=temp + str(i), headers=headers)
    html = etree.HTML(s.content)
    try:
        a = (html.xpath("/html/body/div[2]/div[3]/div[1]/form/div[1]/ul/li[3]/input/@value")[0])
        b = (html.xpath('//*[@id="inputCompy"]/@value')[0])
        if a:
            print(b + ":" + a)
    except:
        pass

0.2 漏洞盒子

登陆进去右键 查看源码

简单的处理脚本:

# coding=utf-8
import re

f = open("temp.txt", encoding='UTF-8')
urls = []
for i in f.readlines():
    if ",'typeName':" in i:
        url = re.findall(r'http.*?\'', i.rstrip("\n").lstrip(" "))
        print(url[0].rstrip("'"))
f.close()

————————————–

这时候我们把目标放入特定target.txt

一.子域名

这里将目标放入../store/dict/domains.txt中,实验内容如:
www.qq.com
www.baidu.com

1.1.收集众多目标的子域名.py

存储进../store/deal/domain_result.txt,得到内容如下:
v.baidu.com
ss.qq.com
u.qq.com
v5.qq.com

生成子域名字典.py 当然也可以用layer的字典

# coding=utf-8
import itertools

domain = []
import sys, os
# sys.path.append(os.path.abspath('../libs'))
# import common
import string

base_string = string.ascii_letters
# print(common.domain_list)
# print(string.ascii_letters)
# print(string.digits)
# print(string.ascii_lowercase)
# print(string.ascii_uppercase)
for i in range(4):
    for j in itertools.product(base_string, repeat=i):
        domain.append(''.join(j))
print(len(domain))

请求dns服务器.py

import dns.resolver

my_resolver = dns.resolver.Resolver()

# 8.8.8.8 is Google's public DNS server
my_resolver.nameservers = ['114.114.114.114', '8.8.8.8', '223.5.5.5', '223.6.6.6', '119.29.29.29', '182.254.116.116']
# my_resolver.nameservers = ['223.5.5.5']
ans = my_resolver.query('legal.qq.com')
for i in ans:
    print(i.address)
if ans:
    ips = ', '.join(sorted([i.address for i in ans]))
print(ips)
print("qname:", ans.qname)
print("reclass:", ans.rdclass)
print("rdtype:", ans.rdtype)
print("rrset:", ans.rrset)
print("response:", ans.response)

收集单个目标的子域名.py

import dns.resolver
import gevent
from gevent import monkey

monkey.patch_all()
from gevent.queue import PriorityQueue


def domain_query(j):
    resolvers = [dns.resolver.Resolver(configure=False) for _ in range(4)]
    dns_servers = ['114.114.114.114', '8.8.8.8', '223.5.5.5', '223.6.6.6', '119.29.29.29', '182.254.116.116']
    dns_count = len(dns_servers)
    for _r in resolvers:
        _r.lifetime = _r.timeout = 6.0
    # for j in range
    resolvers[j].nameservers = [dns_servers[j % dns_count]]
    print(resolvers[j].nameservers)


threads = [gevent.spawn(_scan, i) for i in range(4)]
gevent.joinall(threads)

收集多个目标子域名的脚本.py

#!/usr/bin/env python
# -*- encoding: utf-8 -*-

from multiprocessing import Pool
import gevent
from gevent import monkey, pool

monkey.patch_all()

import dns.resolver
import time
import codecs

import sys, os

sys.path.append(os.path.abspath('../libs'))
from load_dict import load_suffix, load_domain_dict, load_dns_domains, load_target_domains


# 请求dns服务器 查看域名是否存在
def domain_query(domain=None, dns_servers=None):
    try:
        ans = resolver.query(domain)
        if ans:
            ips = ', '.join(sorted([i.address for i in ans]))
            # print("success:"+domain)
            if "0.0.0.1" not in ips:
                print(domain)
                domain_results.add(domain)
    except:
        pass
        # print("fail:" + domain)


def test(domain):
    print(domain)


# 扫描每个目标的子域名
def scan_subdomain(domain):
    scan_pool = pool.Pool(30)
    gevent_list = [scan_pool.spawn(domain_query, (domain_dict + "." + domain)) for domain_dict in domain_dicts]
    gevent.joinall(gevent_list)


if __name__ == '__main__':
    start_time = time.time()

    # 装载字典
    global domain_results
    domain_results = set()
    global domain_dicts
    domain_dicts = load_domain_dict()  # 得到子域名爆破前缀字典 2s
    global dns_servers
    dns_servers = load_dns_domains()  # 得到域名服务器
    global suffix_list
    suffix_list = load_suffix()  # 得到各国域名后缀
    domains = load_target_domains(suffix_list)  # 得到目标字典

    global resolver
    resolver = dns.resolver.Resolver()
    # dns_servers = ['114.114.114.114', '8.8.8.8', '223.5.5.5', '223.6.6.6', '119.29.29.29', '182.254.116.116']
    resolver.lifetime = resolver.timeout = 6.0
    resolver.nameservers = dns_servers  # 默认['114.114.114.114', '8.8.8.8']
    # print(resolver.nameservers)

    domain_pool = pool.Pool(5)
    gevent_list = [domain_pool.spawn(scan_subdomain, domain) for domain in domains]
    gevent.joinall(gevent_list)

    # 保存最后的结果
    with codecs.open("../store/deal/domain_results.txt", "w+") as f:
        for domain_result in domain_results:
            f.write(domain_result)
            f.write("\n")
    f.close()

    end_time = time.time()
    print(end_time - start_time)

二.处理收集到的子域名

2.1 whatweb命令处理收集到的子域名

whatweb -i domain_result.txt | sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" | tee whatweb_urls.txt

whatweb处理脚本.py

# coding=utf-8
import codecs

with codecs.open("../store/deal/whatweb_urls.txt", "rb", encoding="ISO-8859-1") as f:
    urls = []
    for i in f.readlines():
        i = i.rstrip("\n")
        # print(i)
        # if "Unassigned" not in i and "500" not in i:
        if "Unassigned" not in i and "Not Found" not in i and "ERROR" not in i:
            urls.append(i)
            print(i)
f.close()
with open("../store/deal/whatweb_results.txt", "w+", encoding="utf-8") as f:
    for i in urls:
        f.write(i)
        f.write("\n")
f.close()

再次处理whatweb结果 将能用的子域名(带http)整合出来.py

# coding=utf-8
import codecs
import sys, os

sys.path.append(os.path.abspath('../libs'))
from common import left_rigth_strip, left_rigth_strip_http
from load_dict import load_suffix, load_target_domains, load_whatweb_results

if __name__ == '__main__':
    global suffix_list
    suffix_list = load_suffix()  # 得到各国域名后缀
    domains = load_target_domains(suffix_list)  # 得到目标字典
    results = load_whatweb_results()  #
    temp_list = set()
    for domain in domains:
        for result in results:
            if domain in result:
                # result=result.split(domain)[0]+domain
                result = result.split(domain)[0] + domain
                temp_list.add(result)

    with codecs.open("../store/deal/带http的子域名.txt", "w", encoding="utf-8") as f:
        for i in temp_list:
            f.write(i)
            f.write("\n")
    f.close()

2.2 根据挑选后的子域名进行cms识别

# coding=utf-8
import codecs
import sys, os

sys.path.append(os.path.abspath('../libs'))
from common import left_rigth_strip, left_rigth_strip_http
from load_dict import load_suffix, load_target_domains, load_whatweb_results

if __name__ == '__main__':
    global suffix_list
    suffix_list = load_suffix()  # 得到各国域名后缀
    domains = load_target_domains(suffix_list)  # 得到目标字典
    results = load_whatweb_results()  #
    temp_list = set()
    for domain in domains:
        for result in results:
            if domain in result:
                # result=result.split(domain)[0]+domain
                result = result.split(domain)[0] + domain
                temp_list.add(result)

    with codecs.open("../store/deal/带http的子域名.txt", "w", encoding="utf-8") as f:
        for i in temp_list:
            f.write(i)
            f.write("\n")
    f.close()

将不具备cms特征的子域名筛选出来.py

#!/usr/bin/env python
# -*- encoding: utf-8 -*-

import sys, os

sys.path.append(os.path.abspath('../libs'))
from load_dict import load_sudomain_http, load_sudomain_CMS
import codecs

if __name__ == '__main__':
    subdomains = load_sudomain_http()
    subdomains_cms = load_sudomain_CMS()
    with codecs.open("../store/deal/不具备cms特征的子域名.txt", "w+", encoding="utf-8") as f:
        for i in subdomains:
            if i not in subdomains_cms:
                f.write(i)
                f.write("\n")
    f.close()

三.爆破web目录

#!/usr/bin/env python
# -*- encoding: utf-8 -*-

from multiprocessing import Pool
import gevent
from gevent import monkey, pool

monkey.patch_all()

import dns.resolver
import time
import codecs
import requests

import sys, os

sys.path.append(os.path.abspath('../libs'))
from load_dict import load_sudomain_CMS, load_catelog_dict, load_no_cms_subdomain


def Request_url(target, catelog):
    url=target + catelog
    try:
        s = requests.get(url)
        # print s.text
        if s.status_code !=404:
            print(url)
            urls_lists.append(url)
    except:
        pass


def test(target,catelog):
    print(target+catelog)

def broken_catelog(target):
    scan_pool = pool.Pool(30)
    gevent_list = [scan_pool.spawn(Request_url(target, catelog)) for catelog in catelogs_list]
    gevent.joinall(gevent_list)


if __name__ == '__main__':
    global Cms_data
    Cms_data = []
    Cms_data = load_sudomain_CMS()
    global catelogs_list
    catelogs_list = []
    catelogs_list = load_catelog_dict()

    global urls_list
    urls_lists = []

    global urls
    urls = load_no_cms_subdomain()

    temp_urls=[]
    for url in urls:
        if url not in Cms_data:
            temp_urls.append(url)

    scan_pool = pool.Pool(10)
    gevent_list = [scan_pool.spawn(broken_catelog(url)) for url in temp_urls]
    gevent.joinall(gevent_list)

    #
    # with codecs.open("..store/deal/网站目录爆破结果.txt", "w+", encoding="utf-8") as f:
    #     for url in urls_lists:
    #         f.write(url)
    #         f.write("\n")
    # f.close()

四.爬取子域名url

2019.1.11

发表评论

电子邮件地址不会被公开。 必填项已用*标注