现在的位置: 首页 > 综合 > 正文

模拟登陆网站 之 Python版(内含两种版本的完整的可运行的代码)

2013年07月17日 ⁄ 综合 ⁄ 共 10174字 ⁄ 字号 评论关闭

之前已经介绍过了网络相关的一些基础知识了:

【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项

以及,简单的网页内容抓取,用Python是如何实现的:

【教程】抓取网并提取网页中所需要的信息 之 Python版

现在接着来介绍,如何通过Python来实现基本的模拟网站登陆的流程。

不过,此处需要介绍一下此文前提:

假定你已经看完了:

【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项

了解了基本的网络相关基本概念;

看完了:

【总结】浏览器中的开发人员工具(IE9的F12和Chrome的Ctrl+Shift+I)-网页分析的利器

知道了如何使用IE9的F12等工具去分析网页执行的过程。

此处已模拟登陆百度首页:

http://www.baidu.com/

为例,说明如何通过Python模拟登陆网站。


1.模拟登陆网站之前,需要搞清楚,登陆该网站的内部执行逻辑

此想要通过程序,python代码,实现模拟登陆百度首页之前。

你自己本身先要搞懂,本身登陆该网站,内部的逻辑是什么样的。

 

而关于如何利用工具,分析出来,百度首页登录的内部逻辑过程,参见:

【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程

 

2.然后才是用对应的语言,此处是Python去实现,模拟登陆的逻辑

看懂了上述用F12分析出来的百度首页的登陆的内部逻辑过程,接下来,用Python代码去实现,相对来说,就不是很难了。

 

注:

(1)关于在Python中如何利用cookie,不熟悉的,先去看:

【已解决】Python中如何获得访问网页所返回的cookie

【已解决】Python中实现带Cookie的Http的Post请求

(2)对于正则表达式不熟悉的,去参考:

正则表达式学习心得

(3)对python的正则表达式不熟悉的,可参考:

【教程】详解Python正则表达式

 

此处,再把分析出来的流程,贴出来,以便方便和代码对照:

顺序 访问地址 访问类型 发送的数据 需要获得/提取的返回的值
1 http://www.baidu.com/ GET 返回的cookie中的BAIDUID
2 https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true GET 包含BAIDUID这个cookie 从返回的html中提取出token的值
3 https://passport.baidu.com/v2/api/?login POST 一堆的post data,其中token的值是之前提取出来的 需要验证返回的cookie中,是否包含BDUSS,PTOKEN,STOKEN,SAVEUSERID

 

然后,最终就可以写出相关的,用于演示模拟登录百度首页的Python代码了。

【版本1:Python实现模拟登陆百度首页的完整代码 之 精简版】

这个是相对精简的一个版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Function:   Used to demostrate how to use Python code to emulate login baidu main page:
http://www.baidu.com/
Note:       Before try to understand following code, firstly, please read the related articles:
            (1)【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项
 
 
            (2) 【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程
 
 
            (3) 【教程】模拟登陆网站 之 Python版
 
 
Version:    2012-11-06
Author:     Crifan
"""
 
import
re;
import
cookielib;
import
urllib;
import
urllib2;
import
optparse;
 
#------------------------------------------------------------------------------
# check all cookies in cookiesDict is exist in cookieJar or not
def
checkAllCookiesExist(cookieNameList, cookieJar) :
    cookiesDict
= {};
    for
eachCookieName in
cookieNameList :
        cookiesDict[eachCookieName]
= False;
     
    allCookieFound
= True;
    for
cookie in
cookieJar :
        if(cookie.name
in cookiesDict) :
            cookiesDict[cookie.name]
= True;
     
    for
eachCookie in
cookiesDict.keys() :
        if(not
cookiesDict[eachCookie]) :
            allCookieFound
= False;
            break;
 
    return
allCookieFound;
 
#------------------------------------------------------------------------------
# just for print delimiter
def
printDelimiter():
    print
'-'*80;
 
#------------------------------------------------------------------------------
# main function to emulate login baidu
def
emulateLoginBaidu():
    print
"Function: Used to demostrate how to use Python code to emulate login baidu main page:
http://www.baidu.com/"
;
    print
"Usage: emulate_login_baidu_python.py -u yourBaiduUsername -p yourBaiduPassword";
    printDelimiter();
 
    # parse input parameters
    parser
= optparse.OptionParser();
    parser.add_option("-u","--username",action="store",type="string",default='',dest="username",help="Your
Baidu Username"
);
    parser.add_option("-p","--password",action="store",type="string",default='',dest="password",help="Your
Baidu password"
);
    (options, args)
= parser.parse_args();
    # export all options variables, then later variables can be used
    for
i in
dir
(options):
        exec(i
+ " = options."
+ i);
 
    printDelimiter();
    print
"[preparation] using cookieJar & HTTPCookieProcessor to automatically handle cookies";
    cj
= cookielib.CookieJar();
    opener
= urllib2.build_opener(urllib2.HTTPCookieProcessor(cj));
    urllib2.install_opener(opener);
 
    printDelimiter();
    print
"[step1] to get cookie BAIDUID";
    baiduMainUrl
= "http://www.baidu.com/";
    resp
= urllib2.urlopen(baiduMainUrl);
    #respInfo = resp.info();
    #print "respInfo=",respInfo;
    for
index, cookie in
enumerate
(cj):
        print
'[',index, ']',cookie;
 
    printDelimiter();
    print
"[step2] to get token value";
    getapiResp
= urllib2.urlopen(getapiUrl);
    #print "getapiResp=",getapiResp;
    getapiRespHtml
= getapiResp.read();
    #print "getapiRespHtml=",getapiRespHtml;
    #bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
    foundTokenVal
= re.search("bdPass\.api\.params\.login_token='(?P<tokenVal>\w+)';", getapiRespHtml);
    if(foundTokenVal):
        tokenVal
= foundTokenVal.group("tokenVal");
        print
"tokenVal=",tokenVal;
 
        printDelimiter();
        print
"[step3] emulate login baidu";
        staticpage
= "http://www.baidu.com/cache/user/html/jump.html";
        baiduMainLoginUrl
= "https://passport.baidu.com/v2/api/?login";
        postDict
= {
            #'ppui_logintime': "",
            'charset'      
: "utf-8",
            #'codestring'    : "",
            'token'        
: tokenVal, #de3dbf1e8596642fa2ddf2921cd6257f
            'isPhone'      
: "false",
            'index'        
: "0",
            #'u'             : "",
            #'safeflg'       : "0",
            'staticpage'   
: staticpage, #http%3A%2F%2Fwww.baidu.com%2Fcache%2Fuser%2Fhtml%2Fjump.html
            'loginType'    
: "1",
            'tpl'          
: "mn",
            'callback'     
: "parent.bdPass.api.login._postCallback",
            'username'     
: username,
            'password'     
: password,
            #'verifycode'    : "",
            'mem_pass'     
: "on",
        };
        postData
= urllib.urlencode(postDict);
        # here will automatically encode values of parameters
        # such as:
        #print "postData=",postData;
        req
= urllib2.Request(baiduMainLoginUrl, postData);
        # in most case, for do POST request, the content-type, is application/x-www-form-urlencoded
        req.add_header('Content-Type',
"application/x-www-form-urlencoded");
        resp
= urllib2.urlopen(req);
        #for index, cookie in enumerate(cj):
        #    print '[',index, ']',cookie;
        cookiesToCheck
= ['BDUSS',
'PTOKEN', 'STOKEN',
'SAVEUSERID'];
        loginBaiduOK
= checkAllCookiesExist(cookiesToCheck, cj);
        if(loginBaiduOK):
            print
"+++ Emulate login baidu is OK, ^_^";
        else:
            print
"--- Failed to emulate login baidu !"
    else:
        print
"Fail to extract token value from html=",getapiRespHtml;
 
if
__name__
=="__main__":
    emulateLoginBaidu();

 

【版本2:Python实现模拟登陆百度首页的完整代码 之 crifanLib.py版】

这个是另外一个版本,其中利用到我自己的python库:crifanLib.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Function:   Used to demostrate how to use Python code to emulate login baidu main page:
http://www.baidu.com/
            Use the functions from crifanLib.py
Note:       Before try to understand following code, firstly, please read the related articles:
            (1)【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项
 
 
            (2) 【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程
 
 
            (3) 【教程】模拟登陆网站 之 Python版
 
 
Version:    2012-11-07
Author:     Crifan
Contact:    admin (at) crifan.com
"""
 
import
re;
import
cookielib;
import
urllib;
import
urllib2;
import
optparse;
 
#===============================================================================
# following are some functions, extracted from my python library: crifanLib.py
# for the whole crifanLib.py:
#===============================================================================
 
import
zlib;
 
gConst =
{
    'constUserAgent'
: 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E)',
    #'constUserAgent' : "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0.1",
}
 
################################################################################
# Network: urllib/urllib2/http
################################################################################
 
#------------------------------------------------------------------------------
# get response from url
# note: if you have already used cookiejar, then here will automatically use it
# while using rllib2.Request
def
getUrlResponse(url, postDict
={}, headerDict={}, timeout=0,
useGzip
=False) :
    # makesure url is string, not unicode, otherwise urllib2.urlopen will error
    url
= str(url);
 
    if
(postDict) :
        postData
= urllib.urlencode(postDict);
        req
= urllib2.Request(url, postData);
        req.add_header('Content-Type',
"application/x-www-form-urlencoded");
    else
:
        req
= urllib2.Request(url);
 
    if(headerDict) :
        #print "added header:",headerDict;
        for
key in
headerDict.keys() :
            req.add_header(key, headerDict[key]);
 
    defHeaderDict
= {
        'User-Agent'   
: gConst['constUserAgent'],
        'Cache-Control'
: 'no-cache',
        'Accept'       
: '*/*',
        'Connection'   
: 'Keep-Alive',
    };
 
    # add default headers firstly
    for
eachDefHd in
defHeaderDict.keys() :
        #print "add default header: %s=%s"%(eachDefHd,defHeaderDict[eachDefHd]);
        req.add_hea

抱歉!评论已关闭.