微博登录的js更新好快啊~之前有一次想登陆就没搞定,现在终于有时间弄弄这个小工具了,顺便练练手。
参考了几个前人的文章:
http://www.douban.com/note/201767245/
http://blog.csdn.net/monsion/article/details/8656690
http://blog.csdn.net/huyoo/article/details/11952603
通过这几篇文章,基本搞懂了登录过程~
通过http://login.sina.com.cn/,看源代码可以发现登录过程的js脚本文件为
Request URL:https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.15)&_=1399711215289
后面的最后一串是请求时间。原来时间的产生方式(v1.4.4)比现在的短了几位,所以还需要对time做个小修改,详见代码。
weiboLogin.py
#! /usr/bin/env python # -*- coding: utf-8 -*- import sys import urllib import urllib2 import cookielib import base64 import re import json import hashlib import rsa import binascii import time class weiboLogin: cj = cookielib.LWPCookieJar() cookie_support = urllib2.HTTPCookieProcessor(cj) opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler) urllib2.install_opener(opener) postdata = { 'entry': 'weibo', 'gateway': '1', 'from': '', 'savestate': '7', 'userticket': '1', 'ssosimplelogin': '1', 'vsnf': '1', 'vsnval': '', 'su': '', 'service': 'miniblog', 'servertime': '', 'nonce': '', 'pwencode': 'rsa2', 'sp': '', 'encoding': 'UTF-8', 'prelt': '115', 'rsakv': '', 'url': 'http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack', 'returntype': 'META' } def get_servertime(self,username): curtime=int(time.time()*1000) url = r'http://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=%s&rsakt=mod&checkpin=1&client=ssologin.js(v1.4.15)&_=' %username +str(curtime) # print url data = urllib2.urlopen(url).read() p = re.compile('\((.*)\)') # print data try: json_data = p.search(data).group(1) data = json.loads(json_data) servertime = str(data['servertime']) nonce = data['nonce'] pubkey = data['pubkey'] rsakv = data['rsakv'] return servertime, nonce, pubkey, rsakv except: print 'Get severtime error!' return None def get_pwd(self, password, servertime, nonce, pubkey): rsaPublickey = int(pubkey, 16) key = rsa.PublicKey(rsaPublickey, 65537) #创建公钥 message = str(servertime) + '\t' + str(nonce) + '\n' + str(password) #拼接明文js加密文件中得到 passwd = rsa.encrypt(message, key) #加密 passwd = binascii.b2a_hex(passwd) #将加密信息转换为16进制。 return passwd def get_user(self, username): username_ = urllib.quote(username) username = base64.encodestring(username_)[:-1] return username def get_account(self,filename): f=file(filename) flag = 0 for line in f: if flag == 0: username = line.strip() flag +=1 else: pwd = line.strip() f.close() # print username,' ',pwd return username,pwd def login(self,filename): username,pwd = self.get_account(filename) url = 'http://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.15)' # try: servertime, nonce, pubkey, rsakv = self.get_servertime(username) print servertime print nonce print pubkey print rsakv # except: # print 'get servertime error!' # return weiboLogin.postdata['servertime'] = servertime weiboLogin.postdata['nonce'] = nonce weiboLogin.postdata['rsakv'] = rsakv weiboLogin.postdata['su'] = self.get_user(username) weiboLogin.postdata['sp'] = self.get_pwd(pwd, servertime, nonce, pubkey) weiboLogin.postdata = urllib.urlencode(weiboLogin.postdata) print self.get_user(username),self.get_pwd(pwd, servertime, nonce, pubkey) headers = {'User-Agent':'Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0 Chrome/20.0.1132.57 Safari/536.11'} req = urllib2.Request( url = url, data = weiboLogin.postdata, headers = headers ) result = urllib2.urlopen(req) text = result.read() self.writefile('./output/textlogin',text) self.writefile('./output/resultlogin',eval("u'''"+text+"'''")) p = re.compile('location\.replace\(\'(.*)\'\)')#这里博文用的是双引号,是错的,改成了单引号就好了! try: login_url = p.search(text).group(1) print login_url urllib2.urlopen(login_url) print "Login success!" return 1 except: print 'Login error!' return 0 def writefile(self,filename,content): fw = file(filename,'w') fw.write(content) fw.close()
主程序main.py
# -*- coding: utf-8 -*- import weiboLogin import urllib import urllib2 import time import getWeiboPage filename = './config/account'#保存微博账号的用户名和密码,第一行为用户名,第二行为密码,没有空行 WBLogin = weiboLogin.weiboLogin() if WBLogin.login(filename)==1: print 'Login success!' else: print 'Login error!' exit() WBmsg = getWeiboPage.getWeiboPage() url = 'http://weibo.com/p/1005051447378675/weibo?from=page_100505&mod=TAB#place' # 'http://weibo.com/274891787?from=otherprofile&wvr=3.6&loc=tagweibo' WBmsg.get_firstpage(url) WBmsg.get_secondpage(url) WBmsg.get_thirdpage(url)
登陆后保存页面的,因为主页有lazy load机制,所以要分三次保存
getWeiboPage.py
#!/usr/bin/env python # -*- coding: utf-8 -*- import urllib import urllib2 import sys import time reload(sys) sys.setdefaultencoding('utf-8') class getWeiboPage: body = { '__rnd':'', '_k':'', '_t':'0', 'count':'50', 'end_id':'', 'max_id':'', 'page':1, 'pagebar':'', 'pre_page':'0', 'uid':'' } uid_list = [] charset = 'utf8' def get_msg(self,uid): getWeiboPage.body['uid'] = uid url = self.get_url(uid) self.get_firstpage(url) self.get_secondpage(url) self.get_thirdpage(url) def get_firstpage(self,url): getWeiboPage.body['pre_page'] = getWeiboPage.body['page']-1 url = url +urllib.urlencode(getWeiboPage.body) req = urllib2.Request(url) result = urllib2.urlopen(req) text = result.read() self.writefile('./output/text1',text) self.writefile('./output/result1',eval("u'''"+text+"'''")) def get_secondpage(self,url): getWeiboPage.body['count'] = '15' # getWeiboPage.body['end_id'] = '3490160379905732' # getWeiboPage.body['max_id'] = '3487344294660278' getWeiboPage.body['pagebar'] = '0' getWeiboPage.body['pre_page'] = getWeiboPage.body['page'] url = url +urllib.urlencode(getWeiboPage.body) req = urllib2.Request(url) result = urllib2.urlopen(req) text = result.read() self.writefile('./output/text2',text) self.writefile('./output/result2',eval("u'''"+text+"'''")) def get_thirdpage(self,url): getWeiboPage.body['count'] = '15' getWeiboPage.body['pagebar'] = '1' getWeiboPage.body['pre_page'] = getWeiboPage.body['page'] url = url +urllib.urlencode(getWeiboPage.body) req = urllib2.Request(url) result = urllib2.urlopen(req) text = result.read() self.writefile('./output/text3',text) self.writefile('./output/result3',eval("u'''"+text+"'''")) def get_url(self,uid): url = 'http://weibo.com/' + uid + '?from=otherprofile&wvr=3.6&loc=tagweibo' return url def get_uid(self,filename): fread = file(filename) for line in fread: getWeiboPage.uid_list.append(line) print line time.sleep(1) def writefile(self,filename,content): fw = file(filename,'w') fw.write(content) fw.close()
删除机制测试了一下,请求返回的是错误页面。。求帮忙~谢谢!
替换getWeiboPage.py里的对应函数~
def get_firstpage(self,url): getWeiboPage.body['pre_page'] = getWeiboPage.body['page']-1 url = url +urllib.urlencode(getWeiboPage.body) req = urllib2.Request(url) result = urllib2.urlopen(req) text = result.read() self.writefile('./output/text1.html',text) p = re.compile('{\"ns\":\"pl\.content\.homeFeed\.index\"(.*)"html":"(.*)}') try: feeds = p.search(text).group(2) self.writefile('./output/result1.html',feeds) # print feeds,'FEEDS ok' eachFeed = re.compile(r'action-data=\\"mid=(\d*)') # pp = re.compile(r'mid=(\d+)') nodes=eachFeed.findall(feeds) middict={} print nodes for node in nodes: middict[node]=1 for (key,x) in middict.items(): print "deleting key:"+key urldel = 'http://weibo.com/aj/mblog/del?_wv=5' postdata = {'mid':key} postdata = urllib.urlencode(postdata) req = urllib2.Request(url,postdata) result = urllib2.urlopen(req) time.sleep(4) delresult = result.read() print delresult except: print 'get feed error'
还是有错~求帮忙~