一、介紹 #介紹:使用requests可以模擬瀏覽器的請(qǐng)求,比起之前用到的urllib,requests模塊的api更加便捷(本質(zhì)就是封裝了urllib3)
#注意:requests庫(kù)發(fā)送請(qǐng)求將網(wǎng)頁(yè)內(nèi)容下載下來(lái)以后,并不會(huì)執(zhí)行js代碼,這需要我們自己分析目標(biāo)站點(diǎn)然后發(fā)起新的request請(qǐng)求
#安裝:pip3 install requests
#各種請(qǐng)求方式:常用的就是requests.get()和requests.post()
>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r = requests.post('http:///post', data = {'key':'value'})
>>> r = requests.put('http:///put', data = {'key':'value'})
>>> r = requests.delete('http:///delete')
>>> r = requests.head('http:///get')
>>> r = requests.options('http:///get')
#建議在正式學(xué)習(xí)requests前,先熟悉下HTTP協(xié)議
二、基于GET請(qǐng)求 1、基本請(qǐng)求 import requests response=requests.get('http://dig./') print(response.text) 2、帶參數(shù)的GET請(qǐng)求->params import requests
response=requests.get('https://s.taobao.com/search?q=手機(jī)')
response=requests.get('https://s.taobao.com/search',params={'q':'美女'})
3、帶參數(shù)的GET請(qǐng)求->headers #通常我們?cè)诎l(fā)送請(qǐng)求時(shí)都需要帶上請(qǐng)求頭,請(qǐng)求頭是將自身偽裝成瀏覽器的關(guān)鍵,常見(jiàn)的有用的請(qǐng)求頭如下 Host Referer #大型網(wǎng)站通常都會(huì)根據(jù)該參數(shù)判斷請(qǐng)求的來(lái)源 User-Agent #客戶(hù)端 Cookie #Cookie信息雖然包含在請(qǐng)求頭里,但requests模塊有單獨(dú)的參數(shù)來(lái)處理他,headers={}內(nèi)就不要放它了 #添加headers(瀏覽器會(huì)識(shí)別請(qǐng)求頭,不加可能會(huì)被拒絕訪(fǎng)問(wèn),比如訪(fǎng)問(wèn)https://www.zhihu.com/explore)
import requests
response=requests.get('https://www.zhihu.com/explore')
response.status_code #500
#自己定制headers
headers={
'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36',
}
respone=requests.get('https://www.zhihu.com/explore',
headers=headers)
print(respone.status_code) #200
4、帶參數(shù)的GET請(qǐng)求->cookies import uuid import requests url = 'http:///cookies' cookies = dict(sbid=str(uuid.uuid4())) res = requests.get(url, cookies=cookies) print(res.json()) 三、基于POST請(qǐng)求 1、介紹 #GET請(qǐng)求
HTTP默認(rèn)的請(qǐng)求方法就是GET
* 沒(méi)有請(qǐng)求體
* 數(shù)據(jù)必須在1K之內(nèi)!
* GET請(qǐng)求數(shù)據(jù)會(huì)暴露在瀏覽器的地址欄中
GET請(qǐng)求常用的操作:
1. 在瀏覽器的地址欄中直接給出URL,那么就一定是GET請(qǐng)求
2. 點(diǎn)擊頁(yè)面上的超鏈接也一定是GET請(qǐng)求
3. 提交表單時(shí),表單默認(rèn)使用GET請(qǐng)求,但可以設(shè)置為POST
#POST請(qǐng)求
(1). 數(shù)據(jù)不會(huì)出現(xiàn)在地址欄中
(2). 數(shù)據(jù)的大小沒(méi)有上限
(3). 有請(qǐng)求體
(4). 請(qǐng)求體中如果存在中文,會(huì)使用URL編碼!
#?。。equests.post()用法與requests.get()完全一致,特殊的是requests.post()有一個(gè)data參數(shù),用來(lái)存放請(qǐng)求體數(shù)據(jù)
2、發(fā)送post請(qǐng)求,模擬瀏覽器的登錄行為 #對(duì)于登錄來(lái)說(shuō),應(yīng)該輸錯(cuò)用戶(hù)名或密碼然后分析抓包流程,用腦子想一想,輸對(duì)了瀏覽器就跳轉(zhuǎn)了,還分析個(gè)毛線(xiàn),累死你也找不到包 '''
一 目標(biāo)站點(diǎn)分析
瀏覽器輸入https://github.com/login
然后輸入錯(cuò)誤的賬號(hào)密碼,抓包
發(fā)現(xiàn)登錄行為是post提交到:https://github.com/session
而且請(qǐng)求頭包含cookie
而且請(qǐng)求體包含:
commit:Sign in
utf8:?
authenticity_token:lbI8IJCwGslZS8qJPnof5e7ZkCoSoMn6jmDTsL1r/m06NLyIbw7vCrpwrFAPzHMep3Tmf/TSJVoXWrvDZaVwxQ==
login:egonlin
password:123
二 流程分析
先GET:https://github.com/login拿到初始cookie與authenticity_token
返回POST:https://github.com/session, 帶上初始cookie,帶上請(qǐng)求體(authenticity_token,用戶(hù)名,密碼等)
最后拿到登錄cookie
ps:如果密碼時(shí)密文形式,則可以先輸錯(cuò)賬號(hào),輸對(duì)密碼,然后到瀏覽器中拿到加密后的密碼,github的密碼是明文
'''
import requests
import re
#第一次請(qǐng)求
r1=requests.get('https://github.com/login')
r1_cookie=r1.cookies.get_dict() #拿到初始cookie(未被授權(quán))
authenticity_token=re.findall(r'name='authenticity_token'.*?value='(.*?)'',r1.text)[0] #從頁(yè)面中拿到CSRF TOKEN
#第二次請(qǐng)求:帶著初始cookie和TOKEN發(fā)送POST請(qǐng)求給登錄頁(yè)面,帶上賬號(hào)密碼
data={
'commit':'Sign in',
'utf8':'?',
'authenticity_token':authenticity_token,
'login':'317828332@qq.com',
'password':'alex3714'
}
r2=requests.post('https://github.com/session',
data=data,
cookies=r1_cookie
)
login_cookie=r2.cookies.get_dict()
#第三次請(qǐng)求:以后的登錄,拿著login_cookie就可以,比如訪(fǎng)問(wèn)一些個(gè)人配置
r3=requests.get('https://github.com/settings/emails',
cookies=login_cookie)
print('317828332@qq.com' in r3.text) #True
import requests import re session=requests.session() #第一次請(qǐng)求 r1=session.get('https://github.com/login') authenticity_token=re.findall(r'name='authenticity_token'.*?value='(.*?)'',r1.text)[0] #從頁(yè)面中拿到CSRF TOKEN #第二次請(qǐng)求 data={ 'commit':'Sign in', 'utf8':'?', 'authenticity_token':authenticity_token, 'login':'317828332@qq.com', 'password':'alex3714' } r2=session.post('https://github.com/session', data=data, ) #第三次請(qǐng)求 r3=session.get('https://github.com/settings/emails') print('317828332@qq.com' in r3.text) #True 3、補(bǔ)充 requests.post(url='xxxxxxxx',
data={'xxx':'yyy'}) #沒(méi)有指定請(qǐng)求頭,#默認(rèn)的請(qǐng)求頭:application/x-www-form-urlencoed
#如果我們自定義請(qǐng)求頭是application/json,并且用data傳值, 則服務(wù)端取不到值
requests.post(url='',
data={'':1,},
headers={
'content-type':'application/json'
})
requests.post(url='',
json={'':1,},
) #默認(rèn)的請(qǐng)求頭:application/json
四、響應(yīng)Response 1、response屬性 import requests respone=requests.get('http://www.jianshu.com') # respone屬性 print(respone.text) print(respone.content) print(respone.status_code) print(respone.headers) print(respone.cookies) print(respone.cookies.get_dict()) print(respone.cookies.items()) print(respone.url) print(respone.history) print(respone.encoding) 2、編碼問(wèn)題 #編碼問(wèn)題
import requests
response=requests.get('http://www./news')
# response.encoding='gbk' #汽車(chē)之家網(wǎng)站返回的頁(yè)面內(nèi)容為gb2312編碼的,而requests的默認(rèn)編碼為ISO-8859-1,如果不設(shè)置成gbk則中文亂碼
print(response.text)
3、獲取二進(jìn)制數(shù)據(jù) import requests response=requests.get('https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1509868306530&di=712e4ef3ab258b36e9f4b48e85a81c9d&imgtype=0&src=http%3A%2F%2Fc.hiphotos.baidu.com%2Fimage%2Fpic%2Fitem%2F11385343fbf2b211e1fb58a1c08065380dd78e0c.jpg') with open('a.jpg','wb') as f: f.write(response.content) #stream參數(shù):一點(diǎn)一點(diǎn)的取,比如下載視頻時(shí),如果視頻100G,用response.content然后一下子寫(xiě)到文件中是不合理的
import requests
response=requests.get('https://gss3.baidu.com/6LZ0ej3k1Qd3ote6lo7D0j9wehsv/tieba-smallvideo-transcode/1767502_56ec685f9c7ec542eeaf6eac93a65dc7_6fe25cd1347c_3.mp4',
stream=True)
with open('b.mp4','wb') as f:
for line in response.iter_content():
f.write(line)
4、解析json #解析json import requests response=requests.get('http:///get') import json res1=json.loads(response.text) #太麻煩 res2=response.json() #直接獲取json數(shù)據(jù) print(res1 == res2) #True 5、Redirection and History By default Requests will perform location redirection for all verbs except HEAD.
We can use the history property of the Response object to track redirection.
The Response.history list contains the Response objects that were created in order to complete the request. The list is sorted from the oldest to the most recent response.
For example, GitHub redirects all HTTP requests to HTTPS:
>>> r = requests.get('http://github.com')
>>> r.url
'https://github.com/'
>>> r.status_code
200
>>> r.history
[<Response [301]>]
If you're using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable redirection handling with the allow_redirects parameter:
>>> r = requests.get('http://github.com', allow_redirects=False)
>>> r.status_code
301
>>> r.history
[]
If you're using HEAD, you can enable redirection as well:
>>> r = requests.head('http://github.com', allow_redirects=True)
>>> r.url
'https://github.com/'
>>> r.history
[<Response [301]>]
import requests import re #第一次請(qǐng)求 r1=requests.get('https://github.com/login') r1_cookie=r1.cookies.get_dict() #拿到初始cookie(未被授權(quán)) authenticity_token=re.findall(r'name='authenticity_token'.*?value='(.*?)'',r1.text)[0] #從頁(yè)面中拿到CSRF TOKEN #第二次請(qǐng)求:帶著初始cookie和TOKEN發(fā)送POST請(qǐng)求給登錄頁(yè)面,帶上賬號(hào)密碼 data={ 'commit':'Sign in', 'utf8':'?', 'authenticity_token':authenticity_token, 'login':'317828332@qq.com', 'password':'alex3714' } #測(cè)試一:沒(méi)有指定allow_redirects=False,則響應(yīng)頭中出現(xiàn)Location就跳轉(zhuǎn)到新頁(yè)面,r2代表新頁(yè)面的response r2=requests.post('https://github.com/session', data=data, cookies=r1_cookie ) print(r2.status_code) #200 print(r2.url) #看到的是跳轉(zhuǎn)后的頁(yè)面 print(r2.history) #看到的是跳轉(zhuǎn)前的response print(r2.history[0].text) #看到的是跳轉(zhuǎn)前的response.text #測(cè)試二:指定allow_redirects=False,則響應(yīng)頭中即便出現(xiàn)Location也不會(huì)跳轉(zhuǎn)到新頁(yè)面,r2代表的仍然是老頁(yè)面的response r2=requests.post('https://github.com/session', data=data, cookies=r1_cookie, allow_redirects=False ) print(r2.status_code) #302 print(r2.url) #看到的是跳轉(zhuǎn)前的頁(yè)面https://github.com/session print(r2.history) #[] |
|
來(lái)自: 東西二王 > 《編程開(kāi)發(fā)》