利用Python爬唯一圖庫網(wǎng)上的漂亮妹子圖
寫在前面
學(xué)習(xí)了很多Python知識,敲寫了千萬行代碼,感覺學(xué)Python語言太枯燥乏味了。但是呢,本著美女是學(xué)習(xí)動力的第一原則,啊哈哈。寫個程序把妹子們都下載下來吧。
今天咱們就利用Python爬取唯一圖庫(http://www./mmtp/)上的漂亮的妹子圖,給大家一波福利。O(∩_∩)O。
福利時刻
妹子圖片質(zhì)量整體上還是不錯呦,放三張不同風(fēng)格的圖大家感受下,O(∩_∩)O哈哈~
爬取結(jié)果
程序框架
import urllib.request
from bs4 import BeautifulSoup
import os
def Download(url,picAlt,name):
...
def run(targetUrl, beginNUM ,endNUM):
...
if beginNUM ==endNUM
...
if __name__ == '__main__':
程序思路
- 群定目標網(wǎng)站
- 提取該網(wǎng)頁的HTML代碼
- 運用BeautifulSoup獲取想要的內(nèi)容
- 保存數(shù)據(jù)
程序步驟
程序?qū)崿F(xiàn)
該程序利用Beautiful Soup實現(xiàn)的,它其實是Python的一個庫,主要功能是從網(wǎng)頁抓取數(shù)據(jù),可參考這篇詳細文章(https:///1319.html/comment-page-1#comments)
安裝Beautiful Soup
pip install beautiful soup4
導(dǎo)包
from bs4 import BeautifulSoup
建立保存路徑
def Download(url,picAlt,name):
path = 'D:\\pythonD爬蟲妹子圖\\'+picAlt+'\\'
if not os.path.exists(path):
os.makedirs(path)
urllib.request.urlretrieve( url, '{0}{1}.jpg'.format(path, name))
完整代碼如下
import urllib.request
from bs4 import BeautifulSoup
import os
def Download(url,picAlt,name):
path = 'D:\\pythonD爬蟲妹子圖\\'+picAlt+'\\'
if not os.path.exists(path):
os.makedirs(path)
urllib.request.urlretrieve( url, '{0}{1}.jpg'.format(path, name))
header = {
"User-Agent":'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36',
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive'
}
def run(targetUrl, beginNUM ,endNUM):
req = urllib.request.Request(url=targetUrl,headers=header)
response = urllib.request.urlopen(req)
html = response.read().decode('gb2312','ignore')
soup = BeautifulSoup(html, 'html.parser')
Divs = soup.find_all('div',attrs={'id':'big-pic' })
nowpage = soup.find('span',attrs={'class':'nowpage'}).get_text()
totalpage= soup.find('span',attrs={'class':'totalpage'}).get_text()
if beginNUM ==endNUM :
return
for div in Divs:
beginNUM = beginNUM+1
if div.find("a") is None :
print("沒有下一張了")
return
elif div.find("a")['href'] is None or div.find("a")['href']=="":
print("沒有下一張了None")
return
print("下載信息:總進度:",beginNUM,"/",endNUM," ,正在下載套圖:(",nowpage,"/",totalpage,")")
if int(nowpage)<int(totalpage):
nextPageLink ="http://www./mmtp/qcmn/" +(div.find('a')['href'])
elif int(nowpage)==int(totalpage):
nextPageLink = (div.find('a')['href'])
picLink = (div.find('a').find('img')['src'])
picAlt = (div.find('a').find('img'))['alt']
print('下載的圖片鏈接:',picLink)
print('套圖名:[ ', picAlt , ' ] ')
print('開始下載...........')
Download(picLink,picAlt, nowpage)
print("下載成功!")
print('下一頁鏈接:',nextPageLink)
run(nextPageLink,beginNUM ,endNUM)
return
if __name__ == '__main__':
targetUrl ="http://www./mmtp/qcmn/237269.html"
run(targetUrl,beginNUM=0,endNUM=70)
print(" OVER")