蘇生不惑又寫(xiě)了個(gè)小工具

風(fēng)聲之家 2022-07-25 發(fā)布于江蘇

展開(kāi)全文

之前分享過(guò)我寫(xiě)的一些工具整理下蘇生不惑開(kāi)發(fā)過(guò)的那些軟件和腳本，不過(guò)導(dǎo)出的公眾號(hào)pdf文件太多想合并成一個(gè)，于是用PDFShaper合并pdf，但合并后的pdf沒(méi)有書(shū)簽：于是用python寫(xiě)了個(gè)pdf合并工具，這里以莫言的公眾號(hào)文章為例，先下載他的所有公眾號(hào)文章，詳情見(jiàn)我之前的文章一鍵批量下載微信公眾號(hào)文章內(nèi)容/圖片/封面/視頻/音頻，支持導(dǎo)出html和pdf格式，包含閱讀數(shù)/點(diǎn)贊數(shù)/在看數(shù)/留言數(shù) ，導(dǎo)出的文章數(shù)據(jù)包含文章日期，文章標(biāo)題，文章鏈接，文章簡(jiǎn)介，文章作者，文章封面圖，是否原創(chuàng)，IP歸屬地，閱讀數(shù)，在看數(shù)，點(diǎn)贊數(shù)和留言數(shù)等，看IP歸屬地莫言在上海：文章也同步到博客了 https://rcel.app/#/wechat/%E5%85%AC%E4%BC%97%E5%8F%B7%E8%8E%AB%E8%A8%80%E5%8E%86%E5%8F%B2%E6%96%87%E7%AB%A0

所有音頻也下載了：如果是批量下載文章里的音頻/視頻可以使用我開(kāi)發(fā)的這個(gè)小工具我又開(kāi)發(fā)了個(gè)公眾號(hào)音頻視頻和話(huà)題批量下載工具，效果：話(huà)題下的音頻也支持，代碼如下：

topic_url='xxx'
biz=re.search(r'__biz=(.*?)&',topic_url).group(1)
album_id=re.search(r'album_id=(.*?)&',topic_url).group(1)
response = requests.get(topic_url, headers=headers)
voiceids = re.findall('data-voiceid="(.*)"',response.text)
msgids = re.findall('data-msgid="(.*)"',response.text)
links = re.findall('data-link="(.*)"',response.text)
titles = re.findall('data-title="(.*)" data-voiceid',response.text)
print(titles,len(voiceids))
for i,j in zip(titles,voiceids):
 voice_url = f'https://res.wx.qq.com/voice/getvoice?mediaid={j}'
 # print(i,voice_url)
 audio_data = requests.get(voice_url,headers=headers)
 print('正在下載音頻：'+i+'.mp3')
 with open(i+'.mp3','wb') as f:
  f.write(audio_data.content)
sys.exit(1)

下載效果：

下載的文章html先轉(zhuǎn)換成pdf：代碼如下：

def to_pdf():
    import pdfkit
    print('導(dǎo)出 PDF...')
    htmls = []
    for root, dirs, files in os.walk('.'):
     for name in files:
      if name.endswith(".html"):
       print(name)
       try:
        pdfkit.from_file(name, 'pdf/'+name.replace('.html', '')+'.pdf')
       except Exception as e:
        print(e)
def to_word():
    print('導(dǎo)出 word...')
    htmls = []
    for root, dirs, files in os.walk('.'):
        for name in files:
            if name.endswith(".pdf"):
                print(name)
                try:
                    cv = Converter(name)
                    cv.convert('word/'+name.replace('.pdf', '')+'.docx')
                    cv.close()
                except Exception as e:
                    print(e)
to_pdf()
# to_word()

然后將轉(zhuǎn)換的pdf合并成一個(gè)文件并生成書(shū)簽。代碼如下，我打包成工具了，在公眾號(hào)后臺(tái)對(duì)話(huà)框回復(fù) 公眾號(hào)：

import logging,os,html
from PyPDF2 import  PdfFileReader, PdfFileWriter,PdfFileMerger
file_writer = PdfFileWriter()
merger = PdfFileMerger()
num = 0
for root, dirs, files in os.walk('.'):
    for name in files:
        if name.endswith(".pdf"):
            print(name)
            file_reader = PdfFileReader(f"{name}")
            file_writer.addBookmark(html.unescape(name).replace('.pdf',''), num, parent=None)
            for page in range(file_reader.getNumPages()):
                num += 1
                file_writer.addPage(file_reader.getPage(page))
with open(r"公眾號(hào)蘇生不惑歷史文章合集.pdf",'wb') as f:
    file_writer.write(f)

合并后的效果：點(diǎn)擊左側(cè)書(shū)簽跳轉(zhuǎn)到對(duì)應(yīng)文章pdf（含留言）：

當(dāng)然也可以導(dǎo)出pdf的書(shū)簽到excel，包含書(shū)簽名和頁(yè)碼。代碼如下：

def bookmark_export(lines):
    bookmark = ''
    for line in lines:
        if isinstance(line, dict):
            bookmark += line['/Title'] + ','+str(line['/Page']+1)+'\n'
        else:
            bookmark_export(line)
    return bookmark
with open('公眾號(hào)蘇生不惑歷史文章合集.pdf', 'rb') as f:
    lines = PdfFileReader(f).getOutlines()
    bookmark = bookmark_export(lines)
with open('公眾號(hào)蘇生不惑歷史文章合集.csv', 'a+', encoding='utf-8-sig') as f:
    f.write(bookmark)

最近原創(chuàng)文章：

加入我的知識(shí)星球

解除b站番劇區(qū)域限制，這個(gè)特殊版本的 b 站 app 功能太強(qiáng)了

2022 最新一鍵下載百度文庫(kù)/豆丁/道客巴巴/原創(chuàng)力文檔

一鍵批量下載微信公眾號(hào)文章內(nèi)容/圖片/封面/視頻/音頻，支持導(dǎo)出html和pdf格式，包含閱讀數(shù)/點(diǎn)贊數(shù)/在看數(shù)/留言數(shù)

網(wǎng)易云音樂(lè)每天自動(dòng)聽(tīng)歌300首升級(jí)LV10，b站每天自動(dòng)簽到升級(jí)LV6，京東每天自動(dòng)簽到領(lǐng)京豆，微信運(yùn)動(dòng)每天自動(dòng)修改步數(shù)

分享幾個(gè)音樂(lè)神器 APP，免費(fèi)收聽(tīng)和下載音樂(lè)，一鍵解鎖網(wǎng)易云音樂(lè)變灰歌曲

整理下蘇生不惑開(kāi)發(fā)過(guò)的那些軟件和腳本

如果文章對(duì)你有幫助還請(qǐng)

不喜歡

確定

不看此公眾號(hào)

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶(hù)發(fā)布，不代表本站觀(guān)點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買(mǎi)等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：風(fēng)聲之家 > 《python》

舉報(bào)/認(rèn)領(lǐng)