jieba庫安裝管理員身份運(yùn)行cmd窗口輸入命令:pip install jieba jieba庫功能介紹特征
分詞功能
添加自定義詞典開發(fā)者可以指定自己自定義的詞典,以便包含jieba詞庫里沒有的詞。雖然jieba有新詞識別能力,但是自行添加新詞可以保證更高的正確率 用法
關(guān)鍵詞提取
詞性標(biāo)注
案例一、精確模式import jieba list1 = jieba.lcut("中華人民共和國是一個偉大的國家") print(list1) print("精確模式:"+"/".join(list1)) 二、全模式list2 = jieba.lcut("中華人民共和國是一個偉大的國家",cut_all = True) print(list2,end=",") print("全模式:"+"/".join(list2)) 三、搜索引擎模式list3 = jieba.lcut_for_search("中華人民共和國是一個偉大的國家") print(list3) print("搜索引擎模式:"+" ".join(list3)) 四、修改詞典import jieba text = "中信建投投資公司了一款游戲,中信也投資了一個游戲公司" word = jieba.lcut(text) print(word) # 添加詞 jieba.add_word("中信建投") jieba.add_word("投資公司") word1 = jieba.lcut(text) print(word1) # 刪除詞 jieba.del_word("中信建投") word2 = jieba.lcut(text) print(word2) 五、詞性標(biāo)注import jieba.posseg as pseg words = pseg.cut("我愛北京天安門") for i in words: print(i.word,i.flag) 六、統(tǒng)計三國演義中人物出場的次數(shù)import jieba txt = open("文件路徑", "r", encoding='utf-8').read() # 打開并讀取文件 words = jieba.lcut(txt) # 使用精確模式對文本進(jìn)行分詞 counts = {} # 通過鍵值對的形式存儲詞語及其出現(xiàn)的次數(shù) for word in words: if len(word) == 1: # 單個詞語不計算在內(nèi) continue else: counts[word] = counts.get(word, 0) + 1 # 遍歷所有詞語,每出現(xiàn)一次其對應(yīng)的值加 1 items = list(counts.items()) #將鍵值對轉(zhuǎn)換成列表 items.sort(key=lambda x: x[1], reverse=True) # 根據(jù)詞語出現(xiàn)的次數(shù)進(jìn)行從大到小排序 for i in range(15): word, count = items[i] print("{0:<10}{1:>5}".format(word, count)) import jieba excludes = {"將軍","卻說","荊州","二人","不可","不能","如此","如何"} txt = open("三國演義.txt", "r", encoding='utf-8').read() words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word == "諸葛亮" or word == "孔明曰": rword = "孔明" elif word == "關(guān)公" or word == "云長": rword = "關(guān)羽" elif word == "玄德" or word == "玄德曰": rword = "劉備" elif word == "孟德" or word == "丞相": rword = "曹操" else: rword = word counts[rword] = counts.get(rword,0) + 1 for i in excludes: del counts[i] items = list(counts.items()) items.sort(key=lambda x:x[1], reverse=True) for i in range(10): word, count = items[i] print ("{0:<10}{1:>5}".format(word, count)) 文章來源:https://www.cnblogs.com/L-hua/p/15584823.html |
|
來自: 冒險的K > 《應(yīng)用文》