【原】【LlamaIndex教程】2. 存儲模塊：如何在 LlamaIndex 中使用自定義的向量數(shù)據(jù)庫？（附代碼）

小張學(xué)AI 2024-06-22 發(fā)布于山東

展開全文

前面文章兩行代碼就實(shí)現(xiàn)了文檔的切分和向量化存儲以及持久化存儲。如果我們想用自定義的向量化數(shù)據(jù)庫呢？

0. 背景

前面文章兩行代碼就實(shí)現(xiàn)了文檔的切分和向量化存儲以及持久化存儲。

index = VectorStoreIndex.from_documents(documents)
# store it for later
index.storage_context.persist(persist_dir=PERSIST_DIR)

但是有時(shí)候我們更希望使用自己常用的向量數(shù)據(jù)庫和向量化方式。下面以 chromadb 為例，介紹如何使用。

1. 在 LlamaIndex 中使用自定義的向量數(shù)據(jù)庫

（1）環(huán)境準(zhǔn)備

寫代碼之前，需要首先安裝 LlamaIndex 中的 chromadb。

pip install -U llama-index-vector-stores-chroma -i https://pypi.tuna.tsinghua.edu.cn/simple

（2）創(chuàng)建一個(gè)chromadb 數(shù)據(jù)庫的實(shí)例

db = chromadb.PersistentClient(path="D:\\GitHub\\LEARN_LLM\\LlamaIndex\\vector_store\\chroma_db")

（3）創(chuàng)建 chroma 數(shù)據(jù)庫的 collection

chroma_collection = db.get_or_create_collection("quickstart")

（4）將 chroma_collection 使用 LlamaIndex 的 ChromaVectorStore 進(jìn)行以下類型轉(zhuǎn)換和封裝，轉(zhuǎn)換成 LlamaIndex 的 VectorStore。

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

（5）將 VectorStore 封裝到 StorageContext 中

storage_context = StorageContext.from_defaults(vector_store=vector_store)

（6）創(chuàng)建 VectorStoreIndex 時(shí)，使用 from_documents 函數(shù)中的 storage_context 參數(shù)，將上面自定義的 storage_context 傳入。

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

完整代碼如下：

import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# load some documents
documents = SimpleDirectoryReader("D:\\GitHub\\LEARN_LLM\\LlamaIndex\\data").load_data()

# initialize client, setting path to save data
db = chromadb.PersistentClient(path="D:\\GitHub\\LEARN_LLM\\LlamaIndex\\vector_store\\chroma_db")

# create collection
chroma_collection = db.get_or_create_collection("quickstart")

# assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# create your index
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

# create a query engine and query
query_engine = index.as_query_engine()
response = query_engine.query("什么是角色提示?")
print(response)

2. 總結(jié)

本文我們學(xué)習(xí)了如何在 LlamaIndex 中使用自定義的向量數(shù)據(jù)庫，并詳細(xì)介紹了其實(shí)現(xiàn)步驟。再總結(jié)一下，在 LlamaIndex 中使用自定義的向量數(shù)據(jù)庫，最主要的是創(chuàng)建 LlamaIndex 的 VectorStore，然后將 VectorStore 封裝到 StorageContext 中，最后將 StorageContext 傳入 VectorStoreIndex 的 from_documents 函數(shù)中。