<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RAG &#8211; 95博客</title>
	<atom:link href="https://95bok.cn/tag/rag/feed/" rel="self" type="application/rss+xml" />
	<link>https://95bok.cn</link>
	<description>云烟</description>
	<lastBuildDate>Thu, 09 Apr 2026 08:25:44 +0000</lastBuildDate>
	<language>zh-Hans</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://95bok.cn/wp-content/uploads/2025/11/cropped-1740116058152-32x32.jpg</url>
	<title>RAG &#8211; 95博客</title>
	<link>https://95bok.cn</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Ollama + LangChain 搭建本地 RAG 知识库：上传 PDF 让 AI 基于你的私人资料回答</title>
		<link>https://95bok.cn/ollama-langchain-rag-knowledge-base/</link>
					<comments>https://95bok.cn/ollama-langchain-rag-knowledge-base/#respond</comments>
		
		<dc:creator><![CDATA[云烟]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 06:30:00 +0000</pubDate>
				<category><![CDATA[本地AI]]></category>
		<category><![CDATA[LangChain]]></category>
		<category><![CDATA[Ollama]]></category>
		<category><![CDATA[RAG]]></category>
		<category><![CDATA[知识库]]></category>
		<guid isPermaLink="false">https://95bok.cn/?p=379</guid>

					<description><![CDATA[这篇讲怎么从零搭一个本地 RAG——丢一堆文档进去，问 AI 问题，它基于你的资料回答，不是瞎编。 为什么用本 ... <a title="Ollama + LangChain 搭建本地 RAG 知识库：上传 PDF 让 AI 基于你的私人资料回答" class="read-more" href="https://95bok.cn/ollama-langchain-rag-knowledge-base/" aria-label="阅读 Ollama + LangChain 搭建本地 RAG 知识库：上传 PDF 让 AI 基于你的私人资料回答">阅读更多</a>]]></description>
										<content:encoded><![CDATA[<p>这篇讲怎么从零搭一个本地 RAG——丢一堆文档进去，问 AI 问题，它基于你的资料回答，不是瞎编。</p>
<hr />
<h2>为什么用本地 RAG</h2>
<p>文档不上云，全在本地处理。免费，不需要 API 额度。模型、向量库、切分策略都能自己调。</p>
<p>流程就两步：检索 + 生成。先从文档里找到相关段落，喂给 AI 让它基于这些内容回答。</p>
<hr />
<h2>环境</h2>
<p>先装 Ollama，拉好模型：</p>
<pre><code class="language-bash">ollama pull qwen2.5:7b
ollama pull nomic-embed-text</code></pre>
<p>嵌入模型用 <code>nomic-embed-text</code>，274MB，专门做向量嵌入的，比用主模型快很多。</p>
<p>Python 环境：</p>
<pre><code class="language-bash">mkdir ~/rag-demo &amp;&amp; cd ~/rag-demo
python3 -m venv venv &amp;&amp; source venv/bin/activate
pip install langchain langchain-community langchain-ollama chromadb pypdf</code></pre>
<hr />
<h2>完整代码</h2>
<p>直接一个文件搞定，保存为 <code>rag.py</code>：</p>
<pre><code class="language-python">#!/usr/bin/env python3
import os, sys, shutil, subprocess
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings, ChatOllama
from langchain_community.vectorstores import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

MODEL = "qwen2.5:7b"
EMBED_MODEL = "nomic-embed-text"
DB_PATH = "./chroma_db"
DOCS_DIR = "./docs"

def ensure_model(name):
    r = subprocess.run(["ollama", "list"], capture_output=True, text=True)
    if name not in r.stdout:
        print(f"下载模型 {name}...")
        subprocess.run(["ollama", "pull", name], check=True)

def build_db():
    print(f"扫描文档目录: {DOCS_DIR}")
    loader = DirectoryLoader(DOCS_DIR, glob="**/*")
    docs = loader.load()
    print(f"加载了 {len(docs)} 个文档")

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=500, chunk_overlap=50,
        separators=["nn", "n", "。", "！", "？", ".", "!", "?", " ", ""]
    )
    chunks = splitter.split_documents(docs)
    print(f"切分成 {len(chunks)} 个文本块")

    if os.path.exists(DB_PATH):
        shutil.rmtree(DB_PATH)

    Chroma.from_documents(
        documents=chunks,
        embedding=OllamaEmbeddings(model=EMBED_MODEL),
        persist_directory=DB_PATH,
    )
    print(f"向量库已保存")

def load_db():
    return Chroma(
        persist_directory=DB_PATH,
        embedding_function=OllamaEmbeddings(model=EMBED_MODEL),
    )

def main():
    ensure_model(MODEL)
    ensure_model(EMBED_MODEL)

    if not os.path.exists(DB_PATH) or "--rebuild" in sys.argv:
        build_db()

    vectorstore = load_db()
    retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
    llm = ChatOllama(model=MODEL, temperature=0)

    prompt = ChatPromptTemplate.from_messages([
        ("system", "基于以下参考资料回答问题。如果资料中没有相关信息，直接说"文档中没有相关内容"。nn参考资料：n{context}"),
        ("human", "{input}"),
    ])

    chain = create_retrieval_chain(
        retriever,
        create_stuff_documents_chain(llm, prompt)
    )

    print("输入问题开始查询（quit 退出 / rebuild 重建索引）")
    while True:
        q = input("n你的问题：").strip()
        if q.lower() in ('quit', 'exit', '退出'):
            break
        if q.lower() == 'rebuild':
            build_db()
            continue

        print("&#x23f3; 回答中...")
        result = chain.invoke({"input": q})
        print(f"n&#x1f4ac; {result['answer']}")

        print("n&#x1f4ce; 引用来源：")
        for i, doc in enumerate(result["context"], 1):
            print(f"  [{i}] {doc.metadata.get('source', 'unknown')}")
            print(f"      {doc.page_content[:200]}...")

if __name__ == "__main__":
    main()</code></pre>
<hr />
<h2>用法</h2>
<pre><code class="language-bash">mkdir docs
cp /path/to/your/document.pdf docs/

cd ~/rag-demo
source venv/bin/activate
python3 rag.py</code></pre>
<p>添加新文档后 <code>python3 rag.py --rebuild</code> 重建索引。</p>
<p>支持的文件格式：PDF、Word、Markdown、TXT、HTML、Excel 都能读。</p>
<hr />
<h2>调优</h2>
<p>回答不准：chunk_size 改小（500→300），TOP_K 改大（3→5），temperature 降低。</p>
<p>速度慢：嵌入模型别用主模型，nomic-embed-text 快很多。向量库只在首次构建时耗时，后续直接加载。</p>
<p>7B 模型在 CPU 上回答一次大概 10~30 秒，GPU 上 3~8 秒。</p>
<p>想给别人用？接个 Gradio 界面：</p>
<pre><code class="language-bash">pip install gradio</code></pre>
<p>代码最后加上：</p>
<pre><code class="language-python">import gradio as gr

def answer(q):
    result = chain.invoke({"input": q})
    return result["answer"]

gr.Interface(fn=answer, inputs=gr.Textbox(label="问题"), outputs=gr.Textbox(label="回答")).launch()</code></pre>
]]></content:encoded>
					
					<wfw:commentRss>https://95bok.cn/ollama-langchain-rag-knowledge-base/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
