如何用FastAPI和Ollama搭建一个智能查天气助手？

摘要：春节时看到公众号有个留言：“能不能搞个AI助手，要能聊天气、查限行，还得私有部署。” 别慌！这篇手记用FastAPI + Ollama + 开源模型，加上一点点天气数据微调，跑通一个能对话查询天气的dem

📝 摘要：春节时看到公众号有个留言：“能不能搞个AI助手，要能聊天气、查限行，还得私有部署。” 别慌！这篇手记用FastAPI + Ollama + 开源模型，加上一点点天气数据微调，跑通一个能对话查询天气的demo。全程口语化，附代码、踩坑记录，直接可以搬到你的项目里。 🚨 一、公众号留言我盯着手机，脑子里闪过一万个念头：从头训练？没卡；调API？数据出局；用开源模型？怎么让它懂实时天气？这场景是不是特眼熟？别问我怎么知道的，上周我刚经历过。好在折腾几天总算跑通了，今天就把这套FastAPI + Ollama + 微调的组合拳拆给你看，代码直接复制，改改就能用。 🗺️ 二、先画张地图：我们要干三件事 🔹 第一 —— 用Ollama跑一个开源大模型（比如qwen2.5:3b），让它能对话。 🔹 第二 —— 给模型装个“工具包”：通过FastAPI调用天气API，实现实时查询。 🔹 第三 —— 用过去一年的天气数据做个LoRA微调，让模型更懂天气问答的口气。 🔹 附赠 —— 用Docker Compose一键启动，扔给运维同事不吵架。 ⚙️ 三、地基：Ollama + FastAPI 极简搭 Ollama 这东西，我愿称之为“大模型界的Docker”，一行命令拉模型，一行命令起服务。咱们先用它跑个轻量模型： # 安装ollama（mac/linux都有脚本，windows有exe） curl -fsSL https://ollama.com/install.sh | sh # 拉取一个3b模型，够用还不吃显卡 ollama pull qwen2.5:3b # 启动服务（默认11434端口） ollama serve 接下来是FastAPI，它就像个智能接线员，把用户的提问转给大模型，再把模型回话包装成API。先写个最简版本： # main.py from fastapi import FastAPI from pydantic import BaseModel import httpx app = FastAPI() OLLAMA_URL = "http://localhost:11434/api/generate" class ChatRequest(BaseModel): message: str @app.post("/chat") async def chat(req: ChatRequest): async with httpx.AsyncClient() as client: payload = { "model": "qwen2.5:3b", "prompt": req.message, "stream": False } resp = await client.post(OLLAMA_URL, json=payload) return {"reply": resp.json()["response"]} ⚠️ 踩坑预警：我第一次写忘了加stream=False，结果返回了一堆流式chunk，前端直接崩了。记住，简单demo就别开流式了。 🌤️ 四、让模型学会“查天气”：工具调用 + 微调双保险现在模型只会瞎聊，得让它知道：当用户问天气时，要去调外部API。有两种路子： 🔸 方法A：函数调用（function calling） —— 让模型输出特定格式，我们解析后调接口。简单直接，适合快速验证。 🔸 方法B：微调（fine-tuning） —— 用一批天气问答数据训练模型，让它内化“查天气”这个动作。效果更自然，但需要数据。我两个都试了，最后用了“微调+工具调用”混搭——微调让模型更主动问地点，工具调用保证数据实时。先看工具调用咋写： # 给ollama的prompt里加入工具描述 SYSTEM_PROMPT = """ 你是一个天气助手。当用户询问天气时，你必须输出JSON格式的查询参数，例如：{"city": "北京"}。我会根据这个参数去调用天气API，然后把结果给你，你再生成自然语言回复。 """ # 然后在chat接口里加入工具调用逻辑 @app.post("/chat") async def chat(req: ChatRequest): # 先调ollama，让它决定是否要查天气 # ... 省略重复代码 # 解析返回内容，如果包含JSON格式，就调天气API # 调完后再把天气数据拼进prompt，让模型生成最终回答完整代码有点长，重点是理解其中的逻辑，具体数据解析和系统提示词可再根据返回数据再作调整。这里说个翻车经验：一开始让模型自己决定要不要查天气，结果它老是不按格式输出，后来加了few-shot样例才好。 pip install fastapi uvicorn httpx pydantic import json import re from typing import List, Optional import httpx from fastapi import FastAPI, HTTPException from pydantic import BaseModel app = FastAPI() # Ollama 服务地址（容器化后可能是 http://ollama:11434） OLLAMA_URL = "http://localhost:11434/api/generate" # 使用的模型名称（微调后可用 weather-assistant） MODEL_NAME = "qwen2.5:3b" # 天气 API 配置（以和风天气为例，你需要注册获取 key） WEATHER_API_KEY = "d3751b3xxxxxx" # 和风天气开发控制台获取 WEATHER_API_URL_HOST = "nc3xxx.re.qweatherapi.com" # 和风天气开发控制台获取 # 对话历史存储（简单起见用内存，生产环境建议用 Redis） conversation_history: List[dict] = [] class ChatRequest(BaseModel): message: str session_id: Optional[str] = None # 用于区分不同对话 class ChatResponse(BaseModel): reply: str used_tool: bool = False def extract_json(text: str) -> Optional[dict]: """从模型回复中提取第一个 JSON 对象""" # 匹配 {...} 或 [...] 格式 json_pattern = r'(\{.*\}|\[.*\])' match = re.search(json_pattern, text, re.DOTALL) if match: try: return json.loads(match.group()) except json.JSONDecodeError: return None return None async def get_location_id(city_name: str) -> Optional[str]: """通过城市名获取和风天气Location ID""" geo_url = f"https://{WEATHER_API_URL_HOST}/geo/v2/city/lookup" params = { "location": city_name, "key": WEATHER_API_KEY } async with httpx.AsyncClient() as client: try: resp = await client.get(geo_url, params=params, timeout=10) data = resp.json() if data.get("code") == "200" and data.get("location"): # 返回第一个结果的ID（通常是最匹配的） location_id = data["location"][0]["id"] print(f"找到城市 {city_name} 的ID: {location_id}") return location_id else: print(f"未找到城市: {city_name}, 错误码: {data.get('code')}") return None except Exception as e: print(f"GeoAPI调用失败: {e}") return None async def call_weather_api(city: str) -> Optional[str]: """调用天气 API 获取实时天气（先查ID再查天气）""" # 第一步：获取城市ID location_id = await get_location_id(city) if not location_id: return f"抱歉，没有找到 '{city}' 的天气信息，请检查城市名称是否正确。" # 第二步：用ID查实时天气 weather_url = f"https://{WEATHER_API_URL_HOST}/v7/weather/now" params = { "location": location_id, "key": WEATHER_API_KEY } async with httpx.AsyncClient() as client: try: resp = await client.get(weather_url, params=params, timeout=10) data = resp.json() print(f"Weather API响应: {data}") if data.get("code") == "200": now = data["now"] # 返回格式化的天气信息 return f"{city} 当前天气：{now['text']}，温度 {now['temp']}℃，体感温度 {now['feelsLike']}℃，{now['windDir']} {now['windScale']}级。" else: return f"天气查询失败，错误码: {data.get('code')}" except Exception as e: print(f"天气 API 调用失败: {e}") return f"天气服务暂时不可用，请稍后再试。" async def call_ollama(prompt: str, system: str = None) -> str: """调用 Ollama 生成回复""" payload = { "model": MODEL_NAME, "prompt": prompt, "system": system, "stream": False, "options": { "temperature": 0.7, "max_tokens": 500 } } async with httpx.AsyncClient() as client: resp = await client.post(OLLAMA_URL, json=payload, timeout=60) if resp.status_code == 200: return resp.json()["response"] else: raise HTTPException(status_code=500, detail="Ollama 服务出错") @app.post("/chat", response_model=ChatResponse) async def chat_endpoint(req: ChatRequest): # 构建带上下文的 prompt history = conversation_history[-5:] # 只保留最近 5 轮 context = "" for turn in history: context += f"用户：{turn['user']}\n助手：{turn['assistant']}\n" # System prompt 明确要求输出 JSON 以触发工具 system_prompt = """你是一个天气助手，能够通过调用工具获取实时天气。当用户询问天气时，你必须先输出一个 JSON 对象，格式为 {"city": "城市名"}，然后再输出自然语言回复。如果用户没有明确城市，你可以反问用户。对于其他问题，正常对话即可。""" full_prompt = f"{context}用户：{req.message}\n助手：" # 第一轮调用 Ollama raw_reply = await call_ollama(full_prompt, system=system_prompt) # 尝试提取 JSON tool_call = extract_json(raw_reply) used_tool = False if tool_call and "city" in tool_call: # 调天气 API city = tool_call["city"] weather_info = await call_weather_api(city) if weather_info: # 将天气信息作为上下文重新请求模型生成最终回答 second_prompt = f"{full_prompt}（工具返回：{weather_info}）\n请根据这个信息生成自然语言回复。" final_reply = await call_ollama(second_prompt, system=system_prompt) used_tool = True # 清理可能残留的 JSON 片段 final_reply = re.sub(r'\{.*?\}', '', final_reply).strip() else: final_reply = f"抱歉，查询 {city} 的天气失败了，请稍后再试。" else: final_reply = raw_reply # 保存对话历史 conversation_history.append({ "user": req.message, "assistant": final_reply }) return ChatResponse(reply=final_reply, used_tool=used_tool) @app.get("/health") async def health(): return {"status": "ok"} if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) 📊 五、加点数据微调：让模型更“懂行” 微调听起来高大上，其实现在有LoRA这种低成本技术，几张图的数据就能见效。比如用过去一年的天气记录（城市、温度、天气状况）创建500条问答对，大概格式长这样： {"instruction": "北京今天天气怎么样？", "output": "北京今天晴，气温-2℃到8℃，北风3级，空气质量优。建议穿羽绒服。"} {"instruction": "上海下雨了吗？", "output": "上海目前小雨，气温10℃到12℃，东北风2级。出门记得带伞。"} {"instruction": "广州明天适合穿什么？", "output": "广州明天多云，22℃到28℃，建议穿短袖+薄外套。"} {"instruction": "帮我查查成都的限行", "output": "成都今天限行尾号3和8，限行时间7:30-20:00。"} 然后用 unsloth 库对qwen2.5做LoRA微调，关键代码：（一直不太想玩大模型的原因就是太吃配置，太耗时，以下训练时参数还需要根据实际需求作调整，此处只作参考） pip install unsloth transformers datasets torch accelerate import json from datasets import Dataset from unsloth import FastLanguageModel import torch from transformers import TrainingArguments from trl import SFTTrainer # 1. 加载基础模型 model, tokenizer = FastLanguageModel.from_pretrained( model_name="qwen/Qwen2.5-3B", max_seq_length=512, dtype=None, # 自动选择 load_in_4bit=True, # 节省显存 ) # 2. 添加 LoRA 适配器 model = FastLanguageModel.get_peft_model( model, r=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_alpha=16, lora_dropout=0, bias="none", use_gradient_checkpointing=True, random_state=42, max_seq_length=512, ) # 3. 准备数据 def format_instruction(example): return { "text": f"用户：{example['instruction']}\n助手：{example['output']}" } with open("weather_train.jsonl", "r") as f: raw_data = [json.loads(line) for line in f] dataset = Dataset.from_list(raw_data) dataset = dataset.map(format_instruction) # 4. 设置训练参数 trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, dataset_text_field="text", max_seq_length=512, args=TrainingArguments( per_device_train_batch_size=2, gradient_accumulation_steps=4, warmup_steps=10, num_train_epochs=3, learning_rate=2e-4, fp16=not torch.cuda.is_bf16_supported(), bf16=torch.cuda.is_bf16_supported(), logging_steps=10, output_dir="outputs", optim="adamw_8bit", save_strategy="epoch", ), ) # 5. 开始训练 trainer.train() # 6. 保存 LoRA 权重 model.save_pretrained("lora-weather") tokenizer.save_pretrained("lora-weather") print("微调完成，权重保存在 lora-weather 目录") 注意：微调后要把LoRA权重合并或通过Ollama的Modelfile导入。Ollama支持直接加载safetensors，可以写个Modelfile： FROM qwen2.5:3b ADAPTER ./lora-weather # 挂载LoRA权重然后用 ollama创建新模型。这样ollama跑的就是微调后的模型了。 ollama create weather-assistant -f Modelfile 验证，Ok后就可以在 FastAPI 中将MODEL_NAME改为weather-assistant来使用微调后的模型了 ollama run weather-assistant "北京今天天气怎么样？" 🐳 六、容器化：一键部署，告别“在我电脑上好好的” 本地跑通了，要交给运维？必须容器化！整个docker-compose把fastapi和ollama装一起： # docker-compose.yml version: '3.8' services: ollama: image: ollama/ollama:latest volumes: - ./ollama:/root/.ollama ports: - "11434:11434" command: serve fastapi: build: . ports: - "8000:8000" environment: - OLLAMA_URL=http://ollama:11434 depends_on: - ollama FastAPI的Dockerfile里记得把微调后的模型拷进去，或者在启动时用ollama create创建。这里有个大坑：容器内ollama默认会下载模型，如果网络慢，可以提前ollama pull好并挂载目录。 🧠 七、几点进阶思考（别再踩我踩过的坑） 🔸 微调不是必须的：如果只是简单查天气，写好prompt+工具调用完全够用。微调更适合让模型学会复杂的对话风格或领域术语。 🔸 数据质量 > 数量：可能100条高质量数据微调，效果比500条噪声数据好得多。一定要清洗数据，把“今天天气咋样”这种口语都覆盖到。 🔸 对话上下文管理：别把历史消息一股脑全塞给模型，用个滑动窗口保留最近几轮就够了，不然容易超限。 🔸 限行和穿搭推荐：一样的套路，准备好对应的API（高德地图/和风天气都有限行接口），模型识别意图后调用即可。

👭 最后唠叨两句最好的学习就是自己把代码跑一遍，逐个解决出现的问题，要是卡在哪了，评论区留言，我看到了就回——毕竟程序员不帮程序员，谁帮？ 💡 对了，别忘了关注+收藏，下次聊些更好玩的～

如何用FastAPI和Ollama搭建一个智能查天气助手？

相关推荐