使用llm-clip构建图像搜索引擎，使用llm chat与模特聊天

LLM是我的 CLI 工具和 Python 库的组合，用于处理大型语言模型。我刚刚发布了LLM 0.10 ，它具有两个重要的新功能：对二进制文件的嵌入支持和llm chat命令。

使用 CLIP 嵌入图像进行图像搜索

当我上周发布 0.9 时，我写了一篇关于 LLM 对嵌入的支持（包括它们是什么以及它们为何有趣）的文章。

最初的版本只能处理文本嵌入——非常适合构建语义搜索和查找相关内容，但无法处理其他类型的数据。

事实证明，有一些非常有趣的嵌入模型可用于处理二进制数据。对我来说，最重要的是CLIP ，由 OpenAI 于 2021 年 1 月发布。

CLIP 有一个非常令人印象深刻的技巧：它可以将文本和图像嵌入到同一个向量空间中。

这意味着您可以为照片集合创建索引，每张照片都放置在 512 维空间中的某个位置。然后，您可以获取一个文本字符串（例如“快乐的狗”）并将其嵌入到同一空间中。最接近该位置的图像将是包含快乐狗的图像！

我的llm-clip插件提供了 CLIP 模型，通过SentenceTransformers加载。您可以像这样安装并运行它：

 llm 安装 llm-clip  
llm embed-multi photos --files photos/ ' *.jpg ' --binary -m 剪辑

这将安装llm-clip插件，然后使用embed-multi使用clip模型将所有 JPEG 文件嵌入photos/目录中。

生成的嵌入向量存储在称为photos嵌入集合中。默认情况下，进入由 LLM 管理的embeddings.db SQLite 数据库，或者您可以添加-d photos.db将其存储在单独的数据库中。

然后，您可以使用llm like对该集合运行文本相似性搜索：

 llm 类似照片 -c '浣熊'

我回来了：

 {"id": "IMG_4801.jpeg", "score": 0.28125139257127457, "content": null, "metadata": null} {"id": "IMG_4656.jpeg", "score": 0.26626441704164294, "content": null, "metadata": null} {"id": "IMG_2944.jpeg", "score": 0.2647445926996852, "content": null, "metadata": null}

果然， IMG_4801.jpeg是这样的：

垃圾桶里浣熊的夜间模糊照片。浣熊脸下方的垃圾箱上印有“垃圾”标签

（我很兴奋地拍了一张真正的垃圾桶里的垃圾熊猫的照片。）

CLIP 在这一点上是一个相当古老的模型，并且有很多有趣的替代方案正在等待有人将它们包装在插件中。我对 Facebook 的ImageBind特别兴奋，它可以将图像、文本、音频、深度、热和 IMU 数据全部嵌入到同一个向量空间中！

使用`llm chat`与模特聊天

LLM 0.10 中的另一个重要功能是新的llm chat命令。

在此版本之前，与模型进行持续对话的方法是通过-c/--continue标志。你可以这样开始对话：

 llm -m gpt-4 '帮助我理解 Python 中的生成器'

然后使用-c提出后续问题，如下所示：

 llm -c '给我看一个涉及 Riverdale 演员的例子'

这很有效，所有内容都会记录到 SQLite，因此您可以运行llm logs -c来查看最近对话的完整日志。

您也可以继续过去的对话，使用llm --cid ID指定记录在这些日志中的对话 ID。

但有一个大问题：LLM 插件允许您在自己的计算机上安装模型，但与它们的跨越多个命令调用的对话会导致该模型被加载到内存中，然后在每次提示时再次卸载。

新的llm 聊天解决了这个问题。您现在可以运行llm chat -m model_id在终端中与您选择的模型启动交互式聊天。

这是使用 Llama 2 13B 的示例，通过llm-mlc插件安装。

 llm 聊天-m mlc-聊天-Llama-2-13b-聊天-hf-q4f16_1

您可以为模型设置别名，以便于记住。

以下是与 Llama 2 的聊天会话示例：

 Type 'exit' or 'quit' to exit Type '!multi' to enter multiple lines, then '!end' to finish > Who are you? Hello! I'm just an AI, here to assist you with any questions you may have. My name is LLaMA, and I'm a large language model trained to provide helpful and informative responses to a wide range of questions and topics. I'm here to help you find the information you need, while ensuring a safe and respectful conversation. Please feel free to ask me anything, and I'll do my best to provide a helpful and accurate response. > Tell me a joke about otters Sure, here's a joke about otters: Why did the otter go to the party? Because he heard it was a "whale" of a time! (Get it? Whale, like a big sea mammal, but also a "wild" or "fun" time. Otters are known for their playful and social nature, so it's a lighthearted and silly joke.) I hope that brought a smile to your face! Do you have any other questions or topics you'd like to discuss? > exit

我喜欢 Llama 2 热情地解释自己的笑话！这很好地说明了为什么像RLHF这样的技术如此重要——如果你想教一个人不要这么老土，你就需要超越原始语言模型。

一旦您点击<enter> ，您的聊天的每一行都会被执行。有时您可能需要输入多行提示，例如，如果您需要粘贴错误消息。您可以使用!multi令牌来做到这一点，如下所示：

 llm 聊天-m gpt-4

 Chatting with gpt-4 Type 'exit' or 'quit' to exit Type '!multi' to enter multiple lines, then '!end' to finish > !multi custom-end Explain this error: File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/urllib/request.py", line 1391, in https_open return self.do_open(http.client.HTTPSConnection, req, File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/urllib/request.py", line 1351, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known> !end custom-end

llm chat还支持系统提示和模板。如果你想和有感知力的芝士蛋糕聊天，试试这个：

 llm 聊天-m gpt-3.5-turbo --system '  
你是一个典型的有感知力的芝士蛋糕，有强烈的意见  
谁总是谈论芝士蛋糕'

您也可以将它们另存为模板：

 llm --system '你是一个典型的有感知能力的芝士蛋糕  
总是谈论芝士蛋糕的强烈意见' --save芝士蛋糕-m gpt-4  
llm聊天-t芝士蛋糕

有关更多选项，请参阅llm 聊天文档。

参与其中

我对法学硕士的抱负是提供最简单的方法来尝试新模型，包括全尺寸的大型语言模型和现在的嵌入模型（例如 CLIP）。

我不会自己编写所有这些插件！

如果您想提供帮助，请来#llm Discord 频道打个招呼。

原文： http://simonwillison.net/2023/Sep/12/llm-clip-and-chat/#atom-everything