使用 Python 在 macOS 上创建节省空间的克隆

Mac 的标准文件系统APFS具有一项名为“节省空间的克隆”的功能。此功能允许您创建文件的多个副本，而无需使用额外的磁盘空间——文件系统仅存储数据的单个副本。

虽然克隆文件共享数据，但它们彼此独立——您可以编辑其中一个副本而不会影响另一个副本（这与符号链接或硬链接不同）。APFS 使用一种称为“写时复制”的技术将数据高效地存储在磁盘上——克隆文件将继续共享它们共有的部分。

克隆文件比复制更快，占用的磁盘空间也更少。如果您处理的是大型文件（例如照片、视频或数据集），那么节省空间的克隆将大有裨益。

有几种文件系统支持克隆，但在这篇文章中，我将重点介绍 macOS 和 APFS。

在最近的一个项目中，我想用 Python 克隆文件。Python 标准库中有一个开放的工单，用于支持文件克隆。Python 3.14 中新增了Path.copy()函数，增加了对 Linux 系统克隆的支持——但目前 macOS 系统尚不支持。

在这篇文章中，我将向您展示使用 Python 在 APFS 中克隆文件的两种方法。

目录

克隆有什么好处？

克隆文件比复制占用更少的磁盘空间

克隆文件比复制更快

如何在 macOS 上克隆文件？

使用 Finder 中的“复制”命令

在命令行上使用cp -c

使用clonefile()函数

如何使用 Python 克隆文件？

使用subprocess执行cp -c

使用ctypes调用clonefile()函数

实际上，如何在 Python 中克隆文件？

克隆有什么好处？

使用克隆而不是副本有两个主要好处。

克隆文件比复制占用更少的磁盘空间

由于文件系统只需保留一份数据副本，因此克隆文件并不会占用更多磁盘空间。我们可以通过实验来验证这一点。首先，创建一个包含 1GB 数据的随机文件，并检查剩余的磁盘空间：

 $ dd if = /dev/urandom of = 1GB.bin bs = 64M count = 16 16+0 records in 16+0 records out 1073741824 bytes transferred in 2.113280 secs (508092550 bytes/sec) $ df -h -I / Filesystem Size Used Avail Capacity Mounted on /dev/disk3s1s1 460Gi 14Gi 43Gi 25% /我的磁盘目前有 43GB 可用空间。 $ dd if = /dev/urandom of = 1GB.bin bs = 64M count = 16 16+0 records in 16+0 records out 1073741824 bytes transferred in 2.113280 secs (508092550 bytes/sec) $ df -h -I / Filesystem Size Used Avail Capacity Mounted on /dev/disk3s1s1 460Gi 14Gi 43Gi 25% /

让我们复制文件，并在完成后检查可用磁盘空间。注意，它减少到 42GB，因为文件系统现在存储了这个 1GB 文件的第二个副本：

 $ # Copying $ cp 1GB.bin copy.bin $ df -h -I / Filesystem Size Used Avail Capacity Mounted on /dev/disk3s1s1 460Gi 14Gi 42Gi 25% /现在让我们通过传递$ # Copying $ cp 1GB.bin copy.bin $ df -h -I / Filesystem Size Used Avail Capacity Mounted on /dev/disk3s1s1 460Gi 14Gi 42Gi 25% /

将-c标志添加到cp 。请注意，可用磁盘空间保持不变，因为文件系统只在原始文件和克隆文件之间保留一份数据副本：

 $ # Cloning $ cp -c 1GB.bin clone.bin $ df -h -I / Filesystem Size Used Avail Capacity Mounted on /dev/disk3s1s1 460Gi 14Gi 42Gi 25% /克隆文件比复制更快$ # Cloning $ cp -c 1GB.bin clone.bin $ df -h -I / Filesystem Size Used Avail Capacity Mounted on /dev/disk3s1s1 460Gi 14Gi 42Gi 25% /

克隆文件时，文件系统只需写入少量有关新克隆文件的元数据。而复制文件时，则需要写入整个文件的所有字节。这意味着克隆文件比复制文件快得多，我们可以通过对两种方法进行计时来验证这一点：

 $ # Copying $ time cp 1GB.bin copy.bin Executed in 260.07 millis $ # Cloning $ time cp -c 1GB.bin clone.bin Executed in 6.90 millis这 43 倍的差异是我 Mac 内置 SSD 的。根据我的经验，速度较慢的磁盘（例如外置硬盘）的速度差异会更加明显。 $ # Copying $ time cp 1GB.bin copy.bin Executed in 260.07 millis $ # Cloning $ time cp -c 1GB.bin clone.bin Executed in 6.90 millis

如何在 macOS 上克隆文件？

使用 Finder 中的“复制”命令

如果您使用 Finder 中的“复制”命令（文件 > 复制或 ⌘D），它会克隆该文件。

在命令行上使用`cp -c`

如果您使用带有-c标志的cp （复制）命令，并且能够克隆文件，则您将获得克隆文件而非副本。如果无法克隆文件（例如，如果您使用的是不支持克隆的非 APFS 卷），则您将获得常规副本。

它看起来是这样的：

 $ cp -c src.txt dst.txt使用$ cp -c src.txt dst.txt

`clonefile()`函数

macOS 系统调用clonefile()可以创建节省空间的克隆文件。它是与 APFS 一起推出的。

系统调用是相当底层的，是程序与操作系统交互的方式。我想我从来没有直接进行过系统调用——我使用过像 Python os模块这样的包装器，它们会帮我进行系统调用，但我从未编写过自己的代码来调用它们。

下面是一个使用clonefile()克隆文件的基本 C 程序：

 #include <stdio.h> #include <stdlib.h> #include <sys/clonefile.h> int main ( void ) { const char * src = "1GB.bin" ; const char * dst = "clone.bin" ; /* clonefile(2) supports several options related to symlinks and * ownership information, but for this example we'll just use * the default behaviour */ const int flags = 0 ; if ( clonefile ( src , dst , flags ) != 0 ) { perror ( "clonefile failed" ); return EXIT_FAILURE ; } printf ( "clonefile succeeded: %s ~> %s \n " , src , dst ); return EXIT_SUCCESS ; }您可以像这样编译并运行该程序： #include <stdio.h> #include <stdlib.h> #include <sys/clonefile.h> int main ( void ) { const char * src = "1GB.bin" ; const char * dst = "clone.bin" ; /* clonefile(2) supports several options related to symlinks and * ownership information, but for this example we'll just use * the default behaviour */ const int flags = 0 ; if ( clonefile ( src , dst , flags ) != 0 ) { perror ( "clonefile failed" ); return EXIT_FAILURE ; } printf ( "clonefile succeeded: %s ~> %s \n " , src , dst ); return EXIT_SUCCESS ; }

 $ gcc clone.c $ ./a.out clonefile succeeded: 1GB.bin ~> clone.bin $ ./a.out clonefile failed: File exists但是我在任何项目中都没有使用 C – 我可以从 Python 调用这个函数吗？ $ gcc clone.c $ ./a.out clonefile succeeded: 1GB.bin ~> clone.bin $ ./a.out clonefile failed: File exists

如何使用 Python 克隆文件？

使用`subprocess`执行`cp -c`

在 Python 中克隆文件最简单的方法是使用subprocess模块执行cp -c 。这里有一个简短的例子：

 import subprocess # Adding the `-c` flag means the file is cloned rather than copied, # if possible. See the man page for `cp`. subprocess . check_call ([ " cp " , " -c " , " 1GB.bin " , " clone.bin " ])我认为这段代码非常简单，新读者也能理解它的作用。如果他们不熟悉 APFS 上的文件克隆，他们可能无法立即理解这与import subprocess # Adding the `-c` flag means the file is cloned rather than copied, # if possible. See the man page for `cp`. subprocess . check_call ([ " cp " , " -c " , " 1GB.bin " , " clone.bin " ])

shutil.copyfile ，但他们很快就能解决这个问题。

这种方法继承了cp命令的所有优点——例如，如果您尝试在不支持克隆的卷上进行克隆，它将回退到常规文件复制。虽然生成外部进程会产生一些开销，但总体影响可以忽略不计（并且很容易被克隆速度的提升所抵消）。

这种方法的问题在于错误处理会变得更加困难。cp cp每次出错都会返回退出代码 1，因此你需要解析 stderr 来区分不同的错误，或者实现自己的错误处理。

在我的项目中，我将这个cp调用包装在一个函数中，该函数进行了一些额外的检查，以识别常见的错误类型，并将它们作为更具体的异常抛出。其余错误将作为通用的subprocess.CalledProcessError抛出。以下是一个例子：

 from pathlib import Path import subprocess def clonefile ( src : Path , dst : Path ): """ Clone a file on macOS by using the `cp` command. """ # Check a couple of common error cases so we can get nice exceptions, # rather than relying on the `subprocess.CalledProcessError` from `cp`. if not src . exists (): raise FileNotFoundError ( src ) if not dst . parent . exists (): raise FileNotFoundError ( dst . parent ) # Adding the `-c` flag means the file is cloned rather than copied, # if possible. See the man page for `cp`. subprocess . check_call ([ " cp " , " -c " , str ( src ), str ( dst )]) assert dst . exists ()对我来说，这段代码在可读性和返回良好错误之间取得了良好的平衡。 from pathlib import Path import subprocess def clonefile ( src : Path , dst : Path ): """ Clone a file on macOS by using the `cp` command. """ # Check a couple of common error cases so we can get nice exceptions, # rather than relying on the `subprocess.CalledProcessError` from `cp`. if not src . exists (): raise FileNotFoundError ( src ) if not dst . parent . exists (): raise FileNotFoundError ( dst . parent ) # Adding the `-c` flag means the file is cloned rather than copied, # if possible. See the man page for `cp`. subprocess . check_call ([ " cp " , " -c " , str ( src ), str ( dst )]) assert dst . exists ()

使用`ctypes`调用`clonefile()`函数

如果我们想要详细的错误代码，又不想产生启动外部进程的开销，该怎么办？虽然我知道可以使用ctypes库在 Python 中进行系统调用，但我从未真正这样做过。这可是个学习的好机会！

按照ctypes的文档，以下步骤如下：

导入ctypes并加载动态链接库。这是我们需要做的第一件事——在本例中，我们加载包含clonefile()函数的 macOS 链接库。
```
 import ctypes libSystem = ctypes . CDLL ( " libSystem.B.dylib " )我发现我需要加载import ctypes libSystem = ctypes . CDLL ( " libSystem.B.dylib " )
```
我在 GitHub 上查看了其他ctypes代码示例，找到了libSystem.B.dylib 。但在 Apple 的文档中找不到相关解释。

后来我发现可以用otool查看编译后的可执行文件链接到的共享库。例如，我可以看到cp链接到了同一个libSystem.B.dylib ：
```
 $ otool -L /bin/cp /bin/cp: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1351.0.0)这$ otool -L /bin/cp /bin/cp: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1351.0.0)
```
CDLL()调用仅适用于 macOS，这很合理——它正在加载 macOS 库。如果我在我的 Debian Web 服务器上运行这段代码，会收到错误： OSError: libSystem.B.dylib: 无法打开共享对象文件：没有此文件或目录。
告诉ctypes函数签名。如果我们查看clonefile()的手册页，我们会看到 C 函数的签名：
```
我们需要告诉int clonefile ( const char * src , const char * dst , int flags );
```
ctypes在libSystem.B.dylib中找到该函数，然后描述该函数的参数和返回类型：
```
 clonefile = libSystem . clonefile clonefile . argtypes = [ ctypes . c_char_p , ctypes . c_char_p , ctypes . c_int ] clonefile . restype = ctypes . c_int虽然clonefile = libSystem . clonefile clonefile . argtypes = [ ctypes . c_char_p , ctypes . c_char_p , ctypes . c_int ] clonefile . restype = ctypes . c_int
```
如果您不描述签名， ctypes可以调用 C 函数，这是一个很好的做法，并为您提供了一些安全保障。

例如，现在ctypes知道clonefile()函数需要三个参数。如果我尝试使用一个或两个参数调用该函数，就会抛出TypeError 。如果我没有指定签名，我可以使用任意数量的参数调用它，但可能会出现一些奇怪或意想不到的行为。

定义函数的输入。此函数需要三个参数。

在原始 C 函数中， src和dst是char*类型的指针，指向以char字符结尾的字符串。在 Python 中，这意味着输入必须是bytes串。因此， flags是一个常规的 Python int 。

 # Source and destination files src = b " 1GB.bin " dst = b " clone.bin " # clonefile(2) supports several options related to symlinks and # ownership information, but for this example we'll just use # the default behaviour flags = 0调用该函数。 # Source and destination files src = b " 1GB.bin " dst = b " clone.bin " # clonefile(2) supports several options related to symlinks and # ownership information, but for this example we'll just use # the default behaviour flags = 0

现在我们在 Python 中有了可用的函数，并且输入是 C 兼容类型，我们可以调用该函数：

 import os if clonefile ( src , dst , flags ) != 0 : errno = ctypes . get_errno () raise OSError ( errno , os . strerror ( errno )) print ( f " clonefile succeeded: { src } ~> { dst } " )如果克隆成功，程序就成功运行。但如果克隆失败，就会出现一个无用的错误： import os if clonefile ( src , dst , flags ) != 0 : errno = ctypes . get_errno () raise OSError ( errno , os . strerror ( errno )) print ( f " clonefile succeeded: { src } ~> { dst } " )

OSError：[Errno 0] 未定义错误：0 。

调用 C 函数的目的是获取有用的错误代码，但我们需要选择接收它们。具体来说，我们需要在CDLL调用中添加use_errno参数：

现在，当克隆失败时，我们会根据失败类型收到不同的错误。异常包含数字错误代码，并且 Python 会抛出以下命名子类： libSystem = ctypes . CDLL ( " libSystem.B.dylib " , use_errno = True )

OSError例如FileNotFoundError 、 FileExistsError或PermissionError 。这使得针对特定故障编写try … except块变得更加容易。

这是完整的脚本，它克隆单个文件：

 import ctypes import os # Load the libSystem library libSystem = ctypes . CDLL ( " libSystem.B.dylib " , use_errno = True ) # Tell ctypes about the function signature # int clonefile(const char * src, const char * dst, int flags); clonefile = libSystem . clonefile clonefile . argtypes = [ ctypes . c_char_p , ctypes . c_char_p , ctypes . c_int ] clonefile . restype = ctypes . c_int # Source and destination files src = b " 1GB.bin " dst = b " clone.bin " # clonefile(2) supports several options related to symlinks and # ownership information, but for this example we'll just use # the default behaviour flags = 0 # Actually call the clonefile() function if clonefile ( src , dst , flags ) != 0 : errno = ctypes . get_errno () raise OSError ( errno , os . strerror ( errno )) print ( f " clonefile succeeded: { src } ~> { dst } " )我写这段代码是为了自己学习，绝对不能用于生产环境。在理想情况下，它能正常工作，并且帮助我理解import ctypes import os # Load the libSystem library libSystem = ctypes . CDLL ( " libSystem.B.dylib " , use_errno = True ) # Tell ctypes about the function signature # int clonefile(const char * src, const char * dst, int flags); clonefile = libSystem . clonefile clonefile . argtypes = [ ctypes . c_char_p , ctypes . c_char_p , ctypes . c_int ] clonefile . restype = ctypes . c_int # Source and destination files src = b " 1GB.bin " dst = b " clone.bin " # clonefile(2) supports several options related to symlinks and # ownership information, but for this example we'll just use # the default behaviour flags = 0 # Actually call the clonefile() function if clonefile ( src , dst , flags ) != 0 : errno = ctypes . get_errno () raise OSError ( errno , os . strerror ( errno )) print ( f " clonefile succeeded: { src } ~> { dst } " )

ctypes ，但如果您确实想使用它，您需要适当的错误处理和测试。

具体来说，有些情况下，如果克隆失败，你会希望回退到shutil.copyfile或类似的方法——比如你使用的是旧版本的 macOS，或者你正在一个不支持克隆的卷上复制文件。这两种情况都可以由cp -c处理，而不是clonefile()系统调用。

实际上，如何在 Python 中克隆文件？

在我的项目中，我使用了cp -c和一个类似上面描述的包装器。它的代码很短，可读性很高，并且会在常见情况下返回有用的错误信息。

直接用ctypes调用clonefile()可能比直接用cp -c稍微快一点，但差别可能微乎其微。缺点是它比较脆弱，其他人也更难理解——它本来应该是代码库中唯一使用ctypes的部分。

文件克隆带来了显著的提升。这个项目需要复制大量文件到外部 USB 硬盘上，而克隆文件（而非复制完整文件）则大大加快了速度。以前需要超过一个小时才能完成的任务，现在不到一分钟就能完成。（文件是在同一驱动器上的文件夹之间复制的——克隆文件必须位于同一个 APFS 卷上。）

我很高兴看到使用Path.copy()在 Python 3.14 中的 Linux 上进行文件克隆的工作原理，并且我希望 macOS 支持也能很快实现。

[如果这篇文章的格式在你的阅读器中看起来很奇怪，请访问原始文章]

原文： https://alexwlchan.net/2025/cloning-with-python/?ref=rss

目录