This article relies largely or entirely on a single source. (December 2011)
This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (June 2016) (Learn how and when to remove this template message)
"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another. This is frequently used to save CPU cycles and memory bandwidth when transmitting a file over a network.
Zero-copy versions of operating system elements, such as device drivers, file systems, and network protocol stacks, greatly increase the performance of certain application programs and more efficiently utilize system resources. Performance is enhanced by allowing the CPU to move on to other tasks while data copies proceed in parallel in another part of the machine. Also, zero-copy operations reduce the number of time-consuming mode switches between user space and kernel space. System resources are utilized more efficiently since using a sophisticated CPU to perform extensive copy operations, which is a relatively simple task, is wasteful if other simpler system components can do the copying.
As an example, reading a file and then sending it over a network the traditional way requires two data copies and two context switches per read/write cycle. One of those data copies uses the CPU. Sending the same file via zero copy reduces the context switches to two and eliminates all CPU data copies.
Zero-copy protocols are especially important for high-speed networks in which the capacity of a network link approaches or exceeds the CPU's processing capacity. In such a case the CPU spends nearly all of its time copying transferred data, and thus becomes a bottleneck which limits the communication rate to below the link's capacity. A rule of thumb used in the industry is that roughly one CPU clock cycle is needed to process one bit of incoming data.
An early implementation was IBM OS/360 where a program could instruct the channel subsystem to copy blocks of data from one file or device to another without the nucleus having to copy the data to an intermediate buffer.
Techniques for creating zero-copy software include the use of direct memory access (DMA)-based copying and memory-mapping through an memory management unit (MMU). These features require specific hardware support and usually involve particular memory alignment requirements.
A newer approach used by the Heterogeneous System Architecture (HSA) facilitates the passing of pointers between the CPU and the GPU and also other processors. This requires a unified address space for the CPU and the GPU.
Several operating systems support zero-copying of files through specific APIs.
- sys/socket.h's sendfile, sendfile64
- splice, tee, vmsplice
- process_vm_readv, process_vm_writev
- raw sockets with packet mmap or AF_XDP
Microsoft Windows supports zero-copy through the TransmitFile API.
RDMA (Remote Direct Memory Access) protocols deeply rely on zero-copy techniques.
- Zero Copy I: User-Mode Perspective in Linux Journal - January 1, 2003
- "The programmer's guide to the APU galaxy" (PDF).
- "AMD Outlines HSA Roadmap: Unified Memory for CPU/GPU". 2012-02-02.
- "Linux PACKET_MMAP documentation". kernel.org.
- Efficient data transfer through zero copy by Sathish K. Palaniappan and Pramod B. Nagaraja. September 2008