Part XIX — File I/O

§ 19.1-19.25 File Descriptors, mmap, epoll, io_uring, and Unix Plumbing

The Unix file model is a small integer in user space backed by shared kernel objects, virtual memory mappings, metadata, path operations, notification queues, terminal devices, and high-performance event rings.

1. Overview

File I/O starts with an integer file descriptor, but the kernel work happens below it: a per-process descriptor table points to an open file description, the description points through VFS to an inode, and the inode owns metadata plus page-cache mappings.

2. Key Data Structures

The important distinction is descriptor slot versus open file description. dup() and fork() create new descriptor references to the same description, while a second open() usually creates a fresh description.

Kernel ObjectMain FieldsScopeWhy It Matters
FD tablearray index, close-on-exec bit, pointer to fileper process, copied on forkControls which integer names are open and which survive exec.
Open file descriptionfile offset, status flags, refcount, access modeshared by dup and fork aliasesO_APPEND, O_NONBLOCK, and offsets live here.
inodedevice, inode number, mode, size, mappingfilesystem objectIdentifies the file and connects I/O to metadata and page cache.
page cachefile offset to page, dirty state, writeback stateshared kernel cacheNormal reads and writes are cached here before storage I/O.

fork Copies Descriptor Slots, Not Offsets

After fork(), parent and child have separate FD tables, but each copied slot points at the same open file description, so offset movement is visible to both.

3. Core Mechanism

Background: Most file I/O bugs are scope bugs: the programmer changes a descriptor-local thing when the shared open file description matters, or assumes a shared offset is private.

Plan: First decide whether a call creates a new open file description or aliases an existing one. Then decide whether the operation uses the implicit shared offset or an explicit offset. Finally, set close-on-exec atomically at creation time when a descriptor must not leak into a new program image.

Example: A shell opening out.txt for cmd > out.txt creates one write-only description, duplicates it into descriptor 1, closes the temporary FD, and then executes the command. The command never knows about the temporary descriptor; it just writes to stdout.

open() Flags

open() combines access mode, creation policy, status flags on the open file description, and descriptor hygiene flags like O_CLOEXEC.

FlagMeaningUse It WhenHidden Cost or Pitfall
O_RDONLY / O_WRONLY / O_RDWRMutually exclusive access modeEvery open callWrong mode fails later with EBADF.
O_CREATCreate file if missingCreating output filesRequires a mode argument.
O_EXCLFail if target already existsRace-free lock or createUsually meaningful with O_CREAT.
O_TRUNCTruncate existing regular fileReplace output contentsData is destroyed at open time.
O_TMPFILECreate unnamed inode in a directoryBuild complete file before publishFilesystem support varies.
O_APPENDEach write appends atomicallyConcurrent log writerslseek does not choose the write offset.
O_NONBLOCKDo not sleep for readinessEvent loops and pipesCallers must handle EAGAIN.
O_SYNC / O_DSYNCWait for storage durabilityDatabase journals, critical logsLatency rises sharply.
O_DIRECTBypass page cacheDatabase buffer poolsBuffer, length, and offset alignment are strict.
O_CLOEXECSet FD_CLOEXEC at creationMultithreaded programs that fork+execWithout it, descriptors leak to children.
O_NOFOLLOWFail if final path component is symlinkSecurity-sensitive path opensOnly protects the final component.
O_PATHPath reference only*at() syscalls and metadataNo read or write operations.

With O_DIRECT, the kernel avoids filling the page cache and asks the block layer to DMA directly into user pages, but it can reject unaligned buffers with EINVAL.

read/write Family

read() and write() may return fewer bytes than requested. pread() and pwrite() use explicit offsets, so they avoid races around lseek() plus I/O.

readv() and writev() amortize syscall overhead by scattering or gathering bytes across multiple buffers in one kernel entry.

copy_file_range() copies bytes between two file descriptors inside the kernel. On supporting filesystems it can avoid a user-space bounce buffer and may turn into a filesystem-level extent copy.

lseek and Sparse Files

lseek() changes the open file description offset. Seeking beyond EOF followed by a small write creates a sparse file: logical size grows, but the hole consumes no blocks until data is actually written there.

Filesystems that support SEEK_DATA and SEEK_HOLE let tools skip holes efficiently, and fallocate() can punch holes while preserving apparent size.

dup, dup2, dup3

dup() chooses the lowest free descriptor, dup2() targets an exact descriptor and atomically closes it first, and dup3() adds race-free O_CLOEXEC.

Pipes and FIFOs

A pipe is a kernel ring of pipe_buffer entries with two descriptors: one readable end and one writable end. Bytes never live in either process address space unless a reader or writer copies them through a syscall.

Background: Pipes are the default Unix backpressure primitive. Shell pipelines, logging helpers, and parent-child protocols all depend on the same close semantics: no writers means EOF; no readers means SIGPIPE or EPIPE.

Plan: Create the pipe before fork(), close unused ends in both processes, write records no larger than PIPE_BUF when multiple writers share the pipe, and treat a zero-length read as final EOF.

Example: A parent writes two log lines and closes the write end. The child drains the buffer, then its next read returns 0 because the last writer reference is gone.

Linux keeps writes up to PIPE_BUF atomic with respect to other writers, which is why line-oriented logging over a shared pipe works only when each record stays below that bound.

A FIFO gives the same pipe behavior a persistent filesystem name. Opening the read side normally blocks until a writer appears, and opening the write side normally blocks until a reader appears.

Zero-Copy Plumbing

sendfile() lets static servers move file data from the page cache toward a socket without first copying it into a user buffer.

splice(), tee(), and vmsplice() generalize the idea by using a pipe as the page-reference conduit between file descriptors.

fcntl and File Locks

fcntl() is a descriptor control multiplexer. Some commands modify FD-local state like FD_CLOEXEC, while others modify open-file-description state like O_NONBLOCK.

Command FamilyScopeTypical UsePitfall
F_GETFD / F_SETFDFD slotSet FD_CLOEXECNot shared by dup aliases unless set on each descriptor.
F_GETFL / F_SETFLOpen file descriptionToggle O_NONBLOCK or O_APPENDAffects every dup or fork alias of that description.
F_SETPIPE_SZPipe objectIncrease pipe capacity for bursty producersCapped by /proc/sys/fs/pipe-max-size and memory limits.
F_SETLK / F_SETLKWProcess record locksAdvisory byte-range lockingClosing any FD for that file in the process can drop all locks.
F_OFD_SETLKOpen file description locksThread-safe byte-range lockingRequires Linux OFD-lock support.

File locks differ sharply in scope: flock() is whole-file and tied to the open file description, classic POSIX record locks are per-process, and OFD locks are byte-range locks tied to the open file description.

Lock TypeGranularityOwnershipThread-Safe?
flockwhole fileopen file descriptionUsually yes for duplicated descriptors.
POSIX fcntl lockbyte rangeprocess plus inodeNo; unrelated closes can release locks.
OFD fcntl lockbyte rangeopen file descriptionYes; survives unrelated closes.

The classic POSIX lock trap is closing a second descriptor for the same file and accidentally releasing locks that were acquired through the first descriptor.

mmap and Memory-Mapped I/O

mmap() installs a VMA whose page table entries point at file-backed page-cache pages. With MAP_SHARED, stores dirty the shared cache page and can be persisted with msync(); with MAP_PRIVATE, the first write takes a private COW copy.

Background: mmap is attractive when a program wants random access to a file as memory, but the first access to each missing page still has to fault and populate the page cache.

Plan: Create the mapping, touch pages only as needed, use madvise() to describe expected access, and call msync() or fsync() when durability matters.

Example: A parser maps a 200 MB index file, jumps to offset 96 MB, faults one page, and resumes with a normal load instruction after the kernel installs a PTE for that file offset.

The choice between read()/write() and mmap() is a trade-off between explicit syscall/copy control and fault-driven memory access.

Directory Operations

readdir() is a libc wrapper over buffered directory records returned by getdents64(). Filesystems store variable-length directory entries, often behind an index such as ext4 htree for large directories.

The *at() family solves a real race: once a directory is open as a stable dirfd, relative operations do not depend on re-resolving a parent path that another process can rename.

File Metadata

stat(), fstat(), and lstat() report identity, type, permissions, size, link count, owner, and timestamps; statx() adds masks and optional birth time.

TimeUpdated ByNotes
atimeSuccessful file data readsrelatime reduces updates; noatime disables most of them.
mtimeContent modificationChanges after write, truncate, mmap dirty writeback, or similar content updates.
ctimeInode status changesChanges after chmod, chown, link count changes, rename metadata, and content size changes.
btimeFile creationAvailable through statx() only when the filesystem reports it.

Permissions, Ownership, and ACLs

Unix mode uses four octal digits: one digit for setuid, setgid, and sticky bits, then three rwx triplets for user, group, and other.

setuid binaries temporarily run with the file owner as effective UID. Modern systems often replace broad setuid root with file capabilities stored in security.capability, while POSIX ACLs live under system.posix_acl_access.

Links and Atomic Path Operations

A hard link is another directory entry pointing at the same inode, so both names share metadata and data and the inode survives until the last link and last open reference disappear.

A symbolic link is a separate inode whose payload is a pathname string. It can cross filesystems and can dangle if the target path is removed.

rename() is atomic within one filesystem, which makes the write-temp, fsync, rename, fsync-directory pattern the standard way to publish a complete replacement file. An open-then-unlink temporary file has no directory name but remains usable until the last descriptor closes.

Filesystem Notification

inotify turns filesystem changes into readable records on a file descriptor. A watcher registers paths with masks such as IN_CREATE, IN_MODIFY, and IN_DELETE, then reads struct inotify_event entries from the queue.

Recursive watching is not automatic: every subdirectory needs its own watch descriptor, and a newly-created directory must be detected and registered before events inside it can be observed.

fanotify sits at a broader filesystem layer and can deliver permission events, which is why antivirus and audit tools can inspect or deny opens before the target process continues. Large trees hit limits such as /proc/sys/fs/inotify/max_user_watches.

Special Files for Event Loops

Linux exposes many kernel services as ordinary FDs: eventfd for counters and wakeups, signalfd for signals as records, timerfd for readable timer expirations, and memfd_create for anonymous sealable files.

A sealed memfd is useful when one process wants to share bytes but prevent later mutation: write the payload, add seals such as F_SEAL_WRITE, pass the FD over a Unix socket with SCM_RIGHTS, and let the receiver map it read-only.

Filesystem Containment

Container startup changes the process view of the filesystem with a private mount namespace and pivot_root(): the new root is mounted, the old root is moved under put_old, then the old mount is detached before executing the target program.

chroot() only changes path lookup root. It is not a complete sandbox because a process that kept a directory FD to the old root can climb back out; hardened path opens use directory FDs plus openat2() constraints such as RESOLVE_BENEATH.

Extended Attributes

Extended attributes attach small named byte strings to an inode. The namespace prefix controls who owns the meaning: applications commonly use user.*, ACLs use system.*, and LSMs or capabilities use security.*.

NamespaceCommon EntriesWho Uses ItNotes
user.*user.tag, user.commentApplications and usersControlled by normal file permissions and mount support.
system.*system.posix_acl_accessKernel and filesystem helpersStores ACLs and filesystem-managed metadata.
security.*security.selinux, security.capabilityLSMs, file capabilities, IMAOften requires privilege or LSM policy permission.
trusted.*trusted.overlay.*Root-only tools and filesystemsRequires CAP_SYS_ADMIN.

stdio vs Raw File Descriptors

FILE* is a libc buffer and formatting layer around an underlying file descriptor. fileno() exposes that FD, fdopen() wraps an FD as a stream, and freopen() redirects an existing stream such as stdout.

Background: stdio buffering improves throughput, but it means data can sit in user memory while raw write() calls on the same FD pass it and reach the kernel first.

Plan: Pick one abstraction per descriptor, flush before mixing layers, and flush all inherited streams before fork() when both parent and child might exit through libc.

Example: A process prints pending without a newline, forks, and both processes later call exit(). The bytes were copied as user-space buffer state, so both processes flush the same text.

Buffer ModeCommon DefaultFlush TriggerRisk
_IOFBFregular files and redirected stdoutbuffer full, fflush, fclose, exitRaw writes can appear before earlier printf bytes.
_IOLBFstdout connected to a ttynewline, buffer full, explicit flushPrompt text without newline can remain hidden.
_IONBFstderr on many systemseach stdio call writes promptlyMore syscalls and lower throughput.

tty and pty

A tty is a terminal device with a kernel line discipline that can echo input, edit cooked-mode lines, and translate control characters such as Ctrl-C into signals. A pty pair splits that terminal into a controller-facing master FD and a program-facing slave FD.

Terminal hosts such as sshd, tmux, IDE panes, and expect use forkpty() or posix_openpt() plus grantpt(), unlockpt(), and ptsname(). The child sees the slave as a real controlling terminal while the host reads and writes the master.

/proc and /sys File Interfaces

/proc is a virtual filesystem generated by the kernel at read time. /proc/self/maps exposes VMAs, /proc/self/fd exposes open descriptors as symlinks, and /proc/sys exposes sysctl tunables.

/sys is sysfs: a kobject attribute tree. Device drivers publish small attributes as files, so reading /sys/class/net/eth0/operstate samples link state and writing mtu calls the driver setter path.

epoll Deep Dive

epoll keeps persistent kernel state: an interest set for registered FDs and a ready list for events already observed. epoll_wait() returns ready entries instead of rescanning every FD like select() or poll().

Background: Edge-triggered epoll is fast because it wakes on readiness transitions, but that means a half-drained FD may never produce another edge.

Plan: Put every ET FD in nonblocking mode, handle the readiness event, then accept or read in a loop until EAGAIN. Use EPOLLONESHOT when worker threads need explicit re-arming and EPOLLEXCLUSIVE to avoid waking many accept waiters for one connection.

Example: A socket receives 6 KB. The event loop reads only 1 KB and returns to epoll_wait(). Because the FD is still ready and no new not-ready-to-ready transition happened, the remaining 5 KB can stall forever.

APIPer-Wait CostScalingBest Use
selectcopy and scan bitsetslimited by fd set sizeSmall portable programs.
pollcopy and scan arrayO(N) per waitModerate FD counts with portable semantics.
epoll LTreturns ready listO(ready)Safe high-FD-count event loops.
epoll ETreturns readiness edgesO(ready transitions)High-performance loops that drain to EAGAIN.

io_uring Deep Dive

io_uring maps two rings into user space: the submission queue carries SQEs from user to kernel, and the completion queue carries CQEs from kernel to user. In the steady state, many operations can be submitted and completed with fewer syscalls than epoll plus separate reads or writes.

With IORING_SETUP_SQPOLL, a kernel polling thread watches the submission queue and starts work without the application entering the kernel for every batch. The trade-off is a CPU-consuming poller and stricter setup/permission constraints.

Registered buffers skip repeated page pinning, registered files skip repeated FD lookup, multishot operations can produce many completions from one submission, and IOSQE_IO_LINK expresses ordered chains such as open, read, and close.

ModelWhat Is SubmittedSyscall PatternWhere It Wins
epoll + read/writereadiness interest, then synchronous operationswait syscall plus I/O syscallsSocket servers with simple operations and broad portability.
io_uringactual operations: read, write, accept, fsync, openat, splicebatched enter, or SQPOLL hot pathHigh-throughput mixed file/network I/O with batching and fixed resources.
libaio / KAIOdirect-I/O requestssubmit and reap syscallsLegacy database O_DIRECT workloads; poor fit for buffered I/O.

4. Minimal C Demo

These demos isolate the interview-critical behavior: shared offsets, atomic temporary creation, short-write loops, sparse files, shell-style redirection, mmap persistence, notification FDs, stdio buffering, ptys, epoll drain rules, and io_uring ring mechanics.

dup Shares Offset, open Does Not — C Demo
stdin (optional)
O_TMPFILE then linkat Publish — C Demo
stdin (optional)
full_write Loop for Short Writes — C Demo
stdin (optional)
Create a 1 GB Sparse File — C Demo
stdin (optional)
Implement cmd > file with dup2 — C Demo
stdin (optional)
pipe, EOF, and Broken Pipe — C Demo
stdin (optional)
FIFO Producer and Consumer — C Demo
stdin (optional)
sendfile to a Socket — C Demo
stdin (optional)
Toggle O_NONBLOCK with fcntl — C Demo
stdin (optional)
flock Exclusive Lock Contention — C Demo
stdin (optional)
MAP_SHARED mmap then msync — C Demo
stdin (optional)
Recursive opendir and readdir Walk — C Demo
stdin (optional)
statx Metadata and Birth Time — C Demo
stdin (optional)
setgid Directory Group Inheritance — C Demo
stdin (optional)
Atomic Update with fsync and rename — C Demo
stdin (optional)
inotify Events from /tmp — C Demo
stdin (optional)
epoll with socket, timerfd, signalfd, eventfd — C Demo
stdin (optional)
chroot Jail Setup and Root-Gated Switch — C Demo
stdin (optional)
xattr user.tag Round Trip — C Demo
stdin (optional)
stdio Buffering vs Raw write — C Demo
stdin (optional)
forkpty Minimal Terminal Automation — C Demo
stdin (optional)
Parse /proc/self/maps — C Demo
stdin (optional)
epoll ET Drain to EAGAIN — C Demo
stdin (optional)
Raw io_uring SQ/CQ Completion — C Demo
stdin (optional)

5. Kernel Source Pointers

TopicFiles and FunctionsWhat to Read For
FD table and openfs/open.c, do_sys_openat2(), do_filp_open()How path lookup creates a struct file and installs it into the descriptor table.
Descriptor allocationfs/file.c, alloc_fd(), fd_install(), do_close_on_exec()FD bitmaps, close-on-exec state, and fork/exec descriptor handling.
read/write syscallsfs/read_write.c, ksys_read(), ksys_write(), vfs_read(), vfs_write()Short count behavior, position updates, and vector I/O entry points.
dup familyfs/file.c, do_dup2(), replace_fd()How a descriptor slot is made to reference an existing open file description.
Sparse filesfs/read_write.c, vfs_llseek(); filesystem llseek methodsWhere SEEK_DATA and SEEK_HOLE are delegated to filesystem code.
Pipes and FIFOsfs/pipe.c, do_pipe2(), pipe_read(), pipe_write()Pipe ring accounting, EOF and broken-pipe rules, and pipe capacity changes.
Zero-copy plumbingfs/read_write.c, do_sendfile(); fs/splice.c, do_splice(), do_tee()How page references move between files, pipes, and sockets without user-space buffers.
fcntl and locksfs/fcntl.c, do_fcntl(); fs/locks.c, fcntl_setlk(), flock_lock_inode()Command dispatch, FD versus file-description scope, and lock ownership semantics.
mmapmm/mmap.c, do_mmap(); mm/filemap.c, filemap_fault()How VMAs are installed and how file-backed page faults populate page cache pages.
Directory reads and path lookupfs/readdir.c, iterate_dir(); fs/namei.c, path_openat()Directory iteration, getdents64, and relative lookup through openat.
Metadata and permissionsfs/stat.c, vfs_statx(); fs/attr.c, notify_change()How stat/statx fields are collected and how chmod/chown update inode attributes.
Links and renamefs/namei.c, vfs_link(), vfs_symlink(), vfs_rename()Hard link counts, symlink creation, and same-filesystem atomic rename behavior.
Filesystem notificationfs/notify/, fsnotify(); fs/notify/inotify/; fs/notify/fanotify/How VFS events become queued records and how fanotify permission events can block opens.
Special event FDsfs/eventfd.c, fs/timerfd.c, fs/signalfd.c, mm/memfd.cCounter wakeups, timer expiration reads, signal records, and memfd sealing rules.
Root switching and path containmentfs/open.c, ksys_chroot(); fs/namespace.c, pivot_root(); fs/openat2.cWhy chroot changes lookup state, how pivot_root moves mounts, and how openat2 resolver flags reject escapes.
Extended attributesfs/xattr.c, vfs_setxattr(), vfs_getxattr(), listxattr()Namespace permission checks, filesystem callbacks, and ACL/capability storage.
stdio wrappersglibc/libio/, _IO_file_xsputn(), _IO_new_file_write()How libc buffering batches user writes before calling the kernel write path.
tty and ptydrivers/tty/tty_io.c, drivers/tty/pty.c, drivers/tty/n_tty.cTTY allocation, pseudo-terminal master/slave plumbing, and line discipline behavior.
/proc and /sysfs/proc/, proc_pid_make_inode(); fs/sysfs/, sysfs_create_file_ns()How virtual process files and kobject attributes are generated on demand.
epollfs/eventpoll.c, do_epoll_create(), ep_insert(), ep_poll()Interest tree management, ready-list wakeups, edge-triggered delivery, and exclusive waits.
io_uringio_uring/io_uring.c, io_uring_setup(), io_submit_sqes(), io_cqring_ev_posted()Shared SQ/CQ ring setup, SQE consumption, CQE publication, SQPOLL, and registered resources.

6. Interview Prep

QuestionConcise Answer
What is shared by dup()?The new descriptor points to the same open file description, so file offset and status flags are shared.
Why is pread() safer than lseek() plus read()?It performs I/O at an explicit offset without changing the shared file offset, avoiding races between threads or forked children.
What does O_APPEND guarantee?For regular files, the kernel moves each write to EOF atomically with the write operation, so concurrent writers do not overwrite each other.
Why use O_CLOEXEC instead of fcntl(F_SETFD) after open?It closes the race where another thread forks and execs between open and the later fcntl call.
What is a sparse file?A file whose logical size includes holes that read as zeros but have no allocated disk blocks until real data is written.
What happens when the last pipe writer closes?After buffered bytes are drained, readers get a zero-length read, which is EOF.
What does PIPE_BUF guarantee?Concurrent writes of at most PIPE_BUF bytes to a pipe are atomic; larger writes may interleave.
Why does sendfile() help static file servers?It avoids copying file bytes into a user-space buffer before sending them to the socket.
Which fcntl() flags are FD-local versus description-local?FD_CLOEXEC is FD-local; status flags like O_NONBLOCK and O_APPEND live on the open file description.
Why are POSIX record locks dangerous in multithreaded code?They are process-owned, so closing any descriptor for the same file in that process can release all its record locks.
How does MAP_SHARED differ from MAP_PRIVATE?MAP_SHARED stores modify shared file-backed pages; MAP_PRIVATE writes trigger copy-on-write and do not update the file.
Why use openat() with a directory FD?It anchors lookup to an already-open directory, avoiding races where the parent path is renamed or replaced.
What updates atime, mtime, and ctime?Reads update atime, content changes update mtime, and inode metadata or size changes update ctime.
Hard link versus symlink?A hard link is another name for the same inode on the same filesystem; a symlink is a separate inode containing a path string and can cross filesystems.
Why does atomic file update fsync both file and directory?The file fsync persists contents; the directory fsync persists the renamed directory entry after publication.
inotify versus fanotify?inotify watches paths and reports events after they happen; fanotify can observe broader filesystem activity and can issue permission events that allow or deny opens.
Why are eventfd, signalfd, and timerfd useful?They convert wakeups, signals, and timers into readable FDs, so one epoll loop can handle them uniformly with sockets and pipes.
What does a sealed memfd buy you?It gives processes shared anonymous file-backed memory that can be made immutable before the FD is handed to another process.
How can chroot() be escaped?If a process keeps a directory FD outside the jail, it can fchdir() back and walk out; mount namespaces, pivot_root cleanup, and constrained openat2 lookups close that class of bug.
What belongs in security.* xattrs?Security labels and enforcement metadata such as SELinux labels, IMA state, and file capabilities.
Why can mixing printf() and write() reorder output?printf() writes into a user-space FILE* buffer, while write() enters the kernel immediately unless the stdio buffer is flushed first.
What is the fork double-flush pitfall?Unflushed stdio buffers are copied into the child, so both parent and child can flush the same pending bytes during exit().
What is a pty master versus slave?The slave behaves like a terminal for the child program; the master is held by the controller that feeds input and reads terminal output.
How does /proc/self/fd help debug FD leaks?It exposes each live descriptor as a symlink to its target, making leaked files, sockets, pipes, and deleted-but-open files visible.
epoll LT versus ET?Level-triggered epoll keeps reporting an FD while it remains ready; edge-triggered epoll reports readiness transitions, so nonblocking handlers must drain to EAGAIN.
What does EPOLLONESHOT solve?It disables the FD after one event so a worker can process it exclusively, then re-arm it with EPOLL_CTL_MOD.
What does EPOLLEXCLUSIVE solve?It prevents a thundering herd by waking only one epoll waiter for a shared ready source such as a listening socket.
Why did io_uring replace libaio for many workloads?io_uring supports a broader operation set, buffered I/O, sockets, batching, linked operations, fixed buffers/files, and shared-ring completions instead of the narrow KAIO O_DIRECT focus.
Why is SQPOLL called a zero-syscall hot path?The application advances the shared SQ tail and a kernel polling thread consumes submissions without requiring io_uring_enter() for each batch.