Linux Content Index
A Linux file system is expected to handle two types of data structure species — Dentries & Inodes. They are indeed the defining characteristic of a file system running inside the Linux kernel. For example a path “/bm/celtic” contains three elements, “/” , “bm” & “celtic”, so each will have its own own dentry and inode. Among a lot of other information a dentry encapsulates the name, a pointer to the parent dentry and also a pointer to the corresponding inode.
What happens when we type “cd /bm/celtic”?
Setting the current working directory involves pointing the process “task_struct” to the dentry associated with “celtic”, locating that particular entry involves the following steps.
- “/” at the beginning of the string indicates root
- Root dentry is furnished during file system mount, so VFS has a point where it can start its search for a file or a directory.
- A file system module is expected to have the capability to search for a child when the parent dentry is provided to it. So VFS will request the dentry for “bm” by providing its parent dentry (root).
- It’s up to the file system module to find the child entry using the parent dentry. Note that the parent Dentry also has a pointer to its own inode which might hold the key.
The above sequence of steps will be repeated recursively. This time the parent will be “bm” and “celtic” will be the child, in this manner VFS will generate a list of Dentries associated with a path.
Linux is geared to run on sluggish hard disks supported with relatively large DRAM memories. This might mean that there is this ocean of Dentries and Inodes cached in RAM & whenever a cache miss is encountered, VFS tries to access it using the above steps by calling the file system module specific “look_up” function.
Fundamentally a file system module is only expected to work on top of inodes, Linux will request operations like creation and deletion of inodes, look up of inodes, linking of inodes, allocation of storage blocks for inodes etc.
Parsing of paths, control cache management are all abstracted in kernel as part of VFS and buffer management as part of block driver framework.
How about writing to new file?
- User space communicates the buffer to be written using the “write” system call.
- VFS then allocates a kernel page and associates that with the write offset in the “address_space” of that inode, each inode has its own address_space indexed by the file offset.
- Every write needs to eventually end up in the storage device so the new page in the RAM cache will have to be mapped to a block in the storage device. For this VFS calls the “get_block” interface of a the file system module, which establishes this mapping.
- A copy_from_user_space routine moves the user contents into that kernel page and marks it as dirty.
- Finally the control returns to the application.
Overwriting contents of a file differ in two aspects – one is that the offset being written to might already have a page allocated in the cache and the other is that it would be already mapped to a block in the storage. So it’s just a matter of memcpy from user space to kernel space buffer. All the dirty pages are written when the kernel flusher threads kick in, and at this point the already established storage mapping will help the kernel identify to which storage block the page must go.
Reading a new file follows the similar steps but it’s just that the contents needs to be read from the device into the page and then into the user space buffer. If an updated page is encountered, the read from storage device read is of course avoided.