Linux Content Index
— File System Architecture – Part I
–– File System Architecture– Part II
–– File System Write
–– Buffer Cache
–– Storage Cache
A Linux file system is expected to handle two types of data structure species — Dentries & Inodes. They are indeed the defining characteristic of a file system running inside the Linux kernel. For example a path “/bm/celtic” contains three elements, “/” , “bm” & “celtic”, so each will have its own own dentry and inode. Among a lot of other information a dentry encapsulates the name, a pointer to the parent dentry and also a pointer to the corresponding inode.
What happens when we type “cd /bm/celtic”?
Setting the current working directory involves pointing the process “task_struct” to the dentry associated with “celtic”, locating that particular entry involves the following steps.
- “/” at the beginning of the string indicates root
- Root dentry is furnished during file system mount, so VFS has a point where it can start its search for a file or a directory.
- A file system module is expected to have the capability to search for a child when the parent dentry is provided to it. So VFS will request the dentry for “bm” by providing its parent dentry (root).
- It’s up to the file system module to find the child entry using the parent dentry. Note that the parent Dentry also has a pointer to its own inode which might hold the key.
The above sequence of steps will be repeated recursively. This time the parent will be “bm” and “celtic” will be the child, in this manner VFS will generate a list of Dentries associated with a path.
Linux is geared to run on sluggish hard disks supported with relatively large DRAM memories. This might mean that there is this ocean of Dentries and Inodes cached in RAM & whenever a cache miss is encountered, VFS tries to access it using the above steps by calling the file system module specific “look_up” function.
Fundamentally a file system module is only expected to work on top of inodes, Linux will request operations like creation and deletion of inodes, look up of inodes, linking of inodes, allocation of storage blocks for inodes etc.
Parsing of paths, control cache management are all abstracted in kernel as part of VFS and buffer management as part of block driver framework.
How about writing to new file?
- User space communicates the buffer to be written using the “write” system call.
- VFS then allocates a kernel page and associates that with the write offset in the “address_space” of that inode, each inode has its own address_space indexed by the file offset.
- Every write needs to eventually end up in the storage device so the new page in the RAM cache will have to be mapped to a block in the storage device. For this VFS calls the “get_block” interface of a the file system module, which establishes this mapping.
- A copy_from_user_space routine moves the user contents into that kernel page and marks it as dirty.
- Finally the control returns to the application.
Overwriting contents of a file differ in two aspects – one is that the offset being written to might already have a page allocated in the cache and the other is that it would be already mapped to a block in the storage. So it’s just a matter of memcpy from user space to kernel space buffer. All the dirty pages are written when the kernel flusher threads kick in, and at this point the already established storage mapping will help the kernel identify to which storage block the page must go.
Reading a new file follows the similar steps but it’s just that the contents needs to be read from the device into the page and then into the user space buffer. If an updated page is encountered, the read from storage device read is of course avoided.
Can you briefly explain what does the term “kernel page” refer to? Is it the piece of 4kB physical page frame on the memory that can only used by kernel?
As we know that the OS need to allocate pages for page cache if data on the disk need to be load into the page cache, but who does the page, who will hold data in the page cache, belong to?
As you mentioned, kernel pages are simply the 4K blocks used for managing dynamic memory allocations within the kernel.
In the context of a file system module, the RAM file data cache or file system meta data cache are usually managed in terms of 4K blocks. A data cache will be usually associated with an VFS inode but how they are managed is file responsibility of the file system module.
So the operations like the allocation, flushing and freeing of page memory are eventually the responsibility of the file system. But kernel provides useful helper APIs for various such operations. Hope this clarifies your question.
Have tried to explain more on the file system caches here: http://tekrants.me/2015/04/24/linux-storage-cache/