FUSE File System Performance on Embedded Linux

We ported and benchmarked a flash file system to Linux running on an ARM board. Porting was done via FUSE, a user space file system mechanism where the file system module itself runs as a process inside Linux. The file I/O calls from other processes are eventually routed to the FUSE process via inter process communication. This IPC is enabled by a low level FUSE driver running in the kernel.

http://en.wikipedia.org/wiki/Filesystem_in_Userspace

 

 

The above diagram provides an overview of FUSE architecture. The ported file system was proprietary and was not meant to be open sourced, from this perspective file system as a user space library made a lot of sense.

Primary bottleneck with FUSE is its performance. The control path timing for a 2K byte file read use-case is elaborated below. Please note that the 2K corresponds to NAND page size.

1. User space app to kernel FUSE driver switch. – 15 uS

2. Kernel space FUSE to user space FUSE library process context switch. – 1 to15 mS

3. Switch back into kernel mode for flash device driver access – NAND MTD driver overhead without including device delay is in uS.

4. Kernel to FUSE with the data read from flash – 350uS (NAND dependent) + 15uS + 15uS (Kernel to user mode switch and back)

5. From FUSE library back to FUSE kernel driver process context switch. – 1 to 15mS

6. Finally from FUSE kernel driver to the application with the data – 15 uS

As you can see, the two process context switches takes time in terms of Milliseconds, which kills the whole idea. If performance is a crucial, then profile the context switch overhead of an operating system before attempting a FUSE port. Seems loadable kernel module approach would be the best alternative.

Advertisement

Relevance of Quad Core processors in mobile computing

October edition of EE Times Europe has an article written by +Marco Cornero from ST Ericsson explaining how Quad core processors for mobile computing is ahead of its time. The following tweet was send from the official ST Ericsson account.

“Quad cores in mobile platforms: Is the time right?” An article in EETimes Europe written by Marco Cornero, ST-Ericsson http://ow.ly/6TrDo

Please note that you might need to log in to EE Times website to access the full article along with the whole October edition.

In some ways this is quite a brazen stance taken by ST Ericsson.

1. +Marco Cornero states that there is a 25% to 30% performance overhead on each core while moving from Dual to Quad, this is due to systemic latency attributed to L1/L2 cache synchronization, bus access etc.

2. This overhead will mandate that for a Quad core to out perform Dual core each software application needs to have 70% of its code capable of executing in parallel. How this is calculated is clearly explained in the article.

The article also argues that there are not many complex use cases which can create multi-tasking scenarios where there is optimal usage of all the four cores and multiple other arguments which seems to conclusively prove that Quad cores is a case of “Diminishing returns” (See below quote from ST Ericsson CEO Mr.Gilles Delfassy)

“We aim to be leaders in apps processors, but there is a big debate whether quad core is a case of diminishing returns,” – Gilles Delfassy

The whole Article hinges on one fact that there will be a 25%-30% performance overhead on each core while moving from Dual to Quad, but isn’t this purely dependent on hardware? This figure may hold good for ST Ericsson chip-set, but what about ASICs from Qualcomm, NVIDIA, Marvel etc?

The crux of the argument is this very overhead percentage and if we bring this overhead marginally down to 20-25%, then only 50-60% of the application code needs to execute in parallel for optimal usage of Quad core. This situation is not so bad. Is this inference inherently flawed? The fact that Qualcomm and NVIDIA are close to bringing out Quad core solutions to market makes me wonder about ST Ericsson’s claims!