In this post I demonstrate that symbolic links can impair the speed of listing files. I also show that use of multiple threads can improve the speed.
The observations mentioned in this post are from tests done on the following platform:
- CPU: Count = 4, Type = 2.6 GHz Intel Xeon
- OS: Linux 2.6.32 x86_64
- JVM: Java HotSpot(TM) 1.7.0_10 64-Bit Server VM
I wrote a single-threaded Java program to list the contents of a directory recursively.
java.io.File.listFiles() method was used for the purpose.
First, this program was run for a directory containing symbolic links; the targets of those symbolic links were directories containing sub-directories and regular files.
Then, this program was run for a directory containing sub-directories and regular files only; no symbolic links.
Figure 1 shows the comparison of speed of these two executions.
Listing with symbolic links (red series in figure 1) was two orders of magnitude slower than without symbolic links (blue series in figure 1).
Note that the results might vary with factors such as OS, number of files, type of files, machine resources, etc.
Another observation was that sometimes the listing is fast even in the presence of symbolic links. This is shown by the green series in figure 1. I don’t know the reason for this but my guess is that probably the result of symbolic link resolution is cached (by the OS??) for some time thereby speeding up the listing. However,the speed drops down again after few tens of seconds and the cycle of slow and fast speed continues.
Then I wrote a multi-threaded program, using Fork/Join framework with default parallelism. The idea was to fork a sub-task to list the contents of a sub-directory. As shown by figure 2, the multi-threaded program was consistently faster than its single-threaded equivalent.
The lessons learnt are:
- Use symbolic links cautiously. They can impair the performance of your programs.
- A well-written multi-threaded program can list files faster than its single-threaded equivalent, especially on multi-processor and multi-core machines.