Symbolic links can impair the speed of listing files

In this post I demonstrate that symbolic links can impair the speed of listing files. I also show that use of multiple threads can improve the speed.

(In UNIX and Linux, the command to list files is ls. In Windows, the analogous command is  dir.)

The observations mentioned in this post are from tests done on the following platform:

  • CPU: Count = 4, Type = 2.6 GHz Intel Xeon
  • OS: Linux 2.6.32 x86_64
  • JVM: Java HotSpot(TM) 1.7.0_10 64-Bit Server VM

 

I wrote a single-threaded Java program to list the contents of a directory recursively. java.io.File.listFiles() method was used for the purpose.

First, this program was run for a directory containing symbolic links; the targets of those symbolic links were directories containing sub-directories and regular files.

Then, this program was run for a directory containing sub-directories and regular files only; no symbolic links.

Figure 1 shows the comparison of speed of these two executions.

Figure 1: Listing - symbolic links vs. regular files

Figure 1: Listing – symbolic links vs. regular files

Listing with symbolic links (red series in figure 1) was two orders of magnitude slower than without symbolic links (blue series in figure 1).

Note that the results might vary with factors such as OS, number of files, type of files, machine resources, etc.

Another observation was that sometimes the listing is fast even in the presence of symbolic links. This is shown by the green series in figure 1. I don’t know the reason for this but my guess is that probably the result of symbolic link resolution is cached (by the OS??) for some time thereby speeding up the listing. However,the speed drops down again after few tens of seconds and the cycle of slow and fast speed continues.

Then I wrote a multi-threaded program, using Fork/Join framework with default parallelism. The idea was to fork a sub-task to list the contents of a sub-directory. As shown by figure 2, the multi-threaded program was consistently faster than its single-threaded equivalent.

Figure 2: Listing - single-threaded vs. Fork/Join-based

Figure 2: Listing – single-threaded vs. Fork/Join-based

The lessons learnt are:

  1. Use symbolic links cautiously. They can impair the performance of your programs.
  2. A well-written multi-threaded program can list files faster than its single-threaded equivalent, especially on multi-processor and multi-core machines.
Advertisements
This entry was posted in Java, Linux and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s