HOME Python Post Messages

Comparing Perl, Python, and Ruby


1. Introduction

I have used Awk and Lisp as script languages for long time (you may be able to guess my age from this fact). I knew Perl and tried to learn it. But, I felt that Perl is not so useful, because

Recently, I tried to learn Python or Ruby because I heard that these languages are getting popular in these days. As I am lazy to learn both, I had to decide which I learn. To do so, I wrote a small script using Perl, Python, and Ruby with consulting web documents.

In this document, I am going to write my impression of these language in writing small scripts. Please don't take it so serious, it is just a impression of a beginner. I know this topic is somehow contentious.

2. The test script I wrote

The script I wrote is to store photos in a removal media to a hard disc drive (HDD). I shared my PC with my wife and daughter. They like taking photos using a digital camera and store them to the HDD. They think the HDD is a photo album with (almost) unlimited capacity.

They need a convenient script to transfer photos to the HDD. They are not good PC users very much and sometimes take serious mistakes during copying photos. Following shows the spec of the script they need:

  1. Photos are boxed up by the year and month they are taken. Thus directories of year–month named 'yy-mm' (for instance, '06-03' for May 2006) are created if they are not exist.
  2. Photos in the same directory of the removal media are saved together under the same subdirectory of the 'yy-mm' directory, which are named like photoNN such as photo01, photo02, etc.
  3. The script compares the photo files in the HDD and the removal media and remove those in the removal media if they are the same.
For instance, if two directories (say imag1 and imag2) of photo files exist in the removal media and three directories (photo01, photo02, and photo03) exist under the 'yy-mm' directory of this month, two directories (photo04 and photo05) are created under the 'yy-mm' directory and photo images in imag1 and imag2 are saved in the photo04 and photo05.

3. Writing this script using Perl, Python, and Ruby

I wrote the test script using Perl, Python, and Ruby. Notice that the following three scripts are written by a beginner and you can find several points to be improved.

Enjoy seeing what happens if a beginner uses these script languages. (Now I know Python somehow and I can improve the original script. But I let it be as it has been.)

3.1. First, Perl

Following is the Perl code of the test case. The Perl code is used as a measure.
01:     #! perl
02:     
03:     use strict;
04:     use File::Copy;
05:     use File::Compare;
06:     use File::Find;
07:     use Cwd;
08:     
09:     ## global parameters
10:     my $DOC_DIR = 'D:/doc';
11:     my $MEDIA ='G:/';
12:     my $PHOTO_VIEWER = 'D:/WBIN/linar160/linar.exe';
13:     
14:     # getting the string of "year(NN)-month(NN)" 
15:     sub get_year_month{
16:     my ($m, $y) = (localtime)[4,5];
17:     sprintf ("%02d-%02d", $y-100, $m + 1);
18:     }
19:     
20:     # getting the starting number of photoNN, the directory in which photos should be saved
21:     # This function should be called when the program is in the month directory.
22:     sub get_first_photo_dir_number{
23:       my  @pdirs = (glob "photo[0-9][0-9]");
24:       @pdirs ? 1+ substr($pdirs[-1], -2) : 1;
25:     }
26:     
27:     #move into the directory "$DOC_DIR/y-m"
28:     sub move_into_dir_of_month{
29:       my $dir_of_month =  &get_year_month;
30:       unless ($DOC_DIR eq cwd){
31:         chdir $DOC_DIR or die "Cannot move to $DOC_DIR: $!";
32:       }
33:       unless (-d $dir_of_month){
34:         mkdir $dir_of_month or die "cannot create $dir_of_month: $!";
35:       }
36:       chdir $dir_of_month or die "Cannot move to $dir_of_month: $!";
37:       "$DOC_DIR/$dir_of_month" ;
38:     }
39:     
40:     #archive photos in the media into the HD
41:     # This function should be called when the program is in the month directory.
42:     sub archive_photos{
43:       my $photo_dir_number = shift;
44:       my %dhash;
45:       find({
46:         wanted      => sub{push @{$dhash{$File::Find::dir}}, $_ if -f},
47:       }, $MEDIA);
48:       for my $dir_from (sort keys %dhash){
49:         my $n = @{$dhash{$dir_from}};
50:         my $i = 0;
51:         my $dir_to = sprintf("photo%02d", $photo_dir_number++);
52:         mkdir $dir_to or die "cannot create $dir_to: $!";
53:         print "\n$dir_from ==> $dir_to\n";
54:         for my $fname (@{$dhash{$dir_from}}){
55:           my $copy_from = "$dir_from/$fname";
56:           my $copy_to   = "$dir_to/$fname";
57:           copy($copy_from, $copy_to) or die "cannot make a copy for $copy_from: $!";
58:           if(0 == compare($copy_from, $copy_to)){
59:             unlink $copy_from;
60:             print ++$i, "/$n\r";
61:           }else{
62:             die "an error occurs during coping $copy_from";
63:           }
64:         }
65:       }
66:       %dhash;
67:     }
68:     
69:     #main
70:     my $dir_of_month = &move_into_dir_of_month;
71:     my $first_photo_dir_number =  &get_first_photo_dir_number;
72:     if(&archive_photos($first_photo_dir_number)){
73:       exec (sprintf "%s %s/photo%02d", $PHOTO_VIEWER, $dir_of_month, $first_photo_dir_number);
74:     }else{
75:       print "No photos in the media!\nGive Return:";
76:       < STDIN>
77:     }
The Perl code consist of about 80 lines. As a list itself cannot be a value of a hash table in Perl, a reference of a list should be used as a value of the hash table. My impression is:
Perl is basically a mixture of Awk and Sed and has been expanded beyond its range. It looks as if it were a dam ready to collapse. However, the library is good and run-time is short.

3.2. Next, Python

Following is the Python code.
01:     #! usr/bin/env python
02:     
03:     import glob, string, os, os.path, shutil, filecmp, re, sys
04:     from datetime import date
05:     
06:     #global parameters
07:     HD    = 'D:/doc/'
08:     MEDIA = 'G:/'
09:     PHOTO_VIEWER = 'D:/WBIN/linar160/linar.exe'
10:     PREGEXP = re.compile(".(gif|bmp|jpe?g|tiff?)$", re.I)
11:     PHASH = {}
12:     
13:     def sum(ls):
14:         total = 0
15:         for x in ls:
16:             total += x
17:         return total
18:     
19:     def nPdir(dir):
20:         lst = [ x for x in glob.glob(dir + "photo[0-9][0-9]") if os.path.isdir(x)]
21:         return lst and int(lst[-1][-2:]) or 0
22:     
23:     def Search_Media(dir):
24:         os.chdir(dir)
25:         items = os.listdir(dir)
26:         ls = [x for x in items if PREGEXP.search(x)]
27:         if ls:
28:             PHASH[dir] = ls
29:         for d in [dir + x + '/' for x in items if os.path.isdir(x)]:
30:             Search_Media(d)
31:     
32:     def Move_Photos():
33:         md = HD + date.today().strftime("%y-%m/")
34:         np = nPdir(md)
35:         pt = sum([len(val) for val in PHASH.itervalues()])
36:         sf = 0
37:         i0 = True
38:     
39:         if not pt:
40:             print "No photos in the media: give return"
41:             sys.stdin.readline()
42:             sys.exit()
43:             
44:         os.chdir(md)
45:     
46:         for d, fs in PHASH.iteritems():
47:             np += 1
48:             pd = md + "photo%02d" % np
49:             if i0:
50:                 pd0 = pd
51:                 i0 = False
52:             os.mkdir(pd)
53:             for f in fs:
54:                 f1 = d + f
55:                 f2 = pd + '/' + f
56:                 shutil.copyfile(f1, f2)
57:                 if filecmp.cmp(f1, f2):
58:                     os.remove(f1)
59:                 else:
60:                     print "copy failed: %s => %s\n" % (f1, f2)
61:                 sf += 1
62:                 print "%d/%d\r" % (sf, pt),
63:         return [' ', pd0]
64:     
65:     if __name__=='__main__':
66:         Search_Media(MEDIA)
67:         os.execv(PHOTO_VIEWER, Move_Photos())
It consists of about 70 lines and the size is comparable to that of the Perl code. The source code looks nice, which means the idea that indents represent blocks being reasonable. The code is tight and it feels atmosphere of a building in brick.

Followings are important features (I think) of Python:

3.3. Finally, Ruby

Finally, Let's see the code written in Ruby.
01:     #! ruby
02:     
03:     require "FileUtils"
04:     
05:     #global parameters
06:     Doc_dir = "D:/doc/"
07:     Media = "G:/"
08:     Viewer = "D:/WBIN/linar160/linar.exe"
09:     PEXP =  /\.(jpe?g|JPE?G|bmp|BMP|tiff?|TIFF?)$/
10:     $photo_hash = Hash.new
11:     
12:     def photo_dir_max(dir)
13:         Dir.chdir(dir)
14:         d = Dir.glob("photo[0-9][0-9]").select{|f| File.directory?(f)}.last
15:         d ? d.slice(5..6).to_i : 0 
16:     end
17:     
18:     def search_media(dir)
19:         Dir.chdir(dir)
20:         files = Dir.glob("*.*").select{|f| f =~ PEXP}
21:         $photo_hash[dir] = files unless files == []
22:         Dir.glob("*").select{|f| File.directory?(f)}.each{|d| search_media(dir + d + "/")}
23:     end
24:     
25:     def move_photos ()
26:         mon_dir = Doc_dir + Time.new.strftime("%y-%m/")
27:         p_d_num = photo_dir_max(mon_dir)
28:         i = count = 0; p_dir0 = ''
29:         n_p_files = lambda{|h| n=0; h.each_key{|k| n += h[k].size}; n}.call($photo_hash)
30:         if n_p_files == 0
31:         then
32:             puts "NO photo files in the media."
33:             STDIN.readline
34:             exit(0)
35:         end
36:         $photo_hash.each_key{|d| 
37:             p_d_num += 1
38:             p_dir = mon_dir + sprintf("photo%02d/", p_d_num)
39:             if i == 0 then p_dir0 = p_dir; i += 1 end
40:             FileUtils.mkdir(p_dir)
41:             $photo_hash[d].each{|f|
42:                 f1, f2 = d + f, p_dir + f
43:                 FileUtils.cp(f1, f2)
44:                 FileUtils.cmp(f1, f2) ?  File.delete(f1) : printf("Copy failed: %s => %s\n", f1, f2)
45:                 count += 1
46:                 printf("%d/%d\r", count, n_p_files) 
47:             }
48:         }
49:         p_dir0
50:     end
51:     
52:     search_media(Media)
53:     exec(Viewer, move_photos())
The code consists of about 50 lines, which is the shortest among the three. The code looks nice. Data flow from the left side of the dot to the right side, which is similar to Lisp in that data flow from the inside to the outside of the parentheses. In addition, the code is clear as all functions are method.

Unfortunately, it is slow (need more run-time). Even the bottle-neck of the program should be reading from the removal media, Ruby script is much slower than those of Perl and Python. It is pity because the design of the language itself is good. Slow code chafes me. I put off using it, at the moment.

In Japan, Ruby is much popular than Python because Ruby supports Japanese language from the beginning while Python not. From Python 2.3, however, Python also support Japanese and the language support is not the advantage of Ruby any more.

4. I have chosen Python

Table 1 summarizes a result of this comparison of Perl, Python, and Ruby. Don't take is serious as it is just an impression of a beginner.

Table 1. Summary of the comparison of Perl, Python, and Ruby

entry Perl Python Ruby
writability A A A
readability C A+ A
library A A B
run time A A B
document A+ A B
number of users A+ A B

Writability is not so different each other. Readability, however, differs tremendously: Python is easiest to read, then Ruby follows. Perl code cannot be read if not commented enough nor carefully decided parameter names.

Both Python and Ruby are good languages. However Python is more user friendly by following three points.

  1. It has rich libraries.
  2. It is well documented.
  3. Its run-time is short enough.
So I decided to use Python. But Ruby is not bad. I hope the developer of Ruby improve the library and document.

5. Summary

I have compared Perl, Python, and Ruby from the point of view of a beginner. The conclusion is:
  1. Python and Ruby are much better than Perl. (They should be because they appeared after Perl.)
  2. Python and Ruby are even in language design.
Followings are major web sites.

6. Postscript — a semi serious comparison

As the document above just compared the doneness of a small script written in the three languages, I am trying to compare them semi-seriously in this postscript. Notice that my comment is somehow biased toward Python as I am using Python now.

Perl and Ruby are pure script languages and the main goal of them is to make codes shorter while you can write large programs using them. Python, on the other hand, is basically for large scale programs and focused on easy debugging as far as weakly-typed languages concerned. As a programming language with capable of wide application is worth learning, learning Python is a better choice than learning Perl or Ruby. This argument can be supported by the fact that emerging companies like Google use Python as a main programming language.

See following links about the philosophy of Python.

The sentences below is partially overlapped with those in Python Quick Look.

6.1. Making large scale programs

The greatest feature of Python is easy evolving a small script to a large scale program.
This advantage is provided by following features:
  1. Each source file has its own name space.
  2. Functions are tested one by one interactively.
  3. A Test code can be included after
    if __name__=='__main__'.
In Perl and Ruby, you should declare name spaces all the way. In Python, on the other hand, each source file has its own name space and the name space name and file name is the same. This way is convenient to avoid name collisions.

The coding style of modules is substantially different from that of ordinary programs in Perl and Ruby, which means that you should modify your program considerably to use as a module. In Python, on the other hand, the coding style of modules and ordinary programs are same. and you can use your small script as a module with a few modification such as adding __all__ statement. You can use a module as a stand-alone program as well.

6.2. Reading file contents

Perl programs basically read file line by line using while(<>). Python and Ruby programs, on the other hand, read all file contents at once using read(). The difference may be because that memory price got cheaper in 1990s when Python and Ruby appeared and that plenty of memory space became available.

6.3. List operations

Mapping and filtering of lists can be performed in these languages like as follows: Python borrows list comprehensive from Haskell and Ruby's way is similar to that of Lisp.

Example: return the list of square root if the original item is positive.
[-3,-2,-1,0,1,2,3] ⇒ [0.0, 1.0, 1.4142135623731, 1.73205080756888]

01:     # Perl 5
02:     my @ls0=(-3,-2,-1,0,1,2,3);
03:     my @ls1=();
04:     for  (@ls0){
05:       push @ls1, sqrt($_) if $_ >= 0;
06:     }
07:     print "$_\n"  for (@ls1);
08:     
09:     # Ruby
10:     p [-3,-2,-1,0,1,2,3].select{|x| x>=0}.map{|x| Math.sqrt(x)}
11:     
12:     # Python
13:     import math
14:     print [math.sqrt(x) for x in [-3,-2,-1,0,1,2,3] if x>=0]

6.4. Passing parameters to functions

Perl uses unusual way to hand parameters to functions. Python and Ruby adopt a conventional way.

6.5. Function creating code

Functions of Perl and Ruby can return a function but Python can't (basically). Python uses a class of functions instead.

Example: accumulator. The function takes a initial number (n) and returns a function that takes increment (i) and returns the incremented value with holding the incremented value.

01:     # Perl 5
02:     sub foo {
03:       my ($n) = @_;
04:       sub {$n += shift}
05:     }
06:     
07:     #  Python
08:     class foo:
09:         def __init__(self, n):
10:             self.n = n
11:         def __call__(self, i):
12:             self.n += i
13:             return self.n
14:                 
15:     # Ruby
16:     def foo (n)
17:         lambda {|i| n += i }
18:         end
Following shows how it works.
01:     >>> a=foo(10)
02:     >>> a(3)
03:     13
04:     >>> a(5)
05:     18

6.6. Higher order functions

Perl and Python support higher order functions.

6.7. Others