Skip to content
Advertisement

Python string join performance

There are a lot of articles around the web concerning Python performance. The first thing you read is concatenating strings should not be done using ‘+’; avoid s1 + s2 + s3, and instead use str.join

I tried the following: concatenating two strings as part of a directory path: three approaches:

  1. ‘+’ which I should not do
  2. str.join
  3. os.path.join

Here is my code:

import os, time

s1 = '/part/one/of/dir'
s2 = 'part/two/of/dir'
N = 10000

t = time.clock()
for i in xrange(N):
    s = s1 + os.sep + s2
print time.clock() - t

t = time.clock()
for i in xrange(N):
    s = os.sep.join((s1, s2))
print time.clock() - t

t = time.clock()
for i in xrange(N):
    s = os.path.join(s1, s2)
print time.clock() - t

Here the results (Python 2.5 on Windows XP):

0.0182201927899
0.0262544541275
0.120238186697

Shouldn’t it be exactly the other way around?

Advertisement

Answer

It is true you should not use ‘+’. Your example is quite special. Try the same code with:

s1 = '*' * 100000
s2 = '+' * 100000

Then the second version (str.join) is much faster.

Advertisement