python string join performance

Tags: , ,



There are a lot of articles around the web concerning python performance, the first thing you read: concatenating strings should not be done using ‘+’: avoid s1+s2+s3, instead use str.join

I tried the following: concatenating two strings as part of a directory path: three approaches:

  1. ‘+’ which i should not do
  2. str.join
  3. os.path.join

Here is my code:

import os,time

s1='/part/one/of/dir'
s2='part/two/of/dir'
N=10000

t=time.clock()
for i in xrange(N):
    s=s1+os.sep+s2
print time.clock()-t

t=time.clock()
for i in xrange(N):
    s=os.sep.join((s1,s2))
print time.clock()-t

t=time.clock()
for i in xrange(N):
    s=os.path.join(s1,s2)
print time.clock()-t

Here the results (python 2.5 WinXP)

0.0182201927899
0.0262544541275
0.120238186697

Shouldn’t it be exactly the other way round ?

Answer

It is true you should not use ‘+’. Your example is quite special, try the same code with:

s1='*'*100000
s2='+'*100000

Then the second version (str.join) is much faster.



Source: stackoverflow