There are a lot of articles around the web concerning Python performance. The first thing you read is concatenating strings should not be done using ‘+’; avoid s1 + s2 + s3, and instead use str.join
I tried the following: concatenating two strings as part of a directory path: three approaches:
- ‘+’ which I should not do
- str.join
- os.path.join
Here is my code:
import os, time s1 = '/part/one/of/dir' s2 = 'part/two/of/dir' N = 10000 t = time.clock() for i in xrange(N): s = s1 + os.sep + s2 print time.clock() - t t = time.clock() for i in xrange(N): s = os.sep.join((s1, s2)) print time.clock() - t t = time.clock() for i in xrange(N): s = os.path.join(s1, s2) print time.clock() - t
Here the results (Python 2.5 on Windows XP):
0.0182201927899 0.0262544541275 0.120238186697
Shouldn’t it be exactly the other way around?
Advertisement
Answer
It is true you should not use ‘+’. Your example is quite special. Try the same code with:
s1 = '*' * 100000 s2 = '+' * 100000
Then the second version (str.join) is much faster.