Skip to content
Advertisement

Parse human-readable filesizes into bytes

example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

I want to convert all this strings into bytes. So far I came up with this:

def parseSize(size):
    if size.endswith(" B"):
        size = int(size.rstrip(" B"))
    elif size.endswith(" KB"):
        size = float(size.rstrip(" KB")) * 1000
    elif size.endswith(" MB"):
        size = float(size.rstrip(" MB")) * 1000000
    elif size.endswith(" GB"):
        size = float(size.rstrip(" GB")) * 10000000000
    elif size.endswith(" TB"):
        size = float(size.rstrip(" TB")) * 10000000000000
    return int(size)

But I don’t like it and also I don’t think it works. I could find only modules that do the opposite thing.

Advertisement

Answer

Here’s a slightly prettier version. There’s probably no module for this, just define the function inline. It’s very small and readable.

units = {"B": 1, "KB": 10**3, "MB": 10**6, "GB": 10**9, "TB": 10**12}

# Alternative unit definitions, notably used by Windows:
# units = {"B": 1, "KB": 2**10, "MB": 2**20, "GB": 2**30, "TB": 2**40}

def parse_size(size):
    number, unit = [string.strip() for string in size.split()]
    return int(float(number)*units[unit])


example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

for example_string in example_strings:
    print(parse_size(example_string))

10680
11811160064
359766426

(Note that different places use slightly different conventions for the definitions of KB, MB, etc — either using powers of 10**3 = 1000 or powers of 2**10 = 1024. If your context is Windows, you will want to use the latter. If your context is Mac OS, you will want to use the former.)

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement