Skip to content
Advertisement

improve the throughput of sending multiple file via socket

At First, I don’t have any errors or bugs, I ask this question to understand more. I want to send multiple files concurrently via a separate connection to a server from a client. I used threads to make the sending process concurrent on the client-side. It sounds that it does improve the throughput a little bit. But I’m still confused. below is my server and client code. I don’t know how using threads makes this process concurrent, because the socket on the server-side has a queue, and all the data stash in the queue whether they were sent in turn or concurrently. Can anyone explain it to me? or if I am wrong or my code doesn’t make sending concurrent, please let me know! thanks.

server.py

    def rcv_thread(self, conn):
        context = ''
        while True:
            try:
                recvfile = conn.recv(4096)
                context += recvfile.decode()
            except Exception:
                if context == '':
                    return
                break
        
      /////////
        conn.close()

    def receive(self):
        self.t = time.time()
        while True:
            c, addr = self.socket.accept()
            c.settimeout(2)
            start_new_thread(self.rcv_thread,(c,))


client.py

    def transfer_file(self, file_name):
        path = self.path + "/" + file_name
        sckt = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sckt.connect((self.HOST, self.PORT))
        file = open(path, 'rb')
        context = file.read()
        sckt.sendall((file_name + "##" + context.decode()).encode())
        sckt.close()

    def run(self):
        self.start_time = time.time()
        files = os.listdir(self.path)
        num_of_iterates = int(len(files) / self.concurrency)
        for j in range(num_of_iterates + 1):
            min_iter = min(self.concurrency, len(files) - j * self.concurrency)
            for i in range(min_iter):
                th = threading.Thread(target=self.transfer_file, args={files[j * self.concurrency + i]})
                th.start()
                self.connection_threads.append(th)

            for i in range(min_iter):
                self.connection_threads[i].join()

Advertisement

Answer

the socket on the server-side has a queue, and all the data stash in the queue whether they were sent in turn or concurrently.

There are multiple sockets involved here not a single one. On the server side there is a listener socket which returns a new connected socket on accept. Similar on the client multiple sockets are used. This results in multiple TCP connections between client and server, each with its own send and receive buffer and independent flow control.

Given that TCP connection starts with a small window of in-flight data and slowly ramps up the window one can make better use of the available bandwidth by having many short-lived TCP connection in parallel than having many short-lived TCP connections after each other. It can be even more efficient though to have only a few long-lived TCP connections instead and transfer multiple files over these connections.

Apart from the network traffic there are other things which lead to more performance when using multiple threads. For one it makes better use of multiple cores in today’s CPU. Then for each transfer a file is read from disk which adds a delay too – more delay for slow disks. Opening and reading multiple files in parallel can be much faster then doing this in sequence since the underlying OS is able to run several disk operations in parallel and optimize disk access.

Advertisement