Skip to content
Advertisement

Python3 cgi.FieldStorage parses file name but not contents between boundary tags

I inherited a python3 project where we are trying to parse a 70 MB file with python 3.5.6 . I am using cgi.FieldStorage

File (named: paketti.ipk) I’m trying to send:

kissakissakissa
kissakissakissa
kissakissakissa

Headers:

X-FILE: /tmp/nginx/0000000001
Host: localhost:8082
Connection: close
Content-Length: 21
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: multipart/form-data; boundary=---------------------------264635460442698726183359332565
Origin: http://172.16.8.12
Referer: http://172.16.8.12/
DNT: 1
Sec-GPC: 1

Temporary file /tmp/nginx/0000000001:

-----------------------------264635460442698726183359332565
Content-Disposition: form-data; name="file"; filename="paketti.ipk"
Content-Type: application/octet-stream

kissakissakissa
kissakissakissa
kissakissakissa

-----------------------------264635460442698726183359332565--

Code:

class S(BaseHTTPRequestHandler):
  def do_POST(self):
    temp_filename = self.headers['X-FILE']
    temp_file_pointer=open(temp_filename,"rb")
    form = cgi.FieldStorage( fp=temp_file_pointer, headers=self.headers, environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':self.headers['Content-Type'], 'CONTENT_LENGTH':self.headers['Content-Length'] }, )
    actual_filename = form['file'].filename
    logging.info("ACTUAL FILENAME={}".format(actual_filename))
    open("/tmp/nginx/{}".format(actual_filename), "wb").write(form['file'].file.read())
    logging.info("FORM={}".format(form))

Now the strangest things. Logs show:

INFO:root:ACTUAL FILENAME=paketti.ipk
INFO:root:FORM=FieldStorage(None, None, [FieldStorage('file', 'paketti.ipk', b'')])

Look at the /tmp/nginx directory:

root@am335x-evm:/tmp# ls -la /tmp/nginx/*
-rw-------    1 www      www            286 May 18 20:48 /tmp/nginx/0000000001
-rw-r--r--    1 root     root             0 May 18 20:48 /tmp/nginx/paketti.ipk

So, it is like partially working because the name is got. But why it does not parse the data contents? What am I missing?

Is this even doable on python or should I just write a C utility? The file is 70 MB and if I read it in memory, OOM-killer kills the python3 process (and rightfully so, I’d say). But yeah, where does the data contents go?

Advertisement

Answer

There were more issues at play than I first thought.

First, /tmp was coming from tmpfs having maximum size of 120MB.

Secondly, my nginx.conf was problematic. I needed to comment out stuff like this to clean it up:

#client_body_in_file_only       on
#proxy_set_header               X-FILE $request_body_file;
#proxy_set_body                 $request_body_file;

Then I needed to add these

proxy_redirect                 off; # Maybe not that importnat
proxy_request_buffering        off; # Very important

After this the code

form = cgi.FieldStorage( fp=self.rfile, headers=self.headers, environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':self.headers['Content-Type'], })

started to “work”. I’m monitoring /tmp usage and it uses first 70MB and then full 120 MB. The uploaded file is truncated to 50 MB.

So, when I am reading and writing parsed cgi.FieldStorage even in a loop of 4096 characters, the system reads it automatically FULLY to somewhere in /tmp once and then tries to write the final file and encounters “No space left on device” error.

To fix this I keep the nginx.conf additions and just read the self.rfile manually myself in a loop, totally reading [‘Content-Length’] (anything other makes it go bonkers). This is able to save it cleanly with one pass; there is no more than single time 70MB usage of /tmp .

Advertisement