|Title:||Handling POST forms in WSGI|
|Author:||Ian Bicking <email@example.com>|
|Discussions-To:||Python Web-SIG <firstname.lastname@example.org>|
This suggests a way that WSGI middleware, applications, and frameworks
can access POST form bodies so that there is less contention for the
environ['wsgi.input'] points to a stream that represents
the body of the HTTP request. Once this stream has been read, it
cannot necessarily be read again. It may not have a
(none is required by the WSGI specification, and frequently none is
provided by WSGI servers).
As a result any piece of a system that looks at the request body essentially takes ownership of that body, and no one else is able to access it. This is particularly problematic for POST form requests, as many framework pieces expect to have access to this. One notable case is when a request “enters” a traditional web framework which parses the POST form, then “exits” back to WSGI through some framework-specific WSGI gateway.
The specification covers library code that multiple frameworks can implement. This is not functionality that is intended to be added to a WSGI “stack”.
This applies when certain requirements of the WSGI environment are met:
def is_post_request(environ): if environ['REQUEST_METHOD'].upper() != 'POST': return False content_type = environ.get('CONTENT_TYPE', 'application/x-www-form-urlencoded') return (content_type.startswith('application/x-www-form-urlencoded' or content_type.startswith('multipart/form-data'))
That is, it must be a POST request, and it must be a form request
application/x-www-form-urlencoded or when there are
When this happens, the form can be parsed by
cgi.FieldStorage. The results of this parsing is put in
new_wsgi_input can be used to check if an intermediary has
replaced the input since
wsgi.post_form was calculated. If
the input has been changed, the
wsgi.post_form data should
be discarded. The
old_wsgi_input can be used if you want to get
access to the original input stream (which may be seekable, and so
wsgi.input guards against routines that
access the data but don’t conform to this specification. Ideally the
replacement will act like the original
the same data), but if not it should raise an exception. The input
should not block or produce inaccurate data.
def get_post_form(environ): assert is_post_request(environ) input = environ['wsgi.input'] post_form = environ.get('wsgi.post_form') if (post_form is not None and post_form is input): return post_form # This must be done to avoid a bug in cgi.FieldStorage environ.setdefault('QUERY_STRING', '') fs = cgi.FieldStorage(fp=input, environ=environ, keep_blank_values=1) new_input = InputProcessed('') post_form = (new_input, input, fs) environ['wsgi.post_form'] = post_form environ['wsgi.input'] = new_input return fs class InputProcessed(object): def read(self, *args): raise EOFError('The wsgi.input stream has already been consumed') readline = readlines = __iter__ = read
By using this routing multiple consumers can parse a POST form, accessing the form data in any order (later consumers will get the already-parsed data).
Note that nothing in this specification touches or applies to the
query string (in
environ['QUERY_STRING']). This is not parsed as
part of the process, and nothing in this specification applies to GET
requests, or to the query string which may be present in a POST
While this proposal makes it more feasible for middleware to access
POST form data, it should not be read as encouraging middleware to do
so. In particular, no consumer should ever expect that
wsgi.post_form is in the request environment. Also, no
intermediary should parse the POST form data unless it actually is
interested in that data – access should be deferred until there is a
real need for the POST data.
cgi.FieldStorage. This is not the only parser possible, though it is the only parser in common usage.
cgi.FieldStorageis not particularly well defined, so creating compatible parsers is difficult.
cgi.FieldStoragedoesn’t have any unicode handling (it has to be done higher up).
wsgi.inputwould stick around, either as a temporary file or as a file that was a lazy serialization of the parsed data.
One of the simplest possibilities is to add this information to
environ['wsgi.input'] itself as a separate attribute. E.g.:
fs = getattr(environ['wsgi.input'], 'cgi_FieldStorage', None) if fs is None: # parse and replace wsgi.input...
There’s a certain elegance to keeping
self-describing and movable.
POSTrequests. Most of the same issues apply to such requests, except that frameworks tend not to touch the request body in that case. The body may be large, so the actual contents of the request body shouldn’t go in the environment. Perhaps they could go in a temporary file, but this too might be an unnecessary indirection in many cases. Also other kinds of request (like
PUT) that have a request body are not covered, for largely the same reason. In both these cases, it is much easier to construct a new
wsgi.inputthat accesses whatever your internal representation of the request body is.
FieldStorageinstance? Should all the information go in
wsgi.inputbe replaced by
InputProcessed, or just left as is? Or should we look for code that serializes
FieldStorageobjects back to parseable strings?
QUERY_STRINGactually have to be set for
cginot to mess up, or is that just an issue with GET requests?