Notes on Apache Processes and Python Interpreters

3 minute read

From https://modwsgi.readthedocs.io/en/develop/user-guides/processes-and-threading.html

The default behaviour of mod_wsgi is to create a distinct Python sub interpreter for each WSGI application. […] When Apache is run in a mode whereby there are multiple child processes, each child process will contain sub interpreters for each WSGI application.

Where shared data needs to be visible to all application instances, regardless of which child process they execute in, and changes made to the data by one application are immediately available to another, including any executing in another child process, an external data store such as a database or shared memory must be used. Global variables in normal Python modules cannot be used for this purpose.

Access to and modification of shared data in an external data store must be protected so as to prevent multiple threads in the same or different processes from interfering with each other. This would normally be achieved through a locking mechanism visible to all child processes.

From https://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#python-simplified-gil-state-api

The consequences of attempting to use a C extension module for Python which is implemented against the simplified API for GIL state management in any sub interpreter besides the first, is that the code is likely to deadlock or crash the process. The only way around this issue is to ensure that any WSGI application which makes use of C extension modules which use this API, only runs in the very first Python sub interpreter created when Python is initialised.

To force a specific WSGI application to be run within the very first Python sub interpreter created when Python is initialised, the WSGIApplicationGroup directive should be used and the group set to ‘%{GLOBAL}’:

WSGIApplicationGroup %{GLOBAL}

From https://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIApplicationGroup.html

The WSGIApplicationGroup directive can be used to specify which application group a WSGI application or set of WSGI applications belongs to. All WSGI applications within the same application group will execute within the context of the same Python sub interpreter of the process handling the request.

Setting WSGIApplicationGroup doesn’t control what processes a request is handled by, that is what the WSGIProcessGroup directive does. In other words, the WSGIProcessGroup directive operates distinct from the WSGIApplicationGroup directive, with WSGIProcessGroup dictating what named group of processes a request is handled by, and WSGIApplicationGroup dictating which named Python sub interpreter context (application group) of those processes is used. In each distinct process of a named group of processes, there will be a separate sub interpreter instance of same name, for handling the requests accepted by that process.

The argument to the WSGIApplicationGroup can be either one of four special expanding variables or an explicit name of your own choosing. The meaning of the special variables are:

%{GLOBAL}

The application group name will be set to the empty string.

Any WSGI applications in the global application group will always be executed within the context of the first interpreter created by Python whenit is initialised, of the process handling the request. Forcing a WSGIapplication to run within the first interpreter can be necessary when a third party C extension module for Python has used the simplified threading API for manipulation of the Python GIL and thus will not run correctly within any additional sub interpreters created by Python.

In conclusion, each Apache Process creates one Python Interpreter. The Apache Threads of each Process are used to handle requests.

If the same group of processes runs multiple WSGI applications, then Apache can create multiple subinterpreters. However, this use case is not common and can be problematic due to the issues previously mentioned with subinterpreters.

Example config:

  WSGIDaemonProcess appname python-path=... processes=2 threads=15 maximum-requests=3000
  WSGIProcessGroup appname
  WSGIApplicationGroup %{GLOBAL}

Keep in mind that there are options, such as maximum-requests, which restart processes. So even if you don’t expected, depending of the configuration, Apache may restart processes inadvertently. Also, Linux may kill processes if out of memory too. So design your Python application so multiple instances are compatible and don’t have concurrency problems

See also: https://stackoverflow.com/questions/5021424/mod-wsgi-daemon-mode-wsgiapplicationgroup-and-python-interpreter-separation

Categories:

Updated:

Leave a comment