During the base performance testing on the dmr+++ machinery in various parts of Hyrax we noticed that when we had an joinNew aggregation of dmr+++ files whose binary objects were held on a remote system (in this case S3) that servers response times indicated a problem. The response times would be consistent for a while (~23 requests) and then one response would take a very long time to produce. After the slow response, the next request will fail. (This is a typical pattern when the BES listener dies and the OLFS discovers the problem while servicing the next request.) After this the pattern begins again. Based on the testing I have already done (see work log below) it's pretty clear that there is a memory leak in the interaction between the ncml_handler and the dmrpp_module. The long response time preceding the failed request corresponds to kernel swap dominating the process stack, as observed with top.
The mission: Find and fix this memory leak.