Circular references, memory leaks, fieldhashes and Mojolicious

INTRODUCTION

Yesterday we could see very engrossing topic on Mojolicious issues tracker. It was already slightly edited by the author, but you still can get the point. Let's try to deal what is circular references, how they can produce memory leaks and should Mojolicious use fieldhashes inside Mojo::IOLoop::Delay.

WHAT IS CIRCULAR REFERENCES

In the simplest form circular reference produced when one variable refers to another and another to the first one:

my ($a, $b);
$a = \$b;
$b = \$a;

In the real world circular references always has more complicated form. In this example $ua has reference to itself inside cb key, which is a reference to subroutine. This is a simplified version of what some common non-blocking classes with callbacks nature, like Mojo::UserAgent, can produce under the hood.

my $ua = {};
$ua->{cb} = sub { $ua->{foo} = 123 }

HOW CIRCULAR REFERENCES MAY PRODUCE MEMORY LEAKS

Perl interpreter has simple garbage collector which destroys variable when reference count to this variable will decrease to zero. After creation variable has 1 reference. When you'll assign reference to this variable to another variable, reference count will be increased to 2. An so on.

# refcount = 1
my %a = (a => 1, b => 2);
# refcount = 2
my $b = \%a;
# refcount = 3
my $c = \%a;
# refcount = 4
my $d = $c;

# refcount = 1
my $a = {a => 1, b => 2};
# refcount = 2
my $b = $a;
# refcount = 2, this will just copy to new variable
my %c = %$a;

To decrease reference count you should assign some other value (like undef) to variable which contains reference to this variable or let variable to go out of the scope. Let's see what is the problem with our first example of circular references:

use Devel::Refcount 'refcount';
{
my ($a, $b); # just two undefined variables
$a = \$b; # now $a contains reference to $b, if we'll change value of $b, with $$a we'll see this changes
warn refcount($a); # reference count to $b=2!
$b = \$a;  # now $b contains reference to $a which contains reference to $b: cycle!
warn refcount($b); # reference count to $a=2!
}

# here $a and $b will go out of the scope and you may think that both variables should be destroyed
# but this will only decrease reference count of each to 1, which is not 0, and because of this
# perl will not destroy it

So, you got two variables somewhere in the memory and you even lost access to this variables, which means that you can't delete any of it. This is memory leak! You can change program like this to see how it leaks megabytes of memory:

for (1..10_000_000) {
        my ($a, $b);
        $a = \$b;
        $b = \$a;
}

warn "See my memory usage - My pid is $$";
<>;

Let's change this first example a little:

{
        my ($a, $b);
        $a = \$b;
        $b = $a; # now will assign plain $a instead of reference to $a
}

How this change the things? Is it still leaks? Yes, it leaks, but a little smaller! See: in this example on the first line reference count of both variables = 1, on the second we increase reference count of $b to 2 ($a still 1), on the third line we increase reference count of $b one more time because $a is a reference to $b (now $b has 3 references and $a still 1). So, at the end of scope reference count of $a will be decreased to zero and it will be destroyed, $b will be still alive with reference count equals to 1 (3-$a-$b=1). And with 10_000_000 iterations you can see, that this example eats twice less memory.

Let's see our second example:

use Devel::Refcount 'refcount';
{
my $ua = {}; # $ua has reference to a hash with reference count = 1
$ua->{cb} = sub { $ua->{foo} = 123 } # assign closure
warn refcount($ua); # reference count = 1, is it leaks?!
}

We can see, that reference count to $ua at the end of block is 1. But try to run it with 10_000_000 iterations. It eats gigabytes of memory! What's going on? Well, in this example we used closure. The main principle of closure is to keep references to variables used inside it and created outside of the closure even if this variables went out of the scope. If we'll change our example like this, leaks will go away:

{
my $ua = {};
my $x = sub { $ua->{foo} = 123 } # assign closure
}
#1 closure inside $x garabage collected
#2 $ua garabage collected

But in our case closure keeped $ua reference assigned to the field inside $ua. Cycle! So, this is the second case which can produce memory leaks.

Let's change second example this way:

{
my $ua = {};
$ua->{cb} = sub { my $x = 1; $ua->{foo} = sub { $x } };
$ua->{cb}->();
delete $ua->{cb}; # remove cycle
}

And this example leaks only with perl < 5.18! As before we have circular reference inside closure assigned to $ua->{cb}, but we also have nested closure, which has no $ua inside (and no circular references then). Then we invoked closure inside $ua->{cb}, which created $ua->{foo} field with our nested closure. At the end we removed $ua->{cb}. With this action we broke circular reference created by outer closure. But if we removed all circular references why it still leaks? And why it leaks only on older perls? Well, till version 5.18 nested closures in perl captured all variables that outer closures captures. So, while our inner closure doesn't use $ua inside, it still captures it (with perl < 5.18), because outer closure uses it. Be careful with nested closures on perl < 5.18!

HOW TO AVOID MEMORY LEAKS CAUSED BY CIRCULAR REFERENCES

For our first example we can break cycle by assigning $a = undef at the end of block. For second example by assigning $ua = undef at the end of block.

The common practice is to use Scalar::Util::weaken, which prevents increasing of reference count for reference stored inside passed variable. Our second example rewrited this way will not leak the memory:

my $ua = {};
my $weaken_ua = $ua;
weaken $weaken_ua;
$ua->{cb} = sub { $weaken_ua->{foo} = 123 } # use weaken ua inside closure

There are some tools which can help you to find circular references and memory leaks:

Devel::Cycle, Devel::Leak, Devel::TrackObjects, ...

CIRCULAR REFERENCES AND MOJOLICIOUS

Mojolicious is full of callback-style code, which can potentially cause memory leaks. Yestarday sri posted several examples which should proove or refute existens of memory leak. Let's see example with Mojo::UserAgent:

use 5.10.1;
use strict;
use warnings;

use Mojo::IOLoop;
use Mojo::Server::Daemon;
use Mojo::UserAgent;

my $daemon = Mojo::Server::Daemon->new;
$daemon->start;

for my $i (1 .. 1_000_000) {
        my $ua = Mojo::UserAgent->new;
        $ua->get(
                'http://127.0.0.1:3000' => sub {              # 1
                        my $tx = pop;
                        say $tx->res->code;
                        $ua->get(                             # 2
                                'http://127.0.0.1:3000' => sub {
                                        my $tx = pop;
                                        say $tx->res->code;
                                        Mojo::IOLoop->stop;
                                }
                        );
                }
        );
        Mojo::IOLoop->start;
}

At the first look this example has potential leak. $ua refers to closure (1), which refers to $ua (2). Looks like a cycle. But if we'll run this example we will not get memory leak. What's going on? Time to see Mojo::UserAgent implementation. All interesting things starts inside sub _connection:

$self->{connections}{$id} = {cb => $cb, nb => $nb, tx => $tx};

This code really creates circular reference for our case, which in simplified form can be writed as:

$ua->{connections}{$id}{cb} = sub { $ua }

And other interesting place is sub _finish, which will be called when response will be received. Here we can see:

$self->_remove($id, $close);

And inside sub _remove:

my $c = delete $self->{connections}{$id} || {};

Here it is - the place where circular reference was broken by Mojo::UserAgent. So, there is no more circular references and Mojo::UserAgent may be successfully garabage collected when it will go out of the scope.

Other example contained symbiosis of Mojo::UserAgent and Mojo::IOLoop::Delay:

use 5.10.1;
use strict;
use warnings;

use Mojo::IOLoop;
use Mojo::Server::Daemon;
use Mojo::UserAgent;

my $daemon = Mojo::Server::Daemon->new;
$daemon->start;

for my $i (1 .. 1_000_000) {
        my $ua = Mojo::UserAgent->new;
        Mojo::IOLoop->delay(
                sub {
                        my $delay = shift;
                        $ua->get('http://127.0.0.1:3000' => $delay->begin);
                },
                sub {
                        my ($delay, $tx) = @_;
                        say $tx->res->code;
                        $ua->get('http://127.0.0.1:3000' => $delay->begin);
                },
                sub {
                        my ($delay, $tx) = @_;
                        say $tx->res->code;
                }
        )->wait;
}

Here we also will not get any memory leak. In the current implementation Mojo::IOLoop::Delay uses fieldhash to store passed callbacks outside of Mojo::IOLoop::Delay object. So, with this example we'll not get circular references at all. But if Mojo::IOLoop::Delay will store passed callbacks inside object as it was before version 4.95 we'll get circular reference like:

$delay->{remaining}[0] = sub { $ua }
$ua->{connections}{$id}{cb} = sub { $delay }

But this will also NOT produce memory leaks, because Mojo::IOLoop::Delay deletes executed steps, which will break circular reference.

FIELDHASHES

Hash::Util::FieldHash::fieldhash provides a way to build a class which will store all it attributes inside package variable (hash) instead of storing they inside object as we always do it with hash-based classes. It will automatically remove all attributes for registered object when reference count to this object will decrease to zero.

As I said before Mojo::IOLoop::Delay uses this technique in the current implementation. And as we saw in Mojo::UserAgent + Mojo::IOLoop::Delay example this helped to prevent circular reference. But this circular reference was not danger because Mojo::IOLoop::Delay breaks it when executes next callback by deliting it from internal storage.

Let's try to imagine situation when fieldhash may help to prevent leak. The first thing that comes to my mind is situation when delay will be garbage collected, but not all steps executed. But this is impossible, because Mojo::IOLoop::Delay always keeps second reference to the object. And even if we'll undef $delay after steps call it will still continue to work.

In fact this fieldhash stuff only prevents circular references when object used inside callback stores this callback inside object, like in the example with Mojo::UserAgent. And Mojo::IOLoop::Delay will always break this cycles.

FINDINGS

FieldHash technique shouldn't be used insight Mojo::IOLoop::Delay because it will not provide any useful additional functionality. Benchmark showed about 5% speed up with steps stored inside object instead of fieldhash.

Comments (4)

alexbyk says:
Sat, 14 Mar 2015
Hi. Nice investigation) Useful article)
alexbyk says:
Sat, 14 Mar 2015
It wasn't edited by me) I just got a ban by sri in the repo trying to convince that everybody are mistaken) Hope your article will do it better than me)
Brian Manning says:
Sat, 14 Mar 2015
Good post, the explanations were clear, thanks. One nit-pick:
to brake = [за]тормозить
to break = [с]ломать
;)
Oleg says:
Sun, 15 Mar 2015
Thanks, Brian
Fixed :)

Add comment

Fields marked * are required.
This form has a bot protection mechanism, that requires Cookies.
Please, don't disable them.



Gravatar-friendly




Paragraphs are created automatically. Available tags: [quote], [code].