Subverting control with weak references
Weak references are neat. The best language features unlock different kinds of abstractions, and weak references do exactly that. Let me show you why.
In JavaScript we have two APIs to work with weak references: WeakMap
and WeakRef
. (Before I wrote this article I thought WeakRef
was only a proposal, but it turns out most browsers have already implemented it)
One of the more common use cases uses WeakMap
. This data structure keeps a weak reference to the keys in the map, and a strong reference between the keys and values. I guess I should explain what a "weak reference" is: usually if you have a reference to an object in a variable, it stops the garbage collector from deleting it (which makes sense). It'd be weird if suddenly your variable pointed to nothing right?
A "weak reference" doesn't stop the object from being garbage collected. Usually languages' semantics don't allow a variable to change in the middle of execution however:
// this doesn't exist, but what if we could create a weak reference like this?
let weak trans = new Transaction();
// do a bunch of things...
// error! trans is... nothing? it got garbage collected
trans.transfer();
Wouldn't it be weird if trans
changed in the middle of execution? If that was possible all bets would be off for everything in the entire program because these weak references could be passed anywhere.
Instead, APIs for weak references force you to call a function to get the value. The WeakRef
class has a deref
method to get the object.
const ref = new WeakRef(obj)
// get the object
ref.deref()
// do a bunch of things...
ref.deref()
In the above example, if nothing has a reference to obj
, it'll eventually get garbage collected. That means deref
might return obj
the first time, but the second time it might return undefined
.
Let's get back to WeakMap
which is the more common usage of weak references. (Direct weak references have a lot of weird behaviors and they should be considered very low-level.) A WeakMap
has the same API as Map
, but it holds a weak reference to the keys:
const map = new WeakMap();
map.set(obj, new Thing());
// later on
const thing = map.get(obj)
This map
will not retain obj
. Once obj
gets garbage collected, it will no longer be in the map. Note that there is a strong reference to the value thing
, meaning that as long as obj
exists, so will thing
. However, because obj
is weakly referenced, once it gets garbage collected it no longer references thing
in the map. That means the strong reference to the values will never result in memory leaks. It only means that thing
will be alive as long as obj
is.
This brings me to my favorite use of weak references: subverting control of abstractions.
Let's say you have a Transaction
class:
class Transaction {
// implementation
}
Now let's say we have a function that builds a textual representation of a transaction. To do this, it fetches a bunch of extra information from a remote server.
async function describe(transaction) {
const customer = await fetchCustomer(...)
const conversions = await fetchCurrencyConversions(...)
// etc
// The rest of the implementation sets `description`
return description;
}
This brings up a lot of interesting questions about the shape of your abstractions. You probably don't want to re-fetch all of these details every single time you call describe
. Even if you are sure describe
is only called once, what will happen is at some point in the future you will say "oh shoot, I want to call that again".
So we need to figure out how to avoid all the refetching work. One solution is to have a caching layer that owns all of that work. You'd delegate it to a network abstraction that ensures everything gets cached.
That forces the describe
function to buy-in to your entire networking layer though. Maybe that's not possible for any number of reasons: this is an experimental feature and the backend APIs don't work well with the frontend caching yet, or this describe
is used in other products that don't have the same networking layer. Regardless, maybe you are doing a lot of CPU-intensive work and you simply cache the result.
Using WeakMap
you can cache the result based off of transaction
:
const CACHE = new WeakMap();
async function describe(transaction) {
let cached = CACHE.get(transaction);
if(cached) {
return cached;
}
// Not cached, do all the work...
CACHE.set(transaction, description);
return description;
}
What's cool about this is you never have to worry about memory leaks. The CACHE
map will never accidentally keep growing with references to object we don't need anymore. When transaction
gets GCed it simply won't be in the map anymore.
You might recognize this as simple memoization. This is only one simple use case though. You can use this trick in any cache where you want to "cache" something, regardless if it's based on the arguments or not:
function describe(transaction) {
let currency = getCurrency(transaction);
let converter = CACHE.get(currency);
if(!converter) {
// get it...
}
// keep doing stuff...
}
Maybe "converter" here is some kind of currency converter, and to get it requires an expensive API call that you want to cache.
Again, you could lift this caching up into a network layer. But this is useful when you are stuck in a place where for some reason you just can't use that. Another strategy would be to have the transaction
object itself handle the caching. You could add something on the Transaction
class:
class Transaction {
_cachedCurrencyConverter = null;
// rest of the implementation
}
Now in describe
we could set it on the Transaction
instance:
function describe(transaction) {
let currency = getCurrency(transaction);
let converter = transaction._cachedCurrencyConverter;
if(!converter) {
converter = /* get it */
transaction._cachedCurrencyConverter = converter;
}
}
This could work. However, we are assuming the Transaction
class is something we control. Similar to the networking layer problem, we might not be able to change it. We could get spicy in set it on transaction
ourselves without it being defined in the class. Now you are monkeypatching instances and these could clash across abstractions (maybe another dev sets something with the same name).
I also believe stuffing classes with semi-related data also leads to bloated abstractions. There are better ways to coordinate.
If you are in a weird situation where you really want to write a describe
function with the caching semantics, but you don't have control over Transaction
and you can't rely on the networking layer, what are you to do?
This WeakMap
technique subverts control of the program: it gives you the power to add these semantics yourself. You need to track state somewhere. If you can't control Transaction
or the networking layer, you're kinda stuck. You could use a normal Map
but then you need to figure out an eviction strategy and the semantics just wouldn't be as good.
Tracking more state
This made me wonder: does this open up other interesting patterns? What if we used it to track a bunch of other state?
Let's say I'm on a team working on currency conversions. We are writing a lot of code for this, and we want to structure it in a class to provide all the functionality:
class CurrencyConversion {
// implementation
}
We want this code to live as if it was an extension of the Transaction
class: the decisions of which systems to use all apply here. Anywhere you can use Transaction
, you can also use this class to manage currency conversions. In fact, if we controlled Transaction
, we would just add all of this code in there. But let's assume we don't own it and the team is pushing back on our APIs.
We kind of want to "extend" Transaction
with our APIs and treat it like this:
+------------------------------------+
|+-------------+ |
|| Transaction | Currency conversion |
|+-------------+ |
+------------------------------------+
We won't go so far as monkeypatching Transaction
, so the APIs will stay separate. But how can we make it as seamless as possible? What if we had a getConverter
function that created our APIs?
const CONVERTERS = new WeakMap();
function getConverter(transaction) {
const converter =
CONVERTERS.get(transaction) || new CurrencyConversion(transaction, {
/* options */
});
CONVERTERS.set(transaction, converter);
return converter;
}
Now we can get access to our APIs whenever we need to:
function describe(transaction) {
if (!getConverter(transaction).isReady()) {
// do something..
}
let amount = getConverter(transaction).convert();
// generate a description
}
This is a silly example. In real code, you have shared systems that make it easy to write new abstractions that can coordinate easily. However, after working for a large company for several years I've seen how many times you want more control yourself. It's too easy to discover the the existing systems don't work for you.
What is so different between the above technique and just simply doing this?
function describe(transaction) {
const converter = new CurrencyConversion(transaction);
// Use the converter throughout this code
}
Hopefully by now the difference is obvious: here converter
only exists for the lifetime to the describe
function call. When we store it in a weak map, it exists for the entire lifetime of the transaction
class. That allows us to write code the assumes the same lifetime (caching things, using instance equality, etc)
Admittedly, this is more of an edge case. I encourage you to be intentional about how you shape your abstractions however. How you structure code and coordinate abstractions can make or break your codebase, so while there's never a single best solution, the more tools we have the better.