Lately, I’ve been thinking about what the package manager for Austral is going to look like. Dependency resolution is a surprisingly complex problem, so I did a deep dive, and this post explains how to solve the problem in a way that is tractable and doesn’t require reinventing the wheel too much.
Contents
 The Problem
 Propositional Logic
 Translating the Problem
 A Simple SAT Solver
 Example Run
 Heuristics
 External Solvers
 See Also
The Problem
If packages specified the exact version of every one of their dependencies, there would be no dependency resolution problem. There would however be constant dependency integration problems.
You’re writing an application that depends on libraries A and B. Library A depends on Foo v1.0.2, library B depends on Foo v1.0.3. Now you have a dependency conflict^{1}. The bruteforce solution is to vendor one of A and B and adjust their dependencies. And you may have to do this recursively. Then you miss out on improvements, security fixes, etc.
“Fine”, you said. “I’ll allow depending on major versions. Then A and B can just depend on Foo v.1.*.*, and the solver picks the latest version.”
And this mainly solves the problem, assuming everyone sticks to semver. But there are two problems:

Packages can hold back from upgrading because a new (minor or patch) version introduces either a bug or a performance regression that affects their usage of that package. The developers of A might want to exclude everything above 1.7.*, while the developers of B are indifferent as long as the major version is 1. So picking e.g. 1.9.0 satisfies B but is bad for A.

Sometimes you want to depend on a library across major versions (e.g. the standard library) because you’re only using the parts of the code that are highly conserved across major versions.
Even with strict semver enforcement (like Elm does), there are observable features of libraries that fall outside of what an API spec can describe.
In other words, there’s a tradeoff:
 Less expressive version ranges make things easier for the package manager, while complicating the user experience and requiring lots of manual work from the user.
 More expressive version ranges allow users greater freedom to allow or deny specific package versions, while complicating the work that the package manager has to perform.
The dependency resolution problem is:
Given a set of package dependency constraints, find an assignment—a map of package names to specific package versions—that satisfies every constraint without having multiple versions of the same package.
This turns out to be a surprisingly hard problem. NPcomplete, actually. Manifest/lock files are orthogonal to this: manifest files are about making builds reproducible, but manifest files are a product of dependency resolution, not an input to it.
The dependency resolution problem is solved, in wildly different ways, by different package managers. There exists a simple and satisfactory solution that involves translating the problem into logic, applying a SAT solver (a mature, wellunderstood piece of technology), and translating the solution back to the original problem domain. This approach is correct, it is free of adhockery, and it benefits from the fact that SAT solvers have been optimized to death over decades.
Propositional Logic
Propositional logic is the simplest kind of logic. It has two components:
 Syntax: the rules for writing logical sentences.
 Semantics: the rules for evaluating logical expressions to a truth value—true or false—just as arithmetic expressions can be evaluated to a number
The syntax of propositional logic is simple: expressions are built up, starting from constants (true/false) and variables, by joining them through logical operators. More formally, an expression in propositional logic can be one of:
 $\false$, which stands for false.
 $\true$, which stands for true.
 A variable (typically $A$, $B$, $C$, etc.) is an expression that can be replaced with true or false. Variables represent the inputs to a problem in logic.
 $\neg P$, read “not $P$”.
 $P \land Q$, read “$P$ and $Q$”.
 $P \lor Q$, read “$P$ and or $Q$”.
 $P \implies Q$, read “$P$ implies $Q$”.
 $P \iff Q$, read “$P$ if and only if $Q$”.
Or, in Haskell:
data Expr = CFalse
 CTrue
 Var String
 Not Expr
 And Expr Expr
 Or Expr Expr
Implication and ifandonlyif don’t need their own constructors, because they are not primitive:
 Implication should be thought of as a predicate. $P \implies Q$ means:
 If $P$ is true and $Q$ is true, then $P \implies Q$ is true.
 If $P$ is true, and $Q$ is false, then $P \implies Q$ is false.
 If $P$ is false, the value of $Q$ doesn’t matter, and the implication $P \implies Q$ as a whole evaluates to true. If you draw the truth table this is the same as writing $(\neg P) \lor Q$.
 $P \iff Q$ is true when $P$ and $Q$ have the same truth value. This is the same as writing $(P \implies Q) \land (Q \implies P)$.
An assignment is map from variables to truth values. Given an assignment $v$, expressions can be evaluated like so:
eval :: (String > Bool) > Expr > Bool
eval v expr = case expr of
CFalse > False
CTrue > True
Var s > env s
Not p > not (eval v p)
And p q > (eval v p) && (eval v q)
Or p q > (eval v p)  (eval v q)
For example, given:
\[((A \land B) \lor C) \rightarrow (\lnot B \land C)\]And an assignment $\{ A \rightarrow \true, B \rightarrow \false, C \rightarrow \true \}$, we can evaluate the expression in a step by step manner like so:
Expression  Step 

$((A \land B) \lor C) \rightarrow (\lnot B \land C)$  Original. 
$(\neg ((A \land B) \lor C)) \lor (\lnot B \land C)$  Simplify implication. 
$(\neg ((\true \land \false) \lor \true)) \lor (\lnot \false \land \true)$  Replace variables. 
$(\neg (\false \lor \true)) \lor (\lnot \false \land \true)$  $\true \land \false$ is false. 
$(\neg \true) \lor (\lnot \false \land \true)$  $\false \lor \true$ is true. 
$\false \lor (\lnot \false \land \true)$  $\neg \true$ is false. 
$\false \lor (\true \land \true)$  $\neg \false$ is true. 
$\false \lor \true$  $\true \land \true$ is true. 
$\true$  $\false \lor \true$ is true. 
The Boolean satisfiability problem is this:
Given a logical expression, find an assignment of variables to truth values that makes it true.
Solutions to this problem have surprisingly many useful applications. You’ll notice this is worded similarly to the statement of the dependency resolution problem above.
Boolean satisfiability is pretty thoroughly solved. By translating from one problem to the other, we can profit from prior work in this area. The next section explains how.
Translating the Problem
We will translate the set of dependency constraints to a logical expression, according to the following rules:
 Package Versions are Variables: a variable “foov1.2.0” being assigned
$\true$ means all packages in this build DAG will use
v.1.2.0
forfoo
, $\false$ means we’re using some other version offoo
. 
Version Ranges are Disjunctions: suppose you have a version constraint like
foo >= 1.2 && foo < 1.5
. Then you consult the package index for all known versions offoo
. Say it looks like this:1.1
1.2
1.3
1.4
1.5
Then the versions that satisfy this constraint are (
\[\text{foov1.2} \lor \text{foov.1.3} \lor \text{foov1.4}\]1.2
,1.3
,1.4
). This constrain gets translated into the following logical sentence:Arbitrarily complex version range checks can be implemented, by simply treating the check as a predicate that returns either true or false for every version in the package index. Then you make a disjunction of all the versions for which it returned true.

Dependencies are Implications:
\[\text{foov1} \implies (\text{barv1} \lor \text{barv2})\]foov1
depends onbarv2
orbar3
means “one ofbarv2
orbarv3
must be used, conditional onfoov1
being used”. So we translate this as an implication: 
Consistency: the constraint that we can’t have different versions of the same package is expressed by negating the conjunction (and) of every pair of versions of the same package. So if
\[\begin{align*} (\neg (\text{foov1} & \land \text{foov2})) \, \land \\ (\neg (\text{foov1} & \land \text{foov3})) \, \land \\ (\neg (\text{foov1} & \land \text{foov4})) \, \land \\ (\neg (\text{foov2} & \land \text{foov3})) \, \land \\ (\neg (\text{foov2} & \land \text{foov4})) \, \land \\ (\neg (\text{foov3} & \land \text{foov4})) \end{align*}\]foo
has versions[1,2,3,4]
the expression:Ensures we only have one version of
foo
.The procedure here is to take the set of all combinations of length two from the set of all versions of
foo
. 
The Root of the DAG: the package version we’re resolving dependencies for is the root of the build DAG. The variable corresponding to this package version must be true on all assignments, so we have to $\land$ it together with every other term.
Intuitively, the reason we have to add this is the variable being true is what “kicks off” the implications that represent the dependencies and gives us the assignment we want.
A Simple SAT Solver
In this section I’ll show you how to build a simple SAT solver in Python. This is the simplest possible SAT solver: there is not one optimization and the time complexity is exponential. For reallife use, you’ll either want to use a commercial offtheshelf solver or optimize this further (there’s a great deal of literature on how to do this). The point is to demystify the process somewhat (SAT solving sounds very ivory tower) and also to provide an actual, working implementation of this post as a software object you can experiment with.
The interface to the solver will be a function that takes a logical expression
and returns an assignment of variables to Booleans (if the sentence is
satisfiable, that is, if there exists a solution), or None
if the sentence is
unsatisfiable.
First, some types to represent logic expressions:
class Expr:
pass
class FalseExpr(Expr):
pass
class TrueExpr(Expr):
pass
class Var(Expr):
def __init__(self, name: str):
self.name = name
class Not(Expr):
def __init__(self, expr: Expr):
self.expr = expr
class And(Expr):
def __init__(self, exprs: list[Expr]):
self.exprs = exprs
class Or(Expr):
def __init__(self, exprs: list[Expr]):
self.exprs = exprs
class Impl(Expr):
def __init__(self, p: Expr, q: Expr):
self.p = p
self.q = q
The basic algorithm (and this is almost too stupid to be considered) is:
 Start: we have a logical expression.
 Does it have any variables?
 If so: pick a variable arbitrarily (but not randomly, because we want the
algorithm to be deterministic). Then fork:
 In one branch, replace that variable with $\true$, and go to the start.
 In another, replace that variable with $\false$, and go to the start.
 If there are no variables, evaluate the expression. If it evaluates to $\true$, return the list of variable replacements it took to get here.
 If so: pick a variable arbitrarily (but not randomly, because we want the
algorithm to be deterministic). Then fork:
The core logic is implemented like this:
Bindings = dict[str, bool]
def solve(e: Expr) > Bindings  None:
return solver(e, {})
def solver(e: Expr, bs: Bindings) > Bindings  None:
free_var = any_var(e)
if free_var is None:
if eval_expr(e):
return bs
else:
return None
else:
# Replace with True.
t: Expr = replace(e, free_var, True)
t_bs: Bindings = dict(bs)
t_bs[free_var] = True
# Replace with False.
f: Expr = replace(e, free_var, False)
f_bs: Bindings = dict(bs)
f_bs[free_var] = False
# Solve both branches, and return the first one that works.
return solver(t, t_bs) or solver(f, f_bs)
any_var
is a function that takes an expression, and returns an
arbitrarilychosen variable. This is implemented by recursively building up the
set of free variables, then sorting the variables (for determinism)
alphabetically and picking the first one:
def free(e: Expr) > set[str]:
if isinstance(e, FalseExpr) or isinstance(e, TrueExpr):
return set()
elif isinstance(e, Var):
return {e.name}
elif isinstance(e, Not):
return free(e.expr)
elif isinstance(e, And):
return set().union(*[free(expr) for expr in e.exprs])
elif isinstance(e, Or):
return set().union(*[free(expr) for expr in e.exprs])
elif isinstance(e, Impl):
return free(e.p).union(free(e.q))
else:
raise TypeError("Invalid expression type")
def any_var(e: Expr) > str  None:
variables: list[str] = sorted(list(free(e)))
if len(variables) == 0:
return None
else:
return variables[0]
Evaluation is done in the base case of the recursion, when all variables have been replaced:
def eval_expr(e: Expr) > bool:
if isinstance(e, FalseExpr):
return False
elif isinstance(e, TrueExpr):
return True
elif isinstance(e, Var):
raise ValueError(f"eval: the variable {e.name} has not been replaced.")
elif isinstance(e, Not):
return not eval_expr(e.expr)
elif isinstance(e, And):
return all(eval_expr(expr) for expr in e.exprs)
elif isinstance(e, Or):
return any(eval_expr(expr) for expr in e.exprs)
elif isinstance(e, Impl):
return (not eval_expr(e.p)) or eval_expr(e.q)
else:
raise TypeError("Invalid expression type")
Finally, the replace
function takes an expression, and replaces all instances
of the variable with the given name with a Boolean constant:
def replace(e: Expr, name: str, value: bool) > Expr:
if isinstance(e, FalseExpr):
return FalseExpr()
elif isinstance(e, TrueExpr):
return TrueExpr()
elif isinstance(e, Var):
if e.name == name:
return TrueExpr() if value else FalseExpr()
else:
return Var(e.name)
elif isinstance(e, Not):
return Not(replace(e.expr, name, value))
elif isinstance(e, And):
return And([replace(expr, name, value) for expr in e.exprs])
elif isinstance(e, Or):
return Or([replace(expr, name, value) for expr in e.exprs])
elif isinstance(e, Impl):
return Impl(replace(e.p, name, value), replace(e.q, name, value))
else:
raise TypeError("Invalid expression type")
Example Run
Consider the following package database:
Package  Versions 

app 
0 
sql 
0, 1, 2 
threads 
0, 1, 2 
http 
0, 1, 2, 3, 4 
stdlib 
0, 1, 2, 3, 4 
We have an application (the root of the build DAG) and four dependencies, one of
which (threads
) will be transitive. And we have the following constraints:
from dataclasses import dataclass
@dataclass(frozen=True)
class Dependency:
name: str
minimum: int
maximum: int
@dataclass(frozen=True)
class Package:
name: str
version: int
depends_on: list[Dependency]
packages: list[Package] = [
Package(
"app",
0,
[
Dependency("sql", 2, 2),
Dependency("threads", 2, 2),
Dependency("http", 3, 4),
Dependency("stdlib", 4, 4),
],
),
Package("sql", 0, []),
Package("sql", 1, [Dependency("stdlib", 1, 4), Dependency("threads", 1, 1)]),
Package("sql", 2, [Dependency("stdlib", 2, 4), Dependency("threads", 1, 2)]),
Package("threads", 0, [Dependency("stdlib", 2, 4)]),
Package("threads", 1, [Dependency("stdlib", 2, 4)]),
Package("threads", 2, [Dependency("stdlib", 3, 4)]),
Package("http", 0, [Dependency("stdlib", 0, 3)]),
Package("http", 1, [Dependency("stdlib", 0, 3)]),
Package("http", 2, [Dependency("stdlib", 1, 4)]),
Package("http", 3, [Dependency("stdlib", 2, 4)]),
Package("http", 4, [Dependency("stdlib", 3, 4)]),
]
We can translate this into a logic formula as follows (using some helper functions):
from itertools import combinations
def convert(root: str, packages: list[Package]) > Expr:
"""
Given a packageversion to use as the root of the build DAG, and a list of
package dependency constraints, convert them into a logical expression.
"""
# First things first: we need the root package to be part of the assignment.
terms: list[Expr] = [Var(root)]
# Add implications.
for p in packages:
# Package versions imply their dependencies.
for dep in p.depends_on:
versions: list[int] = list(range(dep.minimum, dep.maximum + 1))
deps: list[Expr] = [package_var(dep.name, v) for v in versions]
impl: Impl = Impl(package_var(p.name, p.version), Or(deps))
terms.append(impl)
# Exclude every pair of versions. We do this by taking the set.of free
# variables in the expression built up so far, and for each package depended
# upon, we find all the versions mentioned for it and pairwise exclude them.
variables: set[str] = free(And(terms))
varnames: set[str] = set([var_name(v) for v in variables])
for name in varnames:
vers: set[int] = {var_version(v) for v in variables if var_name(v) == name}
for a, b in all_combinations(vers):
terms.append(Not(And([package_var(name, a), package_var(name, b)])))
# Finally, return the built up expression as a conjunction.
return And(terms)
def package_var(name: str, version: int) > Var:
return Var(f"{name}v{version}")
def var_name(var: str) > str:
return var.split("v")[0]
def var_version(var: str) > int:
return int(var.split("v")[1])
def all_combinations(lst: set[int]) > list[tuple[int, int]]:
return list(combinations(lst, 2))
Converting our dependency constraints into a formula:
formula: Expr = convert("appv0", packages)
The full formula looks like this:
\[\begin{align*} \land ~~ &\text{appv0} \\ \land ~~ &(\text{appv0}\implies(\text{sqlv2})) \\ \land ~~ &(\text{appv0}\implies(\text{threadsv2})) \\ \land ~~ &(\text{appv0}\implies(\text{httpv3} \lor \text{httpv4})) \\ \land ~~ &(\text{appv0}\implies(\text{stdlibv4})) \\ \land ~~ &(\text{sqlv1}\implies(\text{stdlibv1} \lor \text{stdlibv2} \lor \text{stdlibv3} \lor \text{stdlibv4})) \\ \land ~~ &(\text{sqlv1}\implies(\text{threadsv1})) \\ \land ~~ &(\text{sqlv2}\implies(\text{stdlibv2} \lor \text{stdlibv3} \lor \text{stdlibv4})) \\ \land ~~ &(\text{sqlv2}\implies(\text{threadsv1} \lor \text{threadsv2})) \\ \land ~~ &(\text{threadsv0}\implies(\text{stdlibv2} \lor \text{stdlibv3} \lor \text{stdlibv4})) \\ \land ~~ &(\text{threadsv1}\implies(\text{stdlibv2} \lor \text{stdlibv3} \lor \text{stdlibv4})) \\ \land ~~ &(\text{threadsv2}\implies(\text{stdlibv3} \lor \text{stdlibv4})) \\ \land ~~ &(\text{httpv0}\implies(\text{stdlibv0} \lor \text{stdlibv1} \lor \text{stdlibv2} \lor \text{stdlibv3})) \\ \land ~~ &(\text{httpv1}\implies(\text{stdlibv0} \lor \text{stdlibv1} \lor \text{stdlibv2} \lor \text{stdlibv3})) \\ \land ~~ &(\text{httpv2}\implies(\text{stdlibv1} \lor \text{stdlibv2} \lor \text{stdlibv3} \lor \text{stdlibv4})) \\ \land ~~ &(\text{httpv3}\implies(\text{stdlibv2} \lor \text{stdlibv3} \lor \text{stdlibv4})) \\ \land ~~ &(\text{httpv4}\implies(\text{stdlibv3} \lor \text{stdlibv4})) \\ \land ~~ &\neg(\text{stdlibv0} \land \text{stdlibv1}) \\ \land ~~ &\neg(\text{stdlibv0} \land \text{stdlibv2}) \\ \land ~~ &\neg(\text{stdlibv0} \land \text{stdlibv3}) \\ \land ~~ &\neg(\text{stdlibv0} \land \text{stdlibv4}) \\ \land ~~ &\neg(\text{stdlibv1} \land \text{stdlibv2}) \\ \land ~~ &\neg(\text{stdlibv1} \land \text{stdlibv3}) \\ \land ~~ &\neg(\text{stdlibv1} \land \text{stdlibv4}) \\ \land ~~ &\neg(\text{stdlibv2} \land \text{stdlibv3}) \\ \land ~~ &\neg(\text{stdlibv2} \land \text{stdlibv4}) \\ \land ~~ &\neg(\text{stdlibv3} \land \text{stdlibv4}) \\ \land ~~ &\neg(\text{threadsv0} \land \text{threadsv1}) \\ \land ~~ &\neg(\text{threadsv0} \land \text{threadsv2}) \\ \land ~~ &\neg(\text{threadsv1} \land \text{threadsv2}) \\ \land ~~ &\neg(\text{httpv0} \land \text{httpv1}) \\ \land ~~ &\neg(\text{httpv0} \land \text{httpv2}) \\ \land ~~ &\neg(\text{httpv0} \land \text{httpv3}) \\ \land ~~ &\neg(\text{httpv0} \land \text{httpv4}) \\ \land ~~ &\neg(\text{httpv1} \land \text{httpv2}) \\ \land ~~ &\neg(\text{httpv1} \land \text{httpv3}) \\ \land ~~ &\neg(\text{httpv1} \land \text{httpv4}) \\ \land ~~ &\neg(\text{httpv2} \land \text{httpv3}) \\ \land ~~ &\neg(\text{httpv2} \land \text{httpv4}) \\ \land ~~ &\neg(\text{httpv3} \land \text{httpv4}) \\ \land ~~ &\neg(\text{sqlv1} \land \text{sqlv2}) \\ \end{align*}\]Running the solver on this formula:
bs: Bindings  None = solve(formula)
if bs is not None:
for k, v in sorted(bs.items(), key=lambda p: p[0]):
print(k, v)
Yields the following assignment:
Variable  Value 

appv0 
$\true$ 
httpv0 
$\false$ 
httpv1 
$\false$ 
httpv2 
$\false$ 
httpv3 
$\true$ 
httpv4 
$\false$ 
sqlv1 
$\false$ 
sqlv2 
$\true$ 
stdlibv0 
$\false$ 
stdlibv1 
$\false$ 
stdlibv2 
$\false$ 
stdlibv3 
$\false$ 
stdlibv4 
$\true$ 
threadsv0 
$\false$ 
threadsv1 
$\false$ 
threadsv2 
$\true$ 
Or, put another way:
Package  Version 

http 
3 
sql 
2 
stdlib 
4 
threads 
2 
Heuristics
One reason to roll your own SAT solver would be to introduce domainspecific heuristics. For example: we’d generally want a package manager to prefer the latest version of a package that satisfies the constraints.
We can implement this by redefining the function that chooses a variable, to pick variables representing higher version numbers first:
def any_var_latest(e: Expr) > str  None:
# Sort variables alphabetically, for determinism.
variables: list[str] = sorted(list(free(e)))
if len(variables) == 0:
return None
else:
# Return the last one, the highest version number for the
# (alphabetically) last package.
return variables[1]
And using this in the resolver:
Bindings = dict[str, bool]
def solve_latest(e: Expr) > Bindings  None:
return solver_latest(e, {})
def solver_latest(e: Expr, bs: Bindings) > Bindings  None:
free_var = any_var_latest(e)
if free_var is None:
if eval_expr(e):
return bs
else:
return None
else:
# Replace with True.
t: Expr = replace(e, free_var, True)
t_bs: Bindings = dict(bs)
t_bs[free_var] = True
# Replace with False.
f: Expr = replace(e, free_var, False)
f_bs: Bindings = dict(bs)
f_bs[free_var] = False
# Solve both branches, and return the first one that works.
return solver_latest(t, t_bs) or solver_latest(f, f_bs)
Running the same formula over this new solver yields:
Package  Version 

http 
4 
sql 
2 
stdlib 
4 
threads 
2 
Which simply upgrades http
from version 3
to 4
.
Other heuristics might include:
 Package Frequncy: pick a version of the package name which appears most often in the free variable list. This could mean the package is widely dependedupon (e.g. the standard library). If there is no assignment that satisfies the constraints, trying versions of this package will allow us to get to failure faster.
 Simultaneous Assignment: since we now, from the nature of the problem, that different versions of the same package are mutually exclusive, we can speed up the solver by assigning all variables with the same package name at once, assigning one of them to true and all the others to false.
External Solvers
The solver described in this post can be easily optimized, but implementing every last technique in the Handbook of Satisfiability is probably overkill. Depending on how portable you want to be, you might want to shell out to an external SAT solver like MiniSat or Clasp. The DIMACS format provides a standard file format for SAT solver input.
The tradeoff here is that using an external solver means you get optimizations like the DPLL algorithm and CDCL for free, at the cost of losing error reporting, because the interface to the SAT solver is very narrow and doesn’t communicate much except “here’s an assignment that works, or solving failed”. Using a homespun solver means you get to apply domainspecific heuristics and optimizations, but you have to reimplement a lot from the literature.
See Also
Footnotes

This depends on the specifics of the programming language and package manager you’re working on. Some languages are perfectly fine with different libraries depending on different versions of the same package. But it’s a recipe for subtle bugs. Generally it makes things more tractable if the set of all transitive dependencies for an application has exactly one version for each package. ↩