How Many Paths of Length K are There Between A and B? : And I Show You How Deep the Rabbit Hole Goes...

Here's a (surprisingly interesting) programming problem: Given a directed unweighted graph with V vertices and E edges, how many paths of length K are there from node A to node B? Paths may visit the same node or edge multiple times[1]. To avoid dealing with very large numbers, assume that we're computing our answer modulo a large prime.

Figure There are 2 paths of length 2 from node A to node B cluster_first Graph cluster_second Path 1 cluster_third Path 2 A1 A C1 A1->C1 D1 A1->D1 B1 B C1->B1 D1->B1 A2 A C2 A2->C2 D2 A2->D2 B2 B C2->B2 D2->B2 A3 A C3 A3->C3 D3 A3->D3 B3 B C3->B3 D3->B3

This problem is fairly standard - many of you may have seen it or heard it in an interview. Personally, I've seen this problem on Hackernews in some form at least three times, here, here, and here.

I found this comment from the last link particularly interesting.

He states of the most advanced level:

I've never seen anyone get this and I only learned about it after months of asking this question.

And that's where he left the problem. Seems like a pretty cool problem with plenty of depth, doesn't it?

As it turns out, this problem has even more depth than that Google interviewer thought. What if I told you, that drawing on concepts from coding theory, abstract algebra, and signal processing, we could solve this problem in O(EV+VlogVlogK)O(EV + V \log V \log K) time?

To my knowledge, despite being a common problem, I have not seen this faster solution presented anywhere.

Let's dive down the rabbit hole of solutions to this problem, starting from the top.

The Brute Force Solution (V5,K5V \leq 5, K\leq 5)

The most straightforward solution to this problem is to enumerate all the paths and stopping once our path reaches K nodes. We can implement it with a breadth first search, like so:

ans = 0
queue = [(A, 0)] # We track (length of walk, current node)
while not queue.empty():
    curNode, curLength = queue.front()
    if curLength == K:
        if curNode == B:
            ans += 1
    for neighbor in neighbors(curNode):
        queue.push((neighbor, curLength + 1))

However, this solution is exponential. Take this simple graph

G 0 0 0->0 1 1 0->1 1->0 1->1

Let's count the number of paths of length K from node 0 to node 1. We see that any sequence of 0s and 1s that ends at node 1 is a valid path, implying that there are 2K2^K valid paths. As this solution counts each valid path separately, it must be doing exponential work as well.

An interviewer (like that Googler) wouldn't be too impressed.

Dynamic Programming (E5000,K5000E \leq 5000, K \leq 5000)

Looking at the above solution, we notice that there is a lot of wasted work - our queue will often visit the same state many times. For example, we'll visit node B with a path of length K as many times as our answer. Noticing that the same state is visited multiple times naturally leads to a dynamic programming solution.

In this case, we choose the same state as we did in our above problem: (node, length). This time, however, we consolidate all of our redundant states.

Thus, dp[node][length] = sum(dp[neighbors(node)][length-1]).

dp[A][0] = 1
for length in 0..K:
    for node in 0..N:
        for neighbor in neighbors(node):
            dp[node][length] += dp[neighbor][length-1]

We can either do this iteratively (which allows us to reduce memory complexity by only storing one layer of DP at a time), or recursively with memoization. I've presented the iterative solution above.

The complexity of this solution is O(EK)O(EK). So far, this seems like a standard DP problem. Most interviewers would probably be satisfied with this solution.

We aren't. :^)

A Neat Trick With Adjacency Matrices (V500,K1 billionV \leq 500, K \leq 1 \text{ billion})

If there aren't many nodes but KK is extremely large, then we need to give up on the above DP approach. The naive approach above finds the answer for "how many paths of length K1K-1" before it finds the answer to "how many paths of length KK". As a result, even if we could find the answer for KK from the answer for K1K-1 in constant time, we wouldn't be able to solve this problem with the above constraints.

There's 2 different ways to proceed. The first is to note that we don't actually need the answer for K1K-1 before we can find the answer for KK. For example, if we know that that there are 3 paths of length 50 from A to C and 4 paths of length 50 from C to B, then there are 343 \cdot 4 paths of length 100 from A to B with C at the midpoint. More generally, consider any node C. The number of paths from A to B of length KK that include C at the midpoint is the number of paths from A to C of half length multiplied by the number of paths from C to B of half length. If we sum over all possible nodes for C, then we have our answer for KK.

G 12 paths from A to B of length 100 if each edge is of length 50 A A C C A->C A->C A->C B B C->B C->B C->B C->B

This allows us to remove our linear dependence on KK and transforms it into a logarithmic one, for a V3logKV^3 \log K algorithm.

Using graph theory, however, there's an even easier way to come up with this algorithm. One way to represent graphs is as an adjacency matrix AA. If one views the values in AijA_{ij} as the number of edges between ii and jj, then AijkA_{ij}^k represents the number of paths between ii and jj of length kk.

Thus, this problem has been reduced to computing AkA^k (i.e: matrix power). This is a standard problem that can be done in V3logKV^3 \log K time through binary exponentiation.

The two approaches outlined above end up being identical, but use radically different approaches. The difficulty of the first approach lies in thinking about first principles, while the difficulty of the second approach lies in abstracting the problem to matrices. However, once you've abstracted the problem appropriately, the solution becomes obvious.

Going Even Deeper... (V10000,E10000,K1 billionV \leq 10000, E \leq 10000, K \leq 1 \text{ billion})

For the vast majority of people (including the aforementioned HN commenter), their ability to solve this problem stops here. And thus far, I've covered nothing that existing articles haven't already done (see this or this).

At this point, unfortunately, it's not clear how to proceed with standard approaches. We've already used dynamic programming to reduce unneeded work, and then we even reduced the problem to matrix exponentiation. But how can we perform matrix exponentiation even faster?

One approach when we're stuck is to convert our problem into another abstraction. In this case, a concept that's closely related to matrix exponentiation is linear recurrences.

Linear Recurrences

A linear recurrence is a recurrence like: ak=3ak1+2ak2ak3a_k = 3a_{k-1} + 2a_{k-2} - a_{k-3}. One famous example of a linear recurrence is the Fibonacci series (itself an inexplicably popular interview problem)[2]. The linear recurrence for Fibonacci can be written as Ak=Ak1+Ak2A_k = A_{k-1} + A_{k-2}. The order of a linear recurrence is the number of terms it depends on. So, the first example would have order 3, and Fibonacci would have order 2.

How are linear recurrences tied to matrix exponentiation? Well, you might know that finding the KK-th Fibonacci number can be done in log(K)\log(K) time using matrix exponentiation. In fact, this is a special case of the more general algorithm to find the KK-th term of any order NN linear recurrence using matrix exponentiation in O(N3logK)O(N^3 \log K) time. This is a good resource if you're unfamiliar.

So, we can convert a linear recurrence problem to a matrix exponentiation problem. Clearly, there is some connection here. Sadly, as we currently have a matrix exponentiation problem, this doesn't immediately help. Could we be lucky enough to have a way to turn a matrix exponentiation problem into a linear recurrence problem?

Enter Cayley Hamilton

Wikipedia writes that

If A is a given n×n matrix and InI_n is the n×n identity matrix, then the characteristic polynomial of A is defined as
p(λ)=det(λInA) {\displaystyle p(\lambda )=\det(\lambda I_{n}-A)~}

Not immediately helpful (to me at least). However, several lines down we see that

The [Cayley Hamilton] theorem allows A^n to be expressed as a linear combination of the lower matrix powers of A.

In other words, we know that this equation holds true for some values of xix_i.

An=x0I+x1A+x2A2...A^n = x_0I + x_1A + x_2A^2 ...

In other words, we are guaranteed that the powers of AA form a linear recurrence! This is not obvious at all, but it does highlight some of the powers of math. Good job, Cayley Hamilton.[3]

Now that we know that matrix exponentiation problems can be converted to a linear recurrence problem, this doesn't help us unless we can calculate the k-th term of a linear recurrence faster than we compute the k-th power of a matrix. So... can we?

As you may have inferred, yes! But to do so, we must first take a small detour into polynomials and generating functions.

Polynomials and Generating Functions

Let's define a (kinda weird) function GG, which takes in a polynomial and replaces xkx^k with the kk-th term in our linear recurrence. More formally, given a polynomial f(x)=cixif(x) = \sum c_i x^i, G(f)G(f) returns ciai\sum c_i a_i.

So, for Fibonacci, G(x0)=1G(x^0)=1, G(x1)=1G(x^1) = 1, G(x2)=2G(x^2) = 2, G(x3)=3G(x^3)=3, G(x4)=5G(x^4)=5, G(x5)=8G(x^5)=8, and G(xk)=kG(x^k) = k-th Fibonacci element (ie: AkA_k). G(x2+x3)=A2+A3=5G(x^2 + x^3) = A_2 + A_3 = 5. Finally, GG is also a linear function, which means that G(f+g)=G(f)+G(g)G(f+g) = G(f) + G(g).

Some more examples:
G(x(x2+2x3))=G(x3+2x4)=A3+2A4=3+25=13G(x(x^2 + 2x^3)) = G(x^3 + 2x^4) = A_3 + 2A_4 = 3 + 2\cdot 5 = 13
G(x20+3)=A20+3A0=6765+3=6768G(x^{20} + 3) = A_{20} + 3A_0 = 6765 + 3 = 6768

If someone gave us a magical black box to evaluate GG, we could simply evaluate G(xk)G(x^k) and get our answer! Unfortunately, no such box exists (I wish). If kk was small enough, we could compute the terms up to kk ourselves. But needing to computing the terms up to kk ourselves puts us back where we started.

Another way we could evaluate G(xk)G(x^k) is to find a polynomial equivalent to G(xk)G(x^k) and evaluate that instead. And if this polynomial had low degree, then evaluating this function would be easy.

For example, this is one way to find an equivalent polynomial of lower degree:
G(x5)=G(x3+x4)=G(x3+(x2+x3))=G(2x2+3x3)=2F2+3F3=8G(x^5) = G(x^3 + x^4) = G(x^3 + (x^2 + x^3)) = G(2x^2 + 3x^3) = 2F_2 + 3F_3 = 8

Note that despite the fact that G(2x2+3x3)G(2x^2 + 3x^3) represents G(x5)G(x^5), we only needed to know Fibonacci values up to G(x3)G(x^3).

Thus, if we could easily compute a polynomial hh with low degree such that G(xk)=G(h)G(x^k) = G(h), we would be done! In order to do so, however, we need one final detour into the ominously named "annihilator" polynomials.


An annihilator is a non-zero polynomial ff such that G(f)=0G(f) = 0. On Fibonacci, for example, examples of annihilators would be x3x2x1x^3 - x^2 - x^1 or x6x5x4x^6 - x^5 - x^4. Remember that GG turns polynomial terms into Fibonacci terms. So, plugging in the last annihilator into GG provides us G(x6x5x4)    F6F5F4=0    F6=F4+F5G(x^6 - x^5 - x^4) \implies F_6 - F_5 - F_4 = 0 \implies F_6 = F_4 + F_5. In other words, the 6th Fibonacci term is equal to the 5th and 4th Fibonacci terms added together.

Note that the last statement is clearly true. After all, that's the definition of Fibonacci.

This observation leads to an easy way of generating annihilators: We just use the definition of our linear recurrence! If the n-th term of our linear recurrence is some combination of the previous terms, then the n-th term minus those previous terms is equal to 00.

For illustration, let's take the linear recurrence an=an12an2+3an3a_n = a_{n-1} - 2a_{n-2} + 3a_{n-3}. Specifically, a3=a22a1+3a0a_3 = a_2 - 2a_1 + 3a_0. This implies that a3a2+2a13a0=0a_3 - a_2 + 2a_1 - 3a_0 = 0. Thus, one annihilator is x3x2+2x3x^3 - x^2 + 2x - 3. We can repeat this process with a4a_4 to get the annihilator x4x3+2x23xx^4 - x^3 + 2x^2 - 3x or a100a_{100} to get the annihilator x100x99+2x983x97x^{100} - x^{99} + 2x^{98} - 3x^{97}. Note that we can also generate these other annihilators by multiplying the annihilator for a3a_3 by xnx^n.

Since G(fx)=G(fx2)=G(fx3)=...=0G(fx) = G(fx^2) = G(fx^3) = ...= 0, this means that G(fg)=0G(fg) = 0, where gg can be any polynomial. For example, G(f(x3+x7))=G(fx3+fx7)=G(fx3)+G(fx7)=0G(f(x^3 + x^7)) = G(fx^3 + fx^7) = G(fx^3) + G(fx^7) = 0.

Computing the K-th term of a linear recurrence with polynomials

Now, we know how to represent linear recurrences with generating functions, and we even know what an annihilator is. But how does that allow us to do anything useful?

Let's take one final digression into normal integer arithmetic. For any integer aa and bb where a<ba < b, we can write bb as da+(bmoda)d\cdot a + (b \mod a) for some integer dd. For example, for a=7a=7 and b=30b=30, we can write 30=47+(30mod7)30 = 4\cdot 7 + (30 \mod 7).

We can apply a similar concept to polynomials [4]. For any polynomial aa and bb where aa has lower degree than bb, we can write bb as da+(bmoda)d\cdot a + (b\mod a), where dd is some polynomial.

Now, this is where it all comes together. Let's plug in a=fa = f (our annihilator) and b=xkb = x^k (the term we're looking for).

xk=df+(xkmodf)x^k = d\cdot f + (x^k \mod f)

We don't know what dd is, but remember that ff is an annihilator. Thus, no matter what dd is, G(df)=0G(d\cdot f) = 0 and we can ignore it! This also implies that G(xk)=G(xkmodf)G(x^k) = G(x^k \mod f), and since xkmodfx^k \mod f is a polynomial with low degree, we can evaluate G(xkmodf)G(x^k \mod f) easily!

Calculating xkmodfx^k \mod f can be done with logk\log k polynomial multiplication and polynomial modulo. Using the Fast Fourier Transform, we can do each of these in nlognn \log n time.[5]

Now that we've computed xkmodfx^k \mod f, which has low degree, we can easily evaluate G(xkmodf)G(x^k \mod f).

Thus, we can find the KK-th term of an order NN linear recurrence in NlogNlogKN \log N \log K time.

Recovering the Linear Recurrence with Berlekamp-Massey

As this point, an astute reader might point out that although, due to Cayley Hamilton, we know that a linear recurrence exists that corresponds to our problems, and that, due to polynomials and annihilators, we know how to compute the K-th term of a linear recurrence quickly, we don't know how to actually recover the sequence. Enter Berlekamp-Massey.

Wikipedia states that:

The Berlekamp–Massey algorithm is an algorithm that will find the shortest linear feedback shift register (LFSR) for a given binary output sequence.

I couldn't tell you what a Linear Feedback Shift Register is, but I can tell you what Berlekamp Massey does. Given a sequence of terms, it finds the shortest linear recurrence that fits all of the terms. If you pass in 2L2L terms, it's guaranteed that the shortest linear recurrence is of order L\leq L. For example, if you pass 1,1,2,31, 1, 2, 3 it will provide you the Fibonacci sequence. And it'll do it in N2N^2 time! It's basically magic.

There's only one last wrinkle - Berlekamp Massey doesn't operate on matrices. However, note that Cayley-Hamilton doesn't merely imply that the matrices satisfy some linear recurrence, it also implies that each individual element satisfies a linear recurrence. [6] For example, A0,0n=x0+x1A0,0+x2A0,02...A^n_{0,0} = x_0 + x_1A_{0,0} + x_2A^2_{0,0}...

Thus, if we're trying to count the number of paths from node A to node B, we can simply pass Ia,b,Aa,b,Aa,b2...I_{a,b}, A_{a,b},A^2_{a,b}... to Berlekamp-Massey. As our linear recurrence is order NN, we need to pass in the first 2N2N terms of AiA^i.

As our DP approach lets us find all the values of AiA^i up to AkA^k in EKEK time, this allows us to compute all of the necessary terms in EVEV time and the corresponding linear recurrence in V2V^2 time.

And that's it! Finally, we can compute the number of paths of length K from node A to node B!

Pseudocode for this entire process.

Compute dp matrix
initial_terms = [dp[i][B] for i in 0..2*N]
linear_recurrence = berlekamp_massey(initial_terms)
ans = get_kth_term(linear_recurrence, initial_terms, K)

Code for this last approach can be found here. The code templates are taken from the amazing ICPC template library KACTL (which deserves a whole separate article).

Concluding Thoughts

Hope you found this a neat rabbit hole to dive into! This problem is one that many may have encountered, whether in class or interviews. Although many of you may have written it off as a basic DP problem (or perhaps after encountering the adjacency matrix interpretation), there's actually a lot more depth to this problem than you may have expected.

Besides the problem-specific lessons, this problem also illustrates a lesson on how math can become quite helpful even on problems where math didn't originally seem relevant. At the beginning, this problem seemed like your typical interview fare (zzz), but as we kept pushing the limits of this problem we ended up needing more and more math. In the end, we needed to start citing cutting-edge research papers (See Further Extensions).


I'd like to thank the competitive programming discord AC, and in particular pajenegod, who both came up with this technique and explained it to me! I'd also like to thank the competitive programming community on Codeforces.

Further Extensions

Q: What if you wanted all the terms of AkA^k, instead of just one?

Our approach currently only provides us one of the N2N^2 terms. Naively, we could try using the linear recurrence we found for a single element and use it for all of the other terms. Unfortunately, this doesn't work. The most obvious counterexample is when the graph is disconnected. Obviously, the paths in one component provides no information about paths in a different connected component. However, even if we restrict ourselves to the case where the graph is connected, it's not guaranteed that a linear recurrence found for one element in AA will generalize to the whole matrix. For example, take this undirected graph

%0 0 0 1 1 0--1 2 2 0--2

The number of paths from 0->1 follow the sequence 0,1,0,2,0,4..., a linear recurrence of order 2, while the number of paths from 2 -> 1 follow the sequence 0,0,1,0,2,0,4..., a linear recurrence of order 3.

One intuition for why this doesn't work is that information about a single node isn't enough for the whole graph. Information might not reach that node or it might be cancelled out. One thing we can try is to find the linear recurrence of aA0,0n+bA1,1n=(aI0,0+bI1,1)+(aA0,0+bA1,1)...a\cdot A^n_{0,0} + b\cdot A^n_{1,1} = (aI_{0,0} + bI_{1,1}) + (aA_{0,0} + bA_{1,1}).... In other words, we take a random linear combination of the elements of our matrix, and hope that this allows us to recover the linear recurrence for the whole matrix. But how well does this work?

As it turns out, the procedure we use here is actually very similar to something called the Coppersmith algorithm. Corollary 22 of this paper states that the procedure outlined above succeeds with probability (11/q)2n(1-1/q)^{2n}, where qq is the cardinality of our finite field (how big our mod is). Thus, with high probability our method will recover the linear recurrence for the whole graph!

Thus, we can actually use this method to find mm different values of AkA^k (assuming that qnq \gg n) in V3+V^3 +m \cdot V \log V \log K$.

Q: Why don't we just diagonalize the matrix and compute AkA^k that way?

If the matrix was diagonalizable (e.g: undirected graph), and we weren't computing over a finite field, we could - if we didn't care about floating point precision. We can't solve this problem by diagonalizing your matrix over a finite field either. Symmetric matrices are not guaranteed to be diagonalizable over finite fields.

For directed graphs, we could also compute the Jordan Normal Form, but we run into the same issues.

  1. Some computer scientists will tell you that the correct terminology should be walk, but I suspect most programmers are more familiar with the term "path". Solving this problem for actual (simple) paths is NP-Complete, as setting K=NK=N reduces this problem to Hamiltonian Path. ↩︎

  2. My suspicion is that programmers love to pretend they're mathematicians. ↩︎

  3. I suppose this is the right place to confess that when Cayley Hamilton was first presented in my math classes, I did not understand my textbook's enthusiasm about the beauty of the Cayley Hamilton theorem. However, I found this application much more convincing. ↩︎

  4. This is a general property of Euclidean rings. ↩︎

  5. If you don't understand how to use FFT to compute polynomial multiplication in NlogNN \log N time, I highly recommend learning about it here. Not only is it a very cool concept that'll teach you things about polynomials and complex numbers, it will also teach you about the difficulty of writing efficient array code when your naive FFT runs 3 orders of magnitude slower than a good one, despite having the same complexity. ↩︎

  6. Two matrices are equal iff each element is equal. For example, if you have A=B+CA = B + C, then you know that A0,0=B0,0+C0,0A_{0,0} = B_{0,0} + C_{0,0}. ↩︎