For every quantum walk there is a (classical) lifted Markov chain with faster mixing time

PDF

Quantum walks on graphs have been shown in certain cases to mix quadratically faster than their classical counterparts. Lifted Markov chains, consisting of a Markov chain on an extended state space which is projected back down to the original state space, also show considerable speedups in mixing time. Here, we construct a lifted Markov chain on a graph with $n^2 D(G)$ vertices that mixes exactly to the average mixing distribution of a quantum walk on the graph $G$ with $n$ vertices, where $D(G)$ is the diameter of $G$. Moreover, the mixing time of this chain is $D(G)$ timesteps, and we prove that computing the transition probabilities for the lifted chain takes time polynomial in $n$. As an immediate consequence, for every quantum walk there is a lifted Markov chain with a faster mixing time that is polynomial-time computable, as the quantum mixing time is trivially lower bounded by the graph diameter. The result is based on a lifting presented by Apers, Ticozzi and Sarlette (arXiv:1705.08253).
Submitted 6 Dec 2017 to Quantum Physics [quant-ph]
Published 7 Dec 2017
Updated 21 Mar 2018
Author comments: 19 pages, 4 figures. Significant update from v1 -- smaller lifting and complexity considerations
http://arxiv.org/abs/1712.02318
http://arxiv.org/pdf/1712.02318.pdf

6 comments

Simone Severini Dec 07 2017 02:51 UTC (1 points)

Closely related to

Simon Apers, Alain Sarlette, Francesco Ticozzi, Simulation of Quantum Walks and Fast Mixing with Classical Processes, https://scirate.com/arxiv/1712.01609

In my opinion, lifting is a good opportunity to put on a rigorous footing the relationship between classical and quantum discrete dynamics on graphs.

Simon Apers Dec 09 2017 07:54 UTC

Thanks for the comment, Simone. A couple of observations:

- We noticed that Danial's result can in fact be proved more directly using the theorem that is used from ([arXiv:1705.08253][1]): by choosing the quantum walk Cesaro average as the goal distribution, it can be attained with a lifted Markov chain in a number of steps equal to the diameter.
- In the more recent arxiv paper ([arXiv:1712.01609][2]) that you mention (remarkable that it came out a day before without coordination!), we strengthen theorem 2 of ([arXiv:1705.08253][1]), the same that is used by Danial, to show that a lifted Markov chain can simulate a quantum walk for an arbitrary initial state, as opposed to a fixed one, and mix as fast for arbitrary/unbounded time intervals. This in turn leads to a new conductance-type bound for quantum walks, and in fact any local stochastic process.

- Also, we provide an efficient construction of the stochastic bridges using a hidden variables result by Aaronson, answering the question that Danial raises towards the end of his paper.

Let us keep in touch about this common topic :-)

[1]: https://arxiv.org/abs/1705.08253
[2]: https://arxiv.org/abs/1712.01609
[3]: https://arxiv.org/abs/1705.08253

Edited Dec 09 2017 07:58 UTC by Simon Apers

Danial Dervovic in reply to Simon Apers Dec 10 2017 15:25 UTC

Thank you for the insightful observations, Simon.

In response to the first point, there is a very short comment in the Discussion section to this effect. I felt an explicit dependence on $T$ as opposed to the diameter would make the implications of the result more clear. Namely, lifting can mix precisely in the same time as quantum walks.

Points 2 and 3 are very exciting results! I look forward to digesting them in more detail over the coming days, in particular the efficient construction of the stochastic bridge.

In any case, further discussion on this topic would be great :-)

Māris Ozols Feb 01 2018 17:53 UTC (1 points)

This paper considers the problem of using "lifted" Markov chains to simulate the mixing of coined quantum walks. The Markov chain has to approximately (in the total variational distance) sample from the distribution obtained by running the quantum walk for a randomly chosen time $t \in [0,T]$ followed by a standard basis measurement, where $T$ is sufficiently large for the quantum walk to mix well ($T$ should be at least the diameter of the underlying graph). The Markov chain is allowed to have more states than the quantum walk (in fact, the number of states can even scale in $T$). The state space of such *lifted* Markov chain is simply a product of the original state space of the quantum walk and an additional work space register which is discarded at the end of the walk.

The main result is a construction of a lifted Markov chain with $n^2 T^3$ states that can simulate an $n$-state quantum walk for time $T$. The main technical ingredient in the proof is a certain lifting $c_\gamma$ (see Def 2) that produces a Markov chain (similar to one used in the Metropolis algorithm) whose stationary distribution approximates well the distribution sampled by the quantum walk. It is not clear how long it takes for this Markov chain to mix, so another lifting is applied from [ATS17] which speeds up the mixing to be at most the diameter of the underlying graph.

I think the result is conceptually interesting because it shows that classical Markov chains can in some sense reproduce the statistics of quantum walks. However, there are two downsides. First, the size of the classical Markov chain scales in the running time $T$ of the quantum walk. Second, this is not a clear-cut simulation result because it only shows the *existence* of a classical Markov chain but it does not tell how hard it is to construct and implement the chain (say, to compute the actual transition probabilities). My understanding is that an infinite amount of pre-computation is needed to produce the desired Markov chain in the first place.

These caveats make the result less surprising to me. They also make me wonder about the optimality of the construction and whether the same result can be obtained by simpler means. I have two questions on this:

**Question 1**

Is there any lower bound on the minimal possible number of states for a lifted Markov chain with the desired properties? E.g., can it be shown that removing the $T$ dependence is impossible? This would give a better idea on where this result stands in the spectrum of what is possible. Also, I wonder if Theorem 1 can be improved in the sense of making the number of states smaller than $n^2 T^3$? E.g., can one design a single lifting from scratch rather than obtain it by composing two other liftings?

**Question 2**

Your construction is somewhat complicated. I wonder what happens if you instead try to prove the result along the following lines (this is inspired by how one shows that probabilistic as well as quantum finite state automata can be simulated by deterministic ones). Let the lifted Markov chain act on $H = C \times V(G) \times A$ where $C$ is the coin register and $A$ is a discretized set of amplitudes, i.e., complex numbers of absolute value at most 1 (you simply store the real and imaginary parts up to $k$ bits of precision). If $U$ denotes one step of the quantum walk, you can approximate its action by a deterministic map on $H$. Besides, finding this approximation is computationally cheap (you basically just need to round the matrix entries of $U$). If this approximation incurs error $\delta$ then $U^T$ will incur error $T \delta$, so you take $\delta = O(1/T)$. In other words, you choose the bit precision $k$ so that it scales in $T$. The size of the lifted chain then will also be some function of $n$ and $T$. I wonder what scaling does this simple construction gives and how does it compare to your $n^2 T^3$.

One issue with this approach though is that I'm not sure how the initial probability distribution should be chosen because this is more like a deterministic simulation (or "strong simulation" in the language of quantum circuits) where one actually computes the output probabilities rather than samples from the same distribution as the quantum walk.

----------

##Comments##

**From $j$ to $i$**

According to your conventions, $P$ is column-stochastic, probability distributions are updated according to Eq. (3), and $P_{ij}$ is the probability of transition from $j$ to $i$, see Eq. (1). This means you should always write $(j,i) \in E$ instead of $(i,j) \in E$ (in particular, you should change this at the bottom of page 3 and in Eq. (2)). Also, at the bottom of page 3 it should be "transition probability from vertex $j$ to vertex $i$".

**Definition 1**

Condition 1 is not a property of a lifted Markov chain but rather part of the setup. I would suggest adding "Let $G^c$ be another graph and $c$ a homomorphism from $G^c$ to $G$." as the second sentence and removing condition 1. Otherwise it is not clear what $c$ and $G^c$ stand for in the next sentence.

Condition 2 is not correct as written because $P$ is not a square matrix acting on $V(G)$. Instead it is a map from $V(G^c)$ to $V(G)$. I believe, what you mean is

$$P C = C P^c.$$

Or, if you draw it as a diagram with maps $P^c$ and $P$ acting horizontally and $C$ vertically, using the category theory jargon one could say that "the diagram commutes".

You should not say "*the* lifted Markov chain" because it is not unique (as you later say, there are many ways of lifting a Markov chain, even when both graphs and the homomorphism are fixed). Just say "a lifted Markov chain". Btw, I've also seen people using the term "coarse-grained Markov chain" when referring to the original chain with respect to the lifted one.

**Section 2.4: Mixing Times**

You seem to be mixing up two claims:

1. $P \pi = \pi$ and
2. $\lim_{t \to \infty} P^t p^{(0)} = \pi$.

It is the second claim that is known as the convergence theorem, not the first. The first one is a consequence of the Perron-Frobenius theorem applied to ergodic Markov chains (ergodic = irreducible + aperiodic).

You later say that $\pi$ is independent of $p^{(0)}$, but this is self-evident since you defined $\pi$ as an eigenvalue-1 eigenvector of $P$ (claim 1 above) and this clearly has nothing to do with $p^{(0)}$. To clear this up, I suggest you also include claim 2 and refer to appropriate claims afterwards (including the proof of Lemma 3).

**Inequality $\mathcal{M}^c \leq \mathcal{M}$**

You don't specify the set $\mathcal{P}^c$ of allowed starting distributions of the lifted chain in Eq. (12). I assume $\mathcal{P}$ and $\mathcal{P}^c$ simply consist of all standard basis vectors. In particular, $\mathcal{P}$ should not be artificially restricted compared to $\mathcal{P}^c$ otherwise you won't get the inequality.

I really don't like the term "marginal mixing time", especially when you say that the chain "marginally mixes" (these terms are used throughout the paper). This is very confusing because "marginally" also means "a little bit". Can you say instead "mixing time of the marginal" and "the marginal of the chain mixes" or something like that? You can also refer to the "marginal chain" as "projection" or "coarse-graining".

Is the mixing time inequality $\mathcal{M}^c \leq \mathcal{M}$ supposed to be obvious? It is not obvious to me because of the max and min in the definition. In fact, to me both quantities seem equal because

$$C (P^c)^t p^{c(0)} = P^t C p^{c(0)} = P^t p^{(0)}$$

and if you optimize in both cases only over starting distributions that are standard basis vectors, you should end up with the same quantity. Am I missing something?

**Theorem 1**

In the theorem statement I'm not sure you need to go into details about what the inner workings of the lifting are. Writing $c_\gamma \circ d_\delta$ makes one only wonder what these are and how does $\gamma$ and $\delta$ relate to $\epsilon$. You should either spell these things out or just denote the composed lifting by some other letter and thus simplify the statement.

**Minor things**

- page 2, Preliminaries: Add something like "For a set $S$, we will write $2^S$ to denote the set of all subsets of $S$." You're using this notation on page 4 without explanation.
- page 5, below Eq. (6): The shift operator $S$ acts on vertices, not edges, so it is better to say "$j$th neighbor" instead of "$j$th arc". Also, coined quantum walks and the shift operator have been considered much before [GZ17].
- page 5, footnote: Capitalize the first word.
- page 5, below Eq. (8): You can add somewhere "in the worst case" to account for the maximization over the input distribution.
- page 7, top: Is it obvious that the minimizing set should always be small?
- page 7, below Observation 1: It would be better to say "Markov chain $M_G^c$ lifted from $M_G$" otherwise it seems that the lifted chain acts on $M_G$. The same applies to Corollary 1 on page 19.
- page 7, Eq. (13): Can you write $\Phi(P)$ to make it clear which chain's conductance you're talking about. Same applies to the equation in the following sentence.
- page 8, Section 3: Strictly speaking here [n] stands for {0,...,n-1} because you work mod n.
- page 8, Section 3: "mixing time of this Markov chain is [given by a] quadratic in $n$".
- page 8, bottom: it should be $j$ instead of $k$ in the edge set definition.
- page 9, Fig 1: The +/- signs should come first in each tuple.
- page 12, Eq. (23): You use the same notation for the all-ones vector and an indicator function. A Kronecker delta might be more appropriate for an indicator function.
- page 12, Eq. (25): For completeness, you can add that $\pi^{c_\gamma}$ denotes the stationary distribution of $P^{c_\gamma}$.
- page 13, Fig 2: The transition probabilities between $x$ and $y$ are missing the 1 in the min, i.e., it should be $\min$ {..., 1}.
- page 14, Lemma 2: "$\pi$ is a stationary *distribution*".
- page 17, Section 4.6: The order of the two liftings can be easily confused. You say that you apply the $d_\delta$ lifting to the $c_\gamma$ lifting, whereas you denote the composed lifting by $c_\gamma \circ d_\delta$. After being confused for a while I realized that homomorphisms go in the reverse direction and thus compose in the opposite order. It would be helpful if you could include a remark clarifying this.
- page 17, bottom: that that
- page 19, top: Your induced $\ell_1$ matrix norm can be confused with Schatten 1-norm.
- page 19, Eq. (30): Use $\texttt{\cdot}$ instead of colon.
- page 19, above Cor 1: $D(G^{c_\gamma})$ is not a mixing time but a lower bound on it.
- page 19, Section 5: Write "$\texttt{i.e.\ bla}$" to get the right spacing after the period.
- page 19, Section 5: You say "polynomially larger", which implicitly assumes $T$ is polynomial in $n$. I'm not sure if this is generally the case.
- page 19, Section 5: You say "arbitrary quantum walk" while your paper considers only coined walks. Not all quantum walks have a coin (e.g., see Szegedy's walk which takes place on the edges of the graph). Is it important that your walk is coined or can your construction be applied to more general quantum walks (i.e., any unitary operator $U$ that somehow respects the locality of the graph $G$)?
- page 20, [AST17]: Should this reference be updated to arXiv:1712.01609 ?

Danial Dervovic Feb 05 2018 15:03 UTC

Thank you Māris for the extremely well thought-out and articulated points here.

I think this very clearly highlights the need to think explicitly about the precompute time if using the lifting to directly simulate the quantum walk, amongst other things.

I wish to give a well-considered response so I will comment again once I have collected my thoughts more clearly.

Danial Dervovic Mar 01 2018 12:08 UTC

Hello again Māris, many thanks for your patience. Your comments and questions have given me much food for thought, and scope for an amended version of the paper -- please see my responses below.

Please if any of the authors of [AST17 [arXiv:1712.01609](https://arxiv.org/abs/1712.01609)] have any further insight (@Simon?) I would be most grateful.

# Summary

I will briefly summarise the main points of reply here, going into more detail below.

**Question 1.** We can make a lifting with only $n^2 D(G)$ states by applying the $d_\delta$-lifting to $G$ with the quantum average mixing distribution as the imposed stationary distribution, as per Simon's earlier comment. This removes the $T$ factor in the number of states.
In fact, we can use the *average mixing matrix* of Godsil for discrete-time walks (see [[arXiv:1701.04474]](https://arxiv.org/pdf/1701.04474.pdf)) in this way to sample directly from the infinite time average mixing distribution, as opposed to a chosen finite $T$.

In terms of a lower bound on the number of states required, for a particular $G$ under the $d_\delta$-lifting one can prune some vertices from the lifted state space (Intuitively, I feel you can get one of the $n$ factors down to $\log n$ -- I can't prove this though). I can't immediately see how to go about proving a general lower bound.

**Question 2.** By my calculation the construction you describe has $O(n^2 T)$ states, (asymptotically) fewer than the lifted Markov chain in the paper, but (assuming $T \geq D(G)$ as in the paper) more than the lifting described above. As you say, the scheme described accomplishes a different task than the lifted chain, i.e. computes the transition probabilities (strong simulation) as opposed to sampling from the distribution. One could use Metropolis sampling here but then we run into the issue that the $d_\delta$-lifting was originally brought in to solve.

**Complexity of computing transition probabilities.** As opposed to the statement in the paper saying that computing the probabilities is intractable, it is polynomial by my calculations -- the runtime of computing the transition probabilities of a general $d_\delta$-lifting is $O(n^4 D(G))$, applying this to a coined walk using the $(c_\gamma \circ d_\delta)$-lifting I get $O(n^6 + n^4 T^5)$. This runtime can even be improved to $O((nd)^4)$ for a coined walk on a $d$-regular graph, by not using a composition of liftings and going direct to the $d_\delta$-lifting with the infinite-time quantum average mixing distribution, as described above.

**General quantum walks.** This simulation method will work for *any* unitary $U$ that is repeatedly applied and respects the structure of the graph $G$, i.e., a *general quantum walk* as defined at the end of section 2.3 -- computing the transition probabilities then requires $O(n^8)$ time.

----

## Question 1.

I have said as much as I can already about the lower bound. For the complexity of computing the transition probabilities, see below.

------

## Question 2

If I understand correctly, the register $A$ will require $\Theta(2^k)$ states, since we are storing all complex amplitudes with absolute value at most $1$ to $k$ bits of precision as an individual state. We need $k = \Theta(\log \delta)$ bit precision for the computation to be accurate to error $\delta$. With $\delta = O(1/T)$ this gives us that the register $A$ is of size $O(T)$ and the state space $H$ has $O(n^2 T)$ states (if we use $d \leq n$ for the size of register $C$).

That means that the described construction has asymptotically fewer states than the construction described in the paper. If we assume the deterministic map on $H$ is efficient we have a potentially more efficient method of computing the average mixing distribution. We also can do this by direct diagonalisation using the average mixing matrix in time $O((nd)^4) = O(n^8)$ (see [[arXiv:1701.04474](https://arxiv.org/pdf/1701.04474.pdf), Section 7]). In either case we have a scenario where we don't have a sampling method, but can compute probabilities -- a 'strong simulation' in the language of quantum circuits -- this still performs a qualitatively different task to what we are trying to do.

----

## Complexity of computing transition probabilities.

We can break this computation down into a number of stages:

1. Compute the stationary distribution probabilities for the $c_\gamma$-lifting, $[\pi^{c_\gamma}]_{x}$ for $x \in V(G \boxtimes P_T)$.

2. Using the stationary distribution probabilities, compute the $c_\gamma$-lifting transition probabilities $[P^{c_\gamma}]_{{y, x}}$ for $x, y \in V(G \boxtimes P_T)$.

3. For every vertex $x \in V(G \boxtimes P_T)$, produce the $d_\delta$-lifting transition probabilities. This amounts to computing the elements of the stochastic bridge matrices $\{P^{c_\gamma}(1), P^{c_\gamma}(2), \ldots, P^{c_\gamma}(T)\}$.
The stochastic bridge must satisfy
$P^{c_{\gamma}}(T) P^{c_{\gamma}}(T−1) \, \cdots \, P^{c_{\gamma}}(2) P^{c_{\gamma}}(1) e_x = \tilde{\pi}^{c_{\gamma}},$

where $\tilde{\pi}^{c_{\gamma}}$ is $\delta$-close to $\pi^{c_{\gamma}}$ in $\ell_1$-distance.

### Stochastic Bridge

Let us first try and come up with the explicit computation of the stochastic bridge applied to a general Markov chain on a graph $G$. We turn to Lemma 5 in [AST17 [arXiv:1712.01609](https://arxiv.org/abs/1712.01609)] where they prove existence of such a bridge, taking inspiration from Aaronson [[10.1103/PhysRevA.71.032325](https://journals.aps.org/pra/abstract/10.1103/PhysRevA.71.032325)]. Their proof involves showing that the transition probability matrix $P(t)$ between 'times' $t$ and $t+1$ are given by the solution of a maximum flow problem. One then solves this maximum flow problem for each $t \in (0,1,\ldots, D(G)-1)$ to obtain the transition probabilities $P(1), P(2), \ldots P(D(G))$.

This proof is lacking one ingredient to be completely constructive, a 'schedule' of flows for the edges adjacent to the source and sink vertices at a given pair of times $(t,t+1)$. To use notation of [Lemma 5, AST17], we need to set a schedule of $p_t$, i.e. $\{ p_0^{(i)}, p_1^{(i)}, \ldots, p_{D(G)}^{(i)} \}$
for a given vertex $i$.

For a given vertex $i$, we can find the stochastic bridge taking $p_0 = e_i$ to $p_{D(G)} = \pi$ as follows:

Find a spanning tree $T_i$ of $G$ rooted at vertex $i$, using breadth-first search. We shall now modify $T_i$ using the following procedure: walk through $T_i$ using breadth-first search. Every time you reach any vertex $j$ that has already been visited, append a new child vertex also labelled $j$. We call this modified graph $T'_{i}$ ; it is still a tree. Moreover, we denote the $t^{\text{th}}$ *level* of $T_i'$, $\ell_t(T_i')$, the set of vertices (in $V(G)$) at distance $t$ from the root $i$ in $T_i'$.

As an example, consider the graph $G$ below, with the spanning tree $T_1$ (rooted at vertex 1) and the related tree $T'_{1}$.

![Example G, T_i, Tprime_i](https://gist.githubusercontent.com/ddervs/41f89b924ecbc52b75a088fe3a2bc22e/raw/d165a3821e57f930ae5c9a986e139e4078e4f290/spanning_trees.png)

Also, let $\mathcal{D}(j, T)$ be the set of descendent leaves of the vertex $j$ in a tree $T$.
We then set our 'bridge schedule' as follows:

$[p_t^{(i)}]$$_j$
$=\sum_{k \in \mathcal{D}(j, T_i')}[\pi]_{k}, \ \ j \in \ell_t(T_i');$

$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0, \ \ \ \ \ \text{otherwise}.$

This schedule effectively routes probability mass through the lifted graph at each timestep. We then solve each max-flow problem with ([Lemma 5, AST17] notation) $y = p_t^{(i)}$, $z = p_{t+1}^{(i)}$ for each $t \in \{0,1,\ldots, D(G)-1\}$ to get the transition probabilities.

Notice here, it is possible 'prune' the vertices in the $d_\delta$-lifted state space that do not occur at the $t^{\text{th}}$ level of each $T_i'$, that is, vertices for which $[p_t^{(i)}]$$_j$$= 0$. To avoid complications I shan't take this into account in the analysis that follows.

For the full $d_\delta$-lifting, we construct a stochastic bridge for each vertex $i \in V(G)$ and take their disjoint union. For each bridge the 'final' transition matrix is the original transition matrix $P$; the end result is for all times $t\geq D(G)$ the state $p^{(t)} = \pi$ ; iterating $P$ leaves this distribution unchanged.

Having completed explicit construction for the $d_\delta$-lifting, we can evaluate the complexity of constructing a $d_\delta$-lifting of a Markov chain on a graph $G$ with $n$ vertices.
We require $n$ stochastic bridges, each containing $D(G)$ $n\times n$ transition matrices. Thus, this lifting requires storing $O(n^3 D(G))$ reals. Solving for each transition matrix requires solving a max-flow problem on $2n$ vertices, where certain flows are given by 'schedule' probabilities.
Computing the schedule probabilities $[p_t^{(i)}]$$_j$ involves finding $n$ spanning trees, walking through them, appending vertices, then for each $t\in [D(G) -1]$ summing up the values of the children. The complexity of this task is washed out by the complexity of solving the max-flow problems.
Maximum-flow problems can be solved in time $O(|V|^3)$ (see [Malhotra, Pramodh Kumar, Maheshwari, [https://doi.org/10.1016/0020-0190(78)90016-9](https://www.sciencedirect.com/science/article/pii/0020019078900169?via%3Dihub)), so the runtime for computing the $d_\delta$-lifting on a graph $G$ with $n$ vertices is $O(n^4 D(G))$ as we solve $nD(G)$ max-flow problems on graphs with $2n$ vertices.

### Transition probabilities

Let's consider the runtime complexity of computing the $(c_\gamma \circ d_\delta)$-lifting transition probabilities.

First we need to compute the $[\pi^{c_\gamma}]_{x}$ for $x \in V(G \boxtimes P_T)$.

$[\pi^{c_\gamma}]$$_x = (1-\gamma)\frac{Q_{t_x}(k_x | \psi(0) )}{T} + \frac{\gamma}{nT}$

where $x = (k_x, t_x)$ for $k_x \in V(G)$, $t_x \in [T]$ and $\gamma > 0$ is a small positive constant we may choose.

For a coined walk this is $[\pi^{c_\gamma}]$$_x = (1-\gamma)\frac{1}{T}\sum_{j \in [d]}| \langle \psi(0)|
U^{t_x} |j,k_x \rangle |^2 + \frac{\gamma}{nT}$.

Since $|\psi(0)\rangle = \sum_{k\in V(G),\ j\in[d]} \alpha_{k,j} |j,k\rangle$, we are going to need the values

$$|\langle j', k'| U^t | j, k \rangle |^2\ \ \ \text{for all}\ \ \ \ k,k'\in V(G),\ j,j' \in [d],\ t\in [T - 1].$$

To compute these values we need to diagonalise $U$, at a cost of $O((nd)^3)$, then we compute the coefficients at a cost of $O(ndT)$.

Summing over the coin basis for the different $[\pi^{c_\gamma}]$$_{x}$ incurs an additional $O(d)$ overhead, so the complexity of computing the stationary probabilities is $O(n^3 d^3 + n d^2 T)$.

Computing the Metropolis transition probabilities once we have the stationary distribution gives us an (additive) complexity of $O((nT)^2)$.

The primary cost will come from the stochastic bridge, which will cost $O(n^4 T^5)$. Now, in the worst case $d=O(n)$ and so the total runtime complexity for simulating the walk will be $O(n^6 + n^4T^5)$. The memory complexity essentially comes from storing the lifted transition probabilities, i.e. the number of states squared $O((n^2 T^3)^2)$ (note in practise this transition matrix is very sparse so can be made smaller).

### Simpler lifting

Now let us consider the case where we apply the $d_\delta$-lifting directly to the graph $G$, where we impose the infinite-time average-mixing distribution of the given quantum walk as the stationary distribution (see Simon's earlier comment, first bullet point). More specifically, we want

$[\pi]$$_k = \pi^q(k | \psi(0) ) = \lim_{T \to \infty} \frac{1}{T} \sum^{T-1}_{t=0} Q_t(k | \psi(0) ).$

From [[arXiv:1701.04474]](https://arxiv.org/pdf/1701.04474.pdf), we have that

$\pi^q(k | \psi(0) ) = \sum_r \langle \psi(0) | F_r^\dagger D_k F_r |\psi(0)\rangle ,$

where the $F_r$ are the idempotents in the spectral decomposition of the walk matrix $U$ and $D_k \in \mathbb{R}^{dn \times dn}$ is the diagonal matrix with a $1$ in positions corresponding to vertex $k$.

Computing the spectral decomposition takes time $O((dn)^3)$. Each term $F_r^\dagger D_k F_r$ is $O((nd)^2 \cdot d)$ since $D_k$ has only $d$ non-zero terms. Taking $\langle \psi(0) | \cdots |\psi(0)\rangle$ is an additive $O(d)$ factor. We then have that $r$ can range up to $O(nd)$ and we perform the sum for $k \in [n]$ for a total $O((nd)^4)$. Now, taking $d= O(n)$ we have that computing $\pi^q$ requires runtime $O(n^8)$.

Computing the $d_\delta$-lifting takes time $O(n^4 D(G))$ and memory $O(n^3 D(G))$, so the runtime complexity here is dominated by the $\pi^q$ computation.

This gives us the new result:

**Result.** Let $(U, |\psi \rangle )$ be a general quantum walk on a graph $G$.
Then there exists a lifted Markov chain on $n^2 D(G)$ vertices that mixes to the quantum average mixing distribution $\pi^q( \cdot | \psi(0))$ after $D(G)$ timesteps, where $D(G)$ is the diameter of the graph. Computing the transition probabilities for the lifted Markov chain requires $O(n^8)$ time and $O(n^4)$ memory.

For a general quantum walk, the $O(nd)$ factors become $m = O(n^2)$, where $m$ is the number of arcs in the graph under consideration.

## General Quantum walks

The above new result applies to a general quantum walk, as defined at the end of section 2.3 in the paper.

----

## Comments

Thank you Māris for pointing out inconsistencies and places I have been sloppy -- I agree with all of your comments. I have one clarification to make:

**Inequality $\mathcal{M}^c \leq \mathcal{M}$**

I wasn't clear in the paper where this inequality comes from. You rightly said that if we take $\mathcal{P}^c$ to be all basis states, then we have $\mathcal{M}^c = \mathcal{M}$. The inequality comes from the fact that we are allowed to choose $\mathcal{P}^c$ to effectively 'prune' the bad starting states away from the maximisation and achieve a superior mixing time.

One of the main results in [ATS17, [arXiv:1705.08253](https://arxiv.org/abs/1705.08253)] is the fact that we can only have diameter-time mixing if we are allowed to choose a mapping from the initial state on the coarse-grained Markov chain to an initial state on the lifted chain. The set $\mathcal{P}^c$ is then the image of $\mathcal{P}$ under this mapping, giving the inequality. This should have been included in the paper.

Edited Mar 01 2018 12:41 UTC by Danial Dervovic