Revision history for comment 1053

Edited by Danial Dervovic Mar 01 2018 12:41 UTC

Hello again Māris, many thanks for your patience. Your comments and questions have given me much food for thought, and scope for an amended version of the paper -- please see my responses below.

Please if any of the authors of [AST17 [arXiv:1712.01609](https://arxiv.org/abs/1712.01609)] have any further insight (@Simon?) I would be most grateful.

# Summary

I will briefly summarise the main points of reply here, going into more detail below.

**Question 1.** We can make a lifting with only $n^2 D(G)$ states by applying the $d_\delta$-lifting to $G$ with the quantum average mixing distribution as the imposed stationary distribution, as per Simon's earlier comment. This removes the $T$ factor in the number of states.
In fact, we can use the *average mixing matrix* of Godsil for discrete-time walks (see [[arXiv:1701.04474]](https://arxiv.org/pdf/1701.04474.pdf)) in this way to sample directly from the infinite time average mixing distribution, as opposed to a chosen finite $T$.

In terms of a lower bound on the number of states required, for a particular $G$ under the $d_\delta$-lifting one can prune some vertices from the lifted state space (Intuitively, I feel you can get one of the $n$ factors down to $\log n$ -- I can't prove this though). I can't immediately see how to go about proving a general lower bound.

**Question 2.** By my calculation the construction you describe has $O(n^2 T)$ states, (asymptotically) fewer than the lifted Markov chain in the paper, but (assuming $T \geq D(G)$ as in the paper) more than the lifting described above. As you say, the scheme described accomplishes a different task than the lifted chain, i.e. computes the transition probabilities (strong simulation) as opposed to sampling from the distribution. One could use Metropolis sampling here but then we run into the issue that the $d_\delta$-lifting was originally brought in to solve.

**Complexity of computing transition probabilities.** As opposed to the statement in the paper saying that computing the probabilities is intractable, it is polynomial by my calculations -- the runtime of computing the transition probabilities of a general $d_\delta$-lifting is $O(n^4 D(G))$, applying this to a coined walk using the $(c_\gamma \circ d_\delta)$-lifting I get $O(n^6 + n^4 T^5)$. This runtime can even be improved to $O((nd)^4)$ for a coined walk on a $d$-regular graph, by not using a composition of liftings and going direct to the $d_\delta$-lifting with the infinite-time quantum average mixing distribution, as described above.

**General quantum walks.** This simulation method will work for *any* unitary $U$ that is repeatedly applied and respects the structure of the graph $G$, i.e., a *general quantum walk* as defined at the end of section 2.3 -- computing the transition probabilities then requires $O(n^8)$ time.

----

## Question 1.

I have said as much as I can already about the lower bound. For the complexity of computing the transition probabilities, see below.

------

## Question 2

If I understand correctly, the register $A$ will require $\Theta(2^k)$ states, since we are storing all complex amplitudes with absolute value at most $1$ to $k$ bits of precision as an individual state. We need $k = \Theta(\log \delta)$ bit precision for the computation to be accurate to error $\delta$. With $\delta = O(1/T)$ this gives us that the register $A$ is of size $O(T)$ and the state space $H$ has $O(n^2 T)$ states (if we use $d \leq n$ for the size of register $C$).

That means that the described construction has asymptotically fewer states than the construction described in the paper. If we assume the deterministic map on $H$ is efficient we have a potentially more efficient method of computing the average mixing distribution. We also can do this by direct diagonalisation using the average mixing matrix in time $O((nd)^4) = O(n^8)$ (see [[arXiv:1701.04474](https://arxiv.org/pdf/1701.04474.pdf), Section 7]). In either case we have a scenario where we don't have a sampling method, but can compute probabilities -- a 'strong simulation' in the language of quantum circuits -- this still performs a qualitatively different task to what we are trying to do.

----

## Complexity of computing transition probabilities.

We can break this computation down into a number of stages:

1. Compute the stationary distribution probabilities for the $c_\gamma$-lifting, $[\pi^{c_\gamma}]_{x}$ for $x \in V(G \boxtimes P_T)$.

2. Using the stationary distribution probabilities, compute the $c_\gamma$-lifting transition probabilities $[P^{c_\gamma}]_{{y, x}}$ for $x, y \in V(G \boxtimes P_T)$.

3. For every vertex $x \in V(G \boxtimes P_T)$, produce the $d_\delta$-lifting transition probabilities. This amounts to computing the elements of the stochastic bridge matrices $\{P^{c_\gamma}(1), P^{c_\gamma}(2), \ldots, P^{c_\gamma}(T)\}$.
The stochastic bridge must satisfy
$P^{c_{\gamma}}(T) P^{c_{\gamma}}(T−1) \, \cdots \, P^{c_{\gamma}}(2) P^{c_{\gamma}}(1) e_x = \tilde{\pi}^{c_{\gamma}},$

where $\tilde{\pi}^{c_{\gamma}}$ is $\delta$-close to $\pi^{c_{\gamma}}$ in $\ell_1$-distance.

### Stochastic Bridge

Let us first try and come up with the explicit computation of the stochastic bridge applied to a general Markov chain on a graph $G$. We turn to Lemma 5 in [AST17 [arXiv:1712.01609](https://arxiv.org/abs/1712.01609)] where they prove existence of such a bridge, taking inspiration from Aaronson [[10.1103/PhysRevA.71.032325](https://journals.aps.org/pra/abstract/10.1103/PhysRevA.71.032325)]. Their proof involves showing that the transition probability matrix $P(t)$ between 'times' $t$ and $t+1$ are given by the solution of a maximum flow problem. One then solves this maximum flow problem for each $t \in (0,1,\ldots, D(G)-1)$ to obtain the transition probabilities $P(1), P(2), \ldots P(D(G))$.

This proof is lacking one ingredient to be completely constructive, a 'schedule' of flows for the edges adjacent to the source and sink vertices at a given pair of times $(t,t+1)$. To use notation of [Lemma 5, AST17], we need to set a schedule of $p_t$, i.e. $\{ p_0^{(i)}, p_1^{(i)}, \ldots, p_{D(G)}^{(i)} \}$
for a given vertex $i$.

For a given vertex $i$, we can find the stochastic bridge taking $p_0 = e_i$ to $p_{D(G)} = \pi$ as follows:

Find a spanning tree $T_i$ of $G$ rooted at vertex $i$, using breadth-first search. We shall now modify $T_i$ using the following procedure: walk through $T_i$ using breadth-first search. Every time you reach any vertex $j$ that has already been visited, append a new child vertex also labelled $j$. We call this modified graph $T'_{i}$ ; it is still a tree. Moreover, we denote the $t^{\text{th}}$ *level* of $T_i'$, $\ell_t(T_i')$, the set of vertices (in $V(G)$) at distance $t$ from the root $i$ in $T_i'$.

As an example, consider the graph $G$ below, with the spanning tree $T_1$ (rooted at vertex 1) and the related tree $T'_{1}$.

![Example G, T_i, Tprime_i](https://gist.githubusercontent.com/ddervs/41f89b924ecbc52b75a088fe3a2bc22e/raw/d165a3821e57f930ae5c9a986e139e4078e4f290/spanning_trees.png)

Also, let $\mathcal{D}(j, T)$ be the set of descendent leaves of the vertex $j$ in a tree $T$.
We then set our 'bridge schedule' as follows:

$[p_t^{(i)}]$$_j$
$=\sum_{k \in \mathcal{D}(j, T_i')}[\pi]_{k}, \ \ j \in \ell_t(T_i');$

$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0, \ \ \ \ \ \text{otherwise}.$

This schedule effectively routes probability mass through the lifted graph at each timestep. We then solve each max-flow problem with ([Lemma 5, AST17] notation) $y = p_t^{(i)}$, $z = p_{t+1}^{(i)}$ for each $t \in \{0,1,\ldots, D(G)-1\}$ to get the transition probabilities.

Notice here, it is possible 'prune' the vertices in the $d_\delta$-lifted state space that do not occur at the $t^{\text{th}}$ level of each $T_i'$, that is, vertices for which $[p_t^{(i)}]$$_j$$= 0$. To avoid complications I shan't take this into account in the analysis that follows.

For the full $d_\delta$-lifting, we construct a stochastic bridge for each vertex $i \in V(G)$ and take their disjoint union. For each bridge the 'final' transition matrix is the original transition matrix $P$; the end result is for all times $t\geq D(G)$ the state $p^{(t)} = \pi$ ; iterating $P$ leaves this distribution unchanged.

Having completed explicit construction for the $d_\delta$-lifting, we can evaluate the complexity of constructing a $d_\delta$-lifting of a Markov chain on a graph $G$ with $n$ vertices.
We require $n$ stochastic bridges, each containing $D(G)$ $n\times n$ transition matrices. Thus, this lifting requires storing $O(n^3 D(G))$ reals. Solving for each transition matrix requires solving a max-flow problem on $2n$ vertices, where certain flows are given by 'schedule' probabilities.
Computing the schedule probabilities $[p_t^{(i)}]$$_j$ involves finding $n$ spanning trees, walking through them, appending vertices, then for each $t\in [D(G) -1]$ summing up the values of the children. The complexity of this task is washed out by the complexity of solving the max-flow problems.
Maximum-flow problems can be solved in time $O(|V|^3)$ (see [Malhotra, Pramodh Kumar, Maheshwari, [https://doi.org/10.1016/0020-0190(78)90016-9](https://www.sciencedirect.com/science/article/pii/0020019078900169?via%3Dihub)), so the runtime for computing the $d_\delta$-lifting on a graph $G$ with $n$ vertices is $O(n^4 D(G))$ as we solve $nD(G)$ max-flow problems on graphs with $2n$ vertices.

### Transition probabilities

Let's consider the runtime complexity of computing the $(c_\gamma \circ d_\delta)$-lifting transition probabilities.

First we need to compute the $[\pi^{c_\gamma}]_{x}$ for $x \in V(G \boxtimes P_T)$.

$[\pi^{c_\gamma}]$$_x = (1-\gamma)\frac{Q_{t_x}(k_x | \psi(0) )}{T} + \frac{\gamma}{nT}$

where $x = (k_x, t_x)$ for $k_x \in V(G)$, $t_x \in [T]$ and $\gamma > 0$ is a small positive constant we may choose.

For a coined walk this is $[\pi^{c_\gamma}]$$_x = (1-\gamma)\frac{1}{T}\sum_{j \in [d]}| \langle \psi(0)|
U^{t_x} |j,k_x \rangle |^2 + \frac{\gamma}{nT}$.

Since $|\psi(0)\rangle = \sum_{k\in V(G),\ j\in[d]} \alpha_{k,j} |j,k\rangle$, we are going to need the values

$$|\langle j', k'| U^t | j, k \rangle |^2\ \ \ \text{for all}\ \ \ \ k,k'\in V(G),\ j,j' \in [d],\ t\in [T - 1].$$

To compute these values we need to diagonalise $U$, at a cost of $O((nd)^3)$, then we compute the coefficients at a cost of $O(ndT)$.

Summing over the coin basis for the different $[\pi^{c_\gamma}]$$_{x}$ incurs an additional $O(d)$ overhead, so the complexity of computing the stationary probabilities is $O(n^3 d^3 + n d^2 T)$.

Computing the Metropolis transition probabilities once we have the stationary distribution gives us an (additive) complexity of $O((nT)^2)$.

The primary cost will come from the stochastic bridge, which will cost $O(n^4 T^5)$. Now, in the worst case $d=O(n)$ and so the total runtime complexity for simulating the walk will be $O(n^6 + n^4T^5)$. The memory complexity essentially comes from storing the lifted transition probabilities, i.e. the number of states squared $O((n^2 T^3)^2)$ (note in practise this transition matrix is very sparse so can be made smaller).

### Simpler lifting

Now let us consider the case where we apply the $d_\delta$-lifting directly to the graph $G$, where we impose the infinite-time average-mixing distribution of the given quantum walk as the stationary distribution (see Simon's earlier comment, first bullet point). More specifically, we want

$[\pi]$$_k = \pi^q(k | \psi(0) ) = \lim_{T \to \infty} \frac{1}{T} \sum^{T-1}_{t=0} Q_t(k | \psi(0) ).$

From [[arXiv:1701.04474]](https://arxiv.org/pdf/1701.04474.pdf), we have that

$\pi^q(k | \psi(0) ) = \sum_r \langle \psi(0) | F_r^\dagger D_k F_r |\psi(0)\rangle ,$

where the $F_r$ are the idempotents in the spectral decomposition of the walk matrix $U$ and $D_k \in \mathbb{R}^{dn \times dn}$ is the diagonal matrix with a $1$ in positions corresponding to vertex $k$.

Computing the spectral decomposition takes time $O((dn)^3)$. Each term $F_r^\dagger D_k F_r$ is $O((nd)^2 \cdot d)$ since $D_k$ has only $d$ non-zero terms. Taking $\langle \psi(0) | \cdots |\psi(0)\rangle$ is an additive $O(d)$ factor. We then have that $r$ can range up to $O(nd)$ and we perform the sum for $k \in [n]$ for a total $O((nd)^4)$. Now, taking $d= O(n)$ we have that computing $\pi^q$ requires runtime $O(n^8)$.

Computing the $d_\delta$-lifting takes time $O(n^4 D(G))$ and memory $O(n^3 D(G))$, so the runtime complexity here is dominated by the $\pi^q$ computation.

This gives us the new result:

**Result.** Let $(U, |\psi \rangle )$ be a general quantum walk on a graph $G$.
Then there exists a lifted Markov chain on $n^2 D(G)$ vertices that mixes to the quantum average mixing distribution $\pi^q( \cdot | \psi(0))$ after $D(G)$ timesteps, where $D(G)$ is the diameter of the graph. Computing the transition probabilities for the lifted Markov chain requires $O(n^8)$ time and $O(n^4)$ memory.

For a general quantum walk, the $O(nd)$ factors become $m = O(n^2)$, where $m$ is the number of arcs in the graph under consideration.

## General Quantum walks

The above new result applies to a general quantum walk, as defined at the end of section 2.3 in the paper.

----

## Comments

Thank you Māris for pointing out inconsistencies and places I have been sloppy -- I agree with all of your comments. I have one clarification to make:

**Inequality $\mathcal{M}^c \leq \mathcal{M}$**

I wasn't clear in the paper where this inequality comes from. You rightly said that if we take $\mathcal{P}^c$ to be all basis states, then we have $\mathcal{M}^c = \mathcal{M}$. The inequality comes from the fact that we are allowed to choose $\mathcal{P}^c$ to effectively 'prune' the bad starting states away from the maximisation and achieve a superior mixing time.

One of the main results in [ATS17, [arXiv:1705.08253](https://arxiv.org/abs/1705.08253)] is the fact that we can only have diameter-time mixing if we are allowed to choose a mapping from the initial state on the coarse-grained Markov chain to an initial state on the lifted chain. The set $\mathcal{P}^c$ is then the image of $\mathcal{P}$ under this mapping, giving the inequality. This should have been included in the paper.

Edited by Danial Dervovic Mar 01 2018 12:36 UTC

Hello again Māris, many thanks for your patience. Your comments and questions have given me much food for thought, and scope for an amended version of the paper -- please see my responses below.

Please if any of the authors of [AST17 [arXiv:1712.01609](https://arxiv.org/abs/1712.01609)] have any further insight (@Simon?) I would be most grateful.

# Summary

I will briefly summarise the main points of reply here, going into more detail below.

----

## Question 1.

I have said as much as I can already about the lower bound. For the complexity of computing the transition probabilities, see below.

------

## Question 2

----

## Complexity of computing transition probabilities.

We can break this computation down into a number of stages:

1. Compute the stationary distribution probabilities for the $c_\gamma$-lifting, $[\pi^{c_\gamma}]_{x}$ for $x \in V(G \boxtimes P_T)$.

2. Using the stationary distribution probabilities, compute the $c_\gamma$-lifting transition probabilities $[P^{c_\gamma}]_{{y, x}}$ for $x, y \in V(G \boxtimes P_T)$.

### Stochastic Bridge

For a given vertex $i$, we can find the stochastic bridge taking $p_0 = e_i$ to $p_{D(G)} = \pi$ as follows:

As an example, consider the graph $G$ below, with the spanning tree $T_1$ (rooted at vertex 1) and the related tree $T'_{1}$.

![Example G, T_i, Tprime_i](https://gist.githubusercontent.com/ddervs/41f89b924ecbc52b75a088fe3a2bc22e/raw/d165a3821e57f930ae5c9a986e139e4078e4f290/spanning_trees.png)

Also, let $\mathcal{D}(j, T)$ be the set of descendent leaves of the vertex $j$ in a tree $T$.
We then set our 'bridge schedule' as follows:

$[p_t^{(i)}]$$_j$
$=\sum_{k \in \mathcal{D}(j, T_i')}[\pi]_{k}, \ \ j \in \ell_t(T_i');$

$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \text{otherwise}.$