Data Science and Machine Learning
Formally, Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured dataWikipedia/Data_science. The core object of research in this field is data for the goal of knowledge. This approach is different from research of theoretical study like geometry by deductive reasoning. A nice example is Kepler’s Law, here we list two approaches to it in the following to show the difference between two paradigm.
- Kepler type
- Analyze the data from Tycho
- Summarize the pattern of data in simple way(the first and second law, Occam’s razor)
- Fit the data and generalize the quantitative rule(the third law)
- Newton type(also used in the modern textbook)
- Admit some axioms of the system(Newton’s law)
- With mathematical approach write the equation and solve it
- Discuss the properties of the solution
We call the Kepler type data-driven, and this paradigm is the core of data science.
Roughly speaking, there are three parts for data science: data collection, data storage, and learn from data. Though the other two are also important, our main goal here is the last one. The algorithm for this purpose is Machine Learning. Most machine learning tasks can be classified into the following three frameworks.
-
Supervised learning
- Data: a set of input
) , and its label )) - Goal: find a model for the map from
) , to )) of
There are two main applications to supervised learning: regression (continuous
)) ) and classification (discretized )) ) - Data: a set of input
-
Unsupervised learning
- Data: a set of input
) , , without labels. - Goal: find the hidden patterns or grouping data.
Dimensionality reduction, clustering, generative modeling.
- Data: a set of input
-
Reinforcement learning
- Data: delayed feedback signal(reward) after action. Like the win/lose in a chess game
- Goal: learning an optimal policy(i.e. how to act). Like learn how to play a chess game
Though we list these classes of machine learning, the modern development are not distinct in this level. To solve the complex problems in the real-world we need the cooperation of different technologies for their own parts. To classify such the mixed projects are difficult and meaningless. As a popular and growing technology, there will be many new contents enter the field of machine learning. Just open your mind and be prepared.
Quantum Tech to Data Science
Quantum technology has been one of the most important concepts in the world now. Some people call the control and application of the quantum systems the Second Quantum Revolution, to distinguish against the establishment of quantum mechanics in the early 20th century. To data science, the impact of quantum tech is well described by the following figure from wiki/Quantum-machine-learning.
As quantum tech comes into our sight, a new type of data now is needed to be considering. Classical data is encoded by classical physical systems and its values are deterministic at each stage of processing. The information we touch in the daily life are all classical data, like this report, logical variables by whether voltage in the wire of computer is greater than a threshold or not, and much other analog data. Quantum data is encoded by quantum systems. Like the entities in the Hilbert space(the state of qubit), such data is quiet different from the classical one especially in the read/write rule. By the high representability of digits, we can still simulate the quantum data on classical computer. But since the dimension of Hilbert space (amounts of quantum variables) are exponential to the number of classical variables, this approach is incapably expensive.
Another aspect of the figure above is the type of algorithm. It is not only means the software we handle the data, but also the hardware we used to implement the program. This part contains the storage and processing. The former is qubits and quantum gates, which has been proved to be universal to simulate any quantum system and their evolution, just like the classical bits and classical gates. However, the quantum nature of qubits makes it possible to handle quantum data with polynomial scaling cost(if we can prepare and hold the qubits as easy as we do for classical bits in the sense of scaling). The latter concept (processing), i.e. algorithm or software, is the analog of the classical algorithm and programs running on the classical computer. There are two paradigms for quantum algorithm, one can be represented as the combination of a set of quantum gates, while the other way is based on the adiabatic theorem (wiki/Adiabatic-quantum-computation). These two approaches has been proved to be equivalent (D.Aharonov 2005, H.Yu 2018).
Quantum Speedup for Classical Machine Learning
Quantum computing has been proven significantly more powerful than classical machine on certain problems. In the past decades, the theory about hardware (architecture) of quantum processor and software (algorithm) on quantum computer have been rapidly developed. Though currently, there are still experimental issues as barriers between us and large scale quantum computer, the theoretical preparation for applying quantum technology on machine learning has been sort of made. In this section, we focus on the quantum speedup for classical machine learning, i.e., the outperformance of quantum algorithm in the field of classical machine learning.
Essential Techs for Applied Quantum Computing
There are two essential technologies required in quantum machine learning. One is the quantum Random Accessible Memory, which is usually mentioned as the hardware, and the other is the quantum linear algebra algorithms in software level. The former might be suspected to be essential, since there are actually many quantum machine learning algorithms being free from qRAM. Recently, some new approaches have been studied for handling classical data in a quantum machine like quantum embedding (S. Lloyd 2020) and quantum feature map (V. Havlicek 2018). However, as a generic data loader for quantum machine learning on classical data, it is still valuable to discuss it mathematically here.
Data Loader: qRAM
The Random-Access Memory(RAM) on classical computers provides essentially a tree structure for addressing. Such addressing with conventionally implementation would cost exponentially large energy.
Conventional Classical RAM architecture
Basically, a classical RAM is composed of a memory array of size
- An
- ---bit string is read into input register. These bits store the address of the memory cell to be called, as the path along the tree. For bifurcation case, it is a series of with ,, and at ,-----th level of the tree, the value at ,-----th bit denotes the left and right edge. - Along the addressing path, the output register stores the content of memory cell, or a write circuit would modify its value for the write operation.
Within this procedure, the first addressing bit should control one gate at that node. However, the
from the paper (V. Giovannetti 2008).
Thus, though classical RAM serves the
However, as they said in the introduction of the paper (V. Giovannetti 2008):
A classical RAM that uses the bucket-brigade (ps. introduced in this paper.) addressing schemes need only activate
transistors in the course of a memory call, in contrast with a conventional RAM that activates transistors. As a result, a RAM that uses our design might operate with less dissipation and power consumption than a conventional RAM. Note, however, that energy costs in the memory addressing are not sufficiently high in current RAM chips to justify an immediate adoption of the bucket brigade. Other source of inefficiencies and dissipations are currently predominant (mostly in the memory cells themselves). However, new promising memory cell technologies are being developed (e.g., the “memristor” cells), which would drastically cut back cell dissipation, so that cutting back dissipation in the addressing may become important in the future.
Their architecture can also offer advantages for classical RAM, but it cannot resolve the dominant problem of classical RAM. According to this (stackexchange answer):
I think that it would indeed work in classical RAM, but the hardware constraints didn’t supply the ’evolutionary pressure’ required for it to be develiped and implemented.
currently bucket-brigade architecture is still focused on resolving the quantum issue, like resource efficiency and fault tolerance. (A. Paler 2020)
The bucket-brigade architecture is proposed to resolve the issue that in conventional one, there are always few addressing bits related to exponentially many transistors to control the path. Like the bits controlling the last level of the tree, which needs to control
Bucket-Brigade architecture modifies every nodes in the bifurcation tree into a (qu)trit, which could have three possible states.
(The figure is cited from github.com/qsharp-community/qram)The
This architecture allows a memory call with
where
We begin with the vector
Now we apply a rotation conditioned by the content in memory cells
Thus, with proper post-selection, we can prepare the state of
Though this fast amplitude encoding is very exciting, however, unfortunately, there is still no one has actually done this. As for do we need a qRAM, the following comment from github.com/qsharp-community/qram gives a nice answer
Sometimes. You’ll need a qRAM, or some more general means of quantum state preparation in quantum machine learning (QML) algorithms that require you to load in classical data, or query an oracle that returns classical data. I’ve heard a number of stories of people working on QML being actively discouraged from doing so because “QML won’t work without a qRAM”. That’s just not true, because many QML algorithms do not need a qRAM. Now, whether or not they yield any quantum advantage is a separate question, and won’t be discussed here. The key point is that some QML algorithms need a qRAM, and they will potentially run into trouble as per the next question.
the next question mentioned
Can we design an efficient qRAM?
Maybe. In the primer we’ll take a look at proposals that will in principle run in polynomial depth, and others that scale far worse. There are some very interesting qubit-time tradeoffs one can explore, in particular if the data being stored has some sort of underlying structure. Regardless, even if we can design an efficient circuit, we’d also like something that is efficient in a fault-tolerant setting, and this is potentially very expensive.
Another discussion can be found at the paper (C. Ciliberto 2017), in which the author list three issues for the current qRAM research:
- Do all components of the qRAM require to be error-corrected. If the answer is yes, such exponential physical resources would not be built in an experimental setting.
- The comparison is thought as unfair (D. Steiger’s talk in 2016). The argument states that the benchmark should be done with the same hardware resources scaling, which might decrease the quantum-speedup. The qRAM-based algorithms should be carefully considered.
- As pointed out in (S. Aaronson 2015), the fast state preparation based on qRAM requires the classical data distributed relatively uniform. However, in this case, classical random algorithm can be quite fast and reliable, which also makes the exponential-speedup disappeared.
Quantum Linear Algebra
Though the idea about using quantum resources to enhance the learning algorithms had attracted much attention since 1990s, the rapid growth of quantum machine learning actually began in 2009, by the help of quantum algorithm for linear system. In this section we discuss the algorithm, named as HHL algorithm, after its inventor Harrow, Hassidim, and Lloyd. (HHL 2009).
HHL algorithm try to solve the linear system reads
by the map, we can just consider the case of coefficients matrix to be Hermitian. The pseudo-code for HHL algorithm reads
- Input:
,, unitary matrix - Output: The amplitude-encoded solution
,, in which - Notation:
- Start:
-
Prepare initial state
with the subscript denoting ancilla and controlled -
Apply
Hadamard gates to make a uniform superposition control register -
Apply a
controlled evolution
of .. in which we assume -
Apply the
Fourier transform
on the control registerNote the amplitude
is significant only when -
Apply a
controlled rotation
on ancilla qubit, controlled by .. -
Initialize the control register and make post-selection that ancilla qubit should be
.. Then the output register
-
- End
Phase Estimation
The Phase Estimation
algorithm tries to find an eigenvalue of given unitary operator -
The operation can be implemented by the following circuit. Given
Then with
Then with quantum Fourier transformation on the
Thus, the core of HHL algorithm is to make a parallel version of phase estimation for
The key of HHL algorithm is the quality of phase estimation subroutine. Ignoring the cost of preparation of
HHL algorithm does have some caveats which make it not an efficient quantum algorithm for linear solver. As discussed in (S. Aaronson 2015), there are at least four critical issues in HHL algorithm
- The preparation of
.. - The simulation for general
driving quantum evolution. (in (L. Wossnig 2018), an algorithm for dense matrix with complexity of .. For soft-O notation see Wikipedia/Big_O_notation) - The robustness of inverse the matrix
- How to extract information encoded in
.. (For the application of compute ,, the classical computer can make an estimate within time.)
Though with these caveats, HHL algorithm works well as an approximately method for preparation of the solution of
Quantum Machine Learning Algorithms
Quantum SVM
Support Vector Machine(SVM) is one of the most important classical learning algorithm mainly for data classification. One of its advantages is that we understand how it works better than most black-box algorithms.
The training set has
and makes the distance between these two classes distant largely enough, i.e., maximize
Note: Solution of SVM and time complexity
With
Given convex optimization problem:
in which are convex function, equality constraint functions are affine. Slater’s condition holds iff there exists an ) , such that: Slater’s theorem states that i.e., the minimization of origin problem (primal problem) equals to the maximization of (Lagrangian) dual problem, which reads (note that the infimum does not constrain ) , ) in which is the Lagrangian of primal problem.
An important relation between the optimal of primal problem and dual problem is that: if strong duality holds and a dual optimal solution
That is, solving the dual problem freely gives us the optimal of primal problem.
Now let us consider the optimization problem of SVM training. The objective function
in which we optimize of quadratic from of
Then, primal optimization problem now transforms as dual problem:
in which,
The time complexity of classical training is made up with the following three steps:
-
.. Construction of the dual problem: times inner product for ---vectors -
.. For the convex quadratic programming. -
.. For the recovery of from dual optimal.
Thus, the total time complexity is of
In (P. Rebentrost 2014), the quantum speedup of SVM for big data classification is introduced. They use the least squares support vector machine introduced in (J.A.K. Suykens 1999), in which the model is trained by solving a linear system instead of a convex quadratic programming. The training (together with kernel trick) reads:
Such optimization problem with equality constraints can be solved by
in which
The quantum SVM use HHL algorithm to find the solution of above linear system in a normalized form as
Simulate quantum evolution driven by "
To implement HHL algorithm, we need simulate the unitary operator
in which
The evolution driven by
The basis transformation to diagonalize matrix
in which
The evolution driven by
with
Now, if we discard the dataset register, i.e., partially trace out those
If the kernel function
Now we have the kernel matrix as
-
Generate the Hamiltonian
by qRAM. -
Appling
to the state .. This results inWhen
is small enough, the probability to measure the ancilla qubit in is ,, which allows the estimation of the trace of kernel matrix.
The time evolution driven by
in which we have
After the training process, i.e., we obtain the amplitude encoding vector
-
query the qRAM of training set with addressing register being
, then prepare the state of -
Prepare the state of new item:
-
With an ancilla qubit, prepare the state of (this seems to be a non-trivial process, but it could be efficient on a well-structured qRAM.)
-
Make a measurement on ancilla qubit with the basis of
,, the probability to find isSo, the prediction of SVM is
..
Quantum PCA
Principal Component Analysis(PCA) is a typical unsupervised learning algorithm in classical machine learning. For the given dataset
It can be reduced into an eigensystem problem, with omitting the principal components whose eigenvalues (variance along them) are below the threshold, PCA can also be applied for dimensionality reduction and low-rank approximation.
In 2014, S. Lloyd et al proposed the quantum version of PCA (S. Lloyd 2014). The algorithm is based on the HHL algorithm for matrix inversion, thus authors claim their algorithm is exponentially faster than classical case. (but it still sufferes the caveats of HHL algorithm).
The algorithm begins with a pure quantum way, different from SVM, qRAM is optional for qPCA. We follow the paper (S. Lloyd 2014), the relation between qPCA and classical case will be discussed at the end of this part. Given
efficiently. Thus, with the controlled time evolution and phase estimation, we can prepare the state of (with spectrum decomposition of
Apply this subroutine on initial state of
The first part is the register storing the eigenvalue of
within exponentially short time complexity than classical case, if the number of copies of
Implement qPCA on classical dataset
For classical dataset, qPCA requires qRAM to load data and encode them into amplitude efficiently. When the center-shifted data (i.e.,
where
Thus, use this state as the initial state of qPCA, we can implement the qPCA for classical data.
In a recent paper (T. Xin 2021), an experimental protocol for quantum PCA has been proposed. Different from the above algorithm to find an approach for general density matrix, they use parameterized quantum circuit to find the decomposition for a fixed matrix by training set. With such variational and hybrid classical quantum approach, they claim their protocol could work well on near-term quantum device.
The goal of their qPCA is to find the diagonalization of the given matrix
The theoretical support is:
[Theorem] : Given a semidefinite positive and non-degenerate Hermitian operator
would diagonalize
Proof
Given objection function
where
in which
we have:
Note that with the protocol,
The qPCA via VQC (variational quantum circuit) is to minimize the objection function
with the parameterized circuit
Fast-grad for VQC
In the paper, they used the VQC of form
in which set
This can be proved by noticing
Thus, the gradient of objection with
Thus, the gradient can be obtained efficiently.