All posts by Emil Björnson

How Distortion from Nonlinear Massive MIMO Transceivers is Radiated Spatially

While the research literature is full of papers that design wireless communication systems under constraints on the maximum transmitted power, in practice, it might be constraints on the equivalent isotropically radiated power (EIRP) or the out-of-band radiation that limit the system operation.

Christopher Mollén recently defended his doctoral thesis entitled High-End Performance with Low-End Hardware: Analysis of Massive MIMO Base Station Transceivers. In the following video, he explains the basics of how the non-linear distortion from Massive MIMO transceivers is radiated in space.

A Basic Way to Quantify the Massive MIMO Gain

Several people have recently asked me for a simple way to quantify the spectral efficiency gains that we can expect from Massive MIMO. In theory, going from 4 to 64 antennas is just a matter of changing a parameter value. However, many practical issues need be solved to bring the technology into reality and the solutions might only be developed if we can convince ourselves that the gains are sufficiently large.

While there is no theoretical upper limit on how spectrally efficient Massive MIMO can become when adding more antennas, we need to set some reasonable first goals.  Currently, many companies are trying to implement analog beamforming in a cost-efficient manner. That will allow for narrow beamforming, but not spatial multiplexing.

By following the methodology in Section 3.3.3 in Fundamentals of Massive MIMO, a simple formula for the downlink spectral efficiency is:

(1)   \begin{equation*}K \cdot \left( 1 - \frac{K}{\tau_c} \right) \cdot \log_2 \left( 1+ \frac{ c_{ \textrm{\tiny CSI}} \cdot M \cdot \frac{\mathrm{SNR}}{K}}{\mathrm{SNR}+ 1} \right)\end{equation*}

where $M$ is the number of base-station antennas, $K$ is the number of spatially multiplexed users, $c_{ \textrm{\tiny CSI}}  \in [0,1]$ is the quality of the channel estimates, and $\tau_c$ is the number of channel uses per channel coherence block. For simplicity, I have assumed the same pathloss for all the users. The variable $\mathrm{SNR}$ is the nominal signal-to-noise ratio (SNR) of a user,  achieved when $M=K=1$. Eq. (1) is a rigorous lower bound on the sum capacity, achieved under the assumptions of maximum ratio precoding, i.i.d. Rayleigh fading channels, and equal power allocation. With better processing schemes, one can achieve substantially higher performance.

To get an even simpler formula, let us approximate (1) as

(2)   \begin{equation*}K \log_2 \left( 1+ \frac{ c_{ \textrm{\tiny CSI}} M}{K} \right)\end{equation*}

by assuming a large channel coherence and negligible noise.

What does the formula tell us?

If we increase $M$ while $K$ is fixed , we will observe a logarithmic improvement in spectral efficiency. This is what analog beamforming can achieve for $K=1$ and, hence, I am a bit concerned that the industry will be disappointed with the gains that they will obtain from such beamforming in 5G.

If we instead increase $M$ and $K$ jointly, so that  $M/K$ stays constant, then the spectral efficiency will grow linearly with the number of users. Note that the same transmit power is divided between the $K$ users, but the power-reduction per user is compensated by increasing the array gain $M$ so that the performance per user remains the same.

The largest gains come from spatial multiplexing

To give some quantitative numbers, consider a baseline system with $M=4$ and $K=1$ that achieves 2 bit/s/Hz. If we increase the number of antennas to $M=64$, the spectral efficiency will become 5.6 bit/s/Hz. This is the gain from beamforming. If we also increase the number of users to $K=16$ users, we will get 32 bit/s/Hz. This is the gain from spatial multiplexing. Clearly, the largest gains come from spatial multiplexing and adding many antennas is a necessary way to facilitate such multiplexing.

This analysis has implicitly assumed full digital beamforming. An analog or hybrid beamforming approach may achieve most of the array gain for $K=1$. However, although hybrid beamforming allows for spatial multiplexing, I believe that the gains will be substantially smaller than with full digital beamforming.

Holographic Beamforming versus Massive MIMO

Last year, the startup company Pivotal Commware secured venture capital (e.g., from Bill Gates) to bring its holographic beamforming technology to commercial products. Despite the word “holographic”, this is not a technology focused on visual-light communications. Instead, the company uses passive electronically steered antennas (PESAs) that are designed for radio-frequencies (RFs) in the micro- and millimeter-wave bands. It is the impedance pattern created in the distribution network over the array that is called a “hologram” and different holograms lead to beamforming in different spatial directions. The company reportedly aims at having commercial products ready this year.

Will the futuristic-sounding holographic beamforming make Massive MIMO obsolete? Not at all, because this is a new implementation architecture, not a new beamforming scheme or spatial multiplexing method. According to the company’s own white paper, the goal is to deliver “a new dynamic beamforming technique using a Software Defined Antenna (SDA) that employs the lowest C-SWaP (Cost, Size, Weight, and Power)“. Simply speaking, it is a way to implement a phased array in a thin, conformable, and affordable way. The PESAs are constructed using high volume commercial off-the-shelf components. Each PESA has a single RF-input and a distribution network that is used to vary the directivity of the beamforming. With a single RF-input, only single-user single-stream beamforming is possible. As explained in Section 1.3 in my recent book, such single-user beamforming can improve the SINR, but the rate only grows logarithmically with the number of antennas. Nevertheless, cost-efficient single-stream beamforming from massive arrays is one of the first issues that the industry tries to solve, in preparation for a full-blown Massive MIMO deployment.

The largest gains from multiple antenna technologies come from spatial multiplexing of many users, using a Massive MIMO topology where the inter-user interference is reduced by making the beams narrower as more users are to be multiplexed. The capacity then grows linearly with the number of users, as also explained in Section 1.3 of my book.

Can holographic beamforming be used to implement Massive MIMO with spatial multiplexing of tens of users? Yes, similar to hybrid beamforming, one could deploy an array of PESAs, where each PESA is used to transmit to one user. Eric J. Black, CTO and founder of Pivotal Commware, refers to this as “sub-aperture based SDMA“. If you want the capability of serving ten users simultaneously, you will need ten PESAs.

If the C-SWaP of holographic beamforming is as low as claimed, the technology might have the key to cost-efficient deployment of Massive MIMO. The thin and conformable form factor also makes me think about the recent concept of Distributed Large Intelligent Surface, where rooms are decorated with small antenna arrays to provide seamless connectivity.

Origin of the “Massive MIMO” Name

“A dear child has many names” is a Swedish saying and it certainly applies to Massive MIMO. It is commonly claimed that the Massive MIMO concept originates from the seminal paper “Noncooperative Cellular Wireless with Unlimited Numbers of Base Station Antennas” by Thomas Marzetta, published in 2010. This is basically correct, except for the fact that the paper only talks about “Multi-user MIMO systems with very large antenna arrays“. Marzetta then published several papers using the large‐scale antenna systems (LSAS) terminology, before switching to calling it Massive MIMO in more recent years. Over the years, various papers have also called it “very large multiuser MIMO” and “large-scale MIMO“. Nowadays, Massive MIMO is used by almost everyone in the research community, and even by marketing people.

If you search at IEEEXplore, the origin of the name remains puzzling. The earliest papers are “Massive MIMO: How many antennas do we need?” by Hoydis/ten Brink/Debbah and “Achieving Large Spectral Efficiency with TDD and Not-so-Many Base-Station Antennas” by Huh/Giuseppe Caire/Papadopoulos/Ramprashad, both from 2011. However, these papers are referring to Marzetta’s seminal paper, which doesn’t call it “Massive MIMO”.

If you instead read the news reports by ZDNet and Silicon from the 2010 Bell Labs Open Days in Paris, the origin of “Massive MIMO” becomes clearer. Marzetta presented his concept and reportedly said that “We haven’t been able to come up with a catchy name”, but told ZDNet that “massive MIMO” and “large-scale MIMO” were two candidates. To the Massive MIMO blog, Marzetta now explains why he initially abandoned these potential names, in favor for LSAS:

When I explained the concept to the Bell Labs Director of Research, he commented that it didn’t sound at all like MIMO to him. He recommended strongly that I think of a name that didn’t contain the acronym “MIMO”, hence, LSAS. Eventually (after everyone else called it Massive MIMO) I abandoned “LSAS” and started to call it “Massive MIMO”.

In conclusion, the Massive MIMO name came originally from Marzetta, who used it when first describing the concept to the public, but the name was popularized by other researchers.

Relax and Conquer

Many radio resource allocation tasks are combinatorial in nature. It might be to associate a user equipment (UE) to a base station (BS) from a set of BSs, to select a set of time-frequency resources for transmission to a particular UE, or to assign pilot sequences to a set of users. The unfortunate thing with discrete combinatorial optimization problems is that the number of combinations grows very rapidly with the number of UEs and the number of discrete options that can be made for each of them. For example, suppose there are K UEs and you have to pick one out of D options for each of them, then there are DK different combinations. Hence, the worst-case computational complexity grows exponentially with K.

Interestingly, some radio resource allocation problems that appear to have exponential complexity can be relaxed to a form that is much easier to solve – this is what I call “relax and conquer”. In optimization theory, relaxation means that you widen the set of permissible solutions to the problem, which in this context means that the discrete optimization variables are replaced with continuous optimization variables. In many cases, it is easier to solve optimization problems with variables that take values in continuous sets than problems with a mix of continuous and discrete variables.

A basic example of this principle arises when communicating over a single-user MIMO channel. To maximize the achievable rate, you first need to select how many data streams to spatially multiplex and then determine the precoding and power allocation for these data streams. This appears to be a mixed-integer optimization problem, but Telatar showed in his seminal paper that it can be solved by the water-filling algorithm. More precisely, you relax the problem by assuming that the maximum number of data streams are transmitted and then you let the solution to a convex optimization problem determine how many of the data streams that are assigned non-zero power; this is the optimal number of data streams. Despite the relaxation, the global optimum to the original problem is obtained.

There are other, less known examples of the “relax and conquer” method. Some years ago, I came across the paper “Jointly optimal downlink beamforming and base station assignment“, which has received much less attention than it deserves. The UE-BS association problem, considered in this paper, is non-trivial since some BSs might have many more UEs in their vicinity than other BSs. Nevertheless, the paper shows that one can solve the problem by first relaxing it so that all BSs transmit to all the UEs. The author formulates a relaxed optimization problem where the beamforming vectors (including power allocation) are selected to satisfy each UEs’ SINR constraint, while minimizing the total transmit power. This problem is solved by convex optimization and, importantly, the optimal solution is always such that each UE only receives a non-zero signal power from one of the BSs. Hence, the seemingly difficult combinatorial UE-BS association problem is relaxed to a convex optimization problem, which provides the optimal solution to the original problem!

I have reused this idea in several papers. The first example is “Massive MIMO and Small Cells: Improving Energy Efficiency by Optimal Soft-cell Coordination“, which considers a similar setup but with a maximum transmit power per BS. The consequence of including this practical constraint is that it might happen that some UEs are served by multiple BSs at the optimal solution. These BSs send different messages to the UE, which decode them by successive interference cancelation, thus the solution is still practically achievable.

One practical weakness with the two aforementioned papers is that they take small-scale fading realizations into account in the optimization, thus the problem must be solved once per coherence interval, requiring extremely high computational power. More recently, in the paper “Joint Power Allocation and User Association Optimization for Massive MIMO Systems“, we applied the same “relax and conquer” method to Massive MIMO, but targeting lower bounds on the downlink ergodic capacity. Since the capacity bounds are valid as long as the channel statistics are fixed (and the same UEs are active), our optimized BS-UE association can be utilized for a relatively long time period. This makes the proposed algorithm practically relevant, in contrast to the prior works that are more of academic interest.

Another example of the “relax and conquer” method is found in the paper “Joint Pilot Design and Uplink Power Allocation in Multi-Cell Massive MIMO Systems”. We consider the assignment of orthogonal pilot sequences to users, which appears to be a combinatorial problem. Instead of assigning a pilot sequence to each UE and then allocate power, we relax the problem by allowing each user to design its own pilot sequence, which is a linear combination of the original orthogonal sequences. Hence, a pair of UEs might have partially overlapping sequences, instead of either identical or orthogonal sequences (as in the original problem). The relaxed problem even allows for pilot contamination within a cell. The sequences are then optimized to maximize the max-min performance. The resulting problem is non-convex, but the combinatorial structure has been relaxed so that there are only optimization variables from continuous sets. A local optimum to the joint pilot assignment and power control problem is found with polynomial complexity, using standard methods from the optimization literature. The optimization might not lead to a set of orthogonal pilot sequences, but the solution is practically implementable and gives better performance.

The Common SINR Mistake

We are used to measuring performance in terms of the signal-to-interference-and-noise ratio (SINR), but this is seldom the actual performance metric in communication systems. In practice, we might be interested in a function of the SINR, such as the data rate (a.k.a. spectral efficiency), bit-error-rate, or mean-squared error in the data detection. When the receiver has perfect channel state information (CSI), the aforementioned metrics are all functions of the same SINR expression, where the power of the received signal is divided by the power of the interference plus noise. Details can be found in Examples 1.6-1.8 of the book Optimal Resource Allocation in Coordinated Multi-Cell Systems.

In most cases, the receiver only has imperfect CSI and then it is harder to measure the performance. In fact, it took me years to understand this properly. To explain the complications, consider the uplink of a single-cell Massive MIMO system with K single-antenna users and M antennas at the base station. The received M-dimensional signal is

    $$\mathbf{y} = \sum_{i=1}^{K} \mathbf{h}_{i} x_{i} + \mathbf{n}$$

where $x_{i}$ is the unit-power information signal from user $i$$\mathbf{h}_{i} \in \mathbb{C}^{M}$ is the fading channel from this user, and $\mathbf{n}\in \mathbb{C}^{M}$ is unit-power additive Gaussian noise. In general, the base station will only have access to an imperfect estimate $\hat{\mathbf{h}}_{i} \in \mathbb{C}^{M}$ of $\mathbf{h}_{i}$, for $i=1,\ldots,K.$

Suppose the base station uses  $\hat{\mathbf{h}}_{1},\ldots,\hat{\mathbf{h}}_{K}$ to select a receive combining vector $\mathbf{v}_k\in \mathbb{C}^{M}$ for user $k$. The base station then multiplies it with $\mathbf{y}$ to form a scalar that is supposed to resemble the information signal $x_{k}$:

    $$\mathbf{v}_k^H \mathbf{y} = \underbrace{\mathbf{v}_k^H \mathbf{h}_{k} x_{k}}_\textrm{Desired signal} + \underbrace{\sum_{i=1, i \neq k}^{K} \mathbf{v}_k^H\mathbf{h}_{i} x_{i}}_\textrm{Interference} + \underbrace{\mathbf{v}_k^H \mathbf{w}}_\textrm{Noise}.$$

From this expression, a common mistake is to directly say that the SINR is

    $$\mathrm{SINR}_k^\textrm{wrong} = \frac{| \mathbf{v}_k^H \mathbf{h}_{k}|^2}{ \sum_{i=1, i \neq k}^{K}  | \mathbf{v}_k^H \mathbf{h}_{i}|^2 + \| \mathbf{v}_k \|^2},$$

which is obtained by computing the power of each of the terms (averaged over the signal and noise), and then claim that $\mathbb{E}\{\log_2(1+\mathrm{SINR}_k^\textrm{wrong} )\}$ is an achievable rate (where the expectation is with respect to the random channels). You can find this type of arguments in many papers, without proof of the information-theoretic achievability of this rate value. Clearly, $\mathrm{SINR}_k^\textrm{wrong} $ is an SINR, in the sense that the numerator contains the total signal power and the denominator contains the interference power plus noise power. However, this doesn’t mean that you can plug $\mathrm{SINR}_k^\textrm{wrong} $ into “Shannon’s capacity formula” and get something sensible. This will only yield a correct result when the receiver has perfect CSI.

A basic (but non-conclusive) test of the correctness of a rate expression is to check that the receiver can compute the expression based on its available information (i.e., estimates of random variables and deterministic quantities). Any expression containing $\mathrm{SINR}_k^\textrm{wrong}$ fails this basic test since you need to know the exact channel realizations \mathbf{h}_{1},\ldots,\mathbf{h}_{K} to compute it, although the receiver only has access to the estimates.

What is the right approach?

Remember that the SINR is not important by itself, but we should start from the performance metric of interest and then we might eventually interpret a part of the expression as an effective SINR. In Massive MIMO, we are usually interested in the ergodic capacity. Since the exact capacity is unknown, we look for rigorous lower bounds on the capacity. There are several bounding techniques to choose between, whereof I will describe the two most common ones.

The first lower bound on the uplink capacity can be applied when  the channels are Gaussian distributed and $\hat{\mathbf{h}}_{1}, \ldots, \hat{\mathbf{h}}_{K}$ are the MMSE estimates with the corresponding estimation error covariance matrices $\mathbf{C}_{1},\ldots,\mathbf{C}_{K}$. The ergodic capacity of user $k$ is then lower bounded by

$$R_k^{(1)} = \mathbb{E} \left\{ \log_2 \left(  1 + \frac{| \mathbf{v}_k^H \hat{\mathbf{h}}_{k}|^2}{ \sum_{i=1, i \neq k}^{K}  | \mathbf{v}_k^H \hat{\mathbf{h}}_{i}|^2 + \sum_{i=1}^{K}   \mathbf{v}_k^H \mathbf{C}_{i} \mathbf{v}_k  + \| \mathbf{v}_k \|^2}   \right) \right\}.

Note that this expression can be computed at the receiver using only the available channel estimates (and deterministic quantities). The ratio inside the logarithm can be interpreted as an effective SINR, in the sense that the rate is equivalent to that of a fading channel where the receiver has perfect CSI and an SNR equal to this effective SINR. A key difference from $\mathrm{SINR}_k^\textrm{wrong}$ is that only the part of the desired signal that is received along the estimated channel appears in the numerator of the SINR, while the rest of the desired signal appears as $\mathbf{v}_k^H \mathbf{C}_{k} \mathbf{v}_k$ in the denominator. This is the price to pay for having imperfect CSI at the receiver, according to this capacity bound, which has been used by Hoydis et al. and Ngo et al., among others.

The second lower bound on the uplink capacity is

$$R_k^{(2)} =  \log_2 \left(  1 + \frac{ | \mathbb{E}\{ \mathbf{v}_k^H \mathbf{h}_{k} \} |^2}{ \sum_{i=1}^{K}  \mathbb{E} \{ | \mathbf{v}_k^H \mathbf{h}_{i}|^2 \}  - | \mathbb{E}\{ \mathbf{v}_k^H \mathbf{h}_{k} \} |^2+ \mathbb{E}\{\| \mathbf{v}_k \|^2\} }   \right),

which can be applied for any channel fading distribution. This bound provides a value close to $R_k^{(1)}$ when there is substantial channel hardening in the system, while $R_k^{(2)}$ will greatly underestimate the capacity when $\mathbf{v}_k^H \mathbf{h}_{k}$ varies a lot between channel realizations. The reason is that to obtain this bound, the receiver detects the signal as if it is received over a non-fading channel with gain \mathbb{E}\{ \mathbf{v}_k^H \mathbf{h}_{k} \} (which is deterministic and thus known in theory, and easy to measure in practice), but there are no approximations involved so $R_k^{(2)}$ is always a valid bound.

Since all the terms in $R_k^{(2)} $ are deterministic, the receiver can clearly compute it using its available information. The main merit of $R_k^{(2)}$ is that the expectations in the numerator and denominator can sometimes be computed in closed form; for example, when using maximum-ratio and zero-forcing combining with i.i.d. Rayleigh fading channels or maximum-ratio combining with correlated Rayleigh fading. Two early works that used this bound are by Marzetta and by Jose et al..

The two uplink rate expressions can be proved using capacity bounding techniques that have been floating around in the literature for more than a decade; the main principle for computing capacity bounds for the case when the receiver has imperfect CSI is found in a paper by Medard from 2000. The first concise description of both bounds (including all the necessary conditions for using them) is found in Fundamentals of Massive MIMO. The expressions that are presented above can be found in Section 4 of the new book Massive MIMO Networks. In these two books, you can also find the right ways to compute rigorous lower bounds on the downlink capacity in Massive MIMO.

In conclusion, to avoid mistakes, always start with rigorously computing the performance metric of interest. If you are interested in the ergodic capacity, then you start from one of the canonical capacity bounds in the above-mentioned books and verify that all the required conditions are satisfied. Then you may interpret part of the expression as an SINR.