Category Archives: Technical insights

When Normalization is Dangerous

The signal-to-noise ratio (SNR) generally depends on the transmit power, channel gain, and noise power:

Since the spectral efficiency (bit/s/Hz) and many other performance metrics of interest depend on the SNR, and not the individual values of the three parameters, it is a common practice to normalize one or two of the parameters to unity. This habit makes it easier to interpret performance expressions, to select reasonable SNR ranges, and to avoid mistakes in analytical derivations.

There are, however, situations when the absolute value of the transmitted/received signal power matters, and not the relative value with respect to the noise power, as measured by the SNR. In these situations, it is easy to make mistakes if you use normalized parameters. I see this type of errors far too often, both as a reviewer and in published papers. I will give some specific examples below, but I won’t tell you who has made these mistakes, to not point the finger at anyone specifically.

Wireless energy transfer

Electromagnetic radiation can be used to transfer energy to wireless receivers. In such wireless energy transfer, it is the received signal energy that is harvested by the receiver, not the SNR. Since the noise power is extremely small, the SNR is (at least) a billion times larger than the received signal power. Hence, a normalization error can lead to crazy conclusions, such as being able to transfer energy at a rate of 1 W instead of 1 nW. The former is enough to keep a wireless transceiver on continuously, while the latter requires you to harvest energy for a long time period before you can turn the transceiver on for a brief moment.

Energy efficiency

The energy efficiency (EE) of a wireless transmission is measured in bit/Joule. The EE is computed as the ratio between the data rate (bit/s) and the power consumption (Watt=Joule/s). While the data rate depends on the SNR, the power consumption does not. The same SNR value can be achieved over a long propagation distance by using high transmit power or over a short distance by using a low transmit power. The EE will be widely different in these cases. If a “normalized transmit power” is used instead of the actual transmit power when computing the EE, one can get EEs that are one million times smaller than they should be. As a rule-of-thumb, if you compute things correctly, you will get EE numbers in the range of 10 kbit/Joule to 10 Mbit/Joule.

Noise power depends on the bandwidth

The noise power is proportional to the communication bandwidth. When working with a normalized noise power, it is easy to forget that a given SNR value only applies for one particular value of the bandwidth.

Some papers normalize the noise variance and channel gain, but then make the SNR equal to the unnormalized transmit power (measured in W). This may greatly overestimate the SNR, but the achievable rates might still be in the reasonable range if you operate the system in an interference-limited regime.

Some papers contain an alternative EE definition where the spectral efficiency (bit/s/Hz) is divided by the power consumption (Joule/s). This leads to the alternative EE unit bit/Joule/Hz. This definition is not formally wrong, but gives the misleading impression that one can multiply the EE value with any choice of bandwidth to get the desired number of bit/Joule. That is not the case since the SNR only holds for one particular value of the bandwidth.

Knowing when to normalize

In summary, even if it is convenient to normalize system parameters in wireless communications, you should only do it if you understand when normalization is possible and when it is not. Otherwise, you can make embarrassing mistakes, such as submitting a paper where the results are six orders of magnitude wrong. And, unfortunately, there are several such papers that have been published and these create a bad circle by tricking others into making the same mistakes.

Further Differences Between Massive MIMO for Sub-6 GHz and mmWave

One of the most read posts on this blog is Six differences between Massive MIMO for sub-6 GHz and mmWave, where we briefly outlined the key differences between how the Massive MIMO technology would be implemented and utilized in different frequency bands. Motivated by the great feedback and interest in this topic, we joined forces with Liesbet Van der Perre and Stefano Buzzi to write a full-length magazine article. It has recently been submitted to IEEE Wireless Communications and a pre-print can be found on ArXiv.org:

Massive MIMO in Sub-6 GHz and mmWave: Physical, Practical, and Use-Case Differences

How Distortion from Nonlinear Massive MIMO Transceivers is Radiated Spatially

While the research literature is full of papers that design wireless communication systems under constraints on the maximum transmitted power, in practice, it might be constraints on the equivalent isotropically radiated power (EIRP) or the out-of-band radiation that limit the system operation.

Christopher Mollén recently defended his doctoral thesis entitled High-End Performance with Low-End Hardware: Analysis of Massive MIMO Base Station Transceivers. In the following video, he explains the basics of how the non-linear distortion from Massive MIMO transceivers is radiated in space.

A Basic Way to Quantify the Massive MIMO Gain

Several people have recently asked me for a simple way to quantify the spectral efficiency gains that we can expect from Massive MIMO. In theory, going from 4 to 64 antennas is just a matter of changing a parameter value. However, many practical issues need be solved to bring the technology into reality and the solutions might only be developed if we can convince ourselves that the gains are sufficiently large.

While there is no theoretical upper limit on how spectrally efficient Massive MIMO can become when adding more antennas, we need to set some reasonable first goals.  Currently, many companies are trying to implement analog beamforming in a cost-efficient manner. That will allow for narrow beamforming, but not spatial multiplexing.

By following the methodology in Section 3.3.3 in Fundamentals of Massive MIMO, a simple formula for the downlink spectral efficiency is:

(1)   \begin{equation*}K \cdot \left( 1 - \frac{K}{\tau_c} \right) \cdot \log_2 \left( 1+ \frac{ c_{ \textrm{\tiny CSI}} \cdot M \cdot \frac{\mathrm{SNR}}{K}}{\mathrm{SNR}+ 1} \right)\end{equation*}

where $M$ is the number of base-station antennas, $K$ is the number of spatially multiplexed users, $c_{ \textrm{\tiny CSI}}  \in [0,1]$ is the quality of the channel estimates, and $\tau_c$ is the number of channel uses per channel coherence block. For simplicity, I have assumed the same pathloss for all the users. The variable $\mathrm{SNR}$ is the nominal signal-to-noise ratio (SNR) of a user,  achieved when $M=K=1$. Eq. (1) is a rigorous lower bound on the sum capacity, achieved under the assumptions of maximum ratio precoding, i.i.d. Rayleigh fading channels, and equal power allocation. With better processing schemes, one can achieve substantially higher performance.

To get an even simpler formula, let us approximate (1) as

(2)   \begin{equation*}K \log_2 \left( 1+ \frac{ c_{ \textrm{\tiny CSI}} M}{K} \right)\end{equation*}

by assuming a large channel coherence and negligible noise.

What does the formula tell us?

If we increase $M$ while $K$ is fixed , we will observe a logarithmic improvement in spectral efficiency. This is what analog beamforming can achieve for $K=1$ and, hence, I am a bit concerned that the industry will be disappointed with the gains that they will obtain from such beamforming in 5G.

If we instead increase $M$ and $K$ jointly, so that  $M/K$ stays constant, then the spectral efficiency will grow linearly with the number of users. Note that the same transmit power is divided between the $K$ users, but the power-reduction per user is compensated by increasing the array gain $M$ so that the performance per user remains the same.

The largest gains come from spatial multiplexing

To give some quantitative numbers, consider a baseline system with $M=4$ and $K=1$ that achieves 2 bit/s/Hz. If we increase the number of antennas to $M=64$, the spectral efficiency will become 5.6 bit/s/Hz. This is the gain from beamforming. If we also increase the number of users to $K=16$ users, we will get 32 bit/s/Hz. This is the gain from spatial multiplexing. Clearly, the largest gains come from spatial multiplexing and adding many antennas is a necessary way to facilitate such multiplexing.

This analysis has implicitly assumed full digital beamforming. An analog or hybrid beamforming approach may achieve most of the array gain for $K=1$. However, although hybrid beamforming allows for spatial multiplexing, I believe that the gains will be substantially smaller than with full digital beamforming.

Are Link Simulations Needed Anymore?

One reason for why capacity lower bounds are so useful is that they are accurate proxies for link-level performance with modern coding. But this fact, well known to information and coding theorists, is often contested by practitioners. I will discuss some possible reasons for that here.

The recipe is to compute the capacity bound, and depending on the code blocklength, add a dB or a few, to the required SNR. That gives the link performance prediction. The coding literature is full of empirical results, showing how far from capacity a code of a given block length is for the AWGN channel, and this gap is usually not extremely different for other channel models – although, one should always check this.

But there are three main caveats with this:

  1. First, the capacity bound, or the “SINR” that it often contains, must be information-theoretically correct. A great deal of papers get this wrong. Emil explained in his blog post last week some common errors. The recommended approach is to map the channel onto one of the canonical cases in Figure 2.9 in Fundamentals of Massive MIMO, verify that the technical conditions are satisfied, and use the corresponding formula.
  2. When computing expressions of the type E[log(1+”SINR”)], then the average should be taken over all quantities that are random within the duration of a codeword. Typically, this means averaging over the randomness incurred by the noise, channel estimation errors, and in many cases the small-scale fading. All other parameters must be kept fixed. Typically, user positions, path losses, shadow fading, scheduling and pilot assignments, are fixed, so the expectation is conditional on those. (Yet, the interference statistics may vary substantially, if other users are dropping in and out of the system.) This in turn means that many “drops” have to be generated, where these parameters are drawn at random, and then CDF curves with respect to that second level of randomness needs be computed (numerically).Think of the expectation E[log(1+”SINR”)] as a “link simulation”. Every codeword sees many independent noise realizations, and typically small-scale fading realizations, but the same realization of the user positions. Also, often, neat (and tight) closed-form bounds on E[log(1+”SINR”)] are available.
  3. Care is advised when working with relatively short blocks (less than a few hundred bits) and at rates close to the constrained capacity with the foreseen modulation format. In this case, many of the “standard” capacity bounds become overoptimistic.As a rule of thumb, compare the capacity of an AWGN channel with the constrained capacity of the chosen modulation at the spectral efficiency of interest, and if the gap is small, the capacity bounds will be useful. If not, then reconsider the choice of modulation format! (See also homework problem 1.4.)

How far are the bounds from the actual capacity typically? Nobody knows, but there are good reasons to believe they are extremely close. Here (Figure 1) is a nice example that compares a decoder that uses the measured channel likelihood, instead of assuming a Gaussian (which is implied by the typical bounding techniques). From correspondence with one of the authors: “The dashed and solid lines are the lower bound obtained by Gaussianizing the interference, while the circles are the rate achievable by a decoder exploiting the non-Gaussianity of the interference, painfully computed through days-long Monte-Carlo. (This is not exactly the capacity, because the transmit signals here are Gaussian, so one could deviate from Gaussian signaling and possibly do slightly better — but the difference is imperceptible in all the experiments we’ve done.)”

Concerning Massive MIMO and its capacity bounds, I have met for a long time with arguments that these capacity formulas aren’t useful estimates of actual performance. But in fact, they are: In one simulation study we were less than one dB from the capacity bound by using QPSK and a standard LDPC code (albeit with fairly long blocks). This bound accounts for noise and channel estimation errors. Such examples are in Chapter 1 of Fundamentals of Massive MIMO, and also in the ten-myth paper:

(I wrote the simulation code, and can share it, in case anyone would want to reproduce the graphs.)

So in summary, while capacity bounds are sometimes done wrong; when done right they give pretty good estimates of actual link performance with modern coding.

(With thanks to Angel Lozano for discussions.)

Relax and Conquer

Many radio resource allocation tasks are combinatorial in nature. It might be to associate a user equipment (UE) to a base station (BS) from a set of BSs, to select a set of time-frequency resources for transmission to a particular UE, or to assign pilot sequences to a set of users. The unfortunate thing with discrete combinatorial optimization problems is that the number of combinations grows very rapidly with the number of UEs and the number of discrete options that can be made for each of them. For example, suppose there are K UEs and you have to pick one out of D options for each of them, then there are DK different combinations. Hence, the worst-case computational complexity grows exponentially with K.

Interestingly, some radio resource allocation problems that appear to have exponential complexity can be relaxed to a form that is much easier to solve – this is what I call “relax and conquer”. In optimization theory, relaxation means that you widen the set of permissible solutions to the problem, which in this context means that the discrete optimization variables are replaced with continuous optimization variables. In many cases, it is easier to solve optimization problems with variables that take values in continuous sets than problems with a mix of continuous and discrete variables.

A basic example of this principle arises when communicating over a single-user MIMO channel. To maximize the achievable rate, you first need to select how many data streams to spatially multiplex and then determine the precoding and power allocation for these data streams. This appears to be a mixed-integer optimization problem, but Telatar showed in his seminal paper that it can be solved by the water-filling algorithm. More precisely, you relax the problem by assuming that the maximum number of data streams are transmitted and then you let the solution to a convex optimization problem determine how many of the data streams that are assigned non-zero power; this is the optimal number of data streams. Despite the relaxation, the global optimum to the original problem is obtained.

There are other, less known examples of the “relax and conquer” method. Some years ago, I came across the paper “Jointly optimal downlink beamforming and base station assignment“, which has received much less attention than it deserves. The UE-BS association problem, considered in this paper, is non-trivial since some BSs might have many more UEs in their vicinity than other BSs. Nevertheless, the paper shows that one can solve the problem by first relaxing it so that all BSs transmit to all the UEs. The author formulates a relaxed optimization problem where the beamforming vectors (including power allocation) are selected to satisfy each UEs’ SINR constraint, while minimizing the total transmit power. This problem is solved by convex optimization and, importantly, the optimal solution is always such that each UE only receives a non-zero signal power from one of the BSs. Hence, the seemingly difficult combinatorial UE-BS association problem is relaxed to a convex optimization problem, which provides the optimal solution to the original problem!

I have reused this idea in several papers. The first example is “Massive MIMO and Small Cells: Improving Energy Efficiency by Optimal Soft-cell Coordination“, which considers a similar setup but with a maximum transmit power per BS. The consequence of including this practical constraint is that it might happen that some UEs are served by multiple BSs at the optimal solution. These BSs send different messages to the UE, which decode them by successive interference cancelation, thus the solution is still practically achievable.

One practical weakness with the two aforementioned papers is that they take small-scale fading realizations into account in the optimization, thus the problem must be solved once per coherence interval, requiring extremely high computational power. More recently, in the paper “Joint Power Allocation and User Association Optimization for Massive MIMO Systems“, we applied the same “relax and conquer” method to Massive MIMO, but targeting lower bounds on the downlink ergodic capacity. Since the capacity bounds are valid as long as the channel statistics are fixed (and the same UEs are active), our optimized BS-UE association can be utilized for a relatively long time period. This makes the proposed algorithm practically relevant, in contrast to the prior works that are more of academic interest.

Another example of the “relax and conquer” method is found in the paper “Joint Pilot Design and Uplink Power Allocation in Multi-Cell Massive MIMO Systems”. We consider the assignment of orthogonal pilot sequences to users, which appears to be a combinatorial problem. Instead of assigning a pilot sequence to each UE and then allocate power, we relax the problem by allowing each user to design its own pilot sequence, which is a linear combination of the original orthogonal sequences. Hence, a pair of UEs might have partially overlapping sequences, instead of either identical or orthogonal sequences (as in the original problem). The relaxed problem even allows for pilot contamination within a cell. The sequences are then optimized to maximize the max-min performance. The resulting problem is non-convex, but the combinatorial structure has been relaxed so that there are only optimization variables from continuous sets. A local optimum to the joint pilot assignment and power control problem is found with polynomial complexity, using standard methods from the optimization literature. The optimization might not lead to a set of orthogonal pilot sequences, but the solution is practically implementable and gives better performance.