Multi-user MIMO (MU-MIMO) is not a new technology, but the basic concept of using multi-antenna base stations (BSs) to serve a multitude of users has been around since the late 1980s.

I sometimes get the question “Isn’t Massive MIMO just MU-MIMO with more antennas?” My answer is no, because the key benefit of Massive MIMO over conventional MU-MIMO is not only about the number of antennas. Marzetta’s Massive MIMO concept is the way to deliver the theoretical gains of MU-MIMO under practical circumstances. To achieve this goal, we need to acquire accurate channel state information, which in general can only be done by exploiting uplink pilots and channel reciprocity in TDD mode. Thanks to the channel hardening and favorable propagation phenomena, one can also simplify the system operation in Massive MIMO.

Six key differences between conventional MU-MIMO and Massive MIMO are provided below.

Conventional MU-MIMO

Massive MIMO

Relation between number of BS antennas (M) and users (K)

M ≈ K and both are small (e.g., below 10)

M ≫ K and both can be large (e.g., M=100 and K=20).

Duplexing mode

Designed to work with both TDD and FDD operation

Designed for TDD operation to exploit channel reciprocity

Channel acquisition

Mainly based on codebooks with set of predefined angular beams

Based on sending uplink pilots and exploiting channel reciprocity

Link quality after precoding/combining

Varies over time and frequency, due to frequency-selective and small-scale fading

Almost no variations over time and frequency, thanks to channel hardening

Resource allocation

The allocation must change rapidly to account for channel quality variations

The allocation can be planned in advance since the channel quality varies slowly

Cell-edge performance

Only good if the BSs cooperate

Cell-edge SNR increases proportionally to the number of antennas, without causing more inter-cell interference

Footnote: TDD stands for time-division duplex and FDD stands for frequency-division duplex.

I’ve got an email with this question last week. There is not one but many possible answers to this question, so I figured that I write a blog post about it.

One answer is that beamforming and precoding are two words for exactly the same thing, namely to use an antenna array to transmit one or multiple spatially directive signals.

Another answer is that beamforming can be divided into two categories: analog and digital beamforming. In the former category, the same signal is fed to each antenna and then analog phase-shifters are used to steer the signal emitted by the array. This is what a phased array would do. In the latter category, different signals are designed for each antenna in the digital domain. This allows for greater flexibility since one can assign different powers and phases to different antennas and also to different parts of the frequency bands (e.g., subcarriers). This makes digital beamforming particularly desirable for spatial multiplexing, where we want to transmit a superposition of signals, each with a separate directivity. It is also beneficial when having a wide bandwidth because with fixed phases the signal will get a different directivity in different parts of the band. The second answer to the question is that precoding is equivalent to digital beamforming. Some people only mean analog beamforming when they say beamforming, while others use the terminology for both categories.

A third answer is that beamforming refers to a single-user transmission with one data stream, such that the transmitted signal consists of one main-lobe and some undesired side-lobes. In contrast, precoding refers to the superposition of multiple beams for spatial multiplexing of several data streams.

A fourth answer is that beamforming refers to the formation of a beam in a particular angular direction, while precoding refers to any type of transmission from an antenna array. This definition essentially limits the use of beamforming to line-of-sight (LoS) communications, because when transmitting to a non-line-of-sight (NLoS) user, the transmitted signal might not have a clear angular directivity. The emitted signal is instead matched to the multipath propagation so that the multipath components that reach the user add constructively.

A fifth answer is that precoding consists of two parts: choosing the directivity (beamforming) and choosing the transmit power (power allocation).

I used to use the word beamforming in its widest meaning (i.e., the first answer), as can be seen in my first book on the topic. However, I have since noticed that some people have a more narrow or specific interpretation of beamforming. Therefore, I nowadays prefer only talking about precoding. In Massive MIMO, I think that precoding is the right word to use since what I advocate is a fully digital implementation, where the phases and powers can be jointly designed to achieve high capacity through spatial multiplexing of many users, in both NLoS and LOS scenarios.

The “Massive MIMO” name is currently being used for both sub-6 GHz and mmWave applications. This can be very confusing because the multi-antenna technology has rather different characteristics in these two applications.

The sub-6 GHz spectrum is particularly useful to provide network coverage, since the pathloss and channel coherence time are relatively favorable at such frequencies (recall that the coherence time is inversely proportional to the carrier frequency). Massive MIMO at sub-6 GHz spectrum can increase the efficiency of highly loaded cells, by upgrading the technology at existing base stations. In contrast, the huge available bandwidths in mmWave bands can be utilized for high-capacity services, but only over short distances due to the severe pathloss and high noise power (which is proportional to the bandwidth). Massive MIMO in mmWave bands can thus be used to improve the link budget.

Six key differences between sub-6 GHz and mmWave operation are provided below:

Sub-6 GHz

mmWave

Deployment scenario

Macro cells with support for high user mobility

Small cells with low user mobility

Number of simultaneous users per cell

Up to tens of users, due to the large coverage area

One or a few users, due to the small coverage area

Main benefit from having many antennas

Spatial multiplexing of tens of users, since the array gain and ability to separate users spatially lead to great spectral efficiency

Beamforming to a single user, which greatly improves the link budget and thereby extends coverage

Channel characteristics

Rich multipath propagation

Only a few propagation paths

Spectral efficiency and bandwidth

High spectral efficiency due to the spatial multiplexing, but small bandwidth

Low spectral efficiency due to few users, large pathloss, and large noise power, but large bandwidth

Transceiver hardware

Fully digital transceiver implementations are feasible and have been prototyped

Hybrid analog-digital transceiver implementations are needed, at least in the first products

Since Massive MIMO was initially proposed by Tom Marzetta for sub-6 GHz applications, I personally recommend to use the “Massive MIMO” name only for that use case. One can instead say “mmWave Massive MIMO” or just “mmWave” when referring to multi-antenna technologies for mmWave bands.

Prof. Erik. G. Larsson gave a 2.5 hour tutorial on the fundamentals of Massive MIMO, which is highly recommended for anyone learning this topic. You can then follow up by reading his book with the same topic.

When you have viewed Erik’s introduction, you can learn more about the state-of-the-art signal processing schemes for Massive MIMO from another talk at the summer school. Dr. Emil Björnson gave a 3 hour tutorial on this topic:

The received signal power is proportional to the number of antennas in Massive MIMO systems. This property is known as the array gain and it can basically be utilized in two different ways.

One option is to let the signal power become times larger than in a single-antenna reference scenario. The increase in SNR will then lead to higher data rates for the users. The gain can be anything from bit/s/Hz to almost negligible, depending on how interference-limited the system is. Another option is to utilize the array gain to reduce the transmit power, to maintain the same SNR as in the reference scenario. The corresponding power saving can be very helpful to improve the energy efficiency of the system.

In the uplink, with single-antenna user terminals, we can choose between these options. However, in the downlink, we might not have a choice. There are strict regulations on the permitted level of out-of-band radiation in practical systems. Since Massive MIMO uses downlink precoding, the transmitted signals from the base station have a stronger directivity than in the single-antenna reference scenario. The signal components that leak into the bands adjacent to the intended frequency band will then also be more directive.

For example, consider a line-of-sight scenario where the precoding creates an angular beam towards the intended user (as illustrated in the figure below). The out-of-band radiation will then get a similar angular directivity and lead to larger interference to systems operating in adjacent bands, if their receivers are close to the user (as the victim in the figure below). To counteract this effect, our only choice might be to reduce the downlink transmit power to keep the worst-case out-of-band radiation constant.

Another alternative is that the regulations are made more flexible with respect to precoded transmissions. The probability that a receiver in an adjacent band is hit by an interfering out-of-band beam, such that the interference becomes times larger than in the reference scenario, reduces with an increasing number of antennas since the beams are narrower. Hence, if one can allow for beamformed out-of-band interference if it occurs with sufficiently low probability, the array gain in Massive MIMO can still be utilized to increase the SNRs. A third option will then be to (partially) reduce the transmit power to also allow for relaxed linearity requirements of the hardware.

I am borrowing the title from a column written by my advisor two decades ago, in the array signal processing gold rush era.

Asymptotic analysis is a popular tool within statistical signal processing (infinite SNR or number of samples), information theory (infinitely long blocks) and more recently, [massive] MIMO wireless communications (infinitely many antennas).

Some caution is strongly advisable with respect to the latter. In fact, there are compelling reasons to avoid asymptotics in the number of antennas altogether:

First, elegant, rigorous and intuitively comprehensible capacity bound formulas are available in closed form.
The proofs of these expressions use basic random matrix theory, but no asymptotics at all.

Second, the notion of “asymptotic limit” or “asymptotic behavior” helps propagate the myth that Massive MIMO somehow relies on asymptotics or “infinite” numbers (or even exorbitantly large numbers) of antennas.

Third, many approximate performance results for Massive MIMO (particularly “deterministic equivalents”) based on asymptotic analysis are complicated, require numerical evaluation, and offer little intuitive insight. (And, the verification of their accuracy is a formidable task.)

Finally, and perhaps most importantly, careless use of asymptotic arguments may yield erroneous conclusions. For example in the effective SINRs in multi-cell Massive MIMO, the coherent interference scales with M (number of antennas) – which yields the commonly held misconception that coherent interference is the main impairment caused by pilot contamination. But in fact, in many relevant circumstances it is not (see case studies here): the main impairment for “reasonable” values of M is the reduction in coherent beamforming gain due to reduced estimation quality, which in turn is independent of M.

In addition, the number of antennas beyond which the far-field assumption is violated is actually smaller than what one might first think (problem 3.14).

Many researchers have analyzed pilot contamination over the six years that have passed since Marzetta uncovered its importance in Massive MIMO systems. We now have a quite good understanding of how to mitigate pilot contamination. There is a plethora of different approaches, whereof many have complementary benefits. If pilot contamination is not mitigated, it will both reduce the array gain and create coherent interference. Some approaches mitigate the pilot interference in the channel estimation phase, while some approaches combat the coherent interference caused by pilot contamination. In this post, I will try to categorize the approaches and point to some key references.

Interference-rejecting precoding and combining

Pilot contamination makes the estimate of a desired channel correlated with the channel from pilot-sharing users in other cells. When these channel estimates are used for receive combining or transmit precoding, coherent interference typically arise. This is particularly the case if maximum ratio processing is used, because it ignores the interference. If multi-cell MMSE processing is used instead, the coherent interference is rejected in the spatial domain. In particular, recent work from Björnson et al. (see also this related paper) have shown that there is no asymptotic rate limit when using this approach, if there is just a tiny amount of spatial correlation in the channels.

Data-aided channel estimation

Another approach is to “decontaminate” the channel estimates from pilot contamination, by using the pilot sequence and the uplink data for joint channel estimation. This have the potential of both improving the estimation quality (leading to a stronger desired signal) and reducing the coherent interference. Ideally, if the data is known, data-aided channel estimation increase the length of the pilot sequences to the length of the uplink transmission block. Since the data is unknown to the receiver, semi-blind estimation techniques are needed to obtain the channel estimates. Ngo et al. and Müller et al. did early works on pilot decontamination for Massive MIMO. Recent work has proved that one can fully decontaminate the estimates, as the length of the uplink block grows large, but it remains to find the most efficient semi-blind decontamination approach for practical block lengths.

Pilot assignment and dimensioning

Which subset of users that share a pilot sequence makes a large difference, since users with large pathloss differences and different spatial channel correlation cause less contamination to each other. Recall that higher estimation quality both increases the gain of the desired signal and reduces the coherent interference. Increasing the number of orthogonal pilot sequences is a straightforward way to decrease the contamination, since each pilot can be assigned to fewer users in the network. The price to pay is a larger pilot overhead, but it seems that a reuse factor of 3 or 4 is often suitable from a sum rate perspective in cellular networks. The joint spatial division and multiplexing (JSDM) provides a basic methodology to take spatial correlation into account in the pilot reuse patterns.

Alternatively, pilot sequences can be superimposed on the data sequences, which gives as many orthogonal pilot sequences as the length of the uplink block and thereby reduces the pilot contamination. This approach also removes the pilot overhead, but it comes at the cost of causing interference between pilot and data transmissions. It is therefore important to assign the right fraction of power to pilots and data. A hybrid pilot solution, where some users have superimposed pilots and some have conventional pilots, may bring the best of both worlds.

If two cells use the same subset of pilots, the exact pilot-user assignment can make a large difference. Cell-center users are generally less sensitive to pilot contamination than cell-edge users, but finding the best assignment is a hard combinatorial problem. There are heuristic algorithms that can be used and also an optimization framework that can be used to evaluate such algorithms.

Multi-cell cooperation

A combination of network MIMO and macro diversity can be utilized to turn the coherent interference into desired signals. This approach is called pilot contamination precoding by Ashikhmin et al. and can be applied in both uplink and downlink. In the uplink, the base stations receive different linear combinations of the user signals. After maximum ratio combining, the coefficients in the linear combinations approach deterministic numbers as the number of antennas grow large. These numbers are only non-zero for the pilot-sharing users. Since the macro diversity naturally creates different linear combinations, the base stations can jointly solve a linear system of equations to obtain the transmitted signals. In the downlink, all signals are sent from all base stations and are precoded in such a way that the coherent interference sent from different base stations cancel out. While this is a beautiful approach for mitigating the coherent interference, it relies heavily on channel hardening, favorable propagation, and i.i.d. Rayleigh fading. It remains to be shown if the approach can provide performance gains under more practical conditions.