# Applying compressive sensing to TEM video: a substantial frame rate increase on any camera

- Andrew Stevens
^{1, 2}Email authorView ORCID ID profile, - Libor Kovarik
^{1}, - Patricia Abellan
^{1}, - Xin Yuan
^{2}, - Lawrence Carin
^{2}and - Nigel D. Browning
^{1}

**1**:10

https://doi.org/10.1186/s40679-015-0009-3

© Stevens et al. 2015

**Received: **11 June 2015

**Accepted: **11 June 2015

**Published: **13 August 2015

## Abstract

One of the main limitations of imaging at high spatial and temporal resolution during *in-situ* transmission electron microscopy (TEM) experiments is the frame rate of the camera being used to image the dynamic process. While the recent development of direct detectors has provided the hardware to achieve frame rates approaching 0.1 ms, the cameras are expensive and must replace existing detectors. In this paper, we examine the use of coded aperture compressive sensing (CS) methods to increase the frame rate of any camera with simple, low-cost hardware modifications. The coded aperture approach allows multiple sub-frames to be coded and integrated into a single camera frame during the acquisition process, and then extracted upon readout using statistical CS inversion. Here we describe the background of CS and statistical methods in depth and simulate the frame rates and efficiencies for *in-situ* TEM experiments. Depending on the resolution and signal/noise of the image, it should be possible to increase the speed of any camera by more than an order of magnitude using this approach.

**Mathematics Subject Classification:** (2010) 94A08 · 78A15

## Keywords

## Background

*In-situ* transmission electron microscopy (TEM) has established itself as a very powerful analytical technique for its ability to provide a direct insight into the nature of materials under a broad range of environmental conditions. With the recent development of a wide range of *in-situ* TEM stages and dedicated environmental TEM, it is now possible to image materials under high-temperature, gas, and liquid conditions, as well as in other complex electrochemical, optical, and mechanical settings [1–4]. In many of these applications, it is often critical to capture the dynamic evolution of the microstructure with a very high spatial and temporal resolution. While enormous developments in electron optics and the design of *in-situ* cells have been made, leading to significant improvements in achievable resolution [5–7], there still exist many challenges associated with capturing dynamic processes with high temporal-resolution.

At the present time, a majority of *in-situ* TEM video capture is performed with charge-coupled device (CCD) cameras. High-performance commercially available CCD cameras have readout rates in the range of a few tens of MB/s [8], which under appropriate binning conditions can provide video acquisition rates (∼30 ms acquisition rate) [8]. Important progress has been made recently by the introduction of the direct detection camera (DDC), which utilizes CMOS technology, and thus provides an order of magnitude increase of the readout rate—it has been demonstrated that these cameras can be operated in the ms range [9]. Importantly, DDCs provide a new approach by directly recording the incoming electrons without the use of a scintillator. By avoiding the electron-to-light conversion, the DDC achieves unprecedented sensitivity. While improving temporal resolution, the DDC also enables electron dose reduction, another key challenge for *in-situ* TEM imaging. The limitation in implementing this technology (or any other hardware-based acquisition system), however, is that as the frame rates increase, reading out the images becomes a challenge—the issue then becomes a data transfer problem rather than an electron detection problem.

CS combines sensing and compression in one operation, and thus provides an approach that could further improve the temporal resolution of any detector (both CCDs and DDCs). Because the signal is measured in a compressive manner, fewer total measurements are required; which, when applied to TEM video capture, improve the acquisition speed and reduces the electron dose rate. CS is a recent concept and has come to the forefront due the seminal works of Candès et al. [10] and Donoho [11]. Since those publications, there has been enormous growth in the application of CS and development of CS variants. The concept of CS has also been recently applied to electron tomography [12] and reduction of electron dose in scanning transmission electron microscopy (STEM) imaging [13].

The approach proposed in this paper increases the effective frame rate of any camera by adding a mask/aperture between the sample and the imaging sensor. The mask is moving at a fixed rate so that a sequence of coded images is integrated into a single frame on the sensor. Once the experiment has concluded, the data can be decompressed by the algorithm presented here or by other methods such as GAP [14] or TwIST [15]. The approach presented here is also useful for imaging dose-limited materials. A traditional camera would capture a single image that has integrated a sequence of undamaged and damaged images, whereas CS-TEM would capture a sequence of coded images that can be reconstructed to determine the precise onset of beam damage.

In addition to presenting new results, this article is meant to serve as a general introduction to CS and also to the methods behind the algorithm presented herein, which is fundamentally different from previous approaches. Many of the references are tutorials and reviews (e.g., [16–21]), while others highlight recent developments (e.g., [22–24]). We hope that the descriptions and illustrations provide a starting point for microscopists to enter the related literature.

Before presenting the experimental results, CS theory and a probabilistic recovery approach are reviewed. Next, the *inexpensive* microscope modifications needed to achieve this imaging approach are outlined. In the experiments section, simulated recovery results are shown for palladium nanoparticle oxidation and silver nanoparticle coalescence. Finally, for both simulations, image degradation is quantified as a function of compression level, and an estimate for a reasonable compression level is given.

## Methods

CS has quickly become one of the most important discoveries in the digital age. The theory of CS, and numerous implementations, shows that a signal can be compressed at the time of measurement and accurately recovered at a later time in software. In imaging applications, the compression can be applied spatially to reduce the number of pixels that need to be measured. This can lead to an increase in sensing speed, a decrease in data size, and dose reduction in the case of electron microscopy [13]. In video applications, the time dimension can be compressed. By compressing the sensed data in time, the total frame rate of a camera system is multiplied by integrating a sequence of coded images into a single frame from the camera. In this section, the statistical models and microscope hardware for an approach to compressively sensing and recovering videos will be described.

### CS background

^{1}

^{2}, but in practical applications and in this paper the sensing scheme will simply omit pixels randomly. The measurements are linear so they can be represented as a matrix Φ and the true signal as a vector x (flattened from the two-dimensional image). Expressed mathematically,

*Q*is the dimension of compressed measurement, \(\boldsymbol {y}\in \mathbb {R}^{Q}\), and

*P*is the dimension of the signal \(\boldsymbol {x}\in \mathbb {R}^{P}\). The inverse problem of recovering x from y is underdetermined, so further assumptions must be imposed to guarantee a solution.

_{1},…,y

_{ N }}, a set of sensing matrices, {Φ

_{1},…,Φ

_{ N }}, and a set of signals {x

_{1},…,x

_{ N }},

^{3}with the index

*i*added to Eq. (1),

_{1},…,x

_{ N }} are patches from the full image. Usually the patches are overlapping so that each pixel has a corresponding patch, except for the right and bottom regions of the image. Figure 4 is an illustration of the patches and how they overlap. The sensing matrices, measurements, and signals are all obtained by extracting patches from the corresponding full-size images. In the case of the signal, the CS algorithm will recover the patches x

_{ i }and then the patches are put back together and the overlapping pixels are averaged.

#### Dictionary learning and sparse-CS

^{4}from the data, which is referred to as the dictionary. The learned dictionary allows every patch to be represented by a weighted sum of a few

^{5}dictionary

*elements*or vectors (assuming overcompleteness). Because the overcomplete dictionary model enforces the use of only a few basis patches, the data is sparse under the dictionary. This approach is advantageous because the learned dictionary can guarantee a sparse representation, whereas choosing a Fourier basis, for example, does not guarantee sparsity. Two learned dictionaries are depicted in Fig. 5.

The first algorithm for dictionary learning was based on human vision [26]. More recently, a much faster variant was proposed, the *K*-SVD algorithm [27], and Mairal et al. have further improved the *K*-SVD-based approach and given a thorough review of dictionary learning [16]. Another approach, a part of the approach in this paper, is beta-process factor analysis (BPFA) [25]. BPFA has been used in compressive sensing of STEM images [13]. The relationship between optimization/maximum likelihood (*K*-SVD) approaches and Bayesian/sampling (BPFA) approaches is discussed after the details of the BPFA model are introduced.

Another approach that has been applied in image restoration tasks, and specifically to STEM image restoration is the non-local means algorithm [16, 28]. Non-local means uses all of the image patches simultaneously to find a reweighting of the central pixel of each patch. Sparse representation, on the other hand, finds a subset of elements from a dictionary and the corresponding weights to reconstruct an entire patch (dictionary learning simultaneously finds a dictionary). Non-local means is a kernel density estimation method, and when employing the Gaussian kernel, it is closely related to the GMM, which will be explained in detail.

_{1},…,d

_{ K }] are the dictionary elements. The number of non-zero elements in w

_{ i }is much less than the size of the basis

*K*(number of columns in D), nnz(w

_{ i })≪

*K*. The choice of basis is important since it should induce sparsity in the w

_{ i }. The issue of the CS inverse being underdetermined is alleviated by finding solutions w

_{ i }that are also sparse. In practical applications, the noise ε

_{ i }must also be considered

In the Fourier example above, the signal is recoverable as long as the noise amplitude is not larger than the amplitude of the smallest signal component. The same idea holds for sparse CS.

There are a few applications of sparse-CS in electron microscopy. The first was using *ℓ*
_{1} and total variation (TV) regularization to simulate compressive sensing on STEM images and speculate about the application to STEM tomography [12]. It has also been shown that TV regularization is useful in electron tomography [29]. Tomography is closely related to CS, and even more so in electron tomography where it is common to have a missing *wedge* of data due to the inability to acquire all of the projections. More recently, BPFA has been applied to STEM compressive sensing [13], and an optimization approach is reported for compressed STEM imaging and tomography in [24].

#### Manifold-CS

A more recent approach in CS is to assume that the signal is a manifold embedded in a high-dimensional space [30]. Essentially, the intrinsic dimension of the data is smaller than the ambient dimension. Manifold-CS enjoys higher accuracy because the model is more flexible than sparse-CS [31] (sparse-CS is a special case of manifold-CS). A simple example of a manifold is a tube or a sheet through a three-dimensional space that is not self-intersecting. The concept of two-dimensional materials, such as graphene, is similar to the concept of a manifold in an *N*-dimensional space. Another example of a manifold is face images [32]. As the face image changes from happy to angry, as the lighting changes from light to dark, or as the face turns from right to left, the coordinates of the data move along constrained sections of the ambient space—the face manifold. This is not the same as moving along the principal dimensions defined by a principal components analysis (PCA). Manifold approaches learn local structures, whereas PCA-like methods learn global structures.

*N*-dimensional space.

^{6}There is no specific structure required for the covering sets, so they can be assumed to be Gaussian, i.e., ellipsoids. Figure 6 shows the covering of a one-dimensional manifold (a curve) through a two-dimensional space. It can be seen that in order to use this approach the centers, orientations, and radii of the ellipsoids must be determined. Furthermore, any point on the manifold can be approximated arbitrarily well by this method simply by increasing the number of ellipsoids and also shrinking them to have a tighter fit. Statistically, having too many ellipsoids can cause undesirable overfitting effects, and mathematically, the number of ellipsoids (if it can be determined) is closely related to the manifold condition number.

*x*- and

*y*-axes. The CS inversion process—recovering the signal from compressed measurements—must take compressed measurements and map them back to the signal manifold. The model parameters learned by the MFA make this feasible by constraining the inversion procedure to the manifold.

One difficulty with the standard version of the GMM and factor analysis is that the number of clusters and dimension of the basis must be set *a priori*. Cross-validation can be employed to determine the parameter settings, but it requires splitting the data into several sections and learning the model on each section. Bayesian nonparametrics [34] offers a solution to this problem by including these parameters in the inference of the model. The rest of this section will describe the mathematical details of the GMM, factor analysis, their nonparametric extensions, the MFA, and a description of the hardware needed for a TEM to collect data that can be inverted by CS-MFA.

### Gaussian mixture model

*K*-means. But the GMM goes beyond the primary goal by also finding the uncertainty parameters in the cluster assignments. In Fig. 8, several points lie in the overlap of two ellipses, with

*K*-means they would simply be assigned to the nearest ellipse. In some applications, it may be important to know how strongly the algorithm believes a data point belongs to a cluster; this information can be inferred with the GMM.

^{7}In the GMM, the probability of a data point given the means

*μ*

_{1},…,

*μ*

_{ T }, precisions (inverse variances),

*τ*

_{1},…

*τ*

_{ T }, and cluster weights

*λ*

_{1},…,

*λ*

_{ T }, is

*T*is the number of clusters and

*t*is a specific cluster number. This says that the data point could lie in any of the clusters, so the probability is the sum over the probability of

*x*

_{ i }being in each cluster. The rest of the hierarchy is defined as

where *t*(*i*) is the cluster number of the *i*th data point and \(\mathcal {G}(\cdot,\cdot)\) is the gamma distribution, the conjugate prior for the precision of a normal distribution. The weight *λ*
_{
i
} determines the proportion of the data in cluster *i*. In Eq. (7), the cluster is known, so the probability is simply defined by the statistics of that cluster. The mean and precision of each cluster are given by Eqs. (8)–(9). The hyperparameters *a*,*b*,*c*,*d* are usually determined using the mean and precision of the entire data set. The cluster proportions are sampled jointly from a symmetric Dirichlet distribution in Eq. (10). The Dirichlet distribution is a multivariate extension of the beta distribution, where each *λ*
_{
t
}∈[0,1] and \(\sum _{t=1}^{T} \lambda _{t} = 1\). The parameter *α*>0 determines the decay rate of *λ*
_{1},…,*λ*
_{
T
} and will be discussed more below. Finally, the latent cluster assignments are drawn from a multinomial distribution based on the cluster proportions. The multinomial distribution is a generalization of the Bernoulli distribution; *n* trials (data points) are performed with a chance of success in exactly one of *k* different categories (clusters).

*mixes*; a model has mixed when the predicted distribution reaches a steady state. The samples taken before the model mixes are called

*burn-in*and are thrown away. Samples taken after the burn-in phase can be used to compute statistical approximations, which will be used later. For the cluster assignments, the probability of

*t*(

*i*) can be analytically averaged over all possible

*λ*

_{1},…,

*λ*

_{ T }. This is done by integrating the product of the distributions in Eqs. (10)–(11) with respect to

*λ*

_{1},…,

*λ*

_{ T }. The result is that the probability of a data item being assigned to a particular cluster is proportional to the number of data items already assigned to that cluster:

where t(−*i*) is the list of all cluster assignments except the *i*th and *n*
_{−i
j
} is the number of items in cluster *j*, excluding item *i*.

Returning to the number of clusters, it was previously mentioned that it is possible to infer the number of clusters using Bayesian nonparametrics. For the GMM, the nonparametric model is known as the infinite GMM and is produced by modifying the Dirichlet distribution to be a Dirichlet process (DP). There are a few analogies for the DP that have been well circulated in the statistics literature, the Chinese restaurant process (CRP) and the stick breaking process (SBP). In this paper, the CRP and SBP, which are equivalent to the DP, will be introduced; theoretical details of DP mixture models can be found in [17, 35, 36].

*n*is the current number of customers,

*n*

_{ t }is the number of customers at table

*t*, and

*α*is the parameter related to the rate new tables are set up. To form a draw from a CRP, the infinity of customers are seated at their tables sequentially and after every customer has been seated the proportion of customers at each table determines \(\{\lambda _{t}\}_{t=1}^{\infty }\). The CRP representation clearly shows the influence of

*α*on the thickness of the tail of the proportions; increasing

*α*increases the tail thickness. This countably infinite set of proportions replaces the finite number of proportions in the GMM. Informally, if

*T*→

*∞*in Eq. (12), then limiting cases are given by Eq. (13). Once the proportions have decayed past a certain level, the remaining proportions are set to zero and the number of tables (clusters) can be determined. Figure 9 depicts the seating arrangement and assignment probabilities for a new customer after several customers have been seated.

*α*) and broken off a stick of unit length. Proportions are drawn from Beta(1,

*α*) and broken from the remaining stick until the stick is gone (infinitely small). This approach achieves the same result as the CRP, but the SBP samples the proportions directly. Mathematically, the SBP is defined as

### Factor analysis

_{ i }are Gaussian noise, and I

_{ N }is the

*N*×

*N*identity matrix. In PCA, the data \(\{\boldsymbol {x}_{i}\}_{i=1}^{N}\) is used to discover the matrix D whose column vectors span the space of the data (up to noise) and w

_{ i }are the transformed representations of x

_{ i }. The algorithm has two parameters that need to be set

*K*, the number of dictionary-elements/factors, and

*γ*

_{ ε }, the noise precision (inverse variance). The noise precision can also be modeled by a gamma random variable, so that it can also be inferred. Because the d

_{ k }are Gaussian, the space discovered is ellipsoidal. This can be seen through the following reparameterization:

_{ k }are orthonormal and the singular values

*σ*

_{ k }>0. The singular values are the radii of a

*K*-dimensional ellipsoid and the singular vectors determine the orientation of each dimension (assuming \(\gamma _{\epsilon }^{-1} < \sigma _{K}\)). Figure 11 illustrates the singular values and the mean. Note that probabilistic PCA is different from PCA, which is simply a projection onto the top

*K*principal components (either via SVD of the data or eigen-decomposition of the data covariance matrix) [37].

_{ i }, and second, it allows information to be shared across the weights during inference. The finite Beta-Bernoulli hierarchy is defined as follows

where *K* is the number of dictionary elements and *a*,*b* are hyperparameters. For each \(\boldsymbol {x}_{i}\in \mathbb {R}^{P}\), the latent binary vector \(\boldsymbol {z}_{i}\in \mathbb {R}^{K}\) encodes which dictionary elements are used by x
_{
i
}. The proportion *π*
_{
k
} is the sharing mechanism and encodes the average use of basis vector *k* across all of the selection vectors z
_{
i
}.

*a*) dishes. The

*i*th customer samples each old dish with probability

*#*(previous samples)/

*i*and samples Poisson(

*a*/

*i*) new dishes. This is the single parameter IBP with

*b*=1. Figure 12 illustrates the process. As the number of customers

*i*tends to infinity, the number of new dishes tends to zero. In practice, the IBP is truncated to a number of dishes sufficiently large (i.e., large enough that some dishes are unused with high probability—this is data dependent) and any dishes that are unused can be removed from the representation. Details about the IBP and BeBP can be found in [17, 25].

where Eqs. (23)–(25) have replaced the expression for w
_{
i
} in the PCA model, ∘ is the element-wise Hadamard product, and the product notation in 26 and 27 denotes independent draws. The mean μ has been omitted in (19), since in the case of a single factor analyzer, the mean can simply be subtracted from the data as a pre-processing step. When implementing the algorithm, the hyper-parameters *a*,…,*f* are set to so-called non-informative values.

*K*-SVD), the negative log likelihood is

which is minimized to find the latent parameters. The first term is the least square error between the inferred parameters and the data while the second and third terms are commonly used as smoothing regularizers. The fourth term is the sparsifying regularizer, similar to the *ℓ*
_{1} norm. The BPFA model is commonly implemented using Gibbs sampling or variational Bayesian methods [25, 30]. It must be emphasized that Eq. (28) is not used by sampling algorithms and cannot be optimized with traditional approaches. For more details about beta process dictionary learning including the application to three-dimensional data, see [38].

### Mixture of factor analyzers

where *γ*
_{
ε,t
},*γ*
_{
s,t
},*τ*
_{
tk
},*τ*
_{0} all have gamma hyperpriors. Equation (30) says that data point *i* is in a cluster with statistics given by factor analyzer *t*(*i*). Equations (31)–(33) give a basis representation where Σ
_{
t(i)} is a diagonal matrix similar to a singular value matrix that weights the contributions of each basis vector. If some of the (diagonal) elements of Σ
_{
t
} are small relative to the noise variance, then that component *t*(*i*) will be low rank.

where only one of the vectors w
_{
t
} is non-zero. In this way, only a single block or group is active, which also makes the representation sparse. If there is only a single ellipsoid in the model, then the sparse-CS formulation is recovered as a special case.

In addition to having a block-sparse structure, the nonparametric MFA usually infers bases that are low-rank, *K*<*P*. Low-rank Gaussian bases correspond to localized tubular manifolds. In [30] the fact that the signal is 1-block sparse is used to prove the reconstruction guarantee. Theorems for the separability of the components and satisfaction of the restricted isometry property (RIP) can also be found in [30]. Essentially, the number of measurements should be greater than a constant times the largest rank among all of the D
_{
t
} plus the log of the number of components. The largest rank is the intrinsic manifold dimension, while the number of components *T* is related to the manifold condition number.

### CS-MFA

*p*(x|y), this requires the posterior predictive probability

*p*(x) and the probability of the measurements given the signal

*p*(y|x). The posterior predictive distribution is the expected value of a new (predicted) data point with the expectation taken over the posterior

The prior predictive distribution is obtained when ξ
_{
t
}=0 and Λ
_{
t
}=I
_{
P
}, however this is usually inaccurate, so the posterior parameters are obtained by calculating the mean and covariance of the Gibbs samples. The bases \(\tilde {\boldsymbol {D}}_{t}\) are also taken as the mean of the Gibbs samples.

The representation in Eq. (44) admits an analytic CS inversion procedure, that is, once the model parameters are learned (either offline or online [22, 41]), new signals are recovered by matrix–vector operations.

### Description of CS-TEM hardware

_{ i j ℓ }are binary indicators of whether pixel

*ij*is blocked in compressed frame

*ℓ*, and

*X*is the image. This representation can be consolidated as

where the image size is *N*
_{
x
}×*N*
_{
y
} pixels. As previously mentioned, the images are broken down into patches so the data points x
_{
i
} in the MFA model are of size 4×4×*L*.

*x*- or

*y*-axis according to a triangle wave. During an up-stroke, a set of coded images are integrated and then another set are integrated during the down-stroke. A function generator is used to drive the piezo stage and trigger the image capture on the camera at the troughs and peaks of the triangle wave. The same setup is possible in TEM. The major difficulty in moving this approach to TEM is designing an aperture to block electrons rather than photons. Figure 13 shows an illustration of the TEM-CACTI system.

The benefit of placing the mask on a moving stage is that moving the mask creates a new encoding—essentially a new mask. If the position of the mask is known, then the encoding is known. This overcomes a difficulty in CS of using a new mask for every measurement. The compression ratio is determined by the range of motion of the mask. Effectively, moving *n* pixels (mask feature size) will give a factor of *n* compression, or *n* frames from 1.

## Results and discussion

The results in this section show the efficacy of the CS approach to TEM video. First, the algorithm settings and simplifications are given. Second, two example videos are discussed. Third, the relationship between the compression ratio and reconstruction quality is shown to be approximately logarithmic. The reconstruction quality decreases more slowly as the compression factor increases. The standard deviation of the average PSNR is also well-behaved. The simulation used real TEM video and sampled it according to the CACTI scheme. The CS reconstruction is then compared against the original for a quantitative error analysis. The images are the direct output of the CS algorithm and have *not* been post processed. *Note:* The images are best viewed digitally and full image resolution is available via the zoom function in most PDF readers.

Sampling approaches are computationally expensive (and usually scale poorly with respect to the data size), so we relax the factor analysis constraint and simply use a (finite) GMM. The GMM can be fit very efficiently by expectation-maximization ([19], chapter 9). The development above shows that this simplification is well-founded and the results below show that the simplification still produces adequate results. For training the GMM, we use the algorithm supplied by the MATLAB statistics toolbox with *T*=20 and regularization parameter 10^{−8}. The only other parameters are the patch size and patch spacing.

For all three experiments, the patch size was 4×4×2, and these were extracted half-overlapping (the spacing between the patches was 2×2×1). In the first two experiments, the compression factor was 10 frames, so the rate is 10 to 1. To train the GMM model, the first few frames were used, specifically frames 1,4,7,…,3*N*+1, where *N* is the number of frames compressed in 1 measurement. Training the GMM model on other data also works well (and is more practical), but those results are not reported here. The reconstruction also proceeded by shifting 5 frames at a time (or half of the compression ratio in the last experiment). This adds temporal stability by averaging nearby reconstructions. The silver nanoparticle video has over 900 frames each with 1024×1024 pixels, or roughly 235 million half-overlapping patches that were reconstructed in a few hours on a workstation.

### Palladium nanoparticle oxidation

To demonstrate the applicability of coded aperture CS video reconstruction for atomic resolution imaging, we show observations from Pd nanoparticles during exposure to elevated temperature and an oxidizing environment. Supported Pd nanoparticles are used extensively in catalytic applications under high temperatures and in reactive gas environments. The ability to visualize and characterize morphological, structural, and surface transformations associated with environmental exposure under *in-situ* conditions at high temporal resolution is critical for rationalization of structure–property relationships and thus essential for future advancement of catalytic technologies.

The observations here focus on characterization of atomic level processes associated with a formation of a surface oxide in the initial stage of oxidation. In particular, the observations show how adsorption of oxygen and interaction with a SiN_{x} support lead to subtle morphological changes, and subsequently to a formation of surface oxides. The observations were performed with an environmental FEI Titan 80–300. The microscope is equipped with CEOS aberration corrector for the image-forming lens, which allows imaging with Ångström resolution. The images were acquired with Gatan’s Ultra-Scan 1000S CCD camera, and the acquisition was performed in Digital Micrograph (DM) at the frame rate of 1.1 frames/s. The observations were performed at oxygen partial pressure of 10^{−2} mbar at 500 °C. Heating of the samples was done with an Aduro Protochips heating holder.

*looks*good, Fig. 18) is due to the fact that the reconstructed image is denoised as a side effect of reconstruction. Moreover, the top and left edges of the image (10 pixels) are mostly lost due to the coding process.

### Silver nanoparticle coalescence

Using aberration-corrected environmental TEM, heterogeneous catalysts surface restructuration by gas molecules [43], the sintering mechanisms of supported metal catalysts [44], and other structural changes in a gaseous environment [45], can be studied at the atomic scale under gas pressures of up to 20 Torr. For gas pressures closer to catalytic conditions, up to 1 Atm, subnanometer resolution can be achieved by using dedicated gas cell holders [46, 47]. In order to gain *in-situ* information at the atomic level, highly magnified imaging is required. Typically, an increase in magnification results in the electron beam having to be focused onto a smaller area in order to keep the number of electrons per pixel constant. This increase in the electron dose will ultimately lead to an increase of possible beam damage effects that can influence the process. Here we show an example of metallic particle coalescence induced merely by parallel electron beam illumination in TEM. While our experiments have been done for 60 nm Ag particles, we expect additional or more pronounced beam effects for the case of smaller particles. This is most relevant for catalysis applications, since particle mobility during sintering will be higher.

*in-situ*TEM videos were acquired using an 80–300 keV FEI Titan environmental TEM equipped with an objective-lens spherical aberration corrector and operating at 300 keV and in high vacuum mode. Changes in image contrast are observed, indicating particular dynamic processes, such as the formation of cavities, localized areas with lighter contrast within the particles and adjacent to areas displaying surface expansion, and diffraction contrast due to recrystallization, apparent as broad linear contours. Mass transport is apparent as progressive changes in contrast from the darker particles to the lighter inter-particles and the surface of newly formed areas. After about 10 min of electron-beam irradiation, a recrystallization front is formed and advances from the top left corner of the forming crystal down. After 13 min of irradiation, formation of facets on the recrystallized surface is also observed. The mass transport during irradiation, as shown in the snapshots for the first 2 min of the process, occurs first at the sintering neck between particles and homogeneously around their surfaces on the outermost particles. This indicates that surface diffusion is a main mechanism driving the coalescence process under the electron beam. This observation is in good agreement with previous works [48, 49].

### Compression versus reconstruction quality

*average*image, thus it cannot have a very low PSNR.

For comparison, the average PSNR of linear interpolation is also plotted. This is simply a baseline, it would be difficult to do worse with a principled approach. For example, when the video has been subsampled with compression factor 2 (i.e., every other frame is missing), the interpolated result is the average of the previous frame and the next frame. The compressed video used for the interpolation results is simply subsampled at the rate corresponding to the compression factor. To compute the average PSNR, all of the sampled frames are omitted, since their PSNR is infinite. Therefore, the comparison is between the inferred frames of both methods.

The comparison between CS-MFA and interpolation shows that the compressed frames contain significant information. CS-MFA is able to exploit this information to achieve accurate results for a wide range of compression factors. Moreover, the variance in the reconstruction PSNR is relatively small and does not increase with the compression factor.

It is difficult to decide from Figs. 22 to 23 what the maximum compression factor should be. The reconstructed images degrade very smoothly. Upon inspection of the reconstructed videos (included in the supplementary material), a compression of about 15 × seems feasible for the palladium nanoparticle video and about 20 × for the silver nanoparticle video. The compression factor also depends on what image features are important; this is a tradeoff between speed and image clarity.

## Conclusions

In this paper, we have provided an overview of CS and CS recovery via MFA. By using real TEM data to simulate the effects of compression, we were able to show the feasibility of video CS for TEM. The videos that were recovered from the simulated CS measurements exhibit the salient features of the material dynamics being studied at a compression factor of 10–20 ×. Balancing the information required, the signal to noise of the image and the desired resolution suggests that the compression could be increased even further for other experiments—dramatically improving the temporal resolution of observations in the TEM. Work to build a prototype aperture to collect compressively sensed video is currently underway. If successful, such an approach will be able to improve the ability to observe materials dynamics in any TEM imaging system.

## Endnotes

^{1} An *N*-dimensional basis can be formed by taking the Kronecker product of *N* copies of the 1-dimensional basis.

^{2} The sensing scheme must satisfy the restricted isometry property or be incoherent with the measurement basis [50].

^{3} The form used in equation (1) is built by stacking all of the x
_{
i
},y
_{
i
} into single vectors and placing the Φ
_{
i
} into a block diagonal matrix.

^{4} Frames are a generalization of bases. A frame can have a different number of elements than a basis. If the dimension of the space is *N*, then a basis will have *N* elements of dimension *N*, whereas a frame will have *K*≠*N* elements of dimension *N*. When *K*>*N* the frame is sometimes referred to as an “overcomplete basis”.

^{5} If the dictionary is in \(\mathbb {R}^{N\times K},\, K>N\), the number of dictionary elements used is much smaller than *K*. The actual number of elements used depends on the compressibility of the signal.

^{6}More formally, if {

*A*

_{ i }} is an open cover of a set

*S*in a metric space, then

*S*is compact if

*n*is finite.

^{7} The one-dimensional version is presented for simplicity and is easily generalized with the Wishart distribution.

## Declarations

### Acknowledgements

This work was supported in part by the United States Department of Energy Grant No. DE-FG02-03ER46057. This research is also part of the Chemical Imaging Initiative conducted under the Laboratory Directed Research and Development Program at Pacific Northwest National Laboratory (PNNL) and was performed using EMSL, a national scientific user facility sponsored by the Department of Energy’s Office of Biological and Environmental Research located at PNNL. PNNL is a multi-program national laboratory operated by Battelle Memorial Institute under Contract DE-AC05-76RL01830 for the U.S. Department of Energy.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Ferreira, PJ, Mitsuishi, K, Stach, EA:
*In-situ*transmission electron microscopy. MRS Bull. 33, 83–90 (2008).View ArticleGoogle Scholar - Jinschek, JR: Advances in the environmental transmission electron microscope (etem) for nanoscale
*in-situ*studies of gas-solid interactions. Chem. Commun. 50, 2696–2706 (2014).View ArticleGoogle Scholar - Huang, JY, Zhong, L, Wang, CM, Sullivan, JP, Xu, W, Zhang, LQ, Mao, SX, Hudak, NS, Liu, XH, Subramanian, A, Fan, H, Qi, L, Kushima, A, Li, J:
*In-situ*observation of the electrochemical lithiation of a single SnO_{2}nanowire electrode. Science. 330(6010), 1515–1520 (2010).View ArticleGoogle Scholar - Evans, JE, Jungjohann, KL, Browning, ND, Arslan, I: Controlled growth of nanoparticles from solution with
*in-situ*liquid transmission electron microscopy. Nano Lett. 11(7), 2809–2813 (2011).View ArticleGoogle Scholar - Krivanek, OL, Dellby, N, Lupini, AR: Towards sub-Å electron beams. Ultramicroscopy. 78(14), 1–11 (1999).View ArticleGoogle Scholar
- Haider, M, Rose, H, Uhlemann, S, Kabius, B, Urban, K: Towards 0.1 nm resolution with the first spherically corrected transmission electron microscope. J Electron. Microsc. (Tokyo). 47(5), 395–405 (1998).View ArticleGoogle Scholar
- Jinschek, JR, Helveg, S: Image resolution and sensitivity in an environmental transmission electron microscope. Micron. 43(11), 1156–1168 (2012).View ArticleGoogle Scholar
- Gatan: TEM Imaging & Spectroscopy. http://www.gatan.com/products/tem-imaging-spectroscopy. Accessed: 19 Dec 2014.
- McMullan, G, Faruqi, AR, Clare, D, Henderson, R: Comparison of optimal performance at 300 kev of three direct electron detectors for use in low dose electron microscopy. Ultramicroscopy. 147, 156–163 (2014).View ArticleGoogle Scholar
- Candès, EJ, Romberg, J, Tao, T: Uncertainty principles: exact signal reconstruction from highly incomplete frequency information. Inform. Theory IEEE Trans. 52(2), 489–509 (2006).View ArticleGoogle Scholar
- Donoho, DL: Compressed sensing. Inform. Theory IEEE Trans. 52(4), 1289–1306 (2006).View ArticleGoogle Scholar
- Binev, P, Dahmen, W, DeVore, R, Lamby, P, Savu, D, Sharpley, R: Compressed sensing and electron microscopy. In: Vogt, T, Dahmen, W, Binev, P (eds.)Modeling Nanoscale Imaging in Electron Microscopy. Nanostructure Science and Technology, pp. 73–126. Springer (2012).Google Scholar
- Stevens, A, Yang, H, Carin, L, Arslan, I, Browning, ND: The potential for Bayesian compressive sensing to significantly reduce electron dose in high-resolution STEM images. Microscopy. 63(1), 41–51 (2013).View ArticleGoogle Scholar
- Liao, X, Li, H, Carin, L: Generalized alternating projection for weighted-
*ℓ*_{2,1}minimization with applications to model-based compressive sensing. SIAM J. Imaging Sci. 7(2), 797–823 (2014).View ArticleGoogle Scholar - Bioucas-Dias, JM, Figueiredo, MA. T: A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration. Image Process. IEEE Trans. 16(12), 2992–3004 (2007).View ArticleGoogle Scholar
- Mairal, J, Bach, F, Ponce, J: Sparse modeling for image and vision processing (2014). arXiv preprint arXiv:1411.3230.Google Scholar
- Griffiths, T, Ghahramani, Z: The Indian buffet process: an introduction and review. J. Mach. Learn. Res. 12, 1185–1224 (2011).Google Scholar
- Baraniuk, RG: Compressive sensing. IEEE Signal Process. Mag. 24(4) (2007).Google Scholar
- Bishop, CM, et al: Pattern Recognition and Machine Learning. Springer, New York (2006).Google Scholar
- Foucart, S, Rauhut, H: A Mathematical Introduction to Compressive Sensing, Springer, New York (2013).Google Scholar
- Gill, J: Bayesian Methods: A Social and Behavioral Sciences Approach. CRC press (2014).Google Scholar
- Yuan, X, Yang, J, Llull, P, Liao, X, Sapiro, G, Brady, DJ, Carin, L: Adaptive temporal compressive sensing for video. In: Image Processing (ICIP), 2013 20th IEEE International Conference On, pp. 14–18, Melbourne, Australia (2013).Google Scholar
- Yuan, X, Llull, P, Liao, X, Yang, J, Brady, D, Sapiro, G, Carin, L: Low-cost compressive sensing for color video and depth. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference On. IEEE (2014). arXiv:1402.6932v1.Google Scholar
- Saghi, Z, Benning, M, Leary, R, Macias-Montero, M, Borras, A, Midgley, PA: Reduced-dose and high-speed acquisition strategies for multi-dimensional electron microscopy. Adv. Struct. Chem. Imaging (2015).Google Scholar
- Zhou, M, Chen, H, Paisley, J, Ren, L, Li, L, Xing, Z, Dunson, D, Sapiro, G, Carin, L: Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images. Image Process. IEEE Trans. 21(1), 130–144 (2012).View ArticleGoogle Scholar
- Olshausen, B, et al: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 381(6583), 607–609 (1996).View ArticleGoogle Scholar
- Aharon, M, Elad, M, Bruckstein, A: K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. Signal Process. IEEE Trans. 54(11), 4311–4322 (2006).View ArticleGoogle Scholar
- Binev, P, Blanco-Silva, F, Blom, D, Dahmen, W, Lamby, P, Sharpley, R, Vogt, T: High-quality image formation by nonlocal means applied to high-angle annular dark-field scanning transmission electron microscopy (HAADF–STEM). In: Vogt, T, Dahmen, W, Binev, P (eds.)Modeling Nanoscale Imaging in Electron Microscopy. Nanostructure Science and Technology, pp. 127–145. Springer (2012).Google Scholar
- Goris, B, den Broek, WV, Batenburg, KJ, Mezerji, HH, Bals, S: Electron tomography based on a total variation minimization reconstruction technique. Ultramicroscopy. 113, 120–130 (2012).View ArticleGoogle Scholar
- Chen, M, Silva, J, Paisley, J, Wang, C, Dunson, D, Carin, L: Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: algorithm and performance bounds. Signal Process. IEEE Trans. 58(12), 6140–6155 (2010).View ArticleGoogle Scholar
- Wakin, MB: Manifold-based signal recovery and parameter estimation from compressive measurements (2010). arXiv preprint arXiv:1002.1247.Google Scholar
- He, X, Yan, S, Hu, Y, Niyogi, P, Zhang, H-J: Face recognition using laplacianfaces. Pattern Anal. Mach. Intell. IEEE Trans. 27(3), 328–340 (2005).View ArticleGoogle Scholar
- Munkres, JR: Topology: A First Course. Prentice-Hall Englewood Cliffs, NJ (1975).Google Scholar
- Gershman, SJ, Blei, DM: A tutorial on Bayesian nonparametric models. J. Math. Psychol. 56(1), 1–12 (2012).View ArticleGoogle Scholar
- Rasmussen, C: The infinite Gaussian mixture model. In: NIPS, pp. 554–560, Denver, CO (1999).Google Scholar
- Neal, RM: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000).Google Scholar
- Tipping, ME, Bishop, CM: Probabilistic principal component analysis. J. R. Stat. Soc. Series B (Stat. Methodol.) 61(3), 611–622 (1999).View ArticleGoogle Scholar
- Xing, Z, Zhou, M, Castrodad, A, Sapiro, G, Carin, L: Dictionary learning for noisy and incomplete hyperspectral images. SIAM J. Imaging Sci. 5(1), 33–56 (2012).View ArticleGoogle Scholar
- Ghahramani, Z, Hinton, GE, et al: The EM algorithm for mixtures of factor analyzers (1996). Technical report, Technical Report CRG-TR-96-1, University of Toronto.Google Scholar
- Tipping, M, Bishop, C: Mixtures of probabilistic principal component analyzers. Neural Comput. 11(2), 443–482 (1999).View ArticleGoogle Scholar
- Yang, J, Yuan, X, Liao, X, Llull, P, Sapiro, G, Brady, DJ, Carin, L: Gaussian mixture model for video compressive sensing. In: Image Processing (ICIP), 2013 20th IEEE International Conference On, pp. 19–23 (2013).Google Scholar
- Llull, P, Liao, X, Yuan, X, Yang, J, Kittle, D, Carin, L, Sapiro, G, Brady, D: Coded aperture compressive temporal imaging. Opt. Express. 21(9), 10526–10545 (2013).View ArticleGoogle Scholar
- Yoshida, H, Kuwauchi, Y, Jinschek, JR, Sun, K, Tanaka, S, Kohyama, M, Shimada, S, Haruta, M, Takeda, S: Visualizing gas molecules interacting with supported nanoparticulate catalysts at reaction conditions. Science. 335(6066), 317–319 (2012).View ArticleGoogle Scholar
- DeLaRiva, AT, Hansen, TW, Challa, SR, Datye, AK:
*In-situ*transmission electron microscopy of catalyst sintering. J. Catalysis. 308, 291–305 (2013).View ArticleGoogle Scholar - Jinschek, J: Advances in the environmental transmission electron microscope (ETEM) for nanoscale
*in-situ*studies of gas-solid interactions. Chem. Commun. 50(21), 2696–2706 (2014).View ArticleGoogle Scholar - Creemer, J, Helveg, S, Hoveling, G, Ullmann, S, Molenbroek, A, Sarro, P, Zandbergen, H: Atomic-scale electron microscopy at ambient pressure. Ultramicroscopy. 108(9), 993–998 (2008).View ArticleGoogle Scholar
- Mehraeen, S, McKeown, JT, Deshmukh, PV, Evans, JE, Abellan, P, Xu, P, Reed, BW, Taheri, ML, Fischione, PE, Browning, ND: A (S)TEM gas cell holder with localized laser heating for
*in-situ*experiments. Microscopy Microanal. 19(02), 470–478 (2013).View ArticleGoogle Scholar - Tsyganov, S, Kästner, J, Rellinghaus, B, Kauffeldt, T, Westerhoff, F, Wolf, D: Analysis of Ni nanoparticle gas phase sintering. Phys. Rev. B. 75(4), 045421 (2007).View ArticleGoogle Scholar
- Surrey, A, Pohl, D, Schultz, L, Rellinghaus, B: Quantitative measurement of the surface self-diffusion on Au nanoparticles by aberration-corrected transmission electron microscopy. Nano Lett. 12(12), 6071–6077 (2012).View ArticleGoogle Scholar
- Candès, EJ: The restricted isometry property and its implications for compressed sensing. Comptes Rendus Mathematique. 346(9), 589–592 (2008).View ArticleGoogle Scholar