 Review
 Open Access
 Published:
Deep data analysis via physically constrained linear unmixing: universal framework, domain examples, and a communitywide platform
Advanced Structural and Chemical Imaging volume 4, Article number: 6 (2018)
Abstract
Many spectral responses in materials science, physics, and chemistry experiments can be characterized as resulting from the superposition of a number of more basic individual spectra. In this context, unmixing is defined as the problem of determining the individual spectra, given measurements of multiple spectra that are spatially resolved across samples, as well as the determination of the corresponding abundance maps indicating the local weighting of each individual spectrum. Matrix factorization is a popular linear unmixing technique that considers that the mixture model between the individual spectra and the spatial maps is linear. Here, we present a tutorial paper targeted at domain scientists to introduce linear unmixing techniques, to facilitate greater understanding of spectroscopic imaging data. We detail a matrix factorization framework that can incorporate different domain information through various parameters of the matrix factorization method. We demonstrate many domainspecific examples to explain the expressivity of the matrix factorization framework and show how the appropriate use of domainspecific constraints such as nonnegativity and sumtoone abundance result in physically meaningful spectral decompositions that are more readily interpretable. Our aim is not only to explain the offtheshelf available tools, but to add additional constraints when readymade algorithms are unavailable for the task. All examples use the scalable open source implementation from https://github.com/ramkikannan/nmflibrary that can run from small laptops to supercomputers, creating a userwide platform for rapid dissemination and adoption across scientific disciplines.
Introduction
The development of physical and spectroscopic imaging methods in the last two decades has given rise to large multidimensional datasets, with examples including electron energy loss spectroscopy imaging in (scanning) transmission electron microscopy [1,2,3,4], bias and time spectroscopies in scanning probe microscopy [5,6,7,8], hyperspectral Raman and optical imaging [9,10,11,12], and spatially resolved mass spectrometry measurements [13,14,15].
In many of these techniques, the measured signal can be (with good approximation) presented as a linear combination of spectra, i.e.,
where x is the spatial variable, x = (x,y), R is the vector parameter variable, \(w_{i} \left( {\mathbf{R}} \right)\) is the individual spectra (sometimes called ‘endmembers,’ ‘factors,’ or ‘components’), and a_{ i }(x) are corresponding spatial maps (also called abundance maps) and N defines the noise (not considered here). For example, w_{ i }(R) can be optical spectra in Raman and hyperspectral imaging, mass spectra, energy loss spectra in electron microscopy, force–distance curves in atomic force microscopy, etc. The loading maps a_{ i }(x) correspond then to local weightings of each spectrum, with examples such as concentration of relevant chemical species, phases, etc.
A special case of linear mixing is the linear imaging technique, for which the measured image \(I(\varvec{x})\), is given by the convolution of an ideal image (representing material properties) \(I_{0} \left( {\varvec{x}  \varvec{y}} \right)\) with the resolution function dependent on probe geometry, \(F(\varvec{y})\):
where \(N(\varvec{x})\) is the noise function. While in general the linearity of particular imaging mode needs to be proven, it is considered to be a reasonable approximation in the case of many optical [16], mass spectrometry [17], scanning probe [18,19,20,21], and electron microscopy techniques [22]. The important aspect of Eq. (2) is that finite spatial resolution does not affect the linearity of the mixture, making analysis via Eq. (1) universal.
In certain cases, the elementary contributions w_{ i }(R) in Eq. (1) are known, for example from tabulated data for the specific system. In this case, the problem is reduced to the determination of the unknown weight coefficients a_{ i }(x) via minimal least square regression. Since least squares is a convex optimization, there exists a unique a_{ i }(x) given w_{ i }(R) [23]. At other times, it is necessary to solve a constrained least squares [23, 24] problem, such as nonnegativity [25], box [26, 27], etc. But in all cases the separation of spectrum into a linear combination of known components with unknown coefficients presents a relatively straightforward problem.
However, in many cases the functional form of the endmembers is unknown, leading to a paradoxical problem where we need to determine both loading maps \(a_{i} \left( {\mathbf{x}} \right)\) and endmember spectra w_{ i }(R) from multiple realizations of the experimental observations S(x,R). This constitutes the classical linear unmixing problem [28, 29].
The classical tool to address it is principal component analysis (PCA), known since work by Pearson [30] in the early twentieth century. PCA has started to become popular with the increase of the data size, e.g., from internet applications [31], as a first step of exploratory data analysis for visualizing high dimensional data. Multiple applications of PCA for hyperspectral optical imaging [32], EELS [33,34,35,36], mass spectrometry [37, 38], and scanning probe microscopy [39,40,41,42] have been further reported. However, while it is an extremely powerful exploratory data analysis tool, and is well defined from the information theory perspective, PCAderived components lack physical constraints. For example, PCA components of the (positively defined) EELS signal will have negative regions, automatically precluding physical interpretation. This consideration highlights the (todate) limited applicability of linear unmixing techniques in physical imaging.
However, developments in matrix factorization have enabled a considerably broader spectrum of linear unmixing techniques that allow superimposing a large number of constraints on either loading maps or endmembers. It can be argued that in cases when the statistically imposed constraints match the anticipated physics of the system, the unmixing will directly provide the insight to the latter.
In this manuscript, we present a review of matrix factorization (MF) approaches, as well as a tutorial for domain experts on how these new approaches can be applied to a variety of imaging modalities. We discuss the different physical constraints that can be placed on the endmembers and the spatial maps, that can result in more physical meaningful results, and show test cases with examples ranging from spatially resolved mass spectrometry, to electron microscopy, scanning tunneling, and Xray microscopy. An overview of matrix factorization is provided in “Notations” section. Constraints are discussed in “Matrix factorization” section, and examples of hyperspectral imaging and MFbased images analysis are presented in “Matrix factorization framework (MFF)” and “Domain specific applications” sections.
Notations
We begin with introducing the conventions used in the equations. We use capital case letter such as A to denote matrices and lower case a for vectors. The one indexed lower case such as a_{ i } is a scalar value and represents the vector element at ‘i.’ Similarly, the twoindexed upper/lower cases such as A_{ ij } or a_{ ij } represents the scalar value—also called element of the matrix at the location (i,j). We often require a scalar value for the entire matrix or vector, and one example that can be computed is the socalled matrix or vector norm. More formally a norm is represented as \(\left A \right_{q} :A \in {\mathcal{R}}^{m \times n} \to {\mathcal{R}}\). The typical values for q are 1, 2, and F called as ℓ1norm, ℓ2norm, and Frobenius norm, respectively. Table 1 defines each of these norms, and also offers a quick reference for many of the terms used in this paper. Also, if there is a comparison relation defined between a matrix/vector and a scalar, the relations are defined against every element in the matrix or a vector to the vector. For e.g., A > 0 means every element in the matrix is nonnegative and similarly for a vector it is represented as a > 0.
Matrix factorization
In this section, we will introduce the matrix factorization problem and its connection with the linear unmixing explained above. Subsequently, we explain our matrix factorization framework (MFF) that offers a pragmatic framework of incorporating many realworld physical constraints. We introduce the popular linear unmixing techniques principal component analysis (PCA) and nonnegative matrix factorization (NMF) under this framework and finally, discuss the examples of the two realworld constraints, sparsity and spatial smoothness, as preferential soft constraints with nonnegativity on endmembers. The aim of this section, is to provide domain scientists sufficient information to extend the existing offtheshelf algorithms with additional domain constraints they will encounter during their experiments, hopefully facilitating better understanding and use of multidimensional spectral data.
Matrix factorization is the problem of decomposing the input matrix into two or more matrices—called factors, such that the product of these factors is close to the input matrix. Typically, the rank of these factors will be much less than the rank of the input matrix and is termed as a “low rank approximation” in numerical computing. The rank is similar to number of principal components in PCA. However, in the Big Data literature [24, 43], as opposed to lowrank approximation, the community liberally calls this problem a “matrix factorization” as it determines the factors for the input matrix, leading to an overlap between lowrank approximations and matrix factorization techniques. Overall, it is a popular tool for many realworld problems in both scientific [44, 45] and enterprise domain such as clustering [46, 47], imputation [43, 48], background separation [49, 50], etc.
Here, we provide an overview of the framework for understanding matrix factorization (“lowrank approximation”) and tuning the various parameters on this framework for daytoday needs of handling different domain observations. For the latter, we use the concept of physical constraints such as sparsity, spatial smoothness, robustness to noise, symmetry, etc. that match the physics of the specific problem. We further provide some examples of physical imaging where these constraints are used to match the physics of imaging process and material properties.
As a starting point, consider an input matrix \(X\) of size m × n, where ‘m’ is the number of features and ‘n’ is the number of samples, and a very small number ‘k’ called ‘lowrank.’ Typically, k ≪ min(m,n) may be in the order of 50’s for matrix in size of millions, while k less than 10 is typical for matrices of size in a few thousands. It is common in the machinelearning literature to use features, attributes, dimensions, and metrics interchangeably; here, we will consistently use the term ‘features.’ In Fig. 1 there is a pictorial representation of the matrix factorization process with two lowrank factors.
In the case of scientific data, the input matrix can be the hyperspectral data acquired by a wide range of spectroscopic techniques, where signal in each of the n spatial points represents a spectrum of length m, containing information about local properties. The features in this case correspond to the spatial grid on which measurements are performed (i.e., (x,y) or (x,y,z)), whereas samples correspond to wavelength, energy, voltage, masstocharge ratio, etc. In the case of linear unmixing, the matrix U will be interpreted as consisting of k endmembers w_{ i }(R) and V as the loading maps a_{ i }(x).
There are many interpretations for matrix factorization. One consistent view among researchers is the equivalence of matrix factorization to soft clustering [51] with k representatives and distribution of every sample over these representatives. Given a matrix X of size m × n with n samples of data, where each sample has m dimensions, matrix factorization generates k representatives as left lowrank factor U of size m × k and the right lowrank factor V of size k × n provides the distribution of every sample among these k representatives. That is, consider a sample j, if the weight of the 2nd entry is more than 5th entry of the V matrix, the sample j is associated more with the 2nd cluster over the 5th cluster. This definition is also consistent with the soft clustering of determining ‘k’ clusters [51]. Matrix factorization is also a dimensionality reduction technique as it reduces the sample dimension from m to k in the space of U. That is, given the input matrix X of size m × n, we produce a matrix V of size k × n where k ≪ m and hence the name “dimensionality reduction.” For the rest of the paper, we will address matrix factorization mainly as a “dimensionality reduction” [52, 53] technique.
One challenging problem in unmixing is determination of the number of endmembers k. Ideally, a choice of good k is that every point x in the loading map a_{ i }(x) is exactly representable as a combination of the k endmembers w_{ i }(R). The trivial solution that satisfies this condition is k = rank(X), where rank is the number of nonzero eigenvalues of the matrix X. We are looking for a nontrivial k ≪ min(m,n), that best fits the matrix X. Typically, in practice, we increment k, until we find the results meaningful. Incrementally updating the number of endmembers and the obtaining loading maps for lower number of endmembers is not computationally expensive. In the scientific domain, we are expecting the number of endmembers typically to be small, i.e., < ~ 10. To statistically evaluate the quality of the unmixing, we may utilize the dispersion coefficient method explained by Kim and Park [54] in the matrix factorization context. There are also other approaches [55] based on information criterion such as Akaike information criterion (AIC) or Bayesian information criterion (BIC) and the elbow method based on law of diminishing advantages [56]. For domain scientists, this problem is akin to one of fitting a model (e.g., a polynomial of order n) to data—in those cases, information criterion approaches allow one to apply a penalty on the polynomials of higher order (due to larger available degrees of freedom) that must be overcome for models with higher n to be preferred over those with lower n.
Matrix factorization framework (MFF)
The key questions that arise from the previous sections are (a) How does one define the approximation X ≈ UV? (b) How to incorporate the properties of the input data X, for e.g., positive numbers? (c) How can specific domain knowledge—such as, e.g., the representative spectra should be spatially correlated, it’s a matrix of signals, etc. be incorporated? Most of these questions are addressed in matrix factorization process as one of the following: (refer to Table 1 for details of notations or definitions in this section).

1.
Similarity function X ≈ UV. Even though UV corresponds to the linear unmixing \(\sum\nolimits_{i = 1}^{k} {a_{i} \left( {\mathbf{x}} \right)w_{i} \left( {\mathbf{R}} \right)}\), defining the similarity of UV to X is important. For example, it can be an entrywise closeness of UV to X or alternatively the closeness at the individual spectra. That is, every row of UV to individual vector parameter variable R.

2.
Properties of the input data can be a hard constraint on U and V. For example, the product of two nonnegative matrices will always be positive.

3.
Characteristics of the data will either be a hard constraint or a soft constraint imposed as a regularization. In practice, hard constraints are computationally expensive, and regularization provides good interpretability. Sometimes, for very large matrices enforcing hard constraint might take days to weeks and would require running on distributed supercomputing clusters [24]. The importance of the regularization is always defined through positive regularization constants—the higher the value, the higher the importance. The preference among the conflicting soft constraints is expressed through the values of the corresponding regularization constant. There are scientific libraries such as mlrmbo [57] and hyperopt [58] that help domain scientists determine the values of these regularization constants based on a grid search, line search, random search, or Bayesian optimization techniques.

4.
The product of factors can be transformed using a transformation function f. For example, a sigmoid function for a Boolean input matrix, or a rounding function in the case of integer input matrix.

5.
Preprocessing on the input matrix to generate X. For example, a standard practice in microscopy images is to apply a Fast Fourier Transform (FFT). Mean centering is another popular preprocessing step for PCA. Similarly, normalization to generate the matrix X in the range of [− 1,1] or [0,1] is another common preprocessing technique.

6.
Finally, a less common but an observed practice is providing different weights to the samples. For example, as part of the preprocessing step we assume some engineered features that are augmented to provide better information. Such augmented features will have a different weight towards the observed or measured features.
Figure 2 presents these different control knobs, which are parameters of the matrix factorization process.
The above framework [59] offers a unified way of understanding many dimensionality reduction techniques such as singular value decomposition (SVD), principal component analysis (PCA), nonnegative matrix factorization (NMF), and others needed for multivariate analysis of various multidimensional data. Also, it provides the ability to incorporate the physical constraints that govern the underlying process using the above defined parameters. As an example, we will explain the standard PCA and NMF, that is used in the interpretation of microscopy data.
Below in Table 2 we provide some common realizations of the different parameters encountered in Fig. 2.
Principal component analysis (PCA)
Principal component analysis (PCA) [60] is a simple, nonparametric method for visualizing high dimensional data. Classical PCA is a linear transform that maps the data into a lower dimensional space by preserving as much data variance as possible. With minimal effort PCA reduces a complex dataset to a lower dimension to reveal the sometimes hidden, simplified structures that often underlie it.
The principal components are the topk eigenvectors of mean subtracted data matrix. That is, consider the matrix A of size m × n, an input matrix X is constructed by subtracting the mean of all the m features from each of the n samples. We then perform the singular value decomposition (SVD) of the matrix X. The eigenvalues of the topk eigenvectors are considered as the principal components of matrix A. The above process can be explained in the matrix factorization framework as below.
From the above formulation (3), for PCA we can map the parameters of the MFF, the optimization problem has Frobenius norm as the similarity measure with orthogonality constraints on the factors, where I is an identity matrix of size. PCA performs mean subtraction as preprocessing and considers uniform weights for all the data points.
In PCA, the orthogonality of the factors is rigid and can result in having negative values on the factors restricting its interpretability. For example, V cannot be interpreted as probability distribution, because of negative values. In such scenarios, we consider using nonnegative matrix factorization (NMF).
Nonnegative matrix factorization (NMF)
NMF [61] is the problem of decomposing the input matrix X into two nonnegative factors U and V such that X ≈ UV. NMF is popular among scientist for spatially resolved spectral analysis, defined as finding k ≪ m basic spectra (basis functions that change gradually with composition, in terms of structure and intensity), such that all the \(m\) measurements can be explained as a mixture of the k basic spectra.
Formally NMF can be defined as,
In the case of NMF, the common similarity measure is Frobenius norm as in the above formulation (4) and KLdivergence. We are enforcing hard nonnegative constraint which means every element in the factors U and V will be zero or above, and all the samples are uniformly weighted.
Sparsity
We often know that the number of endmembers that participate in a particular point on the abundance is sparse, i.e., limited. Consider the distribution for a particular pixel, say 3, on the abundance map from matrix V among 4 endmembers could have been [0.48 0.49 0.015 0.015]. The NMF model allocated an insignificant value 0.015 for endmembers 3 and 4 so that it can reduce the overall objective error of the optimization function. But for the domain scientist it can be difficult to delineate these insignificant values. We can overcome this difficulty by enforcing the maximum number of participating endmembers for every pixel in the abundance map. However, it is computationally very expensive to enforce this hard constraint, and instead we use an \(\ell 1\)—regularizer [25]—a soft constraint for the model to ignore insignificant value on the V matrix as follows.
Spatial smoothing
It is generally observed that the mixture of endmembers around a particular point will be similar. That is, in a 128 × 128 target, the mixture among the neighboring pixels such as (x − 1,y), (x + 1,y), etc. around a given (x,y) is likely to be similar. To enforce this spatial smoothness, we utilize the spatial regularization [62] in MFF. The NMF with spatial regularization can be formally defined as
In the above formulation (6), L is a similarity matrix constructed out of the input matrix among 16,384 pixels. That is, we consider the pairwise similarity among 16,384 × 1535 matrix that results in a 16,384 × 16,384 symmetric matrix with diagonal elements being zero. By providing this additional information, we are incorporating the neighborhood information implicitly into the matrix factorization process through the regularization constants λ_{1} and λ_{2}.
Further, if all the data are normalized and in a similar range and if λ_{2} > λ_{1}, we are informing the MFF that spatial properties are more important than sparsity. On the one hand, choosing a very low λ, may not have any impact on the model at all. On the other hand, a high λ, can result in numerical errors and result in infinity, undefined values, or yielding same values across all matrix elements in factors. It is always better in practice to start with relative low regularization values such as 0.001 and increasing in different steps till we obtain a desired value. For example, in this model (6) with spatial smoothness and sparsity, sparsity is relatively an easier constraint over spatial smoothness. Thus, it is preferable to start with a nonzero λ_{1}, proceed with identifying a good parametric value, and only then tune λ_{2}. It is important to observe that λ’s are always nonnegative. Additionally, there are scientific libraries such as mlrmbo [57] and hyperopt [58] that can aid this determination, with automated approaches to determine the values of these regularization constants.
MFF can incorporate different physical constraints during matrix factorization such as sparsity, spatial smoothness, nonnegativity, etc. In this paper, we are using the open source implementation from https://github.com/ramkikannan/nmflibrary. Kannan et al. [50] provide the details about the implementation in their paper. We would like to conclude modeling different popular matrix factorization techniques under MFF in Table 3.
Domainspecific applications
In this section, we begin with the illustrative workflow in Fig. 3 of the unmixing process followed by scientists.
The process begins when a scientist generates some multidimensional imaging data, typically (but not always) in a spatially resolved fashion. Each point or pixel consists of a spectra, and the aim is to unmix this multidimensional dataset into a smaller number of constituent spectra, to aid in interpretation and to speed up visualization with minimal information loss. After preprocessing of the data (which can be either simple or elaborate), the unmixing algorithm is applied, and produces endmembers and abundance maps which are then interpreted by the domain expert. When the abundance maps and the components lack physical meaning, scientists may retry the unmixing by imposing physical constraints as necessary. For e.g., if the spectra from PCA have negative values, they will introduce nonnegative constraints through NMF. This process is iterated till the obtained endmembers and the spatial maps are physically justifiable.
Listed in Table 4 below are some examples of the scientific applications and the potential constraints of matrix factorization approaches. The approach lends itself directly towards applications where measured spectra necessarily arise from mixing of multiple components in an additive fashion. Given variations in the strengths of these mixings, e.g., across spatial or temporal domains, the captured spectra will constitute the matrix to be factored using MFF approaches. The goal in these tasks is usually to determine the constituent (‘purest’) spectra, corresponding to, e.g., ideal crystal phases (Xray crystallography), particular chemical species (chemical imaging such as timeofflight secondary ion mass spectrometry, ToFSIMS), specific electronic structures (scanning tunneling microscopy and current imaging tunneling spectroscopy, STM and CITS), etc.
Specific constraints are applied based on known physical facts, for instance, chemical mass spectra in ToFSIMS are always positive (negative concentration of a species is not defined). Similarly, analysis of electron energy loss spectra (EELS) also implies positivity on all factors and abundances. The sumtoone constraint on the abundances also arises from basic scientific considerations. Assuming that the measured spectra are linear superpositions of constituent spectra, then each abundance is effectively a percentage spectral weight, with the coefficients summing to one. This is true for chemical spectra, Xray diffraction, etc.
Note that for the qualitative analysis of features commonly seen in CITS curves (such as presence/absence of kinks, interpeak separation, and ratio of peak heights) the sumtoone requirement may be omitted, as long as a nonnegativity constraint is imposed. An additional complication arises in determining the optimum number of components. In many cases this value is unknown apriori, but can be easily estimated based on similarity of resulting components when the unmixing is computed for increasingly more components: beyond some threshold k components, additional components will begin to appear similar to other components.
In addition, sparsity and smoothness constraints can be used for analysis of spatial distribution of defects and, in some specific cases, shapes of spectral curves. The main idea behind applying sparsity constraints to abundance maps is a relatively low probability of several phases being observed simultaneously in one pixel. For example, it is very unlikely that more than one type of structure or chemical phase can be present within a pixel whose size is around several angstroms. By the same token, there are certain scenarios, for example in the chemical and STM spectroscopies, in which the chemical or electronic state associated with one endmember (e.g., defectinduced localized state) may not appear at the same value of energy in other endmembers (e.g., in a gapped superconducting phase). The smoothness constraints, meanwhile, imply that the mixture of endmembers around a particular pixel in the abundance maps do not vary strongly.
For a microscopic experiment, smoothness is generally expected to be obeyed when the achievable lateral resolution in the imaging data is larger than the pixel size in the same dataset. That is, it is generally not possible that individual pixels can be surrounded by pixels of a different factor, given finite probe size and associated convolution of the signal across multiple pixels. At the same time, the imposition of the sparsity constraint requires domain knowledge. In some cases, multiple mechanisms (spectra) can coexist, but in many cases, they cannot. As one example, unmixing distinct electronic phases from I–V data with sparsity constraint implies that at any one pixel, there cannot be contribution from multiple competing transport phenomena (such as Ohmic and Schottky emission). Moreover, from a fundamental physics perspective smoothness is enforced because interfaces separating distinct phases tend to be smooth to lower energy, and sparsity comes from the fact that, e.g., multiple structural phases cannot coexist in the same location.
In the section below, we deal with the various scientific applications of the MF approach.
Timeofflight secondary ion mass spectrometry (ToFSIMS) data
Timeofflight secondary ion mass spectrometry (ToFSIMS) is a chemical imaging technique, widely used for chemical characterization of organic and inorganic systems. In ToFSIMS, focused ion beams are used to release material species from the studied sample. Those ions are further accelerated in electric field and analyzed using mass detector [15, 67]. Using multiple ion guns, ToFSIMS allows investigations in the bulk of the sample; in this case the results represent a 4dimensional data cube with three spatial (X, Y, and Z) and one spectral (masstocharge) dimension. Nonnegative matrix factorization (NMF) can be used as a basis for automated interpretation of this data. In this case, each mass spectrum is considered as a mathematical vector X_{ i }, in spatial point I, which is deconvoluted as linear combination of limited number of nonnegative endmembers w_{ j } and noise term N_{ i }.
where A_{ ij }—abundance coefficients.
Nonnegative matrix factorization can be used for automated analysis and interpretation of the hyperspectral data acquired by wide range of spectroscopic techniques, where signal in each point represents a spectrum, containing information about local properties. In this case, multidimensionality and size of the resulted data render more traditional methods of data analysis substantially difficult.
ToFSIMS 2D imaging
In this section, we compare the output of application of NMF and PCA algorithms on ToFSIMS experimental data. The details about the experiment and the procedure of the ToFSIMS data preparation for factorization can be found in ref [68]. Briefly, ToFSIMS chemical imaging was performed on an Arabidopsis root sample placed on an SiO_{2} substrate. After necessary relevant preprocessing, we obtained a mass spectrum of length 1535 over 128 × 128 pixel target. We constructed this a matrix of size 1535 × 16,384 as a spectrum of every pixel of the target image. The maps of the spatial distribution of various elements, along with the averaged mass spectrum, are shown in Fig. 4.
We first performed PCA analysis of this data, with the results shown in Fig. 5. This analysis shows there exists significant deviations in the chemistry within the root. To understand these results, we note that the mass spectrum in each point represents a linear combination of eigenvectors (Fig. 5b, c) with loading coefficients coded by color on the loading abundance (Fig. 5a). For example, component #1 shows averaged mass spectrum of the root, without the characteristic Si peaks. On the other hand, component #2 shows only peaks characteristic for Si (Si^{+}, Si^{2+}, Si_{2}^{+}, etc.), which can be found outside the root (see (Fig. 5a, map #2)). Component #6 most likely is responsible for some kind of contamination, which is sparsely distributed over the root and substrate and contains higher concentrations of Na. However, analysis of other components is hampered by the view of their eigenvectors, which show both positive and negative values. This is one the fundamental shortcomings of the PCA, where eigenvectors are built to be orthogonal. However, this is physically meaningless, since the count signal in mass spectrum is nonnegative.
The results of the NMF over ToFSIMS data are presented in Fig. 6. The best output was found for the unmixing on 4 components. Unlike PCA, endmembers in NMF are presented in the form of classical mass spectra (Fig. 6a) with abundance maps (Fig. 6b–e) showing their concentration at each point. To check accuracy of the data unmixing we compare real data with data restored from four NMF components. Component #1 clearly shows mass spectrum of the SiO_{2} substrate, and all peaks can be easily identified. This agrees with its spatial distribution outside the root (Fig. 4d). On the contrary, other components were mostly localized inside the root, and show variations in its chemistry. Component #2 shows regions with significant amounts of the base inorganic elements (Mg^{+}, Ca^{+}, K^{+}, etc.). Much higher intensities of small molecules (mass range 150 ÷ 350 u) as well as Cs_{2}O^{+}, Cs_{2}OH^{+}, CNCs _{2} ^{+} were found in the component #3, which is most likely related to regions of concentration of organic compounds and growth hormones. Finally, component #4 demonstrates regions with the higher Na concentrations within the root, which is in a good agreement with its map of spatial distribution (Fig. 4e).
After exploring the differences between NMF and PCA, we further explore the possibility of incorporating two common physical constraints—(a) sparsity and (b) spatial smoothing in the MFF, for this dataset.
In Fig. 7, we present the NMF result with and without spatial smoothness for the ToFSIMS data of a particular component. We can observe from Fig. 7b that the number of different nonzeros around a particular pixel is smaller than that of Fig. 7a. That is, in Fig. 7b, the probability of having the same neighboring pixels around a given pixel (x,y) is higher.
In the following sections, we will study enforcing nonnegativity constraints in detail for different types of spectroscopic experiments.
ToFSIMS 3D
Linearity and nonnegativity of endmembers in the case of ToFSIMS, as well as any mass spectrometry technique has perfect physical sense, as measured mass spectra represent a linear combination of responses of various chemical species belonging to the studied sample.
Here we demonstrate NMF for investigations of the chemical composition of an 80nmthick BiFeO_{3} (BFO) ferroelectric thin film, grown on 10 nm LaSr_{0.5}Co_{0.5}O_{3} (LSCO) buffer layer on a LaAlO_{3} (LAO) substrate. ToFSIMS investigations of the film were performed using TOF. SIMS 5 (IONTOF, Germany) instrument with Biion primary gun and Cssputtering gun. Measurements were performed in positive ion detection mode, which allowed the detection of metal ions, in addition to that cluster formed with cesium, were used for the identification of some negative species (e.g., Cs_{2}O^{+} for O^{−}, Cs_{2}OH^{+} for OH^{−}, and Cs_{2}Cl^{−} for Cl^{−}).
Investigations have been performed in the bulk of the sample, which allowed to study local distribution of the chemical composition through the thickness of the BFO film, LSCO layer, and part of the substrate. Details about the film properties and corresponding ToFSIMS investigations can be found in refs [69, 70].
Figure 8 shows the mass spectrum averaged over whole dataset and also shows presence of all base elements of BFO, LSCO, and LAO (Al^{+}, Fe^{+}, Sr^{+}, La^{+}, Bi^{+}), as well as species from adsorption layer (Na^{+}, K^{+}, and Cs_{2}Cl^{+}). We performed NMF for interpretation of the 3D spatial distribution of all detected chemical species. Procedure of the ToFSIMS data preparation for factorization can be found in ref [68].
Our analysis showed superior results for factorization with 4 endmembers, with the corresponding endmembers and cross section of 3D abundance maps plotted in Fig. 9. These data can be used for results interpretation. Specifically, the mass spectrum of component #1 demonstrates pronounced peaks of Al^{+}, La^{+}, and LaO^{+} and localized at the bottom of the scan (Fig. 9e), thus is responsible for LAO substrate. Component #3 represents LSCO buffer layer—it shows peaks of La^{+}, Sr^{+}, and LaO^{+} and exists in narrow stripe in between BFO and LAO (Fig. 9c). Bi^{+} and Fe^{+} thin film can be found in both components #2 and #4, however their mass spectra are significantly different.
Component #2 is responsible for bulk BFO signal (Fig. 9d) and shows weaker signals of pure Fe^{+} and Bi^{+}, than component #4 related with BFO surface. This is related with measurement technique, where Cs is used for the sputtering and it forms clusters with many of the released species. Consequently, in bulk scans some Fe^{+} and Bi^{+} ions form CsFe^{+} and CsBi^{+} clusters and decrease signal of the pure ions in the mass spectra. In addition, component #4 demonstrates the presence of elements from the adsorption layer (Na^{+}, K^{+}, Cs_{2}Cl^{+}), which are localized on the sample surface (Fig. 9b); this is in a good agreement with previous studies [68].
To summarize, enforcing nonnegativity constraint in the MFF, provides powerful capabilities for automated analysis of the mass spectrometry data acquired from multicomponent system. In this case data analysis is simplified to the interpretation of the limited number of endmembers with known mass spectra and maps of the spatial distribution.
Scanning transmission electron microscopy (STEM)
The modernday scanning transmission electron microscopy (STEM) allows atomically resolved imaging of multiple structural and/or chemical phases within a single image, as well as observing transitions between different phases in a series of images [71, 72]. Such experimental capabilities demand development of analytical method for rapid extraction and identification of different phases, and mapping their spatial distribution. Here we describe how the NMF technique can be combined with sliding window fast Fourier transform (FFT) to allow accurate identification and mapping of different structural and chemical phases.
An application of sliding FFT to atomically resolved microscopic images has been discussed in our earlier publications [73, 74]. Briefly, a stack of 2D FFT maps is generated by shifting a window of a selected size across an experimental STEM image such that the entire image is scanned. At each step an FFT map is computed from a region bounded by the sliding window. If we assume that the image structure factor is a linear superposition of the individual constitutive elements, then an application of NMF to the sliding FFT data allows identifying local structure factors (endmembers) and loading maps [73].
As a model published elsewhere we consider an atomically resolved image of an oxide catalyst, shown in Fig. 10a [75]. The results of the NMF analysis for the sliding FFT data obtained from this image are shown in Fig. 10b–g. The two chemical phases are clearly identified in the first and second components (Fig. 10b, e and c, f), whereas the third component can be interpreted as due to a presence of interface regions. Therefore, the use of NMF allows to match the physics of diffraction (in the absence of dynamical effects), i.e., that spectra can be deconvoluted linearly, and the fractions must sum to 1. Moreover it shows that image segmentation is possible, although in future this should be done with symmetrybased constraints on the unmixing process (to determine the space group for each phase). This ability to accurately map different chemical phases within a single STEM frame (image) could become especially valuable during analysis of phase transitions observed via STEM in a framebyframe manner (STEM ‘movies’). We also foresee that in future a combination of sliding FFT and NMF tools can be applied to scanning tunneling microscopy of quasiparticle interference patterns in strongly correlated electronic materials in which different coexisting phases (and/or different scattering centers) may produce several interference patterns with distinct symmetries within an experimental field of view.
Current tunneling imaging spectroscopy (CITS)
We next illustrate an application of NMF methods to extracting physics from current imaging tunneling spectroscopy (CITS) of a strongly correlated electronic system. CITS is a mode of operation of a scanning tunneling microscope that allows extracting 3dimensional (3D) maps of differential tunneling conductance G = dI/dU with subnanometer resolution. The value of G(x, y, U) in each recorded point (pixel) reflects an electronic density of states on the surface at energy E = eU [76]. We specifically focus our attention on CITS dataset obtained from a surface of BaFe_{2}As_{2} compound with hole doping by Mo substitution (x ≈ 0.026) on the Fe sites. This compound could play an important role in discussing mechanisms behind unconventional superconductivity in FeAsbased systems since a superconducting behavior in these materials is observed only at electron doping of the Fe sites by 3d and 4d transition metal atoms but not at hole doping [77, 78].
Figure 11a shows a representative STM topographic image of in situ cleaved Modoped BaFe_{2}As_{2} surface obtained at T = 4 K. The topographic data immediately reveal several characteristic surface features such as a presence of regions with and without a stripelike surface reconstruction, as well as pointlike (lateral size ~ 1 nm) bright blobs and depressions dispersed across the entire field of view. Similar to an earlier analysis of STEM data, our assumption here is that CITS signal can be represented as a linear superposition of currents flowing through each of the available “channels” during the experiment. We next apply NMF to the CITS dataset of the dimensions x × y × U = 80 × 100 × 220 recorded over an area shown in Fig. 11a. The results of the NMFbased decomposition (endmembers and loading maps) into 3 components are Fig. 11c–h. We note in passing that the NMF decomposition into a larger number of components adds only components associated with a noise. Analysis of the loading map in Fig. 11c suggests that the first component is primarily connected to regions without surface reconstruction. The corresponding spectral curve (endmember 1) in Fig. 11f has a characteristic bump at about ≈ − 100 meV and a vanishing density of states at around the Fermi level likely associated with a formation of spin density wave gap below T = 119 K [77]. The second component clearly originates from a presence of pointlike protrusions on the surface (Fig. 11d, g). These point impurities produce a welldefined peak in the density of states at ≈ + 100 meV seen in the endmember 2 (Fig. 11g). Noteworthy, such a welldefined feature present in the experimental electronic density of states and an information obtained about its distribution on the surface allows to significantly narrow down a range of defect structures to be considered in either theoretical modeling of the sample’s surface or in spatially averaged spectroscopic experiments. Finally, the third component can be linked to certain depressions on sample’s surface (albeit not all of them) (Fig. 11e, h). There are no pronounced localized states associated with these depressions in the energy range of interest, although they do modify the character of electronic structure around the Fermi level as seen in endmember 3 (Fig. 11h). Overall, such an unprecedented insight into the details of spatial localization of various electronic features acquired through application of NMF method can be crucial for better understanding mechanisms behind emergence/suppression of superconductivity in FeAs system in future studies. It further shows the utility of the method in segmentation into distinct electronic phases (for example, for determining metal–insulator transitions [79]), which is only possible because positivity is enforced.
Structural Xray imaging
The accurate determination of structural phases and evolution of epitaxial strain in crystalline thin film heterostructures is one of the most active research areas in structural imaging. The most commonly employed structural probe, namely Xray diffraction (XRD), provides crucial information on the crystalline state of thin films, ranging from atomic unit cell configuration in each thinfilm layer to the crystalline quality or mosaic spread of a thin film. The structural information from XRD is, however, spatially averaged over macroscopic distances of the sample [80]. As such, the structural state as determined by XRD is more suitably described as an ensemble average. Various extensions of XRD into a spatially resolved probe has been pursued in the past, ranging from single crystal Xray diffraction topography [81] to microdiffraction [82], the ultimate goal being the determination of the individual structural microstates present in a system. With the advent of third generation synchrotron sources and considerable advances in optics that operate in the hard Xray regime [83] (from angstrom to subangstrom wavelengths), numerous Xray diffraction imaging techniques have sprung out [84,85,86], whose spatially resolving capabilities are most suitable to probing the crystal structure of epitaxial thin films. Despite the photon flux limitations of these techniques, a general consequence of the weak hard Xray scattering cross sections from matter, the exquisite sensitivity of Xray diffraction imaging to the atomic structure, all but guarantees datasets with unprecedented complexity and richness in information. Extracting the salient structural microstates of materials from these datasets, invariably requires advanced data mining techniques such as matrix factorization.
Here, we demonstrate the potential of matrix factorization, in particular nonnegative matrix factorization, in determining epitaxial strain inheritance in an oxide heterostructure from fullfield hard Xray diffraction microscopy (XDM).
XDM is a dark field imaging technique which employs a combination of hard Xray optics to form a real space image of the sample with diffraction contrast. By operating in a Bragg reflection geometry, XDM is sensitive to the full threedimensional atomic structure of a material with a lateral spatial resolution of ~ 70 nm [87], with structural imaging contrast that is diffraction limited (subÅ) [86]. One of the simplest operation modes of XDM is by scanning one of the crystal truncation rods of the substrate, to spatially resolve the spatial distribution of the induced epitaxial strain on the different crystalline layers in a heterostructure (Fig. 12). The XDM dataset originating from the rod scan consists of real space images (Fig. 12b) taken at different Q_{ z } positions along the truncation rod (Fig. 12a), where Q_{ z } is the momentum transfer along the surface normal z (see Fig. 12 caption). The resultant XDM dataset, X(x,y,Q_{ z }), therefore depends on image pixel position (x,y) and Q_{ z }, with the image pixels (x,y) corresponding to lateral sample positions with an effective pixel size of 15 nm (Fig. 12c). As such, X(x,y,Q_{ z }) can be simply interpreted as a spatially resolved XRD, with an XRD intensity I(Q_{ z }) associated with each sample position (x,y).
The studied oxide heterostructure is composed of (80 nm) Pb(Zr_{0.2}Ti_{0.8})O_{3}/(50 nm) SrRuO_{3}/SrTiO_{3} (001), with Bragg diffraction peaks (103 reflection) indicated in Fig. 12a. Due to the large thickness of the SrRuO_{3} (SRO) layers and its inplane lattice mismatch with the single crystal SrTiO_{3} (STO) (SRO: a_{pc}~ 3.93 Å, STO: a_{pc}= 3.905 Å), considerable strain relaxation is expected through the formation of threading dislocations and inhomogeneous spatial distributions in the inplane lattice constant of SRO [88], resulting in a broadening of its Bragg peak. The presence of these threading dislocation networks in the SRO film is clearly visible in XDM (image taken at Q_{ z } = SRO 103), appearing as dark lines since the presence of rotations in the crystal lattice planes near the dislocations moves the Bragg condition away from its nominal position for the dislocationfree regions of the thin film.
The different structural signatures of strainrelieving mechanisms and spatial distributions of structural phases present in the SRO and PZT layers are encoded in X(x,y,Q_{ z }), and can be extracted by nonnegative matrix factorization (NMF). In light of the discussion above, the constraints of orthogonality (SVD, PCA) and linear convexity (pLSI) are not justifiable for an XDM rod scan, since the signal from different structural configurations does not satisfy these constraints, but it does satisfy the constraint of nonnegativity, motivating our application of NMF.
Prior to application of NMF, the XDM dataset X(x,y,Q_{ z }) in Fig. 12b is reshaped into a matrix X(samples, features), where each sample is a spatial position (samples = 700 × 700 pixels) with which is associated a feature vector, given by the diffracted intensity I(Q_{ z }) (features = 56 Q_{ z } points). The nonnegative matrix factorization of X into lowrank factors (V_{ k }) and sample distributions (U_{ k }) are shown in Fig. 12 (note that size(X) = 49,000 × 56 and k = 6 representatives). The lowrank factors V_{ k } can be readily interpreted as XRD scans associated with different structural “phases” in the SRO and PZT films, while their associated U_{ k } show the spatial configurations of such phases (note that each U_{ k } is reshaped from an n vector to an x × y image).
Closer inspection of the lowrank factors indicates that k = 1–3 represent SRO domains with different d_{103} (where d_{HKL} is the spacing between (HKL) Bragg planes) as can be clearly seen from a shift in Q_{ z } of their Bragg peak positions (Fig. 13a) with respect to the spatially averaged 103 reflection. The spatial distributions of SRO domains with different epitaxial strain states are given by their corresponding sample distributions (U_{ k }, with k = 1–3) as shown in Fig. 13b. Note that the intensity of each U_{ k } image is directly proportional to how strongly a particular region of the sample is associated with the structural state characterized by Xray diffraction scan in V_{ k }. In essence, NMF provides the spatial distributions of different classes of SRO lattice configuration (given by U_{ k }), whose atomic positions, occupancies, etc. can be extracted through structural refinement of the XRD scan given by U_{ k }.
The presence of SRO domains with different lattice constants is consistent with the broadening of the spatially averaged Bragg peak in (Fig. 12a), and a direct consequence of relieving the misfit strain imposed by the STO substrate. In addition, to a coherent relaxation of strain, with spatial variations in d_{103} that are localized around the misfit dislocation lines, as can be seen in V_{ 2 }, there is a significant amount of incoherent strain relaxation leading to SRO domain segregation with no discernible preference to principal crystallographic directions (seen in V_{ 1 } and V_{ 3 }). Such domain segregation in SRO could be associated with the presence of RuO_{2} precipitations [89], and can be directly checked through traditional structural refinement of (U_{ 1 }, V_{ 1 }) and (U_{ 3 }, V_{ 3 }) to obtain atomic occupancies of the unit cell in these different SRO domains, buried underneath the PZT layers. Similar to the structural states of SrRuO_{3}, one can directly associate k =4–6 as containing structural deviations of PZT domains from the ensembleaveraged lattice configuration (c = 4.19 ± 10^{−2} Å, a = 3.97 ± 10^{−2} Å, as determined in [86]).
Without additional structural refinement, the NMF decomposition allows us to arrive at a qualitative understanding regarding the epitaxial strain transfer in this heterostructure. For instance, note that by inspection of V_{ 3 } (SRO) and V_{ 6 } (PZT), we remark that SRO domains with lower than average d_{103} spacing induce a minor change in the dspacing of PZT at the exact same lateral position. Furthermore, the changes in dspacing of PZT as shown in V_{5,6} is found to be largely concentrated near the misfit dislocations. These two observations indicate that strain transfer from one film to the next is mainly mediated by misfit dislocations of SRO which extend through PZT.
The power of matrix factorization techniques applied to structural imaging techniques such as XDM, resides in its ability to facilitate the extraction of key qualitative structural information, which can be additionally refined through modelbased interpretations (e.g., crystal structure factor calculations). Additional applications of NMF and other matrix factorization techniques to other Xray diffraction imaging techniques promise to reveal a wealth of structural information.
Conclusion
In this tutorial paper, we discussed the utility of matrix factorization for performing linear unmixing of imaging and spectroscopic data commonly acquired via microscopy modalities. We presented a matrix factorization framework to implement different physical constraints such as sparsity, spatial smoothness, and nonnegativity to constrain the unmixing, leading to more meaningful and interpretable endmembers and abundance maps. We compared the benefits of enforcing different physical constraints on ToFSIMS data such as nonnegativity (NMF), orthogonality without nonnegativity (PCA), spatial smoothness, and sparsity on the resulting spectra and abundance maps. Finally, we presented detailed examples of the use of constrained matrix factorization approaches on different spectroscopy data, including Xray microscopy and scanning probe microscopy datasets. This paper uses the open source NMF implementation from https://github.com/ramkikannan/nmflibrary. The imposition of such physical constraints here and in other machinelearning algorithms will be critical to better understand physical mechanisms in large multidimensional datasets commonly acquired in modernday imaging facilities.
References
 1.
Pennycook, S.J., Varela, M., Lupini, A.R., Oxley, M.P., Chisholm, M.F.: Atomicresolution spectroscopic imaging: past, present and future. J. Electron Microsc. 58, 87–97 (2009)
 2.
Zhou, W., Kapetanakis, M.D., Prange, M.P., Pantelides, S.T., Pennycook, S.J., Idrobo, J.C.: Direct determination of the chemical bonding of individual impurities in graphene. Phys. Rev. Lett. 109, 206803 (2012)
 3.
Suenaga, K., Koshino, M.: Atombyatom spectroscopy at graphene edge. Nature 468, 1088–1090 (2010)
 4.
Varela, M., Gazquez, J., Pennycook, S.J.: STEMEELS imaging of complex oxides and interfaces. MRS Bull. 37, 29–35 (2012)
 5.
Kumar, A., Ehara, Y., Wada, A., Funakubo, H., Griggio, F., TrolierMcKinstry, S., et al.: Dynamic piezoresponse force microscopy: spatially resolved probing of polarization dynamics in time and voltage domains. J. Appl. Phys. 112, 052021 (2012)
 6.
Guo, S., Jesse, S., Kalnaus, S., Balke, N., Daniel, C., Kalinin, S.V.: Direct mapping of ion diffusion times on LiCoO(2) surfaces with nanometer resolution. J. Electrochem. Soc. 158, A982–A990 (2011)
 7.
Kalinin, S., Balke, N., Jesse, S., Tselev, A., Kumar, A., Arruda, T.M., et al.: Liion dynamics and reactivity on the nanoscale. Mater. Today 14, 548–558 (2011)
 8.
Jesse, S., Balke, N., Eliseev, E., Tselev, A., Dudney, N.J., Morozovska, A.N., et al.: Direct mapping of ionic transport in a si anode on the nanoscale: time domain electrochemical strain spectroscopy study. ACS Nano 5, 9682–9695 (2011)
 9.
Kano, H., Segawa, H., Okuno, M., Leproux, P., Couderc, V.: Hyperspectral coherent Raman imaging—principle, theory, instrumentation, and applications to life sciences. J. Raman Spectrosc. 47, 116–123 (2016)
 10.
Wabuyele, M.B., Yan, F., Griffin, G.D., VoDinh, T.: Hyperspectral surfaceenhanced Raman imaging of labeled silver nanoparticles in single cells. Rev. Sci. Instrum. 76, 063710 (2005)
 11.
Fu, D., Holtom, G., Freudiger, C., Zhang, X., Xie, X.S.: Hyperspectral imaging with stimulated raman scattering by chirped femtosecond lasers. J. Phys. Chem. B 117, 4634–4640 (2013)
 12.
Bouillard, J.S.G., Dickson, W., Wurtz, G.A., Zayats, A.V.: Nearfield hyperspectral optical imaging. ChemPhysChem 15, 619–629 (2014)
 13.
Jung, S., Foston, M., Kalluri, U.C., Tuskan, G.A., Ragauskas, A.J.: 3D chemical image using TOFSIMS revealing the biopolymer component spatial and lateral distributions in biomass. Angew. Chem. Int. Ed. 51, 12005–12008 (2012)
 14.
Ievlev, A.V., Maksymovych, P., Trassin, M., Seidel, J., Ramesh, R., Kalinin, S.V., et al.: Chemical state evolution in ferroelectric films during tipinduced polarization and electroresistive switching. ACS Appl. Mater. Interfaces. 8, 29588–29593 (2016)
 15.
McDonnell, L.A., Heeren, R.M.A.: Imaging mass spectrometry. Mass Spectrom. Rev. 26, 606–643 (2007)
 16.
Zimmermann, T.: Spectral imaging and linear unmixing in light microscopy. In: Rietdorf, T., Denert, E. (eds.) Microscopy Techniques: −/−, pp. 245–265. Springer, Berlin (2005)
 17.
Peckner, R., Myers, S.A., Egertson, J.D., Johnson, R.S., Carr, S.A., MacCoss, M.J., et al.: Specter: linear deconvolution as a new paradigm for targeted analysis of dataindependent acquisition mass spectrometry proteomics. bioRxiv (2017). https://doi.org/10.1101/152744
 18.
Kalinin, S.V., Jesse, S., Rodriguez, B.J., Shin, J., Baddorf, A.P., Lee, H.N., et al.: Spatial resolution, information limit, and contrast transfer in piezoresponse force microscopy. Nanotechnology 17, 3400 (2006)
 19.
Collins, L., Okatan, M.B., Li, Q., Kravenchenko, I.I., Lavrik, N.V., Kalinin, S.V., et al.: Quantitative 3DKPFM imaging with simultaneous electrostatic force and force gradient detection. Nanotechnology 26, 175707 (2015)
 20.
Collins, L., Belianinov, A., Somnath, S., Balke, N., Kalinin, S.V., Jesse, S.: Full data acquisition in Kelvin probe force microscopy: mapping dynamic electric phenomena in real space. Sci. Rep. 6, 30557 (2016)
 21.
Cohen, G., Halpern, E., Nanayakkara, S.U., Luther, J.M., Held, C., Bennewitz, R., et al.: Reconstruction of surface potential from Kelvin probe force microscopy images. Nanotechnology 24, 295702 (2013)
 22.
Kirkland, E.J.: Linear image approximations. In: Kirkland, E.J. (ed.) Advanced Computing in Electron Microscopy, pp. 29–60. Springer, Boston (2010)
 23.
Björck, Å: Numerical Methods for Least Squares Problems. SIAM (1996)
 24.
Kannan, R.: Scalable and Distributed Constrained Low Rank Approximations. Georgia Institute of Technology, Atlanta (2016)
 25.
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim. 58, 285–319 (2014)
 26.
Kannan, R., Ishteva, M., Drake, B., Park, H.: Bounded matrix low rank approximation. In: Nonnegative Matrix Factorization Techniques, pp. 89–118. Springer, Berlin (2016)
 27.
Kannan, R., Ishteva, M., Park, H.: Bounded matrix factorization for recommender system. Knowl. Inf. Syst. 39, 491–511 (2014)
 28.
Keshava, N., Mustard, J.F.: Spectral unmixing. IEEE Signal Process. Mag. 19, 44–57 (2002)
 29.
Dobigeon, N., Moussaoui, S., Coulon, M., Tourneret, J.Y., Hero, A.O.: Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery. IEEE Trans. Signal Process. 57, 4355–4368 (2009)
 30.
Pearson, K.: LIII. On lines and planes of closest fit to systems of points in space. In: Philosophical Magazine Series 6, vol. 2, pp. 559–572. (1901)
 31.
Jolliffe, I.: Principal component analysis. In: Wiley StatsRef: Statistics Reference Online. Wiley, London (2014)
 32.
Medina, J.M., Pereira, L.M., Correia, H.T., Nascimento, S.M.C.: Hyperspectral optical imaging of human iris in vivo: characteristics of reflectance spectra. J. Biomed. Opt. 16, 076001 (2011)
 33.
Bonnet, N.: Artificial intelligence and pattern recognition techniques in microscope image processing and analysis. In: Hawkes, P.W. (ed.) Advances in Imaging and Electron Physics, vol. 114, pp. 1–77. Elsevier Academic Press Inc, San Diego (2000)
 34.
Bonnet, N.: Multivariate statistical methods for the analysis of microscope image series: applications in materials science. J. Microsc. Oxf. 190, 2–18 (1998)
 35.
Serin, V., Andrieu, S., Serra, R., Bonell, F., Tiusan, C., Calmels, L., et al.: TEM and EELS measurements of interface roughness in epitaxial Fe/MgO/Fe magnetic tunnel junctions. Phys. Rev. B 79, 144413 (2009)
 36.
Bosman, M., Watanabe, M., Alexander, D.T.L., Keast, V.J.: Mapping chemical and bonding information using multivariate analysis of electron energyloss spectrum images. Ultramicroscopy 106, 1024–1032 (2006)
 37.
Biesinger, M.C., Paepegaey, P.Y., McIntyre, N.S., Harbottle, R.R., Petersen, N.O.: Principal component analysis of TOFSIMS images of organic monolayers. Anal. Chem. 74, 5711–5716 (2002)
 38.
Race, A.M., Steven, R.T., Palmer, A.D., Styles, I.B., Bunch, J.: Memory efficient principal component analysis for the dimensionality reduction of large mass spectrometry imaging data sets. Anal. Chem. 85, 3071–3078 (2013)
 39.
Kalinin, S.V., Rodriguez, B.J., Budai, J.D., Jesse, S., Morozovska, A.N., Bokov, A.A., et al.: Direct evidence of mesoscopic dynamic heterogeneities at the surfaces of ergodic ferroelectric relaxors. Phys. Rev. B 81, 064107 (2010)
 40.
Jesse, S., Kalinin, S.V.: Principal component and spatial correlation analysis of spectroscopicimaging data in scanning probe microscopy. Nanotechnology 20, 085714 (2009)
 41.
Kalinin, S.V., Rodriguez, B.J., Jesse, S., Morozovska, A.N., Bokov, A.A., Ye, Z.G.: Spatial distribution of relaxation behavior on the surface of a ferroelectric relaxor in the ergodic phase. Appl. Phys. Lett. 95, 142902 (2009)
 42.
Ovchinnikov, O.S., Jesse, S., Bintacchit, P., TrolierMcKinstry, S., Kalinin, S.V.: Disorder identification in hysteresis data: recognition analysis of the randombondrandomfield ising model. Phys. Rev. Lett. 103, 157203 (2009)
 43.
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009). https://doi.org/10.1109/MC.2009.263
 44.
Shiga, M., Muto, S., Tatsumi, K., Tsuda, K.: Matrix factorization for automatic chemical mapping from electron microscopic spectral imaging datasets. Trans. Mater. Res. Soc. Jpn 41, 333–336 (2016)
 45.
Shiga, M., Tatsumi, K., Muto, S., Tsuda, K., Yamamoto, Y., Mori, T., et al.: Sparse modeling of EELS and EDX spectral imaging data by nonnegative matrix factorization. Ultramicroscopy 170, 43–59 (2016)
 46.
Kuang, D., Park, H.: Fast rank2 nonnegative matrix factorization for hierarchical document clustering. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 739–747. (2013)
 47.
Xu, W., Liu, X., Gong, Y.: Document clustering based on nonnegative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267–273. (2003)
 48.
Candes, E., Recht, B.: Exact matrix completion via convex optimization. Commun. ACM 55, 111–119 (2012)
 49.
Zhou, T., Tao, D.: Godec: randomized lowrank & sparse matrix decomposition in noisy case. In: International Conference on Machine Learning. (2011)
 50.
Kannan, R., Ballard, G., Park, H.: MPIFAUN: an MPIbased framework for alternatingupdating nonnegative matrix factorization. IEEE Trans. Knowl. Data Eng. 30(3), 544–558 (2018)
 51.
Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 606–610. (2005)
 52.
Choo, J., Lee, C., Clarkson, E., Liu, Z., Lee, H., Chau, D.H.P., et al.: VisIRR: interactive visual information retrieval and recommendation for largescale document data. Georgia Institute of Technology, Atlanta (2013)
 53.
Choo, J., Lee, C., Kim, H., Lee, H., Liu, Z., Kannan, R., et al.: VisIRR: visual analytics for information retrieval and recommendation with largescale document data. In: Visual Analytics Science and Technology (VAST), 2014 IEEE Conference on, pp. 243–244. (2014)
 54.
Kim, J., Park, H.: Sparse nonnegative matrix factorization for clustering. Georgia Institute of Technology, Atlanta (2008)
 55.
Bishop, C.M.: Pattern recognition and machine learning. Springer, Berlin (2006)
 56.
Wit, E., Heuvel, E.V.D., Romeijn, J.W.: ‘All models are wrong…’: an introduction to model uncertainty. Stat. Neerlandica 66, 217–236 (2012)
 57.
Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M.: mlrMBO: a modular framework for modelbased optimization of expensive blackbox functions. arXiv preprint arXiv:1703.03373 (2017)
 58.
Bergstra, J., Yamins, D., Cox, D.D.: Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. (2013)
 59.
Singh, A., Gordon, G.: A unified view of matrix factorization models. In: Machine Learning and Knowledge Discovery in Databases, pp. 358–373. (2008)
 60.
Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal component analysis to the exponential family
 61.
Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)
 62.
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1548–1560 (2011)
 63.
Golub, G.H., Van Loan, C.F.: Matrix Computations. JHU Press, Baltimore (2012)
 64.
Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal components analysis to the exponential family. In: Advances in Neural Information Processing Systems, pp. 617–624. (2001)
 65.
Lee, D.D., Sebastian, S.H.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)
 66.
Singh, A.P., Gordon, G.J.: A unified view of matrix factorization models. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 358–373, Berlin (2008)
 67.
Pacholski, M.L., Winograd, N.: Imaging with mass spectrometry. Chem. Rev. 99, 2977 (1999)
 68.
Ievlev, A.V., Belianinov, A., Jesse, S., Allison, D.P., Doktycz, M.J., Retterer, S.T., et al.: Automated interpretation and extraction of topographic information from time of flight secondary ion mass spectrometry data. Sci. Rep. 7, 17099 (2017)
 69.
Seidel, J., Trassin, M., Zhang, Y., Maksymovych, P., Uhlig, T., Milde, P., et al.: Electronic properties of isosymmetric phase boundaries in highly strained CaDoped BiFeO_{3}. Adv. Mater. 26, 4376–4380 (2014)
 70.
Ievlev, A.V., Maksymovych, P., Trassin, M., Seidel, J., Ramesh, R., Kalinin, S.V., et al.: Chemical state evolution in ferroelectric films during tipinduced polarization and electroresistive switching. ACS Appl. Mater. Interfaces. 8, 29588–29593 (2016)
 71.
Kalinin, S.V., Pennycook, S.J.: Microscopy: hasten high resolution. Nature 515, 487 (2014)
 72.
He, Q., Woo, J., Belianinov, A., Guliants, V.V., Borisevich, A.Y.: Better catalysts through microscopy: mesoscale M1/M2 intergrowth in Molybdenum–Vanadium based complex oxide catalysts for propane ammoxidation. ACS Nano 9, 3470–3478 (2015)
 73.
Vasudevan, R.K., Ziatdinov, M., Jesse, S., Kalinin, S.V.: Phases and interfaces from real space atomically resolved data: physicsbased deep data image analysis. Nano Lett. 16, 5574–5581 (2016)
 74.
Ziatdinov, M., Fujii, S., Kiguchi, M., Enoki, T., Jesse, S., Kalinin, S.V.: Data mining graphene: correlative analysis of structure and electronic degrees of freedom in graphenic monolayers with defects. Nanotechnology 27, 495703 (2016)
 75.
He, Q., Woo, J., Belianinov, A., Guliants, V.V., Borisevich, A.Y.: Better catalysts through microscopy: mesoscale M1/M2 Intergrowth in Molybdenum–Vanadium based complex oxide catalysts for propane ammoxidation. ACS Nano 9, 3470–3478 (2015)
 76.
Ziatdinov, M., Maksov, A., Li, L., Sefat, A.S., Maksymovych, P., Kalinin, S.V.: Deep data mining in a real space: separation of intertwined electronic responses in a lightly doped BaFe2As2. Nanotechnology 27, 475706 (2016)
 77.
Sefat, A.S., Marty, K., Christianson, A.D., Saparov, B., McGuire, M.A., Lumsden, M.D., et al.: Effect of molybdenum 4d hole substitution in BaFe_{2}As_{2}. Phys. Rev. B 85, 024503 (2012)
 78.
Li, L., Cao, H., McGuire, M.A., Kim, J.S., Stewart, G.R., Sefat, A.S.: Role of magnetism in superconductivity of BaFe_{2}As_{2}: study of 5d Audoped crystals. Phys. Rev. B 92, 094504 (2015)
 79.
Fäth, M., Freisem, S., Menovsky, A.A., Tomioka, Y., Aarts, J., Mydosh, J.A.: Spatially inhomogeneous metalinsulator transition in doped manganites. Science 285(5433), 1540–1542 (1999)
 80.
Holt, M., Harder, R., Winarski, R., Rose, V.: Nanoscale hard Xray microscopy methods for materials studies. Ann. Rev. Mater. Res. 43, 183–211 (2013)
 81.
Tanner, B.K.: Xray Diffraction Topography, vol. 10. Pergamon (1976)
 82.
Larson, B.C., Yang, W., Ice, G.E., Budai, J.D., Tischler, J.Z.: Threedimensional Xray structural microscopy with submicrometre resolution. Nature 415, 887–890 (2002)
 83.
Ice, G.E., Budai, J.D., Pang, J.W.L.: The race to Xray microbeam and nanobeam science. Science 334, 1234 (2011)
 84.
Hofmann, F., Abbey, B., Liu, W., Xu, R., Usher, B.F., Balaur, E., et al.: Xray microbeam characterization of lattice rotations and distortions due to an individual dislocation. Nat. Commun. 4, 2774 (2013)
 85.
Hruszkewycz, S.O., Highland, M.J., Holt, M.V., Kim, D., Folkman, C.M., Thompson, C., et al.: Imaging local polarization in ferroelectric thin films by coherent Xray Bragg projection ptychography. Phys. Rev. Lett. 110, 177601 (2013)
 86.
Laanait, N., Zhang, Z., Schlepütz, C.M.: Imaging nanoscale lattice variations by machine learning of Xray diffraction microscopy data. Nanotechnology 27, 1–10 (2016)
 87.
Laanait, N., Zhang, Z., Schlepütz, C.M., VilaComamala, J., Highland, M.J., Fenter, P.: Fullfield Xray reflection microscopy of epitaxial thinfilms. J. Synchrotron Radiat. 21, 1252–1261 (2014)
 88.
Oh, S.H., Park, C.G.: Misfit strain relaxation by dislocations in SrRuO_{3}/SrTiO_{3} (001) heteroepitaxy. J. Appl. Phys. 95, 4691–4704 (2004)
 89.
Koster, G., Klein, L., Siemons, W., Rijnders, G., Dodge, J.S., Eom, C.B., et al.: Structure, physical properties, and applications of SrRuO_{3} thin films. Rev. Mod. Phys. 84, 253–298 (2012)
Authors’ contributions
RK prepared the manuscript and assembled the detailed MFF, its implementation and computation on the scientific data. AI prepared sections on ToFSIMS 2D and 3D analysis. MAZ and RKV prepared the sections STEM and CITS. NL prepared the structural Xray imaging and the analysis on XDM dataset. SKV contributed to the introduction discussion targeting the audience and led the entire team into this writing. SJ heavily contributed to the overall writing as well as the meaningful domain discussions. All authors read and approved the final manuscript.
Acknowledgements
A portion of this research related to the Matrix Factorization library was partially funded by the Oak Ridge National Laboratory Director’s Research and Development fund (RK). A portion of this research was sponsored by the U.S. Department of Energy (DOE), Office of Science (OS), Basic Energy Sciences, Materials Sciences and Engineering Division (RKV, SVK, MAZ). A portion of this research was conducted and partially supported (SJ, AVI) at the Center for Nanophase Materials Sciences, which is a US DOE Office of Science User Facility. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy. NL acknowledges support from the Eugene P. Wigner Fellowship Program (ORNL). XDM data were acquired at the Advanced Photon Source, a US DOE User facility at Argonne National Laboratory. MAZ thanks P. Maksymovych (ORNL) and J. Wang (LANL) for their assistance in STM measurements. RKV gratefully acknowledges A. Borisevich (ORNL) and Q. He (Cardiff University) for use of STEM image of the oxide catalyst. This manuscript has been authored by UTBattelle, LLC under Contract No. DEAC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paidup, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan(http://energy.gov/downloads/doepublicaccessplan).
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
Not applicable.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Funding
We have acknowledged the relevant funding agencies in the acknowledgements.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Unmixing
 Image segmentation
 Scanning probe microscopy
 Matrix factorization
 Big data
 High performance