Feature extraction via similarity search: application to atom finding and denoising in electron and scanning probe microscopy imaging
© The Author(s) 2018
Received: 4 October 2017
Accepted: 27 January 2018
Published: 1 March 2018
We develop an algorithm for feature extraction based on structural similarity and demonstrate its application for atom and pattern finding in high-resolution electron and scanning probe microscopy images. The use of the combined local identifiers formed from an image subset and appended Fourier, or other transform, allows tuning selectivity to specific patterns based on the nature of the recognition task. The proposed algorithm is implemented in Pycroscopy, a community-driven scientific data analysis package, and is accessible through an interactive Jupyter notebook available on GitHub.
Recent advances in (scanning) transmission electron microscopy (STEM) and scanning probe microscopy (SPM) made atomically resolved imaging of solids and surfaces routine [1–5]. STEM enables the visualization of the atomic structure in a broad range of materials from oxides to semiconductors and metals, and in many cases allows observation of the evolution of structure under reactive conditions or thermal stimulations [6–11]. Similarly, ultra-high vacuum (UHV) and liquid SPM modes were used to resolve atomic structures of metal and semiconductors, ad atom structures, etc. [12–16].
Both for (S)TEM and SPM, of interest is the fundamental analysis of materials physics and chemistry from imaging data. Indeed, until recently atomically resolved images were used solely to establish the local structure of materials and make qualitative observations on its quality, the presence of specific defects, etc. The progress in spatial resolution and related information limit enabled quantitative description of images, where (for STEM) atomic coordinates for some (or all) constitutive atoms can be extracted with picometer precision. Once available, this information can be used to reconstruct physical order parameter fields such as polarization [17–20], octahedral tilts [21–23], or chemical expansion . Alternatively, local atomic configurations can be analyzed in an unbiased manner via statistical methods, providing information on local crystallography [25, 26]. Moreover, crystallographic information can be extracted from the shape of the atomic column [22, 27]. Parenthetically, the extraction of physical information from atomically resolved imaging data, along with 3D imaging, provides the primary stimulus for development of progressive high-resolution STEM platforms . Similar approaches can be applied to scanning probe microscopy data [25, 28], albeit in this case the origin of the contrast is more complex.
These applications necessitate the development of robust and reliable techniques to extract atomic coordinates from atomically resolved images, requiring little or no human supervision. These generally require a combination of feature extraction methods with physics-based deconvolution. For STEM data, especially annular dark or bright field images, the analysis can be significantly simplified under the assumptions that the regions with maximum contrast correspond to atomic coordinates. For more complex cases, the target is the development of feature classification and feature extraction schemes. Here, we develop an approach based on sliding window decomposition and similarity search that enables fast and robust analysis of images with multiple periodic textures and can be used for denoising, feature extraction, and segmentation of data. We further note that in certain cases, the proposed algorithm leads to physically significant decompositions—however, we defer these studies to follow-on work focused on specific studies of imaging phenomena.
As a second step, each window in the stack is flattened from an m × m two-dimensional matrix to a one-dimensional array containing m2 elements. When it is desired to emphasize subtle changes in the atomic periodicity in certain atomically resolved images, additional information, such as the magnitude of the fast Fourier transform of the window, can be appended to the window. This combination of real-space window and corresponding FFT with certain weighting factor allows one to balance relative weight of real- and reciprocal space features and greatly improves flexibility of the analysis. We further note that for different preponderant textures, other image transforms can be used, e.g., Radon transform for analysis of the linear domain structures. Overall, this operation transforms the three-dimensional stack of windows into a large two-dimensional matrix [(N − m)2 windows, each with m2 pixels].
Consequently, the first few SVD components contain the most important correlations while the last few components contain the least correlated information and are typically considered to be noise. Figures S3–S5 in Additional file 1 show results from SVD applied to a windowed dataset. The original dataset, A, can be reconstructed exactly from the above equation or partially by using a subset of the r components. Thus, the 2D (N − m)2 × m2 dataset can be reconstructed using only the most informationally significant components from SVD. Subsequently, the windowing process shown in Fig. 1a–c can be used in reverse to generate the filtered image. Figure S6 in Additional file 1 shows the reconstruction of an image using the first few SVD components. Though the windowing process is shown for a square N × N pixel image, the same procedure can be used on rectangular images as well.
The results from SVD are further used to identify atoms or atomic columns via a pattern matching approach outlined in Fig. 1d–e. We begin the atom finding process by performing k-means clustering on U or a subset of U. k-Means clustering classifies data points into k clusters by Euclidean distance such that the variance within each cluster is minimized. In other words, k-means groups data points or pixels such that pixels within the same cluster are more similar to each other than those in other clusters. Applying k-means to U results in a (spatial) map of labels where the value at each pixel is the index of the cluster that the pixel belongs to k-means can also be applied to a subset of U to discount SVD components whose eigenvectors do not exhibit regular patterns. These components often contain information regarding long-range features (e.g., − drift), instrument noise (e.g., − 60 Hz noise), etc. Such manual selection of components in U can better enable k-means to capture the desired features. k-means requires k to be specified a priori and it is a challenge to determine an appropriate value of k that best represents the data. Hence, we ‘over-cluster’ the dataset, or choose a large value for k (e.g., – 24 to 60) to allow k-means to capture the finer nuances in the image, such as phase boundaries, in the original image.
Next, we manually select t square or rectangular windows in the denoised image that are centered on repeating patterns, such as atoms or atomic columns. Separate motifs are selected to represent each of the families of atoms or atomic columns. The coordinates of these windows are used to extract a corresponding set of t motifs from the spatial map of cluster labels obtained from k-means. The spatial abundance of each motif is calculated by scanning the motif across the spatial map of cluster labels, image column by image column and then image row by image row. For a given motif at a given location on the cluster label map, the ‘matching score’ is calculated as the number of pixels in the motif that match with the current window in the cluster label map. This matching score is divided by the number of pixels in the motif such that the score always ranges from 0 to 1. In the event that two motifs identify the same set of atoms, one of the motifs is removed. Additional motifs may be required to identify those atoms that are not captured by the original set of motifs. One example is the case where drift in the microscope results in distortions in the shapes of atoms in certain sections of the image.
This matching process results in t spatial maps of matching scores corresponding to each of the t motifs. These continuous-valued spatial maps of matching scores are manually thresholded to generate t binary maps where the score is set to 1 if the matching score is greater than the threshold and 0 otherwise. The threshold values for each pattern are manually chosen such that the number of matched areas is maximized while minimizing any overlap with segments from other patterns. For atomically resolved images, the coordinates of atoms can be estimated by calculating the centroid of each segment from the thresholded maps. When the same atom is identified for multiple motifs, supervised machine learning techniques such as K-nearest neighbors are used to remove duplicates and assign the atom to the correct motif. Subsequently, a variety of approaches can be used to refine the positions of the atoms for further analysis as necessary [23, 25, 28].
The microscopy community has developed and used a variety of techniques for denoising images and finding the positions of atoms and atomic columns in images since the inception of the SPMs and STEMs . Among the many techniques for denoising atomically resolved images, Gaussian blurring, filtering in the Fourier space, averaging over multiple unit cells, and averaging over a stack of images are some of the most commonly used methods. Most of these techniques are fast and simple but have major shortcomings. For example, filtering in the Fourier space is prone to adding additional atoms at vacancies, removing displacements in atomic positions, etc. Furthermore, it is challenging to further fine-tune the filter since the frequencies with significant information vary from image to image and are not known a priori. Cross-correlation and phase-correlation methods typically require a stack of multiple images and cannot work on a single image like our technique. The current state-of-art technique for image denoising is a non-local means (NLM)  technique called block-matching and 3D filtering (BM3D) , which identifies windows or patches that are similar, performs 3D wavelet denoising on similar patches and finally applies a Wiener filter. BM3D also shares the same shortcomings as frequency space filtering. Additional file 1: Figure S7–S10 shows the results from popular image filtering techniques. Additional file 1: Figure S11 compares the best results from five image denoising techniques. We observe that our technique is substantially better than all other techniques, including BM3D, at effectively removing the majority of the noise while retaining all the important information regarding the lattice structure. All other techniques only remove the short-range features or high-frequency components while the noise at low-frequency components still remains in the image. Furthermore, all the other techniques tend to erase atoms with relatively low intensities easily as the strength of the filter is increased.
We also compared our atom finding technique with other conventional alternatives. Similar to the image denoising alternatives, the atom finding techniques also have their own advantages and limitations. For example, though Gaussian convolution is fast and simple, it is unable to differentiate regions with different lattice structures and often deletes atoms with lower intensities along with the noise especially in images with low signal-to-noise ratio. Window-based convolutions on the other hand perform slightly better than Gaussian convolutions at identifying atoms with relatively low intensities but at the cost of a significant number of false-positives since the method is dominated by atoms or atomic columns with relatively high intensities. Moreover, similar to Gaussian convolution, window-based convolutions are also poor at distinguishing regions with different lattice structures. We find that our technique is superior to the aforementioned techniques since it has consistently been able to correctly identify all atoms and distinguish regions. See Additional file 1: Figure S12 for more information.
Clearly, each denoising and atom finding technique has its own merits and disadvantages and a single technique may not be ideal for every image. For instance, while our technique is consistent in denoising and atom identification, our (currently) computationally intensive algorithm requires modifications to make it suitable for real-time denoising of images and identification of atoms. We find that the best solution is for such algorithms to be made available freely via open-source packages to allow researchers to adopt the algorithms that suit them the best.
We implemented methods for finding atoms and patterns in the high-resolution images based on similarity search on sliding transforms of images. This approach is universally applicable to STEM, STM, and AFM data, and can be also applied to the other feature- finding problems. The use of the identification object comprised image subset and appended Fourier transform allows tuning for increased detectability of periodic structures, and can be adapted to other characteristic morphologies, e.g., via use of Hough transforms.
All the image denoising and atom finding algorithms presented in this paper are freely available in our open-source, community-driven, python package—Pycroscopy (https://github.com/pycroscopy/pycroscopy). The scientific workflow presented in this paper is available via a Jupyter notebook (http://nbviewer.jupyter.org/github/pycroscopy/pycroscopy/blob/master/jupyter_notebooks/Image_Cleaning_Atom_Finding.ipynb) that allows straightforward application of the presented methodology to arbitrary images.
The image in Fig. 3i is a simulated image.
The image in Fig. 3m is WSe2 irradiated with He ions acquired via Nion Company UltraSTEM 100 at 100 keV. Image Courtesy of Nicholas Cross and Gerd Duscher, The University of Tennessee, Knoxville.
SJ, SK, and SS conceived the original ideas for the algorithms. SS and CRS implemented the algorithms in python, analyzed the data, and generated the figures. SS and SK prepared the manuscript. MC, AB, SJ, NC, and GD provided the atomically resolved images. All authors read and approved the final manuscript.
This research was sponsored by the Division of Materials Sciences and Engineering, Basic Energy Sciences, US Department of Energy (SVK and SS). This research was conducted and partially supported (CRS) at the Center for Nanophase Materials Sciences, which is an US DOE Office of Science User Facility. This research was supported by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC for the US Department of Energy (S.J.). We gratefully acknowledge Ondrej Dyck’s assistance in identifying the filtering parameters for an atomically resolved image.
Notice: This manuscript has been authored by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the US Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
The authors declare that they have no competing interests.
Availability of data and materials
All the data and algorithms used in this paper are available on the public software repository-pycroscopy at https://github.com/pycroscopy/pycroscopy.
Ethics approval and consent to participate
This research was sponsored by the Division of Materials Sciences and Engineering, Basic Energy Sciences, US Department of Energy. Part of this research was funded by Center for Nanophase Materials Sciences, which is an US DOE Office of Science User Facility. This research was supported by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC for the US Department of Energy.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Pennycook, S.J., Kalinin, S.V.: Microscopy: hasten high resolution. Nature 515, 487–488 (2014)View ArticleGoogle Scholar
- Pennycook, S.J., Nellist, P.D. (eds.): Scanning transmission electron microscopy: imaging and analysis. Springer, New York (2011)Google Scholar
- Shibata, N., Findlay, S.D., Kohno, Y., Sawada, H., Kondo, Y., Ikuhara, Y.: Differential phase-contrast microscopy at atomic resolution. Nat. Phys. 8, 611–615 (2012)View ArticleGoogle Scholar
- Huang, P.Y., Kurasch, S., Alden, J.S., Shekhawat, A., Alemi, A.A., McEuen, P.L., Sethna, J.P., Kaiser, U., Muller, D.A.: Imaging atomic rearrangements in two-dimensional silica glass: watching silica’s dance. Science 342, 224–227 (2013)View ArticleGoogle Scholar
- Morgenstern, K., Lorente, N., Rieder, K.H.: Controlled manipulation of single atoms and small molecules using the scanning tunnelling microscope. Phys. Status Solidi B-Basic Solid State Phys. 250, 1671–1751 (2013)View ArticleGoogle Scholar
- Pennycook, S.J., Chisholm, M.F., Lupini, A.R., Varela, M., van Benthem, K., Borisevich, A.Y., Oxley, M.P., Luo, W., Pantelides, S.T.: Advances in imaging and electron physics, vol. 153, p. 327. Elsevier Academic Press Inc., San Diego (2008)Google Scholar
- Ishikawa, R., Mishra, R., Lupini, A.R., Findlay, S.D., Taniguchi, T., Pantelides, S.T., Pennycook, S.J.: Direct observation of dopant atom diffusion in a bulk semiconductor crystal enhanced by a large size mismatch. Phys. Rev. Lett. 113, 155501 (2014)View ArticleGoogle Scholar
- Yang, Z.Q., Yin, L.C., Lee, J., Ren, W.C., Cheng, H.M., Ye, H.Q., Pantelides, S.T., Pennycook, S.J., Chisholm, M.F.: Direct observation of atomic dynamics and silicon doping at a topological defect in graphene. Angew Chem. Int. Edit. 53, 8908–8912 (2014)View ArticleGoogle Scholar
- Komsa, H.P., Kotakoski, J., Kurasch, S., Lehtinen, O., Kaiser, U., Krasheninnikov, A.V.: Two-dimensional transition metal dichalcogenides under electron irradiation: defect production and doping. Phys. Rev. Lett. 109, 035503 (2012)View ArticleGoogle Scholar
- Susi, T., Meyer, J.C., Kotakoski, J.: Manipulating low-dimensional materials down to the level of single atoms with electron irradiation. Ultramicroscopy 180, 163–172 (2017)View ArticleGoogle Scholar
- Zheng, H.M., Rivest, J.B., Miller, T.A., Sadtler, B., Lindenberg, A., Toney, M.F., Wang, L.W., Kisielowski, C., Alivisatos, A.P.: Observation of transient structural-transformation dynamics in a Cu2S nanorod. Science 333, 206–209 (2011)View ArticleGoogle Scholar
- Gerber, C., Lang, H.P.: How the doors to the nanoworld were opened. Nat. Nanotechnol. 1, 3–5 (2006)View ArticleGoogle Scholar
- Binnig, G., Quate, C.F., Gerber, C.: Atomic force microscope. Phys. Rev. Lett. 56, 930–933 (1986)View ArticleGoogle Scholar
- Binnig, G., Rohrer, H., Gerber, C., Weibel, E.: 7X7 reconstruction on Si(111) resolved in real space. Phys. Rev. Lett. 50, 120–123 (1983)View ArticleGoogle Scholar
- Fukuma, T., Kobayashi, K., Matsushige, K., Yamada, H.: True molecular resolution in liquid by frequency-modulation atomic force microscopy. Appl. Phys. Lett. 86, 193108 (2005)View ArticleGoogle Scholar
- Fukuma, T., Onishi, K., Kobayashi, N., Matsuki, A., Asakawa, H.: Atomic-resolution imaging in liquid by frequency modulation atomic force microscopy using small cantilevers with megahertz-order resonance frequencies. Nanotechnology 23, 135706 (2012)View ArticleGoogle Scholar
- Nelson, C.T., Winchester, B., Zhang, Y., Kim, S.J., Melville, A., Adamo, C., Folkman, C.M., Baek, S.H., Eom, C.B., Schlom, D.G., Chen, L.Q., Pan, X.Q.: Spontaneous vortex nanodomain arrays at ferroelectric heterointerfaces. Nano Lett. 11, 828–834 (2011)View ArticleGoogle Scholar
- Jia, C.L., Mi, S.B., Urban, K., Vrejoiu, I., Alexe, M., Hesse, D.: Atomic-scale study of electric dipoles near charged and uncharged domain walls in ferroelectric films. Nat. Mater. 7, 57–61 (2008)View ArticleGoogle Scholar
- Jia, C.L., Nagarajan, V., He, J.Q., Houben, L., Zhao, T., Ramesh, R., Urban, K., Waser, R.: Unit-cell scale mapping of ferroelectricity and tetragonality in epitaxial ultrathin ferroelectric films. Nat. Mater. 6, 64–69 (2007)View ArticleGoogle Scholar
- Borisevich, A.Y., Chang, H.J., Huijben, M., Oxley, M.P., Okamoto, S., Niranjan, M.K., Burton, J.D., Tsymbal, E.Y., Chu, Y.H., Yu, P., Ramesh, R., Kalinin, S.V., Pennycook, S.J.: Suppression of octahedral tilts and associated changes in electronic properties at epitaxial oxide heterostructure interfaces. Phys. Rev. Lett. 105, 087204 (2010)View ArticleGoogle Scholar
- Jia, C.L., Mi, S.B., Faley, M., Poppe, U., Schubert, J., Urban, K.: Oxygen octahedron reconstruction in the SrTiO(3)/LaAlO(3) heterointerfaces investigated using aberration-corrected ultrahigh-resolution transmission electron microscopy. Phys. Rev. B 79, 081405 (2009)View ArticleGoogle Scholar
- Borisevich, A., Ovchinnikov, O.S., Chang, H.J., Oxley, M.P., Yu, P., Seidel, J., Eliseev, E.A., Morozovska, A.N., Ramesh, R., Pennycook, S.J., Kalinin, S.V.: Mapping octahedral tilts and polarization across a domain wall in BiFeO(3) from Z-contrast scanning transmission electron microscopy image atomic column shape analysis. ACS Nano. 4, 6071–6079 (2010)View ArticleGoogle Scholar
- Kim, Y.M., Kumar, A., Hatt, A., Morozovska, A.N., Tselev, A., Biegalski, M.D., Ivanov, I., Eliseev, E.A., Pennycook, S.J., Rondinelli, J.M., Kalinin, S.V., Borisevich, A.Y.: Interplay of octahedral tilts and polar order in BiFeO3 films. Adv. Mater. 25, 2497–2504 (2013)View ArticleGoogle Scholar
- Kumar, A., Leonard, D., Jesse, S., Ciucci, F., Eliseev, E.A., Morozovska, A.N., Biegalski, M.D., Christen, H.M., Tselev, A., Mutoro, E., Crumlin, E.J., Morgan, D., Shao-Horn, Y., Borisevich, A., Kalinin, S.V.: Spatially resolved mapping of oxygen reduction/evolution reaction on solid-oxide fuel cell cathodes with sub-10 nm resolution. ACS Nano. 7, 3808–3814 (2013)View ArticleGoogle Scholar
- Lin, W.Z., Li, Q., Belianinov, A., Sales, B.C., Sefat, A., Gai, Z., Baddorf, A.P., Pan, M.H., Jesse, S., Kalinin, S.V.: Local crystallography analysis for atomically resolved scanning tunneling microscopy images. Nanotechnology 24, 415707 (2013)View ArticleGoogle Scholar
- He, Q., Woo, J., Belianinov, A., Guliants, V.V., Borisevich, A.Y.: Better catalysts through microscopy: mesoscale M1/M2 intergrowth in molybdenum-vanadium based complex oxide catalysts for propane ammoxidation. ACS Nano. 9, 3470–3478 (2015)View ArticleGoogle Scholar
- He, Q., Ishikawa, R., Lupini, A.R., Qiao, L., Moon, E.J., Ovchinnikov, O., May, S.J., Biegalski, M.D., Borisevich, A.Y.: Towards 3D mapping of BO6 octahedron rotations at perovskite heterointerfaces, unit cell by unit cell. ACS Nano. 9, 8412–8419 (2015)View ArticleGoogle Scholar
- Gai, Z., Lin, W.Z., Burton, J.D., Fuchigami, K., Snijders, P.C., Ward, T.Z., Tsymbal, E.Y., Shen, J., Jesse, S., Kalinin, S.V., Baddorf, A.P.: Chemically induced Jahn–Teller ordering on manganite surfaces. Nat. Commun 5, 4528 (2014)View ArticleGoogle Scholar
- Vasudevan, R.K., Ziatdinov, M., Jesse, S., Kalinin, S.V.: Phases and interfaces from real space atomically resolved data: physics-based deep data image analysis. Nano Lett. 16, 5574–5581 (2016)View ArticleGoogle Scholar
- Voyles, P.M.: Informatics and data science in materials microscopy. Curr. Opin. Solid State Mater. Sci. 21, 141–158 (2017)View ArticleGoogle Scholar
- Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. IEEE 2, 60–65 (2005)Google Scholar
- Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising with block-matching and 3 D filtering. IEEE 6064, 606414 (2006)Google Scholar