Skip to main content

Mechanical assessment of defects in welded joints: morphological classification and data augmentation


We develop a methodology for classifying defects based on their morphology and induced mechanical response. The proposed approach is fairly general and relies on morphological operators (Angulo and Meyer in 9th international symposium on mathematical morphology and its applications to signal and image processing, pp. 226-237, 2009) and spherical harmonic decomposition as a way to characterize the geometry of the pores, and on the Grassman distance evaluated on FFT-based computations (Willot in C. R., Méc. 343(3):232–245, 2015), for the predicted elastic response. We implement and detail our approach on a set of trapped gas pores observed in X-ray tomography of welded joints, that significantly alter the mechanical reliability of these materials (Lacourt et al. in Int. J. Numer. Methods Eng. 121(11):2581–2599, 2020). The space of morphological and mechanical responses is first partitioned into clusters using the “k-medoids” criterion and associated distance functions. Second, we use multiple-layer perceptron neural networks to associate a defect and corresponding morphological representation to its mechanical response. It is found that the method provides accurate mechanical predictions if the training data contains a sufficient number of defects representing each mechanical class. To do so, we supplement the original set of defects by data augmentation techniques. Artificially-generated pore shapes are obtained using the spherical harmonic decomposition and a singular value decomposition performed on the pores signed distance transform. We discuss possible applications of the present method, and how medoids and their associated mechanical response may be used to provide a natural basis for reduced-order models and hyper-reduction techniques, in which the mechanical effects of defects and structures are decorrelated (Ryckelynck et al. in C. R., Méc. 348(10–11):911–935, 2020).

1 Introduction

Our ability to design and produce materials with desired properties has dramatically improved, and commonly integrates sensors, control, simulation data and computerized predictions (see reviews in [46]). These techniques combine material imaging [7] and digital-twins frameworks [8] for manufacturing and evaluating material properties. In mechanics, numerous applications concern control, defects and anomaly detection [9] or fatigue-design of materials [10, 11]. Machine learning algorithms, notably, have been proposed in the aeronautical field [12], fabrication process [13] or pipelines applications [14], and have been combined with numerical computations to study defects in ball bearings [15]. Simulation-driven machine learning methods based on existing mechanical models are especially attractive as they avoid the explicit parametrization of defects being modeled. Assessing and certifying the mechanical properties of structures containing defects nevertheless requires advanced micro-mechanical models, as well as non-destructive imaging techniques such as X-ray tomography [16] or ultrasonic measurements [17].

This is due to the recognition that composites (or, for that matter, porous) microstructures, often exhibit widely-varying effective responses, as demonstrated by homogenization theories [18, 19] and in optimal-design problems [20]. A broad range of mechanical properties may be achieved by tailoring the inner geometrical arrangement of microstructures, as surveyed in e.g. [2123]. Aside for a few rigorous results obtained for particular geometries, e.g. the Eshelby [24] or Vigdergauz [25] inclusions, the effect of the shapes of pores on the overall mechanical response is difficult to quantify even for linearly-elastic media, and usually involves sophisticated mathematical tools. In plane strain, the presence of corners [26], up to the limiting case of a crack tip [27], bottlenecks [28, 29], and high-aspect ratios are known to be mechanically-determining factors, as highlighted by studies based on conformal mappings [30] or radon transforms techniques [31].

Although these rigorous and (semi-)analytical results are useful as guides, they are restricted to linear media under plane strain or stress, with notable exceptions [32, 33]. They do not allow one to explore the links between morphology and mechanical response, important in industrial problems. The latter often involve inverse-design problems within a given class of microstructures or morphologies, that results from manufacturing constraints [34, 35]. In energetic granular materials, for instance, particles shape and size depend on crystallography and may be controlled by surface treatment, to some extent [36]. Furthermore, the overall material response alone, characterized by an effective stiffness tensor, is insufficient for determining the full mechanical response. The local response, sensitive to the internal microstructure arrangement of a given material system, must be accounted for. Damage localization, which leads to brittle or ductile fracture in composite materials, is driven by the local stress state in the microstructure, which is itself a complex result of the load distribution within the material.

Lately, shape statistics based on morphological operators have been devised [1, 37] and so-called shape spaces [38] have gained attraction as a versatile method for quantifying shapes, seen as points in a high-dimensional metric space representing Fourier-based expansions. On a sphere, the Laplace–Beltrami eigenfunctions have an explicit form in terms of spherical harmonics [39, 40], which can then be used to represent continuous shapes [41], seen as deformation of the sphere, or as mapping between a sphere and an arbitrary shape. This decomposition is especially useful for modeling data on a regular grid (i.e. on images), in computer graphics [42], medical image analysis [43] or material science [44]. In the context of mechanics, sophisticated image analysis approaches based on machine-learning methods have already been employed to detect, and more generally classify, “critical” defects as exemplified in several industrial problems [13, 14]. Other approaches have sought to infer the mechanical response of materials using temperature fields [45]. These methods can be supplemented by transfer learning [46] and shape explorations techniques to determine mechanically-relevant criteria for assessing the effect or criticality of defects.

The purpose of the present work is twofold. First, a methology is developed to explore and classify defects based on shapes and mechanical response, using morphological transforms, spherical harmonics, and the Grassmann distance. To illustrate our approach, a set of defects previously observed in tomography images of welded joins is analyzed to serve as a model problem. Second, use is made of machine learning methods to compare and correlate the resulting classifications. The adequacy of the method for detecting critical defects, and other possible applications, are discussed, as well as future works, such as those exploring the dependency of our results on the choosen metrics.

The present article is organized as follows. Section 2 presents the set of defects used as the basis for the present study, whereas Sect. 3 deals with the various distances used for clustering, including full-field mechanical computations. Our main results, which concern the mechanicaly-based clustering of shapes, are given in Sect. 4. These results are compared to those obtained after data augmentation of the initial set of defect in Sect. 5. We conclude in Sect. 6.

2 Data set of defects and goal of the present work

The present work is based on a data set of defects obtained in L. Lacourt’s PhD thesis [47]. These defects have been extracted from a segmented X-ray tomography image of welded joints, see [2]. The data set consists in 1288 defects in total, each containing between 500 and \(100\text{,}000\) voxels. Smaller defects present in the original image have been discarded in the present study. Slightly more than half of the defects are close to spheres, whereas the rest of them display various convex and non-convex shapes (Fig. 1), see [48]. After segmentation, each defect is embedded in a bounding box in 3D, with edges aligned with the axis (\(\mathrm{e}_{1}\), \(\mathrm{e}_{2}\), \(\mathrm{e}_{3}\)) of a Cartesian coordinates system. The shape has been rotated so that its first ans second principal axis are aligned with \(\mathrm{e}_{1}\) and \(\mathrm{e}_{2}\). A reflection with respect to the plane (\(\mathrm{e}_{1}\), \(\mathrm{e}_{2}\)) is carried out so that the highest absolute coordinate along \(\mathrm{e}_{3}\) is positive. Finally, a homothety is performed so that the dimension of the shape along axis \(\mathrm{e}_{1}\) is \(1/4\) that of the embedding box. For all shapes, the embedding box is a cube containing \(L^{3}=80^{3}\) voxels. Accordingly, the shapes have varying volume fractions, but the same diameter with respect to their bounding box,. This is so that cracks or pores with very high aspect ratios can be discretized with similar resolution.

Figure 1
figure 1

Views in three dimensions of three non-spherical defects, segmented from the tomography image of welded joints

The effects of such defects on the mechanical response of a structure can be efficiently estimated using the two-scales hyper-reduction method proposed in [2] for fatigue. In this method, schematized in Fig. 2, the effect of the overall structure and of defects are dissociated, whereas interactions between the two are taken into account by the far-field [49, 50]. In practice, a reduced basis is computed for the structure without defect and another one for the defect. A “global reduced basis” is then computed by transferring both reduced basis on the real mesh containing the defect and concatenating them. Using this reduced basis, a “hyper-reduced” simulation is performed on a reduced domain of integration (Fig. 2, orange and grey regions). As such, numerical computations are carried out on a sane structure without pores, and on isolated defects, rather than on the entire structure containing defects. Generally speaking, the method is most efficient when dealing with complex and time-consuming constitutive-laws. Speed-up as high as 102 and 103 have been obtained in fatigue in mechanics, for elastoplastic behavior, and about 10 in linear-elastic cases [3].

Figure 2
figure 2

Schematic view of the hyper-reduction method. The structure and defects are treated separately. Orange: fusion zone; grey: base metal. Areas in white are not meshed. In the classical method [2], numerical computations are performed for each new defect (rectangle a). In the proposed approach, the pre-computed mechanical response of a nearest defect is used (rectangle b)

While the hyper-reduced method improves on standard finite element techniques, computing the reduced basis of defects can be time-consuming. This point needs to be addressed in industrial applications where the effect of defects must be quantified in near real-time. Often, the pores shape is random, but follows a certain probability distribution that needs to be estimated. The mechanical responses of shapes close to one another need not be computed twice, in general. However, as noted in Sect. 1, while different shapes may yield similar mechanical response, small difference such as the presence of corners, could induce different mechanical responses. The goal of this work is to investigate whether one may pick an appropriate reduced basis for a defect by learning the mechanical responses of a set of other defects, and how they relate to their shapes. To do this, the mechanical computations for the fields around a defect (rectangle a, Fig. 2) are replaced by statistical learning, making use of pre-computed mechanical fields used as training data (rectangle b, Fig. 2). The full scheme in Fig. 2 will not be implemented in the present work. Instead we focus on the task in rectangle (b) of the same graph, and consider linear elasticity as a proof of concept for our approach.

3 Mechanical and morphological distances

In the following, we make use of a Fourier-based scheme with rotated discrete Green operator [51] to carry out mechanical computations. The method uses periodic boundary conditions, relevant for quasi-isolated defects and has been found to be efficient when compared to finite element, both in terms of memory computations, accuracy and CPU time [52]. For each defect, six FFT computations with prescribed overall strain ε are carried out, corresponding to the six independent strain loadings, in our case \(E_{i}=\langle \varepsilon _{i}\rangle \), (\(i=xx\), xy, xz, yy, yz, zz) with \(E_{j}=0\) (\(j\neq i\)). Accordingly, the data consists in a fourth-order tensorial field, denoted localization tensor in homogenization theories, which has both minor and major symmetry. Figure 3 shows as an example two strain components obtained under uniaxial extension. The fluctuation of the strain field inside the pore depends on the choice of the Green operator and has no physical meaning, except for the mean of the strain in the pore. Accordingly, the strain field inside the pore is replaced by its mean in all mechanical computations.

Figure 3
figure 3

2D cut of two longitudinal and shear strain components \(\varepsilon _{yy}\) (a) and \(\varepsilon _{yz}\) (b) for the middle defect in Fig. 1, with axis \(\text{e}_{y}\) and \(\text{e}_{z}\) vertical and normal to the figure. Macroscopic strain loading: \(E_{yy}=\langle \varepsilon _{yy}\rangle =1\)% (color scale in percent)

In the rest of this study, use is made of the Grassman distance [53, 54] schematized in Fig. 4, for evaluating the dissimilarity between mechanical responses. Consider two matrices \(V_{1}\), \(V_{2}\in \mathbb{R}^{\mathcal{N}}\times \mathbb{R}^{N}\) representing the full-field mechanical response of two defects, where \(N=6\) is the number of applied macroscopic loadings and \(\mathcal{N}=6L^{3}\) are the number of strain components in all voxels for a given loading. The Grassman distance between \(V_{1}\) and \(V_{2}\) is given by:

$$ d^{g}(V_{1}, V_{2})= \Vert \Theta \Vert _{\mathcal{F}}=\sqrt{\sum_{i}\theta _{i}^{2}}, $$

where \(\|\cdot \|_{\mathcal{F}}\) is the Frobenius distance and Θ is a diagonal matrix with eigenvalues \(\theta _{i}\) obtained from the singular value decomposition:

$$ V_{1}^{t}\cdot V_{2}=W_{1} \cdot \cos (\Theta )\cdot W_{2}^{t},\qquad W_{1}^{t} \cdot W_{1}=W_{2}^{t}\cdot W_{2}=I, $$

and I is the identity, and \(W_{1}\), \(W_{2}\in \mathbb{R}^{N}\times \mathbb{R}^{N}\) are orthogonal matrices. Distance (1) measures the dissimilarity between two defects by considering the subspaces (Grassman manifolds) generated by the set of strain responses for each loading to the two defects [55]. The distance is appropriate for mechanical responses with the same number of applied loadings. In the more general case of time-varying loadings, with different number of time steps for each defect, the Schubert distance [56] may be used instead.

Figure 4
figure 4

Schematic representation of the Grassmann distance for the mechanical clustering

The Grassmann distance involves a singular value decomposition performed on matrix \(V_{1}^{t}\cdot V_{2}\) (see Eq. (2)) of size \(N\times N\). These computations become time-consuming when a large number of objects (more than 1200 here) must be compared to one another. To improve on the computation of the Grassmann distances, we define a subdomain Ω of size \((L/2)^{3}\), included in the bounding box, and containing all defects. We define two alternative pseudo-Grassmann distances, computed as in (1) with the data for \(V_{1}\) and \(V_{2}\) restricted to either Ω or its boundary Ω. The computations of the pseudo-Grassmann distances in Ω and Ω is much more efficient as the bounding box Ω has a volume eight times smaller compared to the entire domain. Histograms for the (pseudo-)Grassmannn distances between 400 defects are represented in Fig. 5(a). The distribution of distances for the pseudo-Grassmann distance computed using Ω is strongly different from that of the Grassmann distance, indicating that the former can not be substituted to the latter. However, this is not so for the pseudo-Grassmann distance computed on the entire subdomain Ω which is close to the results obtained for the Grassmann distance, see Fig. 5(b). Accordingly, in the rest of this study, the Grassman distance is evalued on the subdomain Ω only.

Figure 5
figure 5

Histograms of the Grassmann distances for defects 1–400 (a) and corresponding point-cloud representation (b). The distance is estimated using the volume of the surrounding box Ω, the faces Ω of the surrounding box or computed using the entire domain

The shape of defects is quantified by two means, a morphological and spectral decomposition. Consider first the morphological transform based on the signed distance function:

$$ f(x)= \textstyle\begin{cases} d(x,\partial \mathcal{P}) & \text{if } x \in \mathcal{P}^{c}, \\ -d(x,\partial \mathcal{P}) & \text{if } x \in \mathcal{P}, \end{cases} $$

for a pore \(\mathcal{P}\), with boundary ∂P. This distance is obtained by propagating a distance function with quasi-Euclidean metric, and leads to spherical iso-lines far from the defect. The signed distance fields for two defects \(\mathcal{P}_{1}\), \(\mathcal{P}_{2}\) is vectorized into two arrays denoted \(F_{1}\) and \(F_{2}\in \mathbb{R}^{L^{3}}\) and we denote “morphological distance” the distance:

$$ d^{m}(\mathcal{P}_{1}, \mathcal{P}_{2})= \Vert F_{1}-F_{2} \Vert _{2}, $$

where \(\|\cdot \|_{2}\) is the Euclidean distance.

We also define a distance based on the spectral decomposition for the Laplace–Baltrami expansion [44, 57]. This expansion can be conveniently written in terms of spherical harmonics in the case of the sphere [58]. The latter form a basis for square-integrable functions on the unit sphere and this decomposition can accordingly be used to characterize star-shaped defects. We briefly recall how this spectral decomposition is estimated on digital images (the reader is refered to [58] for a detailed discussion). The decomposition reads, in spherical coordinates (θ, ϕ):

$$ x_{k}(\theta ,\phi )= \sum _{0\leq \ell , |m|\leq \ell } c_{k\ell }^{m}Y_{\ell }^{m}( \theta ,\phi ),\quad Y_{\ell }^{m}(\theta , \phi )=\sqrt{ \frac{2\ell +(\ell -m)!}{4\pi (\ell -m)!}}P_{\ell }^{m}\cos ( \theta ) \mathrm{e}^{im\phi }, $$

where P are Legendre polynomials and \(x_{k}\) are the coordinates (\(k=1\), 2, 3) of points along the surface of the defect. In practice, a set of \(25\times 25\) pixels are picked along the surface of the object, distributed uniformly along all directions from the center, providing values for the \(x_{k}(\theta ,\phi )\). The center is the minimial of the signed distance function. The double sum in (5) is truncated to \(|m|\leq \ell \leq \ell _{\max }=10\) and a least-square optimization procedure is used to determine the coefficients \(c_{k\ell }^{m}\). The latter are used to define the distance:

$$ d^{sh}(\mathcal{P}_{1},\mathcal{P}_{2})= \sqrt{\sum_{k\ell ,m} \bigl\Vert c^{m}_{k, \ell }( \mathcal{P}_{1})-c^{m}_{k,\ell }( \mathcal{P}_{2}) \bigr\Vert ^{2}}. $$

Conversely, Eq. (5) can be used to reconstruct a shape, for a given set of values \(c^{m}_{\ell }\). Two shapes and their associated reconstitution are shown in Fig. 6. The difference between the two are a consequence of the truncation of the spectral decomposition, and of the way interpolation points on the surface are chosen, i.e. uniformly distributed along all directions on the sphere rather than uniformly-distributed on the surface of the object. This reconstruction is imperfect and only captures some of the features of each shape.

Figure 6
figure 6

Two shapes and their reconstruction with spherical harmonics (right). (a) A cube. (b) The middle defect in Fig. 1

4 Clustering analysis

In this section, we consider the k-medoids clustering algorithm, which provides us with a set of classes as well as a most-central point (the “medoid”) in each class, that is present in the data set. The classification algorithm, which minimizes distances to the medoids, is based on the matrix of distances between points, and does not require the coordinates of each point [59]. Additionaly, since the medoid is present in the data set, its pre-computed reduced basis can be used in hyper-reduced methods for taking into account defects that belong to a known mechanical class.

We split the data set into two groups, a training set of 936 defects and a testing set containing 508 defects. The training set corresponds to the data collected on two-third of the welded joint and has been obtained on two tomography images. The test data corresponds to the rest of the welded joint, and has been obtained by a third tomography. Accordingly, the data in the two sets are not randomly drawn from a collection of defects, but instead are obtained from different sources, as would be expected in industrial applications. Our results are shown in Fig. 7 for the Grassman distance as well as the two shape distances. The points cloud representation in three dimensions is obtained by multidimensional scaling [60]. Partly-overlapping clusters on this representation may actually be separated when additional dimensions are considered. In this view with reduced dimension, the data form a continuous cloud of points. The medoids (right) exhibit spherical, oblate and non-convex shapes. The amount of information contained in the multidimensional scaling is plotted as a function of the dimension in Fig. 8(a) in the case of the Grassman distance, showing a strong decrease of the amount of unknown information up to \(d\approx 8\) and a slower decrease after that. The effect of the number of clusters is shown in Fig. 8(b), which represents the intra-clusters distance, i.e. the sum of the distances of each shape to its medoid. This distance decreases with the number of clusters. The “typical” number of clusters corresponding to this decrease is about 5, at which point the curve displays an elbow.

Figure 7
figure 7

Clustering provided by the k-medoids algorithm using the Grassmann, morphological and spherical harmonics distances. Medoids on right

Figure 8
figure 8

(a) Amount of information recovered by the multi-dimensional scaling vs. number of dimensions, for the Grassman-based distance (see Fig. 7). (b) Intra-clusters distance vs. number of clusters

Shape clustering as determined by the k-mdeoids analysis can not be used directly to assign a defect to its mechanical cluster, as shown in Fig. 9. This figure represents the confusion matrix that summarizes the number of shapes that belong to a given mechanical cluster and to a cluster based on either the morphological of spherical harmonics distance. Cluster labels are the same as in Fig. 7. The color scale indicates a concentration of shapes from a geometrical cluster into a specific mechanical cluster. Assigning a mechanical cluster to a shape based on its morphological or spherical-harmonics cluster would result in 74% and 87% erroneous labeling, respectively. Instead, we consider a classifier based on a dense neural networks (Fig. 10). The input to the network are the distances to the medoids based on the morphological distance. The network is trained to predict the label of the cluster corresponding to the Grassmann distance. It contains three hidden layers of 15 neurons each and is optimized on the log-loss function:

$$ \mathcal{L}=-\sum_{i=1}^{M}y^{j}_{i} \log \bigl(p^{j}_{i}\bigr), $$

where M is the number of classes, \(y^{j}_{i}\) is a binary indicator equal to 1 if class label i is the correct classification for observation j, and 0 otherwise, and \(p^{j}_{i}\) is the predicted probability of the observation j being of class i. The activation function is a rectified linear function. The training data is split into two different sets: a standard training set to fit parameters, representing 90% of the initial set, and a validation set representing the remaining 10% of the initial set of data. The validation set provides a stop criterion. Loss and accuracy curves, computed using (7), are shown in Fig. 11.

Figure 9
figure 9

Confusion matrix showing the number of shapes in Grassmann-based clusters with respect to clusters based on the morphological distance (left) and the spherical harmonics distance (right)

Figure 10
figure 10

Classifier methodology

Figure 11
figure 11

Loss function and accuracy vs. number of epochs for the training of the neural network, with (b, d) and without (a, c) data augmentation, and using the morphological (a, b) and spherical harmonics (b, d) distances

Figure 12 shows the confusion matrix representing the predicted and true labels, that summarizes the assignement by the network of a mechanical cluster for the various shapes, either in the training or test sets. The percentage of misclassified shapes by cluster is given on the right column. To quantify these results, we define an error on the training set \(\mathrm{e}_{\mathrm{tr}}=M_{\mathrm{tr}}/N_{ \mathrm{tr}}\) as the ratio of correct label predictions \(M_{\mathrm{tr}}\) divided by the total number of predictions \(N_{\mathrm{tr}}\) in the training set. We consider likewise a similarly-defined error \(\mathrm{e}_{\mathrm{te}}\) for the testing set. We also introduce a second error criterion \(\mathrm{e}'_{\mathrm{tr}}\), equal to the mean of the proportion of misclassified shapes in each (non-empty) mechanical cluster for the training data, and likewise \(\mathrm{e}'_{\mathrm{te}}\) for the testing set. These various errors highlight sub-optimal performances of \(\mathrm{e}_{\mathrm{tr}}=13\%\), \(\mathrm{e}'_{\mathrm{tr}}=10\%\) for the training data, as well as \(\mathrm{e}_{\mathrm{te}}=18\%\) and \(\mathrm{e}'_{\mathrm{te}}=26\%\) for the testing set. Higher errors \(\mathrm{e}_{\mathrm{tr}}=16\%\), \(\mathrm{e}'_{\mathrm{tr}}=19\%\), \(\mathrm{e}_{\mathrm{te}}=29\%\) and \(\mathrm{e}'_{\mathrm{te}}=34\%\) are observed when using spherical harmonics instead of the morphological distance (Fig. 12, rows 3 and 4). These results may be attributed to the small number of defects in some classes, as will be investigated in the next section.

Figure 12
figure 12

Left column: confusion matrices for the training (rows a, c) and testing sets (rows b, d) between Grassmann clustering (true label) and the label predicted by the dense neural network using the signed (rows a, b) and spherical harmonics distances (rows c, d). Right: percentage of misclassified shapes in each cluster

5 Data augmentation

To improve on the results presented in Sect. 5, we focus on data augmentation. Both the morphological and spherical harmonics distances are defined as Euclidean distances of vectors in multi-dimensional spaces. As explored in [1], these types of representations can be used for data interpolation as well. Let us consider the linear interpolations (\(0\leq s\leq 1\)):

$$ F(s)=s F(\mathcal{P}_{1})+(1-s)F( \mathcal{P}_{2}),\qquad c_{k\ell }^{m}(s)=s c_{k\ell }^{m}(\mathcal{P}_{1})+(1-s)c_{k\ell }^{m}( \mathcal{P}_{2}), $$

with respect to two shapes \(\mathcal{P}_{1}\) and \(\mathcal{P}_{2}\), where F and \(c_{k\ell }^{m}\) are defined in Eqs. (3), (4) (6) and (5). The vectors \(F(s)\) and \(c_{k\ell }^{m}(s)\) provide continuous interpolations between the two shapes, as illustrated in Fig. 13. An alternative approach consists in using a singular value decomposition on the matrix containing as columns the spherical harmonics decomposition \(c_{k\ell }^{m}(\mathcal{P}_{i})\) for all defects \(\mathcal{P}_{i}\). Considering as an example the first three singular values, a new shape may be represented as a point in a three-dimensional space. By paving this space with a set of points, one generates new defects that interpolate between the shapes corresponding to the three singular values. The set of points representing shapes in the coordinates system corresponding to the first three singular values is represented in Fig. 14(a). Figure 14(b) shows random shapes generated in this space.

Figure 13
figure 13

Interpolation between two shapes. (a) Morphological distance. (b) Spherical harmonics distance

Figure 14
figure 14

(a) Shapes in the three first singular components space. (b) Examples of shapes in the three first singular components space

We now generate 3128 artificial shapes with the above data augmentation techniques. The linear interpolation method in Eq. (8) is used preferentially on set of shapes that belong to mechanical clusters with few shapes. We then classify the shapes according to the k-medoids method, as described in Sect. 4. Results corresponding to the mechanical, morphological and spherical harmonics clustering are shown in Fig. 15. The points cloud are much more dense and homoegneous as compared to the same results obtained without data augmentation (Fig. 7) and suggest the latent space is better represented. Despite this, mechanical clusters can not be predicted using either the morphological or spherical harmonics clustering (Fig. 16): their respective errors read \(e_{\mathrm{te}}=77\)% and \(e_{\mathrm{te}}=67\)%.

Figure 15
figure 15

Clustering provided by the k-medoids algorithm using the Grassmann, morphological and spherical harmonics distances, after data augmentation. Medoids on right

Figure 16
figure 16

Confusion matrices showing the number of shapes in the clustering based on the Grassmann distance and that based on either the morphological (a) and spherical harmonics (b) distance, after data augmentation

Again, use is made of a classifier based on a dense neural network that is trained to predict the mechanical cluster using distances to the medoids. Figure 17 shows the confusion matrix obtained for the training and testing sets, when either the morphological or spherical harmonics distances are considered. The proportion of misclassified shapes by cluster is shown on the right. In the case of the morphological distance, the errors for the training set read \(e_{\mathrm{tr}}=6\)%, \(e'_{\mathrm{tr}}=5\)% and \(e_{\mathrm{te}}=6\)%, \(e'_{\mathrm{te}}=19\)% for the testing set, When spherical harmonics are considered, errors are slighly higher. They read \(e_{\mathrm{tr}}=8\)%, \(e'_{\mathrm{tr}}=8\)% for the training data and \(e_{\mathrm{te}}=9\)%, \(e'_{\mathrm{te}}=27\)% for the testing set (see Table 1 for a summary of the various errors. In any case, these errors are significantly lower than that obtained without data augmentation (Fig. 12), highlighting the benefits of data augmentation. Furthermore, the errors of the neural network consist most often in predicting a label which is a neighbor of the correct mechanical cluster, with similar mechanical response. This materializes into a band-diagonal structure for the confusion matrices.

Figure 17
figure 17

Left column: confusion matrices for the training (rows 1, 3) and testing sets (rows 2, 4) showing the true Grassmann clustering label vs. the label predicted by the neural network using the morphological (rows 1, 2) and spherical-harmonics distance (rows 3, 4). Right: percentage of misclassified shapes in each cluster after data augmentation

Table 1 Percentage of wrongly-assigned labels, averaged over all shapes, or over mechanical clusters, for the training data, with either the morphological or spherical harmonics distances, using data augmentation

In any case, these errors are significantly lower than that obtained without data augmentation (Fig. 12), highlighting the benefits of data augmentation. Furthermore, the errors of the neural network consist most often in predicting a label which is a neighbor of the correct mechanical cluster, with similar mechanical response. This materializes into a band-diagonal structure for the confusion matrices.

6 Concluding remarks

In the present work, data-analysis and clustering methods have been proposed for classifying the mechanical properties of porous defects. Although the approach is restricted to linear elasticity, it is fairly general and can be adapted to nonlinear or time-dependant mechanical responses. While we use conventional clustering and data analysis tools, the methods rely on distances defined in the space of the defects mechanical responses (i.e. a 3D tensorial field) and on geometrical distances based on a morphological transform and a spectral decomposition. Such distances allow us to explore a wide space of defects, and perform data augmentation, without the need for explicit parametrization of shapes. Our methodology is detailed on a set of defects observed in welded joints. It is found that reliable results on clustering require a large number of shapes in each mechanical class. Furthermore, a simple neural network was able to link mechanical and geometrical clusters with a satisfying accuracy, within the space of defects close to that observed in welded joints. Nevertheless, the method applies to arbitrary shape, and may be extended to other types of defects. These results should be useful in particular for a refined two-scale hyper reduction method, as outlined in the introduction, where mechanical properties of defects may be selected on the fly, without solving balance equations.

Possible improvements and future works include hierarchical clustering, extension of the spherical harmonics decomposition to non-star shaped defects, and data augmentation with shape extrapolation, instead of interpolation. In particular, the spherical harmonics decomposition provides a natural basis for data augmentation as well as mechanical clustering.

Availability of data and materials

The data for the present work has presently not been made available.


  1. Angulo J, Meyer F. Morphological exploration of shape spaces. In: 9th international symposium on mathematical morphology and its applications to signal and image processing. Lecture notes in computer science. vol. 5720. Groningen: Springer; 2009. p. 226–37.

    Chapter  Google Scholar 

  2. Lacourt L, Ryckelynck D, Forest S, de Rancourt V, Flouriot S. Hyper-reduced direct numerical simulation of voids in welded joints via image-based modeling. Int J Numer Methods Eng. 2020;121(11):2581–99.

    Article  MathSciNet  Google Scholar 

  3. Ryckelynck D, Goessel T, Nguyen F. Mechanical dissimilarity of defects in welded joints via Grassmann manifold and machine learning. C R, Méc. 2020;348(10–11):911–35.

    Google Scholar 

  4. Osterrieder P, Budde L, Friedli T. The smart factory as a key construct of industry 4.0: a systematic literature review. Int J Prod Econ. 2020;221:107476.

    Article  Google Scholar 

  5. Kusiak A. Smart manufacturing. Int J Prod Res. 2018;56(1–2):508–17.

    Article  Google Scholar 

  6. Rüb J, Bahemia H. A review of the literature on smart factory implementation. In: 2019 IEEE international conference on engineering, technology and innovation (ICE/ITMC). 2019. p. 1–9.

    Google Scholar 

  7. Wang B, Zhong S, Lee T-L, Fancey KS, Mi J. Non-destructive testing and evaluation of composite materials/structures: a state-of-the-art review. Adv Mech Eng. 2020;12(4):1687814020913761.

    Article  Google Scholar 

  8. Wang J, Ye L, Gao R, Li C, Zhang L. Digital twin for rotating machinery fault diagnosis in smart manufacturing. Int J Prod Res. 2019;57(12):3920–34.

    Article  Google Scholar 

  9. Gunasegaram D, Murphy A, Matthews M, DebRoy T. The case for digital twins in metal additive manufacturing. J Phys, Mater. 2021;4(4):040401.

    Article  Google Scholar 

  10. Murakami Y. Material defects as the basis of fatigue design. Int J Fatigue. 2012;41:2–10.

    Article  Google Scholar 

  11. Murakami Y, Endo M. Effects of defects, inclusions and inhomogeneities on fatigue strength. Int J Fatigue. 1994;16(3):163–82.

    Article  Google Scholar 

  12. San Biagio M, Beltran-Gonzalez C, Giunta S, Del Bue A, Murino V. Automatic inspection of aeronautic components. Mach Vis Appl. 2017;28:1–15.

    Google Scholar 

  13. Escobar C, Morales Menendez R. Machine learning techniques for quality control in high conformance manufacturing environment. Adv Mech Eng. 2018;10:168781401875551.

    Article  Google Scholar 

  14. Layouni M, Hamdi M, Tahar S. Detection and sizing of metal loss defects in oil and gas pipelines using pattern-adapted wavelets and machine learning. Appl Soft Comput. 2017;52:247–61.

    Article  Google Scholar 

  15. Sobie C, Freitas C, Nicolai M. Simulation driven machine learning: bearing fault classification. Mech Syst Signal Process. 2018;99:403–19.

    Article  Google Scholar 

  16. Dinda S, Warnett J, Williams M, Roy G, Srirangam P. 3D imaging and quantification of porosity in electron beam welded dissimilar steel to Fe–Al alloy joints by X-ray tomography. Mater Des. 2016;96:224–31.

    Article  Google Scholar 

  17. Lin S, Shams S, Choi H, Azari H. Ultrasonic imaging of multi-layer concrete structures. NDT E Int. 2018;98:101–9.

    Article  Google Scholar 

  18. Milton GW. The theory of composites. Cambridge: Cambridge University Press; 2003.

    Google Scholar 

  19. Milton GW. Some open problems in the theory of composites. Philos Trans R Soc Lond A. 2021;379(2201):20200115.

    MathSciNet  Google Scholar 

  20. Allaire G, Bonnetier E, Francfort G, Jouve F. Shape optimization by the homogenization method. Numer Math. 1997;76(1):27–68.

    Article  MathSciNet  MATH  Google Scholar 

  21. Jikov VV, Kozlov SM, Oleinik OA. Homogenization of differential operators and integral functionals. Berlin: Springer; 2012.

    Google Scholar 

  22. Torquato S. Random heterogeneous materials: microstructure and macroscopic properties. vol. 16. New York: Springer; 2013.

    MATH  Google Scholar 

  23. Tartar L. The general theory of homogenization: a personalized introduction. vol. 7. Berlin: Springer; 2009.

    MATH  Google Scholar 

  24. Liu L. Solutions to the Eshelby conjectures. Proc R Soc A, Math Phys Eng Sci. 2008;464(2091):573–94.

    MathSciNet  MATH  Google Scholar 

  25. Grabovsky Y, Kohn RV. Microstructures minimizing the energy of a two phase elastic composite in two space dimensions. II: the Vigdergauz microstructure. J Mech Phys Solids. 1995;43(6):949–72.

    Article  MathSciNet  MATH  Google Scholar 

  26. Mantič V, Barroso A, París F. Singular elastic solutions in anisotropic multimaterial corners: applications to composites. In: Mantič V, editor. Mathematical methods and models in composites. London: Imperial College Press; 2014. p. 425–95.

    MATH  Google Scholar 

  27. Williams ML. On the stress distribution at the base of a stationary crack. J Appl Mech. 1957;24:109–14.

    Article  MathSciNet  MATH  Google Scholar 

  28. Moschovidis Z, Mura T. Two-ellipsoidal inhomogeneities by the equivalent inclusion method. J Appl Mech. 1975;42(4):847–52.

    Article  MATH  Google Scholar 

  29. Fond C, Riccardi A, Schirrer R, Montheillet F. Mechanical interaction between spherical inhomogeneities: an assessment of a method based on the equivalent inclusion. Eur J Mech A, Solids. 2001;20(1):59–75.

    Article  MATH  Google Scholar 

  30. Besson J. Effect of inclusion shape and volume fraction on the densification of particulate composites. Mech Mater. 1995;19(2–3):103–17.

    Article  Google Scholar 

  31. Franciosi P, Barboura S, Charles Y. Analytical mean green operators/eshelby tensors for patterns of coaxial finite long or flat cylinders in isotropic matrices. Int J Solids Struct. 2015;66:1–19.

    Article  Google Scholar 

  32. Rice J. A path independent integral and the approximate analysis of strain concentration by notches and cracks. J Appl Mech. 1968;35(2):379–86.

    Article  Google Scholar 

  33. Nádai A. Theory of flow and fracture of solids. vol. 2. New York: McGraw-Hill; 1963.

    Google Scholar 

  34. Wang H, Pietrasanta A, Jeulin D, Willot F, Faessel M, Sorbier L, Moreaud M. Modeling of mesoporous alumina microstructure by 3D random models of platelets. J Microsc. 2015;260(3):287–301.

    Article  Google Scholar 

  35. Abdallah B, Willot F, Jeulin D. Morphological modeling of three-phase microstructures of anode layers using sem images. J Microsc. 2016;263(1):51–63.

    Article  Google Scholar 

  36. Kaeshammer E, Borne L, Willot F, Dokládal P, Belon S. Morphological characterization and elastic response of a granular material. Comput Mater Sci. 2021;190:110247.

    Article  Google Scholar 

  37. Velasco-Forero S, Angulo J. Statistical shape modeling using morphological representations. In: 20th international conference on pattern recognition. New York: IEEE; 2010. p. 3537–40.

    Google Scholar 

  38. Kilian M, Mitra NJ, Pottmann H. Geometric modeling in shape space. ACM Trans Graph. 2007;26:64.

    Article  Google Scholar 

  39. Lévy B. Laplace–Beltrami eigenfunctions towards an algorithm that “understands” geometry. In: IEEE international conference on shape modeling and applications 2006 (SMI’06). 2006. p. 13.

    Chapter  Google Scholar 

  40. Jakobson D, Nadirashvili N, Toth J. Geometric properties of eigenfunctions. Russ Math Surv. 2001;56(6):1085.

    Article  MathSciNet  MATH  Google Scholar 

  41. Shen L, Farid H, McPeek M. Modeling three-dimensional morphological structures using spherical harmonics. Evolution, Int J Org Evolution 2009;63(4):1003–16.

    Article  Google Scholar 

  42. Zhou K, Bao H, Shi J. 3D surface filtering using spherical harmonics. Comput Aided Des. 2004;36(4):363–75.

    Article  Google Scholar 

  43. Gerig G, Styner M, Shenton M, Lieberman J. Shape versus size: improved understanding of the morphology of brain structures. In: International conference on medical image computing and computer-assisted intervention. 2001. p. 24–32.

    Google Scholar 

  44. Feinauer J, Spettl A, Manke I, Strege S, Kwade A, Pott A, Schmidt V. Structural characterization of particle systems using spherical harmonics. Mater Charact. 2015;106:123–33.

    Article  Google Scholar 

  45. Daniel T, Casenave F, Akkari N, Ryckelynck D. Model order reduction assisted by deep neural networks (ROM-net). Adv Model Simul Eng Sci. 2020;7(1):1–27.

    Article  Google Scholar 

  46. Pan S, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2010;22(10):1345–59.

    Article  Google Scholar 

  47. Lacourt L. Étude numérique de la nocivité des défauts dans les soudures [PhD thesis]. Mines ParisTech—Université PSL; 2019.

  48. Lacourt L, Forest S, Ryckelynck D, Willot F, Flouriot S, de Rancourt V. Étude numérique de la nocivité des défauts dans les soudures (Presqu’île de Giens), Computational Structural Mechanics Association 2019. Colloque National en Calcul des Structures, Giens, France, May 13–17, 2019.

  49. Launay H, Besson J, Ryckelynck D, Willot F. Hyper-reduced arc-length algorithm for stability analysis in elastoplasticity. Int J Solids Struct. 2021;208–209:167–80.

    Article  Google Scholar 

  50. Fauque J, Ramiere I, Ryckelynck D. Hybrid hyper-reduced modeling for contact mechanics problems. Int J Numer Methods Eng. 2018;115(1):117–39.

    Article  MathSciNet  Google Scholar 

  51. Willot F. Fourier-based schemes for computing the mechanical response of composites with accurate local fields. C R, Méc. 2015;343(3):232–45.

    Article  MathSciNet  Google Scholar 

  52. Gasnier J, Willot F, Trumel H, Jeulin D, Besson J. Thermoelastic properties of microcracked polycrystals. Part I: adequacy of Fourier-based methods for cracked elastic bodies. Int J Solids Struct. 2018;155:248–56.

    Article  Google Scholar 

  53. Amsallem D, Farhat C. Interpolation method for adapting reduced-order models and application to aeroelasticity. AIAA J. 2008;46(7):1803–13.

    Article  Google Scholar 

  54. Mosquera R, Hamdouni A, El Hamidi A, Allery C. POD basis interpolation via inverse distance weighting on Grassmann manifolds. Discrete Contin Dyn Syst, Ser S. 2018;12(6):1743–59.

    MathSciNet  MATH  Google Scholar 

  55. Shigenaka R, Raytchev B, Tamaki T, Kaneda K. Face sequence recognition using Grassmann distances and Grassmann kernels. In: The 2012 international joint conference on neural networks (IJCNN). New York: IEEE; 2012. p. 1–7.

    Google Scholar 

  56. Ye K, Lim L-H. Schubert varieties and distances between subspaces of different dimensions. SIAM J Matrix Anal Appl. 2016;37(3):1176–97.

    Article  MathSciNet  MATH  Google Scholar 

  57. Garboczi E. Three-dimensional mathematical analysis of particle shape using X-ray tomography and spherical harmonics: application to aggregates used in concrete. Cem Concr Res. 2002;32(10):1621–38.

    Article  Google Scholar 

  58. Shen L, Farid H, McPeek M. Modeling three-dimensional morphological structures using spherical harmonics. Evolution. 2009;63(4):1003–16.

    Article  Google Scholar 

  59. Park H, Jun C. A simple and fast algorithm for K-medoids clustering. Expert Syst Appl. 2009;36(2, Part 2):3336–41.

    Article  Google Scholar 

  60. Borg I, Groenen P. Modern multidimensional scaling: theory and applications. Berlin: Springer; 2005.

    MATH  Google Scholar 

Download references


The authors wish to thank L. Lacourt for providing the images of defects used in the present study, and F. N’Guyen for help in segmenting the numerical images, and the European Consortium for Mathematics in Industry (ECMI) for supporting publishing costs. F. Willot and H. Launay are grateful to J. Angulo for fruitful discussions.


This research was funded by a grant from Mines Paris.

Author information

Authors and Affiliations



FW, DR and JB suggested the idea of research and formulated the model problem. The numerical computations and data analysis were carried out by HL The manuscript was drafted by HL and subsequently extended and edited by FW. All authors read and approved the final manuscript.

Corresponding author

Correspondence to François Willot.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Launay, H., Willot, F., Ryckelynck, D. et al. Mechanical assessment of defects in welded joints: morphological classification and data augmentation. J.Math.Industry 11, 18 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: