+Advanced Search
Yangfan Wang, Ping Ni, Marc Sturrock, Qifan Zeng, Bo Wang, Zhenmin Bao, Jingjie Hu. 2024: Deep learning for genomic selection of aquatic animals. Marine Life Science & Technology, 6(4): 631-650. DOI: 10.1007/s42995-024-00252-y
Citation: Yangfan Wang, Ping Ni, Marc Sturrock, Qifan Zeng, Bo Wang, Zhenmin Bao, Jingjie Hu. 2024: Deep learning for genomic selection of aquatic animals. Marine Life Science & Technology, 6(4): 631-650. DOI: 10.1007/s42995-024-00252-y

Deep learning for genomic selection of aquatic animals

  • Genomic selection (GS) applied to the breeding of aquatic animals has been of great interest in recent years due to its higher accuracy and faster genetic progress than pedigree-based methods. The genetic analysis of complex traits in GS does not escape the current excitement around artificial intelligence, including a renewed interest in deep learning (DL), such as deep neural networks (DNNs), convolutional neural networks (CNNs), and autoencoders. This article reviews the current status and potential of DL applications in phenotyping, genotyping and genomic estimated breeding value (GEBV) prediction of GS. It can be seen from this article that CNNs obtain phenotype data of aquatic animals efficiently, and without injury; DNNs as single nucleotide polymorphism (SNP) variant callers are critical to have shown higher accuracy in assessments of genotyping for the next-generation sequencing (NGS); autoencoder-based genotype imputation approaches are capable of highly accurate genotype imputation by encoding complex genotype relationships in easily portable inference models; sparse DNNs capture nonlinear relationships among genes to improve the accuracy of GEBV prediction for aquatic animals. Furthermore, future directions of DL in aquaculture are also discussed, which should expand the application to more aquaculture species. We believe that DL will be applied increasingly to molecular breeding of aquatic animals in the future.

  • Human demand for marine aquatic protein has been growing steadily, and thus resulted in widespread focus on aquaculture. Over the past few decades, advances in breeding technology have led to a significant increase in aquaculture production, but this was accompanied by an increase in disease incidence and environmental pressures, all of which posed a major challenge to the sustainable growth of the aquaculture industry (Gao et al. 2023; Jiang et al. 2023). Similarly, as another important factor affecting aquaculture, climate change correlated with the etiology of ulcerative dermal necrosis (Stokowski et al. 2023). Changes in ocean conditions caused by climate change have wreaked havoc on aquatic animals by inducing physiological stress and increasing sensitivity to various stressors, leading to mass mortality and the spread of diseases in aquatic animals, both directly and indirectly (Defeo et al. 2013; Ortega et al. 2016; Soon and Ransangan 2019; Tan and Ransangan 2015; Tan and Zheng 2020; Zannella et al. 2017). Climate change and disease present significant threats to the sustainability of aquaculture (DeWeerdt 2020). Therefore, the development of improved species of aquatic animals with disease resistance, stress resistance, and fast growth rate is urgent necessitating the use of modern breeding techniques.

    Approximately 30 years ago, the stage of Breeding 3.0 started with marker-assisted selection (MAS) (Collard and Mackill 2008), then moved to quantitative trait locus (QTL) mapping for complex traits (Wallace et al. 2018). The development of high-throughput genotyping then expanded the quantitative genetics toolkit to reveal the genetic basis for phenotypic variation in breeding populations using genome-wide association study (GWAS) (Hayes 2013) and to select genomic estimated breeding values (GEBVs) in genomic selection (GS) (Meuwissen et al. 2001). GS estimates parental and offspring-specific genetic effects on an offspring outcome based on genetic markers covering the whole genome, not based on a few markers as in MAS. As shown in Fig. 1, GS needs to use the phenotyping and genotyping data of the training population, which are used to derive GEBVs for all the individuals of the breeding population from their genomic profile. The GEBVs allow us to select individuals to be retained for breeding at an early stage of growth, greatly reducing generation intervals and time costs. Recently, many GS studies have been used on aquatic animals, showing that GS produces a higher prediction accuracy of GEBVs than traditional pedigree-based methods (Zenger et al. 2019).

    Figure  1.  Applications of deep learning (DL) in genomic selection (GS) for aquatic animals. A Applications of DL models, such as deep neural networks (DNNs), convolutional neural networks (CNNs), and autoencoders in phenotyping, genotyping, and genomic estimated breeding value (GEBV) prediction of GS. B Showing the different steps of GS

    With the explosion of data and the development of big data analytics, we are ushering in a new round of breeding technology marked by the convergence of high-throughput phenotyping and genotyping, and artificial intelligence (AI), and are stepping into the Breeding 4.0 Era (Wallace et al. 2018). AI that replicates certain features of human characteristics in machines has become more popular thanks to advanced machine learning (ML) algorithms (Currie et al. 2019), and improvements in computing power and storage for processing the increased big data. In ML algorithms, there are two basic approaches: supervised and unsupervised learning. The aim of supervised learning is to make a mathematical model that maps its input (e.g., single nucleotide polymorphisms (SNPs), insertion and deletion (InDels) or DNA sequences) to target variables (e.g., stress-resistance traits or growth traits). Target variables can be either continuous (regression) or categorical (classification). Some examples of supervised learning applications are predicting the functional activity of DNA sequences (de Almeida et al. 2022), learning the effects of noncoding DNA (Zhou and Troyanskaya 2015), and predicting the genomic effects of complex human traits (Bellot et al. 2018). If there is neither classified nor labeled information about the outcome, the problem becomes unsupervised learning. An example of an unsupervised learning application is the genotype imputation with denoising autoencoders (Dias et al. 2022). Neural networks (NNs) comprised of node layers, containing an input layer, one or more hidden layers, and an output layer, are well-known methods in ML and have been studied since the 1940s (Tarassenko 1995). Deep learning (DL) (e.g., deep neural networks (DNNs), convolutional neural networks (CNNs), and autoencoders) as a new type of NNs is a relatively young branch of ML (Kriegeskorte and Golan 2019). DL models are distinguished from NNs by having many more hidden layers. Clearly, as prediction power increases with DL, data requirement increases as well.

    In recent years, DL has been widely used in GS to capture the genomic information for the prediction of the genetic merit of each candidate animal for breeding purposes (Montesinos-López et al. 2019b). Although GS has been demonstrated to be useful in the genomic prediction of traits of interest, its success in genomic breeding firstly depends on reliable phenotyping. Using DL-based visual recognition and detection, many high-throughput phenotyping techniques have been used for obtaining detailed measurements of trait characteristics that collectively provide reliable estimates of phenotypic traits in breeding (Yang et al. 2020). Another important factor for its successful implementation in genomic breeding is the availability of genome-wide high-throughput, cost-effective and flexible markers suitable for species with or without the reference genome sequence (Liu et al. 2020). The next-generation sequencing (NGS) technologies (Slatko et al. 2018), which provided novel SNP genotyping platforms especially the genotyping by sequencing, have drastically reduced the cost and time of sequencing as well as SNP discovery. In spite of the remarkable advancements in sequencing technologies, precisely identifying genetic variants from billions of short and potentially erroneous reads continues to pose a significant challenge (Nielsen et al. 2011). The utilization of DL-based genotyping techniques (Poplin et al. 2018) has enabled the detection of genetic variations within aligned NGS data, achieved through the analysis of statistical relationships between images of read pileups around putative and true variants. In GS, many prediction models (e.g., genomic best linear unbiased prediction (GBLUP) (Clark and van der Werf 2013), Bayes B (Wolc and Dekkers 2022), and reproducing kernel Hilbert space regression (RKHS) (Gianola and van Kaam 2008)) have also been developed by integrating the phenotyping and genotyping data of the training population. These are subsequently used to calculate GEBVs for all the individuals of the breeding population. DL-based prediction models (e.g., DNNs (Lv et al. 2022) and CNNs (Zhu et al. 2021)), which take possible non-linearities into account in reality, have been proposed as promising machines for marker-based genomic prediction of complex traits in aquatic animal breeding. This review summarizes the current state of research related to the application of DL in GS for aquatic animal breeding, including phenotyping, genotyping and GEBV prediction, and discusses future directions of DL in aquaculture (Fig. 1).

    The subsequent sections follow a logical progression, commencing with the internal components depicted in Fig. 1A and proceeding outwards. We begin by delving into an introduction to DL and GS. Subsequently, we integrate DL models into three critical aspects: phenotyping, genotyping and GEBV prediction and the application of each domain is reviewed in detail. Finally, we discuss future directions of DL in aquaculture and present the conclusions of this article. The present review will provide aquaculture scholars and breeding companies a general overview of the potential and gaps of DL in GS for aquaculture breeding.

    This section introduces the corresponding knowledge with DL models in GS, including the basic concepts of DL and the common DL models currently applied in aquatic animals GS highlighted in this review.

    The concept of DL originates from the research of NNs and is a collection of algorithms of NNs with multilayer nonlinear transformation for modeling high-complexity data. The computational model of DL consists of multiple processing layers capable of learning data representations with multiple levels of abstraction (LeCun et al. 2015). DL architectures are multi-layered stacks of simple modules, many of which compute nonlinear input–output mappings. By adding deeper layers to the structure and using various nonlinear mapping functions, DL effectively transforms data into a higher and more abstract representation. DL is well-adapted and flexible to address highly complicated challenges in terms of data analysis. These abilities arise from the intricate hierarchical architecture and robust learning capabilities and enable DL to deliver superior prediction and classification results in signal and information processing tasks.

    Multilayer perceptron neural networks (MLPs) with many hidden layers, which are often qualified as DNNs, are good examples of DL models with deep architecture (Kriegeskorte and Golan 2019). DNNs are considered an extension of conventional artificial neural networks (ANNs). The layers within DNNs can be categorized into three types, input, hidden and output layers. The training process of DNNs can be divided into two steps as follows: (1) each layer of DNNs needs to be trained to ensure that the input feature is mapped to different feature spaces of the next layer; (2) establish a back-propagation algorithm to adjust or fine-tune the network weights of each layer. There are many hidden layers in DNN architecture, and the more layers there are, the more complex the network becomes. In consequence, the more resources and time will be needed to train, making DNNs best suited for graphics processing unit (GPU)-based architectures. DNNs have good properties when working with high-dimensional data, as the case in genome-enabled predictions.

    In genomic prediction, n animals were genotyped on m SNPs and measured for the trait of interest. The SNP effects for each individual trait were then estimated using the model as follows:

    (1)

    where is a quantitative trait measured on the individual ; is the mean of all observations, denotes the additive effect of SNP ; the genotype of SNP for the individual is ; and is the sum of random environmental effects (where is the residual variance).

    Equation (2) shows a more general form of the regression model (1):

    (2)

    where function serves as a mapping from the p-dimensional input space to the real line, represents the genotypic codes observed on , and is the same as Eq. (1). For example, when considering the application of single hidden layer feed forward neural networks to the GS framework, Eq. (2) is transformed into the subsequent representation (Gianola et al. 2011):

    (3)

    1. Hidden layer: () are the genomic covariates of an individual (); is a vector of input weights specified in the training phase; () is the bias in NN's terminology and represents the neuron; is an activation function generating the output of the single hidden neuron.

    2. Output layer: () is the weight used to linearly combine the genotype-derived basis functions generated by the hidden layer.

    MLPs, which are renowned as the most prevalent NN architectures, are structured with a sequence of fully connected layers designated as input, hidden and output layers (LeCun et al. 2015). In genomic prediction, SNP genotypes are the input to the first layer. The first layer is represented as Eq. (4):

    (4)

    where x represents the genotypes of each individual; f is a nonlinear activation function; are the weights; and is the bias. The output of the previous layer is used as the input of the subsequent layer, and the expression for each successive layer is the same as Eq. (4). See Eq. (5):

    (5)

    In the final layer of the model, a vector of numbers is generated, which characterizes the Gaussian trait. If the trait is binary or categorical, an array with probabilities for each level will be produced.

    Figure 2 illustrates the structure of the MLPs with additive and dominance components for genomic prediction in GS (Pérez-Rodríguez et al. 2013).

    Figure  2.  Graphical representation of the MLPs for genomic prediction in genomic selection GS with inputs of the entries of additive and dominance matrices, respectively (Pérez-Rodríguez et al. 2013)

    A type of deep learning model called CNNs (Derry et al. 2023), which constitutes a class of deep, feed-forward ANNs, started getting remarkably good results in image classification competitions, and then obtained superior performance in the task of pattern recognition with continuous optimization and improvement by researchers. CNNs allow for massive parallelization through weight sharing and complex models, thus allowing for rapid learning of complex problems. In turn, this learning capability and hierarchical structure give CNNs the ability to flexibly respond and adapt to complex challenges (Kamilaris and Prenafeta-Boldú 2018). The CNNs consist mainly of the convolutional, pooling and fully connected layers. As shown in Fig. 3, CNNs begin briefly with many convolutional layers, in which a series of smaller matrices referred to as filters slide across the input data (e.g., SNPs or images) and extract patches from its input data to produce an output feature map (Zhu et al. 2021). Each convolutional layer is often followed by a pooling layer, which may reduce the dimension of the output feature map while retaining critical discriminatory information that the filters have captured. Eventually, these matrices are converted into one-dimensional (1D) vectors as inputs to fully connected single- or multi-layer neural networks, which use the learned high-level features to classify the input images into predefined classes or yield our predicted response value.

    Figure  3.  Schematics of a convolutional neural network design used genomic prediction in genomic selection GS (Zhu et al. 2021)

    Also, CNNs have achieved many practical successes in computer vision, image classification, image processing and other fields. LeNet (Wan et al. 2020) is the first CNN architecture for the detection of handwritten digits. Subsequently, with the continuous development of CNN technologies, deeper and more complex network architectures have been proposed, such as VGGNet (Khan et al. 2022), GoogLeNet (Balagourouchetty et al. 2020) and ResNet (He et al. 2020).

    Hyperparameter optimization is the basic step of DL implementation because it may seriously affect the prediction performance of DL models, e.g., DNNs and CNN. An improved genetic algorithm implemented in DeepEvolve (Liphardt 2017) could be used to evolve a population of DL models with the goal of obtaining optimized hyperparameters in a more efficient way than traditional grid or random search.

    Autoencoders are special types of DL with no class labels, whose output has the same dimensionality as the input. It employs unsupervised learning for a feature representation or effective encoding of the original data, in the form of input vectors, at hidden layers (Baur et al. 2020), and reconstructs the inputs of network from the compressed model-intrinsic data representation. Because autoencoders always use nonlinear methods for feature extraction without using class labels, the features extracted aim at conserving and better-representing information instead of performing prediction or classification. Autoencoders are trained to find latent variables in the input data that, while not directly observable, may inform how the data are distributed and learn which latent variables could be used to accurately reconstruct the original data (Ilnicka and Schneider 2023).

    The dimension of the hidden layers of autoencoders may be either smaller (when the goal is feature compression) or larger (when the goal is mapping the feature to a higher-dimensional space) than the input dimension. Therefore, autoencoders have the ability to accomplish objectives, such as dimensionality reduction or compression, as well as denoising or de-masking. Autoencoders, trained to predict the original, uncorrupted output from corrupted or masked input, are designated as denoising autoencoders. The characteristics of autoencoders are highly applicable to genotype imputation. Specifically, bi-allelic SNPs are converted into binary representations, indicating the presence (1) or absence (0) of the reference allele A and alternative allele B. Binary input nodes representing the allele presence are combined into a vector then rescaled to 0–1 as output nodes through a sigmoid function, with the existence of three genotype outputs (homozygous reference, homozygous alternate and heterozygous) normalized with Softmax function. Autoencoder outputs can represent probabilities and have applications in calculating alternative allele dosages and imputation quality. It is of interest to note that this representation can be extended to other classes of genetic variants. Figure 4 illustrates a structure of autoencoders for genotype imputation (Dias et al. 2022).

    Figure  4.  Schematic overview of the autoencoder training workflow for genotype imputation (Dias et al. 2022)

    This section describes the application and the current research status of DL in genomic selection for aquatic animals.

    The genetics of traits vary from simplicity to complexity. Simple traits are under control of a small number of major genes or QTLs, while complex traits are influenced by numerous minor effect genes or QTLs scattered across the genome (Watanabe et al. 2019). The majority of economic traits in aquatic animals, including yield, quality and disease resistance are complex quantitative traits that are yet challenging in incremental gains due to their low heritability and sensitivity to environment, as well as costly and labor-intensive phenotyping (Allal and Nguyen 2022; Song et al. 2022). Due to a shortage of funds and labor, the improvement of economic traits in aquatic animals is at times not feasible.

    In aquaculture breeding, it is important to have the ability to recognize and track individuals over space and time, traditionally by capturing and placing visible and unique markers on the animals (Jepsen et al. 2015). As digital photography and image processing software have advanced, photographic mark-recapture (PMR) has gained significant traction, driven by the development of image-based visual recognition and detection technologies (Bolger et al. 2012). Due to the abundance of species with variable natural marking patterns, PMR has gained considerable attention, especially in the field of aquatic animal research (Fearnbach et al. 2012; Forcada and Aguilar 2000; Langtimm et al. 2004). For example, Xing et al. (2017) proposed a novel multiscale image processing approach that incorporates matched filters with Gaussian kernels and partial differential equation (PDE) multiscale hierarchical decomposition, which may be efficiently used for segmentation of the small tubular and periodic structures in scallop shell images. As shown in Fig. 5, utilizing the Space-based Depth-First Search (SDFS) algorithm, the periodic patterns within the structures, comprising bifurcation nodes, intersections of rings and ribs, as well as their connecting lines can be detected in the computed tomography (CT) and digital scallop images. The results confirmed that shell cyclic structure patterns may serve as a reliable biomarker for the identification of biological individuals due to the genetically specific information contained therein. The method could also be used to calculate the growth rate as a growth complex trait by characterizing shell images during a time of scallop growth. Meanwhile, by using image-based measurement method, Wang et al. (2022d) estimated genetic parameters of muscle imaging trait with 2b-RAD SNP markers in Zhikong scallop (Chlamys farreri). This study demonstrated that the adductor muscle area percentage (AMAP) obtained by dividing adductor muscle area by shell area in an X-ray scallop image could be used as a novel high-throughput phenotyping in a non-invasive manner for scallop muscle trait with high estimation accuracies of GEBVs in Zhikong scallop.

    Figure  5.  Flowchart of a multiscale PDE-based method for segmentation and identification of scallops (Xing et al. 2017)

    Recently, DL has been developed as the new image-based visual recognition and detection technologies in phenotyping for recognition and measurement of aquatic animals (Álvarez-Ellacuría et al. 2020; Monkman et al. 2019; Wang et al. 2022e). For example, for long-term tracking and behavioral analysis of aquatic animals, Wang et al. (2022e) presented an image processing method combined with deep CNN models ResNet50 and VGG19 to segment the shape and texture features of leopard coral grouper (Plectropomus leopardus). Then, they used the shape and texture features for individual recognition on sequential images of leopard coral grouper captured for 50 days (Fig. 6). This study showed that the ResNet50 obtained a maximum accuracy of 0.985 ± 0.045 on the test set for long-term tracking leopard coral grouper. Meanwhile, for measurement of aquatic animals, Monkman et al. (2019) trained regional convolutional neural networks (R-CNNs) on public images to estimate the full length of a fusiform fish and obtained high accuracy with percent mean bias error. Based on the model R-CNNs, Álvarez-Ellacuría et al. (2020) proposed to adopt an improved Mask R-CNNs, which used pictures from the auction center and provided a segmentation of the images, to deal with partial occlusions. Considering the fact that fish are highly active in water and prone to overlapping individuals when photographed, Zhao et al. (2018) proposed a semi-supervised learning model based on modified deep convolutional generative adversarial networks to extract the moving fish for improving recognition accuracy. For estimating swimming performance of fish, Zeng et al. (2024) proposed an automatic High-Resolution Network (HRNet) based on a deep learning approach to effectively detect the morphological traits of juvenile large yellow croaker. In echinoderms, Li et al. (2020) presented a real-time underwater object recognition algorithm based on a deep residual network for the real-time detection of sea cucumbers, which was convenient for the management of farmers. In shrimp, Liu et al. (2020) presented an improved AlexNet, called Deep-Shrimp Net, to distinguish different shrimp species with very similar appearance. This study showed that soft-shell shrimp with defects could be identified at a mean accuracy precision of 0.972 by using the Deep-Shrimp Net for quality measurement.

    Figure  6.  The essential architecture of a PDE-based image processing method combined with deep CNN models for fully automated segmentation and identification of Plectropomus leopardus (Wang et al. 2022e)

    Prior to the advent of NGS, SNPs generation primarily relied on array-based marker systems (de Moraes et al. 2018), which were expensive and laborious, limiting the use of SNPs on GEBV prediction for complex traits to critical genomic regions. The emergence of NGS (Slatko et al. 2018) became a turning point, allowing access to SNPs suitable and affordable for GS in many species (Barabaschi et al. 2016; Bhat et al. 2016; de Los Ríos-Pérez et al. 2020). Nowadays, several variant callers for NGS-based genotyping of SNPs have been developed (Hu et al. 2021; Sato et al. 2019). For example, Genome Analysis Toolkit (GATK), which identifies variants based on how the reads align to the reference, has been the most widely used in many aquaculture species, such as large yellow croaker (Larimichthys crocea) (Zhou et al. 2023), Pacific white shrimp (Litopenaeus vannamei) (Luo et al. 2022) and sea cucumber (Apostichopus japonicus) (Lv et al. 2022). Also, FreeBayes has been applied successfully to genotyping for SNPs using short-read sequencing data in European sea bass (Dicentrarchus labrax) (Hillen et al. 2017), gilthead seabream (Sparus aurata) (Peñaloza et al. 2021), Nile tilapia (Oreochromis niloticus) (Barría et al. 2023) and sterlet sturgeon (Acipenser ruthenus) (Du et al. 2020). SAMtools, which applies a hidden Markov mapping and assembly quality (MAQ) model, has been adopted to identify SNPs in Pacific oysters (Crassostrea gigas) (Delomas et al. 2023).

    However, accurate variant calling remains challenging for GATK, FreeBayes and SAMtools when using long-read sequencing data (Poplin et al. 2018), such as Single Molecule Real-Time (SMRT) sequencing (Yuan et al. 2021). In order to improve the accuracy of variant calling, some researchers attempted to apply DL models in NGS-based genotyping. For example, Poplin et al. (2018) proposed a SNP caller, called DeepVariant, which replaced the assortment of statistical modeling components with DNNs. DeepVariant adopted an inception architecture, which used a stacked image of the reference and reads around each candidate variant to give the probability of each of the three diploid or polyploid genotypes at a locus. The results showed that DeepVariant outperformed existing state-of-the-art tools GATK, FreeBayes and SAMtools in variant calling of SNPs and InDels, which means that DeepVariant has an extremely broad range of applications, even in polyploid aquatic animals (e.g., tetraploid Pacific oyster (Crassostrea gigas) and tetraploid rainbow trout (Oncorhynchus mykiss)). Considering the difficulty of variant identification in the long-read data of SMRT, Luo et al. (2019) developed a multi-task five-layer CNN model, called Clairvoyante, for predicting variant type, zygosity, alternative allele and indel length from aligned reads.

    In addition to SNPs, DL has also been used for structural variants (SVs) calling. DeepSV (Cai et al. 2019) called SVs used a novel visualization technique based on CNN and was available for noisy training data. Applying DeepSV on 1000 Genomes Project data, the results demonstrated that DeepSV outperformed other extant methods in calling deletion. Cue (Popic et al. 2023) was introduced as an extensive framework within a fourth-order stacked hourglass convolutional neural network (HN) (Newell et al. 2016) to call and genotype SVs of diverse size and type. The scalability and sustainability of Cue allowed for its future development to accommodate different sequencing platforms and SV data types.This study showed that as a variant caller specialized for SMRT data, the CNN-based Clairvoyante outperformed the DNN-based DeepVariant in terms of accuracy and speed for SMRT genotyping, and also demonstrated better performance compared to GATK on Illumina data. For genetic variants that are more complex than SNPs, deepSV and Cue based on DL can effectively call SVs.

    Despite the rapid advancement and reduced cost of NGS-based sequencing and genotyping techniques, incorporation of GS in breeding requires SNP genotyping of thousands of aquatic animals per generation, which is still prohibitively expensive. The most straightforward strategy is to genotype the target population with low-coverage sequencing, and obtain genome-wide genotypes through imputation methods that can infer the alleles of 'hidden' variants and use those inferences to test the hidden variants for GS (Li et al. 2009). This strategy is pronounced for aquatic animals, mostly without commercially available SNP arrays. Many traditional genotype imputation tools (e.g., Beagle (Browning et al. 2018), Glimpse (Rubinacci et al. 2021), and FImpute (Sargolzaei et al. 2014)) use genetic variants, which are shared between to-be-imputed genomes and the reference panel, and apply Hidden Markov Models (HMM) to impute the missing genotypes per sample. The genotype imputation tools have been widely used in aquatic animals, such as Atlantic salmon (Salmo salar) (Tsairidou et al. 2020), large yellow croaker (Larimichthys crocea) (Zhang et al. 2021) and Pacific white shrimp (Litopenaeus vannamei) (Wang et al. 2022a).

    Recently, autoencoders, a class of DL models, have garnered significant interest in functional genomics due to their proficiency in handling missing data, particularly for image restoration and inpainting tasks (Ayub et al. 2020; Guo et al. 2021). Specifically, denoising autoencoders, which are trained to reconstruct original, uncorrupted data from corrupted or masked inputs, exhibit promising potential for genotype imputation. The autoencoder characteristics are particularly advantageous as they have the capacity to address some limitations of traditional HMM-based imputation methods. By eliminating the reliance on reference panel and capturing non-linear relationships in complex linkage disequilibrium genomic regions, autoencoders offer a promising alternative for accurate genotype imputation. As shown in Fig. 4, Dias et al. (2022) proposed an autoencoder-based denoising approach for SNP genotype imputation. The results showed that both across the allele-frequency spectrum and across genomes of diverse ancestry, this approach attained remarkable imputation accuracy, while significantly reducing the inference runtime by at least four times compared to conventional imputation tools [e.g., Beagle, FImpute and Minimac (Das et al. 2016)] using default parameters. Meanwhile, considering a high-quality reference panel is an important prerequisite for genotype imputation and plays a crucial role in the quality of genotype imputation, Shi et al. (2021) proposed an imputation reference panel reconstruction method RefRGim, which is based on CNNs, to improve imputation performance by generating a study-specified reference panel for each input data with the genetic similarity of individuals.

    Traditional parametric models for GEBV prediction in GS impose stringent presumptions regarding the functional form and the statistical distribution of SNPs. For example, GBLUP hypothesizes that the feature that is of interest is affected by all markers, and the effects share a common variance in distribution (Meuwissen et al. 2001; Piepho 2009). Bayes B allows for the estimation of SNP effects with differential shrinkage, but posterior inferences, particularly SNP variance, depend heavily on the prior assumptions used in the model (Meuwissen et al. 2001). In aquatic animals, GBLUP and Bayesian methods have been widely used to improve various economic traits, such as growth traits in Zhikong scallop (Chlamys farreri) (Wang et al. 2018b), growth traits in large yellow croaker (Larimichthys crocea) (Dong et al. 2016), streptococcosis resistance in Nile tilapia (Oreochromis niloticus) (Joshi et al. 2021b), growth and survival in common carp (Cyprinus carpio) (Palaiokostas et al. 2018, 2019) and bacterial cold water disease resistance in rainbow trout (Oncorhynchus mykiss) (Vallejo et al. 2016) (Supplementary Table S1). Dou et al. (2016) evaluated the predictability of six parametric models, including GBLUP, Bayes A, Bayes B, least absolute shrinkage and selection operator (LASSO) and Bayesian LASSO (BL) in Yesso scallop (Patinopecten yessoensis) by analyzing the real data set of growth traits (e.g., shell length, shell height and shell weight). Conversely, non-parametric models [e.g., RKHS (Gianola and van Kaam 2008), radial basis function model (Long et al. 2010), support-vector machine (SVM) (Wang et al. 2022c) and ML method random forest (RF) (Gianola et al. 2010)] make no or weaker assumptions. Undeniably, in terms of characterizing complex relationships between genomic information and quantitative traits of interest, the latter approaches demonstrate greater flexibility. Wang et al. (2022c) evaluated the predictability of two non-parametric models RKHS and SVM and a parametric model Bayes B for genomic prediction and breeding in a Pacific oyster (Crassostrea gigas) dataset. This study showed that RKHS and SVM achieved relatively high prediction accuracies in most of the tested traits.

    Broadly speaking, the traditional parametric or non-parametric models often ignored complicated interactions between genes, namely epistasis, as well as higher-order non-linearities, which may not be accurate in reality. To meet this challenge, NNs have been proposed as a new tool for genomic prediction of complex traits in the context of animal breeding (Ehret et al. 2015; Gianola et al. 2011; Luo et al. 2024; Okut et al. 2013; Pérez-Rodríguez et al. 2013; Wang et al. 2018a, b). In aquatic animals, Wang et al. (2018b) compared NNs with two linear parametric models GBLUP and Bayes B for genomic prediction in four growth traits of Zhikong scallop (Chlamys farreri) (Supplementary Table S1). This study showed that the nonlinear model NNs outperformed the two linear parametric models GBLUP and Bayes B. Meanwhile, Luo et al. (2024) used NNs to evaluate the accuracy of genomic prediction for growth traits of Pacific white shrimp (Litopenaeus vannamei). The results showed that the NeuralNet package of NNs increased the prediction accuracy by about 10% compared to two linear parametric models GBLUP and Bayes B.

    More recently, DL models (e.g., DNNs and CNNs) are emerging as possibly powerful tools for genomic prediction of complex quantitative traits (Bellot et al. 2018; Cuevas et al. 2019; Montesinos-López et al. 2019a, b). Among the DL models, CNNs seem to be more promising for genomic prediction of complex traits than DNNs (Bellot et al. 2018). As pattern recognition algorithms, CNNs do not require a predefined feature vector and can discover automatically some useful features of input for predicting output (LeCun et al. 2015). CNNs implement high-dimensional regressions by using nonlinear methods to uncover complex patterns that map the inputs (e.g., SNPs) to the outputs (e.g., complex traits). In aquatic animals, Zhu et al. (2021) proposed a CNN-based GS method for genomic prediction of growth traits in a Bay scallop (Argopecten irradians irradians) population. The results showed that CNNs outperformed GBLUP, Bayes B and DNNs on shell height, shell width and total weight, suggesting that CNNs could discover valuable features from input SNP data and extract useful information, such as complicated epistasis as well as higher order non-linearities, from SNP features for prediction. Also, they found that DNNs presented the least accurate predictions, because DNNs had far more parameters to estimate than CNNs, GBLUP and Bayes B, and its predictions were challenged by the over-fitting problem.

    Due to the over-fitting problem when using whole-genome SNPs as input, it is difficult to design an efficient model of DL for genomic prediction in GS (LeCun et al. 2015). To solve the overfitting problem in some DL models, many researchers adopted various dimensionality reduction strategies. For example, genome-wide relationships between individuals (VanRaden 2008) or singular value decomposition (SVD) of genome-wide SNPs (Guzelbulut et al. 2022) were used as DL inputs. Constructing optimal sparse structures of DL is also considered to be a key method to overcome the overfitting problem. For example, Wang et al. (2018a) developed a sparse NN model, named SNNs, to find the optimal sparse structure of NNs by minimizing the squared error to penalize the L1 norm (weights and biases) of the parameters, thus solving the over-fitting problem (Supplementary Table S1). Zeng et al. (2022) used SNNs to predict GEBV in real data analyses of growth traits of Atlantic salmon (Salmo salar) and compared the performance with two linear GBLUP and Bayes Lasso. The results showed that SNNs had advantages in handling complex traits among the GS models. Although L1 norm regularization enjoys several good properties for sparse optimization of DL models, it is sensitive to outliers and may cause serious bias in estimation. To overcome this defect, many non-convex surrogates are proposed and analyzed, including smoothly clipped absolute deviation (SCAD) (Fan and Li 2001), and minmax concave penalty (MCP) (Zhang 2010). Lv et al. (2022) developed a new DNN-based model, called DNN-MCP, which employed minmax concave penalty (MCP) regularization for DNNs to overcome the problem of overfitting and applied it for genomic prediction in sea cucumber (Apostichopus japonicus). DNN-MCP could find an optimal sparse structure of DNNs (see, Fig. 7) and showed the best prediction accuracy compared with DNNs, GBLUP, and Bayes B.

    Figure  7.  Schematic overview of the DNNs and DNN-MCP for genomic prediction in genomic selection (GS) (Lv et al. 2022)

    So far, there is no single model that can be consistently superior in GS applications. Comparison of the accuracy of genomic prediction between linear and nonlinear models on different species and traits presented diverse results. Liu et al. (2019) compared predictive performances of different GS methods for seven traits in yellow drum (Nibea albiflora), and found that CNN had the greatest prediction ability only for gonad weight index. The models with the highest prediction accuracy for the other six traits (body length, body length/body height ratio, swimming bladder index, swimming bladder weight, body thickness and body height) were BayesCπ, MMixp, BayesB, GBLUP, BayesCπ and MMixp, respectively. Luo et al. (2024) used different GS models to estimate GEBV for growth traits in Pacific white shrimp (Litopenaeus vannamei) and the results showed that NN consistently outperformed other machine learning models (RF, k-nearest neighbors (KNN), light gradient boosting machine (LightGBM), eXtreme gradient boosting (XGBoost), categorical features gradient boosting (CatBoost) and weighted ensemble (WE)) and linear models (GBLUP, BayesB and KAML). Different from the above-mentioned results, Wang et al. (2022c) performed genomic prediction for different traits of Atlantic salmon (Salmo salar) and found that the best prediction performance was achieved by ML models RKHS for amoebic load and RF for mean gill, while ANN performed mediocrely for both traits. Supplementary Table S1 summarizes common GS models for GEBV prediction in aquatic animals.

    DL models are able to be integrated with all phases in the GS process of aquatic animals in an innovative way to advance breeding in aquaculture. DL models can obtain phenotypes efficiently, accurately, and non-invasively; can acquire information about SNPs which are low-frequency, rare and in low-LD regions by genotyping and imputation with or without reference panels. They may achieve high accurate estimates of breeding values by combining nonlinear models with sparsification methods. The advantages of DL models are shown below:

    1. In aquatic phenotyping research, DL models are more objective by replacing manual detection and eliminating the subjectivity that exists in human work (Segebarth et al. 2020). Some aquatic animals are characterized by small individuals but high locomotor ability and complex trait statistics. Moreover, the number of individuals is often large in aquaculture breeding studies. Utilizing an uninterrupted working machine for image collection reduces labor and time costs. Feature extraction processes may be difficult when targeting some poor quality images (Sun et al. 2018), at this point, DL models can directly utilize the network framework to recognize the image, which improves the efficiency. Aquatic animals are diverse, but with high similarity of living environment, DL's strong learning ability and transfer learning ability make it possible to quickly learn the features of the object and mine the information, and also transfer the training model to new species. All the above mentioned benefit the accuracy of aquatic organism phenotype acquisition.

    2. In genotyping, CNNs account for complex dependencies between reads by virtue of their role as universal approximators, so as to improve variant calling performance (Hornik et al. 1989). There are two important elements in genotype imputation: 1 the reference panel, which affects the accuracy of the imputation to a certain extent; 2 the imputation method, which is the core part of genotype imputation and also affects the reference panel's contribution to the imputation process (Shi et al. 2018). Using CNN models, corresponding reference panels are constructed according to genotype data from different researchers, and then adapted to different SNP datasets through program transferability (Shi et al. 2021). Undoubtedly, this provides a new high-quality tool for imputation accuracy improvement. Autoencoder can be used to impute genotypes of aquatic animals that lack haplotype libraries without the need for reference panels, and its high efficiency in imputing low LD regions is very useful for the subsequent study of related diseases (Dias and Torkamani 2019).

    3. The hierarchical function of DL systems enables machine learning to process data with a highly nonlinear approach. DL modeling for genomic prediction of complex traits in animal breeding takes into account complex epistasis and higher-order non-linearities (Wang et al. 2018a). In particular, the DL model is advantageous for traits with non-additive genetic architecture and unclear genetic structure (Wang et al. 2018b). On this basis, by finding the optimal sparse structure of the model is an effective means to break through the model overfitting. Weight regularization is a general technique to help mitigate the issue of over-fitting in training neural networks and often improves both performance and the ability to isolate important features. In our previous study, the DNN-MCP model (Lv et al. 2022) could effectively solve the overfitting problem of DNNs in genomic prediction of GS by minimizing the square error subject to a penalty on the non-convex MCP-Norm of the parameters (weights and biases).

    Despite DL having made great advances in the field of breeding for aquatic animals, there are still limitations in solving many real challenges. DL models, while highly capable of learning, require humans to bring relevant domain expertise into the model in order to learn more accurately and efficiently. Although joint DL modeling for breeding is the latest direction in breeding and has yielded promising results, however, researchers have not explored how DL models make predictions yet. Thus, the exploration of explanatory information in neural networks is a major challenge for the application of complex DL models in breeding (Novakovsky et al. 2023).

    1. When DL models are used for detection, recognition and measurement of aquatic animals in underwater environments, object detection is the key to solving advanced recognition tasks, such as segmentation and tracking of aquatic animals as well as understanding underwater scenes (Chang et al. 2022). For example, the shapes of the individual from underwater images have various degrees of distortion even in a similar underwater environment. Thus, the phenotyping in the face of diverse real underwater environments remains an essential challenge. Furthermore, high-quality public datasets are important for global researchers to test the performance or pre-train of DL models. However, aquatic animal datasets are still lacking both in quantity and quality for detection, recognition and measurement of aquatic animals. Different from terrestrial environments, the acquisition of aquatic animal images requires waterproof equipment and is harder to arrange, thus greatly increasing the cost of data collection. Models with good recognition and detection capabilities in clean water environments may not be well adapted to the complex and real underwater environments. In addition, the scattering and refraction of light at different locations underwater may lead to noise in the images. All of the above are the current challenges faced by DL in the application of aquatic animal phenotyping.

    2. Currently, GS remains primarily focused on SNP variation for genomic prediction of complex traits, with a growing recognition of the importance of SVs (Ho et al. 2020). These are classified into four major types: inversions, insertions, deletions, and copy number variations (defined as encompassing at least 50 bp). SVs more frequently regulate gene expression, and have been shown to contribute more to variation in complex traits. However, it is also the diversity and complexity of SVs that make the study of SVs difficult. Moreover, the progress in the study of SVs has lagged significantly compared to SNPs. Combining multiple SV algorithms improves their detection while improving the concordance of SV calls (Ho et al. 2020). The precision and recall of overlapping calls depend on the combination of specific algorithms, and they cannot detect small insertions well (Kosugi et al. 2019). Therefore, the choice of the most appropriate combination of algorithms is also one of the difficulties in accurately calling SVs. Numerous emerging technologies (Falconer et al. 2012; Kitzman et al. 2011; Lee and Schatz 2012; Lieberman-Aiden et al. 2009) attempt to enhance the ability to detect long reads, which will greatly increase the utilization of SVs for researchers. However, to fully analyze SVs, it is necessary to integrate the functions of multiple platforms, which are still under pressure both technically and in terms of cost. Additionally, the impact of SVs on genome architecture and the exploration of their functional effects are major breakthroughs that need to be achieved. In summary, SVs remain challenging to accurately type using NGS data, limiting our understanding of their biological functions and utilization as genetic markers.

    3. Cutting-edge DL models for GEBV prediction encompass tens of millions of free parameters in order to learn complex combinations of predictive features from data. While the advanced capacity of DL models to encode latent feature representations leads to remarkable prediction accuracy, it simultaneously poses a significant challenge of their predictions by the over-fitting problem. DL is a type of predictor that can be obtained for very different models depending on their configuration. In the application of DL models in GS, besides choosing the network configuration, it is necessary to estimate the hyperparameters that control the regularization. Therefore, finding the optimal configuration for DL models may be quite challenging. Although DL models surpass the ability of other commonly used models in GEBV prediction, there is a shortage of explainable inferences of SNPs on phenotypes, which is referred to as black-box-behavior (Bellot et al. 2018). The challenge ensues in identifying which features and combinations of features are learned by the model. When dealing with complex feature sets, DL models learn tens of millions of parameters that collectively determine the occurrence of predictions but do not explain how they work. Thus, the challenge of interpreting complex models ensues when using DL to interpret complex datasets.

    Generative adversarial networks (GANs) (Cong et al. 2023), which enable the generation of fairly realistic synthetic images by forcing the generated images to be statistically almost indistinguishable from real ones for image augmentation, can generate images that look more like real ones, for solving the distortion problems for underwater images.

    Currently, most existing state-of-the-art DL models are still based on 2D images for detection, recognition and measurement of aquatic animals. However, 3D scanning technologies (e.g., binocular vision (Read 2021) and depth cameras (Xu et al. 2021)) provide tools for 3D image acquisition to obtain more spatial and volumetric traits. More useful features extracted from 3D images may help researchers to further study the phenotyping of aquatic animals. For example, researchers may obtain more precise individual volumes that could benefit genetic grain research in the future. Based on the fact that aquatic animals are three-dimensional in nature, using nondestructive phenotyping of 3D image-based traits allows efficient and accurate acquisition of complex growth traits and body conformation (Liao et al. 2021). Meanwhile, researchers may also make timely decisions to reduce losses in aquaculture through the abnormal behavior of aquatic animals judged from 3D images. In addition, the 3D videos (Honari et al. 2023) and point clouds (Wang et al. 2022b) raise a new challenge for DL models in phenotyping of complex traits. Therefore, constructing more comprehensive DL models that can more effectively detect and acquire data e.g., 3D images of aquatic animals in underwater environments, is one of the future research directions.

    It is clear that due to the complexities of the marine environment, image collection of aquatic animals is much more difficult than that of terrestrial animals. Therefore, there is an urgent need for the construction of high-quality public image datasets on different aquatic animals.

    Reliable SV genotyping for GS is urgently needed in many species to allow for more complete genetic studies (e.g., acquisition of more variants, study of gene functions and phenotypic changes, missing heritability, and genomic prediction). In aquatic animals, Bertolotti et al. (2020) performed whole genome sequencing (WGS) on 492 Atlantic salmon (Salmo salar) samples and obtained 15, 483 SVs with high confidence. This research imparted fresh perspectives on the significance of SVs in genome evolution and the genetic architecture of domestication traits, in addition to providing valuable resources for the accurate detection of SVs in aquatic animals. DL models have already been successfully used in genotyping of SNP variation, and have acquired the ability to learn the functional role of long DNA sequences (Kelley et al. 2016). Therefore, it is one of the future research directions to establish new DL based models for reliable SV genotyping.

    Despite an increasing shift towards the use of long-read sequencing for SV discovery and application, these technologies remain prohibitively expensive for large-scale genetic studies. In GS, genotype imputation has been the most widely used to reduce SNP genotyping costs. However, the approach is yet to be tested for SVs. Gundappa et al. (2023) investigated combined single nucleotide variant (SNV) and SV imputation with low-coverage WGS data in Atlantic salmon (Salmo salar). This study provided novel insights into utilizing SVs to enhance genomic selection. Autoencoders have already been used successfully in SNP imputation (Dias et al. 2022). Moreover, GANs can take long sequence noise (missing SV genotype) as input, and generate sample data and judge its authenticity (real SV genotype). Autoencoders combined with GANs are well-suited for SV genotype imputation.

    Attention is a complex cognitive function that is indispensable for human beings (Chaudhari et al. 2021). An important feature of it is that humans do not tend to process information as a whole at one time. Attention mechanism can be used as weight regularization for which weights are imposed on input sequences to selectively concentrate on a part of the input more likely to contain pertinent information crucial for the processing task (Novakovsky et al. 2023). During the learning of latent features, attention weights serve to focus DL models on a limited portion of the input. This focused portion may then be examined directly, assisting in pinpointing the parts that were instrumental in the model's internal representation. This solves the over-parameters of DL models for genomic prediction. With the development of sequencing technology, GS is facing an explosion in the amount of input data. WGS can produce millions or even tens of millions of variants, and the data volume remains huge even after quality control. It is challenging and time-consuming for a DL model to handle such a vast amount of data. Efficiency may be greatly improved if an attention mechanism is introduced here to draw global dependencies between inputs and outputs to obtain global information. Also, this allows the model to assign different weights to different positions of the input sequences, focusing on the most relevant parts.

    GS aims to use genomic information for faster genetic gain than using phenotypic information (Meuwissen et al. 2001), where an essential component is the inference of the relationship between genotype and phenotype in model training. The DL model is a good choice for genomic prediction, but the explanatory information for the predictions it makes is often missing. For genetics researchers, this missing explanatory information of the black-box-behavior would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. The genetic links between phenotype and genomic variations are too complex to determine directly at the genome sequencing level (Costanzo et al. 2019; Lehner 2013). At present, multi-omics data (Subramanian et al. 2020) (e.g., genomics, transcriptomics, proteomics and metabolomics) provide technical support to uncover the causality and underlying molecular mechanisms of genotype–phenotype relationships. This provides a ridge between phenotype and genomic variation that cannot be easily captured at the genome sequence level. It is urgent to develop new DL models and learning algorithms to integrate multi-omics data to uncover the causality and molecular mechanisms. For example, based on the causality and molecular mechanisms of genotype–phenotype relationships, we can use transfer learning (TL) (Xiao et al. 2022) to train DL models on a small subset of SNP variants. Then, this predicts the effect of all other variants or models trained on some genes can be used to make predictions on other genes. These include not only common alleles but also low-frequency and rare variants, irrespective of the magnitude of their effects. Also, the environment underlies phenotypic variation, and more comprehensive inferences about the mechanisms of genotype–phenotype relationships can be made in DL models by introducing environmental influences or even predicting environmental changes. Moreover, epigenetic inheritance is an important factor influencing phenotype. DL has been successfully applied to the analysis of high-throughput epigenomic data to predict epigenetic effects (Talukder et al. 2021). It is worth considering whether it is possible to connect genomic variations and epigenetic effects to explore the relationships with phenotype?

    Most current software of DL for phenotyping, genotyping, GEBV prediction and other genetic analyses necessitate a specialized background in bioinformatics and quantitative genetics. This poses difficulties for general geneticists and biologists in executing these analyses. Hence, a user-friendly web server needs to be developed to fill the gap. In our previous study, we developed the Aquaculture Molecular Breeding Platform (AMBP) (Zeng et al. 2022), as the first web server dedicated to genetic data analysis in aquatic species with value to aquaculture (Fig. 8). AMBP incorporates haplotype reference panels for 18 aquaculture species, significantly enhancing the precision of genotype imputation. Moreover, AMBP offers a comprehensive set of tools to infer genetic structures, dissects the genetic architecture of performance traits, and estimates breeding values by existing state-of-the-art genomic prediction models including a sparse DNN model. The web interface integrates all the tools seamlessly, enabling users to generate interpretable results and assess statistical appropriateness with ease. With the development of webserver and database technologies, more intelligent DL-based GS Systems should be possible in the future.

    Figure  8.  Functions of aquaculture molecular breeding platform (AMBP) including imputation, population characterizing, and genetic breeding (Zeng et al. 2022)

    DL is going to play an increasingly important role in marine exploration. It provides a more convenient and efficient way for researchers to further explore aquatic organisms and environments. This article reviews the applications of DL in genomic selection, including individual detection and classification, genotype calling and imputation, and genomic prediction of aquatic animals. In the future, novel phenotyping (e.g., machine vision) and genomic technologies (e.g., NGS), coupled with advances in DL models and improved computational efficiency, will further facilitate the use of GS in aquaculture. In addition, use of SVs, which could be more common than SNPs in the genome, will become more widespread as phenotyping and genomic technologies. They will become more affordable, coupled with DL-based approaches to apply low-cost imputation methods to aquaculture species. Making sense of genomic variation using DL models to extrapolate from causality association to causality will help us obtain new insights from DL, via artificial intelligence in genomic prediction. By using attention mechanism weights for DL models, we will get a new strategy to mitigate the issue of over-fitting in training DL models in GS. Clearly, it is essential to develop a user-friendly webserver to use DL for GS analysis in aquaculture. We have reason to believe that DL will promote the rapid development of molecular breeding in aquaculture, and providing support for the development of the aquaculture industry.

    The online version contains supplementary material available at https://doi.org/10.1007/s42995-024-00252-y.

    We acknowledge the grant support from the Key R & D Project of Shandong Province (2022ZLGX01), the STI2030-Major Projects (2023ZD0405506), the National Natural Science Foundation of China (32072976), Hainan Seed Industry Laboratory (B23H10002), the Key Research and Development Project of Hainan Province (ZDYF2023XDNY182).

    YW and JH designed the study. PN analyzed the data. YW, MS, QZ, BW and ZB conducted the manuscript writing. All the authors have read and approved the final manuscript.

    Data availability is not applicable to this article as no new data were created or analyzed in this study.

    The authors declare that they have no conflict of interest.

    This article does not contain any studies with human participants or animals performed by any of the authors.

    Edited by Xin Yu.

    Special Topic: Fishery Science and Technology.

  • Allal F, Nguyen NH (2022) Genomic selection in aquaculture species. Methods Mol Biol 2467: 469–491 doi: 10.1007/978-1-0716-2205-6_17
    Álvarez-Ellacuría A, Palmer M, Catalán IA, Lisani JL (2020) Image-based, unsupervised estimation of fish size from commercial landings using deep learning. ICES J Mar Sci 77: 1330–1339 doi: 10.1093/icesjms/fsz216
    Aslam ML, Carraro R, Bestin A, Cariou S, Sonesson AK, Bruant JS, Haffray P, Bargelloni L, Meuwissen THE (2018) Genetics of resistance to photobacteriosis in gilthead sea bream (Sparus aurata) using 2b-RAD sequencing. BMC Genet 19: 43 doi: 10.1186/s12863-018-0631-x
    Aslam ML, Carraro R, Sonesson AK, Meuwissen T, Tsigenopoulos CS, Rigos G, Bargelloni L, Tzokas K (2020) Genetic variation, GWAS and accuracy of prediction for host resistance to Sparicotyle chrysophrii in farmed gilthead sea bream (Sparus aurata). Front Genet 11: 594770 doi: 10.3389/fgene.2020.594770
    Ayub R, Zhao Q, Meloy MJ, Sullivan EV, Pfefferbaum A, Adeli E, Pohl KM (2020) Inpainting cropped diffusion MRI using deep generative models. Predict Intell Med 12329: 91–100 doi: 10.1007/978-3-030-59354-4_9
    Balagourouchetty L, Pragatheeswaran JK, Pottakkat B, Ramkumar G (2020) GoogLeNet-based ensemble FCNet classifier for focal liver lesion diagnosis. IEEE J Biomed Health Inform 24: 1686–1694 doi: 10.1109/JBHI.2019.2942774
    Barabaschi D, Tondelli A, Desiderio F, Volante A, Vaccino P, Valè G, Cattivelli L (2016) Next generation breeding. Plant Sci 242: 3–13 doi: 10.1016/j.plantsci.2015.07.010
    Bargelloni L, Tassiello O, Babbucci M, Ferraresso S, Franch R, Montanucci L, Carnier P (2021) Data imputation and machine learning improve association analysis and genomic prediction for resistance to fish photobacteriosis in the gilthead sea bream. Aquac Rep 20: 100661 doi: 10.1016/j.aqrep.2021.100661
    Barría A, Peñaloza C, Papadopoulou A, Mahmuddin M, Doeschl-Wilson A, Benzie JAH, Houston RD, Wiener P (2023) Genetic differentiation following recent domestication events: a study of farmed Nile tilapia (Oreochromis niloticus) populations. Evol Appl 16: 1220–1235 doi: 10.1111/eva.13560
    Baur C, Denner S, Wiestler B, Albarqouni S, Navab N (2020) Autoencoders for unsupervised anomaly segmentation in brain MR images: a comparative study. Med Image Anal 69: 101952 doi: 10.1016/j.media.2020.101952
    Bellot P, de Los CG, Pérez-Enciso M (2018) Can deep learning improve genomic prediction of complex human traits? Genetics 210: 809–819 doi: 10.1534/genetics.118.301298
    Bertolotti AC, Layer RM, Gundappa MK, Gallagher MD, Pehlivanoglu E, Nome T, Robledo D, Kent MP, Røsæg LL, Holen MM, Mulugeta TD, Ashton TJ, Hindar K, Sægrov H, Florø-Larsen B, Erkinaro J, Primmer CR, Bernatchez L, Martin SAM, Johnston IA et al (2020) The structural variation landscape in 492 Atlantic salmon genomes. Nat Commun 11: 5176 doi: 10.1038/s41467-020-18972-x
    Bhat JA, Ali S, Salgotra RK, Mir ZA, Dutta S, Jadon V, Tyagi A, Mushtaq M, Jain N, Singh PK, Singh GP, Prabhu KV (2016) Genomic selection in the era of next generation sequencing for complex traits in plant breeding. Front Genet 7: 221 doi: 10.3389/fgene.2016.00221
    Bolger DT, Morrison TA, Vance B, Lee D, Farid H (2012) A computer-assisted system for photographic mark–recapture analysis. Methods Ecol Evol 3: 813–822 doi: 10.1111/j.2041-210X.2012.00212.x
    Browning BL, Zhou Y, Browning SR (2018) A one-penny imputed genome from next-generation reference panels. Am J Hum Genet 103: 338–348 doi: 10.1016/j.ajhg.2018.07.015
    Cai L, Gao WuY, J, (2019) DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network. BMC Bioinform 20: 665 doi: 10.1186/s12859-019-3299-y
    Chang CC, Ubina NA, Cheng SC, Lan HY, Chen KC, Huang CC (2022) A two-mode underwater smart sensor object for precision aquaculture based on AIoT technology. Sensors 22: 7603 doi: 10.3390/s22197603
    Chaudhari S, Mithal V, Polatkan G, Ramanath R (2021) An attentive survey of attention models. ACM Trans Intell Syst Technol 12: 1–32 doi: 10.1145/3465055
    Clark SA, van der Werf J (2013) Genomic best linear unbiased prediction (gBLUP) for the estimation of genomic breeding values. Methods Mol Biol 1019: 321–330 doi: 10.1007/978-1-62703-447-0_13
    Collard BC, Mackill DJ (2008) Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philos Trans R Soc Lond B Biol Sci 363: 557–572 doi: 10.1098/rstb.2007.2170
    Cong R, Yang W, Zhang W, Li C, Guo CL, Huang Q, Kwong S (2023) PUGAN: physical model-guided underwater image enhancement using GAN with dual-discriminators. IEEE Trans Image Process 32: 4472–4485 doi: 10.1109/TIP.2023.3286263
    Correa K, Bangera R, Figueroa R, Lhorente JP, Yáñez JM (2017) The use of genomic information increases the accuracy of breeding value predictions for sea louse (Caligus rogercresseyi) resistance in Atlantic salmon (Salmo salar). Genet Sel Evol 49: 15 doi: 10.1186/s12711-017-0291-8
    Costanzo M, Kuzmin E, van Leeuwen J, Mair B, Moffat J, Boone C, Andrews B (2019) Global genetic networks and the genotype-to-phenotype relationship. Cell 177: 85–100 doi: 10.1016/j.cell.2019.01.033
    Cuevas J, Montesinos-López O, Juliana P, Guzmán C, Pérez-Rodríguez P, González-Bucio J, Burgueño J, Montesinos-López A, Crossa J (2019) Deep kernel for genomic and near infrared predictions in multi-environment breeding trials. G3-Genes Genom Genet 9: 2913–2924 doi: 10.1534/g3.119.400493
    Currie G, Hawk KE, Rohren E, Vial A, Klein R (2019) Machine learning and deep learning in medical imaging: intelligent imaging. J Med Imaging Radiat Sci 50: 477–487 doi: 10.1016/j.jmir.2019.09.005
    Dai P, Kong J, Liu J, Lu X, Sui J, Meng X, Luan S (2020) Evaluation of the utility of genomic information to improve genetic evaluation of feed efficiency traits of the pacific white shrimp Litopenaeus vannamei. Aquaculture 527: 735421 doi: 10.1016/j.aquaculture.2020.735421
    Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh PR, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR et al (2016) Next-generation genotype imputation service and methods. Nat Genet 48: 1284–1287 doi: 10.1038/ng.3656
    de Almeida BP, Reiter F, Pagani M, Stark A (2022) DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 54: 613–624 doi: 10.1038/s41588-022-01048-5
    de Los R-Pérez L, Brunner RM, Hadlich F, Rebl A, Kühn C, Wittenburg D, Goldammer T, Verleih M (2020) Comparative analysis of the transcriptome and distribution of putative SNPs in two rainbow trout (Oncorhynchus mykiss) breeding strains by using next-generation sequencing. Genes 11: 841 doi: 10.3390/genes11080841
    de Moraes BFX, dos Santos RF, de Lima BM, Aguiar AM, Missiaggia AA, da Costa DD, Rezende GDPS, Gonçalves FMA, Acosta JJ, Kirst M, Resende MFR, Muñoz PR (2018) Genomic selection prediction models comparing sequence capture and SNP array genotyping methods. Mol Breed 38: 115 doi: 10.1007/s11032-018-0865-3
    Defeo O, Castrejon M, Ortega L, Kuhn AM, Gutierrez NL, Castilla JC (2013) Impacts of climate variability on Latin American small-scale fisheries. Ecol Soc 18: 30 doi: 10.5751/ES-05971-180430
    Delomas TA, Hollenbeck CM, Matt JL, Thompson NF (2023) Evaluating cost-effective genotyping strategies for genomic selection in oysters. Aquaculture 562: 738844 doi: 10.1016/j.aquaculture.2022.738844
    Derry A, Krzywinski M, Altman N (2023) Convolutional neural networks. Nat Methods 20: 1269–1270 doi: 10.1038/s41592-023-01973-1
    DeWeerdt S (2020) Can aquaculture overcome its sustainability challenges? Nature 588: S60–S62 doi: 10.1038/d41586-020-03446-3
    Dias R, Evans D, Chen SF, Chen KY, Loguercio S, Chan L, Torkamani A (2022) Rapid, reference-free human genotype imputation with denoising autoencoders. Elife 11: e75600 doi: 10.7554/eLife.75600
    Dias R, Torkamani A (2019) Artificial intelligence in clinical and genomic diagnostics. Genome Med 11: 70 doi: 10.1186/s13073-019-0689-8
    Dong L, Xiao S, Wang Q, Wang Z (2016) Comparative analysis of the GBLUP, emBayesB, and GWAS algorithms to predict genetic values in large yellow croaker (Larimichthys crocea). BMC Genom 17: 460 doi: 10.1186/s12864-016-2756-5
    Dou J, Li X, Fu Q, Jiao W, Li Y, Li T, Wang Y, Hu X, Wang S, Bao Z (2016) Evaluation of the 2b-RAD method for genomic selection in scallop breeding. Sci Rep 6: 19244 doi: 10.1038/srep19244
    Du K, Stöck M, Kneitz S, Klopp C, Woltering JM, Adolfi MC, Feron R, Prokopov D, Makunin A, Kichigin I, Schmidt C, Fischer P, Kuhl H, Wuertz S, Gessner J, Kloas W, Cabau C, Iampietro C, Parrinello H, Tomlinson C et al (2020) The sterlet sturgeon genome sequence and the mechanisms of segmental rediploidization. Nat Ecol Evol 4: 841–852 doi: 10.1038/s41559-020-1166-x
    Ehret A, Hochstuhl D, Gianola D, Thaller G (2015) Application of neural networks with back-propagation to genome-enabled prediction of complex traits in Holstein-Friesian and German Fleckvieh cattle. Genet Sel Evol 47: 22 doi: 10.1186/s12711-015-0097-5
    Falconer E, Hills M, Naumann U, Poon SS, Chavez EA, Sanders AD, Zhao Y, Hirst M, Lansdorp PM (2012) DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat Methods 9: 1107–1112 doi: 10.1038/nmeth.2206
    Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96: 1348–1360 doi: 10.1198/016214501753382273
    Fearnbach H, Durban J, Parsons K, Claridge D (2012) Photographic mark-recapture analysis of local dynamics within an open population of dolphins. Ecol Appl 22: 1689–1700 doi: 10.1890/12-0021.1
    Forcada J, Aguilar A (2000) Use of photographic identification in capture-recapture studies of mediterranean monk seals. Mar Mamm Sci 16: 767–793 doi: 10.1111/j.1748-7692.2000.tb00971.x
    Gao Y, Wang Q, Liu Y, Ma Y, Jin H, Liu J, Wang H, Yan Y, Li J (2023) Epidemiology of turbot bacterial diseases in China between October 2016 and December 2019. Front Mar Sci 10: 1145083 doi: 10.3389/fmars.2023.1145083
    Gianola D, Okut H, Weigel KA, Rosa GJM (2011) Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genom 12: 87
    Gianola D, van Kaam JBCHM (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178: 2289–2303 doi: 10.1534/genetics.107.084285
    Gianola D, Wu XL, Manfredi E, Simianer H (2010) A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait. Genetica 138: 959–977 doi: 10.1007/s10709-010-9478-4
    Griot R, Allal F, Phocas F, Brard-Fudulea S, Morvezen R, Haffray P, François Y, Morin T, Bestin A, Bruant JS, Cariou S, Peyrou B, Brunier J, Vandeputte M (2021) Optimization of genomic selection to improve disease resistance in two marine fishes, the European sea bass (Dicentrarchus labrax) and the gilthead sea bream (Sparus aurata). Front Genet 12: 665920 doi: 10.3389/fgene.2021.665920
    Gundappa MK, Robledo D, Hamilton A, Houston R, Prendergast J, Macqueen D (2023) High performance imputation of structural and single nucleotide variants in Atlantic salmon using low-coverage whole genome sequencing. bioRxiv. https://doi.org/10.1101/2023.03.05.531147
    Guo Y, Lu M, Zuo W, Zhang C, Chen Y (2021) Deep likelihood network for image restoration with multiple degradation levels. IEEE Trans Image Process 30: 2669–2681 doi: 10.1109/TIP.2021.3051767
    Gutierrez AP, Matika O, Bean TP, Houston RD (2018) Genomic selection for growth traits in pacific oyster (Crassostrea gigas): potential of low-density marker panels for breeding value prediction. Front Genet 9: 391 doi: 10.3389/fgene.2018.00391
    Gutierrez AP, Symonds J, King N, Steiner K, Bean TP, Houston RD (2020) Potential of genomic selection for improvement of resistance to ostreid herpesvirus in pacific oyster (Crassostrea gigas). Anim Genet 51: 249–257 doi: 10.1111/age.12909
    Guzelbulut C, Suzuki K, Shimono S (2022) Singular value decomposition-based gait characterization. Heliyon 8: e12006 doi: 10.1016/j.heliyon.2022.e12006
    Hayes B (2013) Overview of statistical methods for genome-wide association studies (GWAS). Methods Mol Biol 1019: 149–169 doi: 10.1007/978-1-62703-447-0_6
    He F, Liu T, Tao D (2020) Why ResNet works? Residuals generalize. IEEE Trans Neural Netw Learn Syst 31: 5349–5362 doi: 10.1109/TNNLS.2020.2966319
    Hillen JEJ, Coscia I, Vandeputte M, Herten K, Hellemans B, Maroso F, Vergnet A, Allal F, Maes GE, Volckaert FAM (2017) Estimates of genetic variability and inbreeding in experimentally selected populations of European sea bass. Aquaculture 479: 742–749 doi: 10.1016/j.aquaculture.2017.07.012
    Ho SS, Urban AE, Mills RE (2020) Structural variation in the sequencing era. Nat Rev Genet 21: 171–189 doi: 10.1038/s41576-019-0180-9
    Honari S, Constantin V, Rhodin H, Salzmann M, Fua P (2023) Temporal representation learning on monocular videos for 3D human pose estimation. IEEE Trans Pattern Anal Mach Intell 45: 6415–6427 doi: 10.1109/TPAMI.2022.3215307
    Horn SS, Meuwissen THE, Moghadam H, Hillestad B, Sonesson AK (2020) Accuracy of selection for omega-3 fatty acid content in Atlantic salmon fillets. Aquaculture 519: 734767 doi: 10.1016/j.aquaculture.2019.734767
    Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2: 359–366 doi: 10.1016/0893-6080(89)90020-8
    Hu T, Chitnis N, Monos D, Dinh A (2021) Next-generation sequencing technologies: an overview. Hum Immunol 82: 801–811 doi: 10.1016/j.humimm.2021.02.012
    Ilnicka A, Schneider G (2023) Designing molecules with autoencoder networks. Nat Comput Sci 3: 922–933 doi: 10.1038/s43588-023-00548-6
    Jepsen N, Thorstad EB, Havn T, Lucas MC (2015) The use of external electronic tags on fish: an evaluation of tag retention and tagging effects. Anim Biotelem 3: 49 doi: 10.1186/s40317-015-0086-z
    Jiang H, Li S, Dong X, Soowannayan C (2023) Editorial: the transmission and prevention of infectious diseases in aquatic animals. Front Mar Sci 10: 1259722 doi: 10.3389/fmars.2023.1259722
    Joshi R, Almeida DB, da Costa AR, Skaarud A, de Pádua PU, Knutsen TM, Moen T, Alvarez AT (2021a) Genomic selection for resistance to Francisellosis in commercial Nile tilapia population: genetic and genomic parameters, correlation with growth rate and predictive ability. Aquaculture 537: 736515 doi: 10.1016/j.aquaculture.2021.736515
    Joshi R, Skaarud A, Alvarez AT, Moen T, Ødegård J (2021b) Bayesian genomic models boost prediction accuracy for survival to Streptococcus agalactiae infection in Nile tilapia (Oreochromus nilioticus). Genet Sel Evol 53: 37 doi: 10.1186/s12711-021-00629-y
    Joshi R, Skaaurd A, Tola Alvarez A (2021c) Experimental validation of genetic selection for resistance against Streptococcus agalactiae via different routes of infection in the commercial Nile tilapia breeding programme. J Anim Breed Genet 138: 338–348
    Kamilaris A, Prenafeta-Boldú FX (2018) A review of the use of convolutional neural networks in agriculture. J Agric Sci 156: 312–322 doi: 10.1017/S0021859618000436
    Ke Q, Wang J, Bai Y, Zhao J, Gong J, Deng Y, Qu A, Suo N, Chen J, Zhou T, Xu P (2022) GWAS and genomic prediction revealed potential for genetic improvement of large yellow croaker adapting to high plant protein diet. Aquaculture 553: 738090 doi: 10.1016/j.aquaculture.2022.738090
    Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26: 990–999 doi: 10.1101/gr.200535.115
    Khan A, Khan A, Ullah M, Alam MM, Bangash JI, Suud MM (2022) A computational classification method of breast cancer images using the VGGNet model. Front Comput Neurosci 16: 1001803 doi: 10.3389/fncom.2022.1001803
    Kitzman JO, Mackenzie AP, Adey A, Hiatt JB, Patwardhan RP, Sudmant PH, Ng SB, Alkan C, Qiu R, Eichler EE, Shendure J (2011) Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol 29: 59–63 doi: 10.1038/nbt.1740
    Kjetså MH, Ødegård J, Meuwissen THE (2020) Accuracy of genomic prediction of host resistance to salmon lice in Atlantic salmon (Salmo salar) using imputed high-density genotypes. Aquaculture 526: 735415 doi: 10.1016/j.aquaculture.2020.735415
    Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y (2019) Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 20: 117 doi: 10.1186/s13059-019-1720-5
    Kriegeskorte N, Golan T (2019) Neural network models and deep learning. Curr Biol 29: R231–R236 doi: 10.1016/j.cub.2019.02.034
    Langtimm CA, Beck CA, Edwards HH, Fick-Child KJ, Ackerman BB, Barton SL, Hartley WC (2004) Survival estimates for Florida manatees from the photo-identification of individuals. Mar Mamm Sci 20: 438–463 doi: 10.1111/j.1748-7692.2004.tb01171.x
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521: 436–444 doi: 10.1038/nature14539
    Lee H, Schatz MC (2012) Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics 28: 2097–2105 doi: 10.1093/bioinformatics/bts330
    Lehner B (2013) Genotype to phenotype: lessons from model organisms for human genetics. Nat Rev Genet 14: 168–178 doi: 10.1038/nrg3404
    Li J, Xu C, Jiang L, Xiao Y, Deng L, Han Z (2020) Detection and analysis of behavior trajectory for sea cucumbers based on deep learning. IEEE Access 8: 18832–18840 doi: 10.1109/ACCESS.2019.2962823
    Li Y, Willer C, Sanna S, Abecasis G (2009) Genotype imputation. Annu Rev Genom Hum Genet 10: 387–406 doi: 10.1146/annurev.genom.9.081307.164242
    Liao YH, Zhou CW, Liu WZ, Jin JY, Li DY, Liu F, Fan DD, Zou Y, Mu ZB, Shen J, Liu CN, Xiao SJ, Yuan XH, Liu HP (2021) 3DPhenoFish: application for two- and three-dimensional fish morphological phenotype extraction from point cloud analysis. Zool Res 42: 492–501 doi: 10.24272/j.issn.2095-8137.2021.141
    Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950): 289–293 doi: 10.1126/science.1181369
    Liphardt J (2017) DeepEvolve: rapid hyperparameter discovery for neural nets using genetic algorithms. Available at: https://github.com/jliphard/DeepEvolve/. Accessed: January 2018
    Liu G, Dong L, Gu L, Han Z, Zhang W, Fang M, Wang Z (2019) Evaluation of genomic selection for seven economic traits in yellow drum (Nibea albiflora). Mar Biotechnol 21: 806–812 doi: 10.1007/s10126-019-09925-7
    Liu T, Luo C, Ma J, Wang Y, Shu D, Su G, Qu H (2020) High-throughput sequencing with the preselection of markers is a good alternative to SNP chips for genomic prediction in broilers. Front Genet 11: 108 doi: 10.3389/fgene.2020.00108
    Liu Z (2020) Soft-shell shrimp recognition based on an improved AlexNet for quality evaluations. J Food Eng 266: 109698 doi: 10.1016/j.jfoodeng.2019.109698
    Long N, Gianola D, Rosa GJ, Weigel KA, Kranis A, González-Recio O (2010) Radial basis function regression methods for predicting quantitative traits using SNP markers. Genet Res 92: 209–225 doi: 10.1017/S0016672310000157
    Luo R, Sedlazeck FJ, Lam T-W, Schatz MC (2019) A multi-task convolutional deep neural network for variant calling in single molecule sequencing. Nat Commun 10: 998 doi: 10.1038/s41467-019-09025-z
    Luo Z, Yu Y, Bao Z, Li F (2024) Evaluation of machine learning method in genomic selection for growth traits of Pacific white shrimp. Aquaculture 581: 740376 doi: 10.1016/j.aquaculture.2023.740376
    Luo Z, Yu Y, Bao Z, Xiang J, Li F (2022) Evaluation of genomic selection for high salinity tolerance traits in Pacific white shrimp Litopenaeus vannamei. Aquaculture 557: 738320 doi: 10.1016/j.aquaculture.2022.738320
    Lv J, Wang Y, Ni P, Lin P, Hou H, Ding J, Chang Y, Hu J, Wang S, Bao Z (2022) Development of a high-throughput SNP array for sea cucumber (Apostichopus japonicus) and its application in genomic selection with MCP regularized deep neural networks. Genomics 114: 110426 doi: 10.1016/j.ygeno.2022.110426
    Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829 doi: 10.1093/genetics/157.4.1819
    Monkman GG, Hyder K, Kaiser MJ, Vidal FP (2019) Using machine vision to estimate fish length from images using regional convolutional neural networks. Methods Ecol Evol 10: 2045–2056 doi: 10.1111/2041-210X.13282
    Montesinos-López OA, Martín-Vallejo J, Crossa J, Gianola D, Hernández-Suárez CM, Montesinos-López A, Juliana P, Singh R (2019a) A benchmarking between deep learning, support vector machine and Bayesian threshold best linear unbiased prediction for predicting ordinal traits in plant breeding. G3 9: 601–618 doi: 10.1534/g3.118.200998
    Montesinos-López OA, Martín-Vallejo J, Crossa J, Gianola D, Hernández-Suárez CM, Montesinos-López A, Juliana P, Singh R (2019b) New deep learning genomic-based prediction model for multiple traits with binary, ordinal, and continuous phenotypes. G3 9: 1545–1556 doi: 10.1534/g3.119.300585
    Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In Proceedings of the European conference on computer vision, Springer, pp 483–499
    Nguyen NH, Premachandra HKA, Kilian A, Knibb W (2018) Genomic prediction using DArT-Seq technology for yellowtail kingfish Seriola lalandi. BMC Genom 19: 107 doi: 10.1186/s12864-018-4493-4
    Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12: 443–451 doi: 10.1038/nrg2986
    Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW, Mostafavi S (2023) Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet 24: 125–137 doi: 10.1038/s41576-022-00532-2
    Okut H, Wu X-L, Rosa GJM, Bauck S, Woodward BW, Schnabel RD, Taylor JF, Gianola D (2013) Predicting expected progeny difference for marbling score in Angus cattle using artificial neural networks and Bayesian regression models. Genet Sel Evol 45: 34 doi: 10.1186/1297-9686-45-34
    Ortega L, Celentano E, Delgado E, Defeo O (2016) Climate change influences on abundance, individual size and body abnormalities in a sandy beach clam. Mar Ecol Prog Ser 545: 203–213 doi: 10.3354/meps11643
    Palaiokostas C (2021) Predicting for disease resistance in aquaculture species using machine learning models. Aquac Rep 20: 100660 doi: 10.1016/j.aqrep.2021.100660
    Palaiokostas C, Ferraresso S, Franch R, Houston RD, Bargelloni L (2016) Genomic prediction of resistance to pasteurellosis in gilthead sea bream (Sparus aurata) using 2b-RAD sequencing. G3 6: 3693–3700 doi: 10.1534/g3.116.035220
    Palaiokostas C, Kocour M, Prchal M, Houston RD (2018) Accuracy of genomic evaluations of juvenile growth rate in common carp (Cyprinus carpio) using genotyping by sequencing. Front Genet 9: 82 doi: 10.3389/fgene.2018.00082
    Palaiokostas C, Vesely T, Kocour M, Prchal M, Pokorova D, Piackova V, Pojezdal L, Houston RD (2019) Optimizing genomic prediction of host resistance to Koi Herpesvirus disease in carp. Front Genet 10: 543 doi: 10.3389/fgene.2019.00543
    Peñaloza C, Manousaki T, Franch R, Tsakogiannis A, Sonesson AK, Aslam ML, Allal F, Bargelloni L, Houston RD, Tsigenopoulos CS (2021) Development and testing of a combined species SNP array for the European seabass (Dicentrarchus labrax) and gilthead seabream (Sparus aurata). Genomics 113: 2096–2107 doi: 10.1016/j.ygeno.2021.04.038
    Pérez-Rodríguez P, Gianola D, Weigel KA, Rosa GJ, Crossa J (2013) Technical note: an R package for fitting Bayesian regularized neural networks with applications in animal breeding. J Anim Sci 91: 3522–3531 doi: 10.2527/jas.2012-6162
    Piepho HP (2009) Ridge regression and extensions for genomewide selection in maize. Crop Sci 49: 1165–1176 doi: 10.2135/cropsci2008.10.0595
    Popic V, Rohlicek C, Cunial F, Hajirasouliha I, Meleshko D, Garimella K, Maheshwari A (2023) Cue: a deep-learning framework for structural variant discovery and genotyping. Nat Methods 20: 559–568 doi: 10.1038/s41592-023-01799-x
    Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T, Ku A, Newburger D, Dijamco J, Nguyen N, Afshar PT, Gross SS, Dorfman L, McLean CY, DePristo MA (2018) A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 36: 983–987 doi: 10.1038/nbt.4235
    Read JCA (2021) Binocular vision and stereopsis across the animal kingdom. Annu Rev vis Sci 7: 389–415 doi: 10.1146/annurev-vision-093019-113212
    Rubinacci S, Ribeiro DM, Hofmeister RJ, Delaneau O (2021) Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat Genet 53: 120–126 doi: 10.1038/s41588-020-00756-0
    Sargolzaei M, Chesnais JP, Schenkel FS (2014) A new approach for efficient genotype imputation using information from relatives. BMC Genom 15: 478 doi: 10.1186/1471-2164-15-478
    Sato M, Hosoya S, Yoshikawa S, Ohki S, Kobayashi Y, Itou T, Kikuchi K (2019) A highly flexible and repeatable genotyping method for aquaculture studies based on target amplicon sequencing using next-generation sequencing technology. Sci Rep 9: 6904 doi: 10.1038/s41598-019-43336-x
    Segebarth D, Griebel M, Stein N, von Collenberg CR, Martin C, Fiedler D, Comeras LB, Sah A, Schoeffler V, Lüffe T, Dürr A, Gupta R, Sasi M, Lillesaar C, Lange MD, Tasan RO, Singewald N, Pape HC, Flath CM, Blum R (2020) On the objectivity, reliability, and validity of deep learning enabled bioimage analyses. Elife 9: e59780 doi: 10.7554/eLife.59780
    Shi S, Qian Q, Yu S, Wang Q, Wang J, Zeng J, Du Z, Xiao J (2021) RefRGim: an intelligent reference panel reconstruction method for genotype imputation with convolutional neural networks. Brief Bioinform 22: bbab326 doi: 10.1093/bib/bbab326
    Shi S, Yuan N, Yang M, Du Z, Wang J, Sheng X, Wu J, Xiao J (2018) Comprehensive assessment of genotype imputation performance. Hum Hered 83: 107–116 doi: 10.1159/000489758
    Slatko BE, Gardner AF, Ausubel FM (2018) Overview of next-generation sequencing technologies. Curr Protoc Mol Biol 122: e59 doi: 10.1002/cpmb.59
    Song H, Dong T, Yan X, Wang W, Tian Z, Hu H (2023) Using Bayesian threshold model and machine learning method to improve the accuracy of genomic prediction for ordered categorical traits in fish. Agric Commun 1: 100005
    Song H, Dong T, Yan X, Wang W, Tian Z, Sun A, Dong Y, Zhu H, Hu H (2022) Genomic selection and its research progress in aquaculture breeding. Rev Aquac 15: 274–291 doi: 10.1111/raq.12716
    Soon TK, Ransangan J (2019) Extrinsic factors and marine bivalve mass mortalities: an overview. J Shellfish Res 38: 223–232 doi: 10.2983/035.038.0202
    Stokowski M, Sobiegraj W, Kulczykowska E (2023) Potential role of climate change on the spread of salmonid skin condition: the biogeochemical hypothesis on ulcerative dermal necrosis on the Słupia River - Poland. Front Mar Sci 10: 1104436 doi: 10.3389/fmars.2023.1104436
    Subramanian I, Verma S, Kumar S, Jere A, Anamika K (2020) Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 14: 1177932219899051 doi: 10.1177/1177932219899051
    Sun X, Shi J, Liu L, Dong J, Plant C, Wang X, Zhou H (2018) Transferring deep knowledge for object recognition in Low-quality underwater videos. Neurocomputing 275: 897–908 doi: 10.1016/j.neucom.2017.09.044
    Talukder A, Barham C, Li X, Hu H (2021) Interpretation of deep learning in genomics and epigenomics. Brief Bioinform 22: bbaa177 doi: 10.1093/bib/bbaa177
    Tan K, Zheng H (2020) Ocean acidification and adaptive bivalve farming. Sci Total Environ 701: 134794 doi: 10.1016/j.scitotenv.2019.134794
    Tan KS, Ransangan J (2015) Factors influencing the toxicity, detoxification and biotransformation of paralytic shellfish toxins. Rev Environ Contam Toxicol 235: 1–25
    Tarassenko L (1995) Neural networks. Lancet 346: 1712 doi: 10.1016/S0140-6736(95)92880-4
    Tsairidou S, Hamilton A, Robledo D, Bron JE, Houston RD (2020) Optimizing low-cost genotyping and imputation strategies for genomic selection in Atlantic salmon. G3-Genes Genom Genet 10: 581–590 doi: 10.1534/g3.119.400800
    Vallejo RL, Leeds TD, Fragomeni BO, Gao G, Hernandez AG, Misztal I, Welch TJ, Wiens GD, Palti Y (2016) Evaluation of genome-enabled selection for bacterial cold water disease resistance using progeny performance data in rainbow trout: insights on genotyping methods and genomic prediction models. Front Genet 7: 96 doi: 10.3389/fgene.2016.00096
    Vallejo RL, Leeds TD, Gao G, Parsons JE, Martin KE, Evenhuis JP, Fragomeni BO, Wiens GD, Palti Y (2017) Genomic selection models double the accuracy of predicted breeding values for bacterial cold water disease resistance compared to a traditional pedigree-based model in rainbow trout aquaculture. Genet Sel Evol 49: 17 doi: 10.1186/s12711-017-0293-6
    VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91: 4414–4423 doi: 10.3168/jds.2007-0980
    Wallace JG, Rodgers-Melnick E, Buckler ES (2018) On the road to breeding 4.0: unraveling the good, the bad, and the boring of crop quantitative genomics. Annu Rev Genet 52: 421–444 doi: 10.1146/annurev-genet-120116-024846
    Wan L, Chen Y, Li H, Li C (2020) Rolling-element bearing fault diagnosis using improved LeNet-5 network. Sensors 20: 1693 doi: 10.3390/s20061693
    Wang H, Teng M, Liu P, Zhao M, Wang S, Hu J, Bao Z, Zeng Q (2022a) Selection signatures of pacific white shrimp Litopenaeus vannamei revealed by whole-genome resequencing analysis. Front Mar Sci 9: 844597 doi: 10.3389/fmars.2022.844597
    Wang J, Chakraborty R, Yu SX (2022b) Transformer for 3D point clouds. IEEE Trans Pattern Anal Mach Intell 44: 4419–4431
    Wang K, Yang B, Li Q, Liu S (2022c) Systematic evaluation of genomic prediction algorithms for genomic prediction and breeding of aquatic animals. Genes 13: 2247
    Wang Q, Yu Y, Li F, Zhang X, Xiang J (2017a) Predictive ability of genomic selection models for breeding value estimation on growth traits of Pacific white shrimp Litopenaeus vannamei. Chin J Oceanol Limnol 35: 1221–1229 doi: 10.1007/s00343-017-6038-0
    Wang Q, Yu Y, Yuan J, Zhang X, Huang H, Li F, Xiang J (2017b) Effects of marker density and population structure on the genomic prediction accuracy for growth trait in Pacific white shrimp Litopenaeus vannamei. BMC Genet 18: 45 doi: 10.1186/s12863-017-0507-5
    Wang Q, Yu Y, Zhang Q, Zhang X, Huang H, Xiang J, Li F (2019) Evaluation on the genomic selection in Litopenaeus vannamei for the resistance against Vibrio parahaemolyticus. Aquaculture 505: 212–216 doi: 10.1016/j.aquaculture.2019.02.055
    Wang Y, Mi X, Rosa GJM, Chen Z, Lin P, Wang S, Bao Z (2018a) Technical note: an R package for fitting sparse neural networks with application in animal breeding. J Anim Sci 96: 2016–2026 doi: 10.1093/jas/sky071
    Wang Y, Ren Q, Zhao L, Li M, Kong X, Xu Y, Hu X, Hu J, Bao Z (2022d) Estimating genetic parameters of muscle imaging trait with 2b-RAD SNP markers in Zhikong scallop (Chlamys farreri). Aquaculture 549: 737715
    Wang Y, Sun G, Zeng Q, Chen Z, Hu X, Li H, Wang S, Bao Z (2018b) Predicting growth traits with genomic selection methods in Zhikong scallop (Chlamys farreri). Mar Biotechnol 20: 769–779 doi: 10.1007/s10126-018-9847-z
    Wang Y, Xin C, Zhu B, Wang M, Wang T, Ni P, Song S, Liu M, Wang B, Bao Z, Hu J (2022e) A new non-invasive tagging method for leopard coral grouper (Plectropomus leopardus) using deep convolutional neural networks with PDE-based image decomposition. Front Mar Sci 9: 1093623
    Watanabe K, Stringer S, Frei O, Umićević Mirkov M, de Leeuw C, Polderman TJC, van der Sluis S, Andreassen OA, Neale BM, Posthuma D (2019) A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet 51: 1339–1348 doi: 10.1038/s41588-019-0481-0
    Wolc A, Dekkers JCM (2022) Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol 54: 31 doi: 10.1186/s12711-022-00724-8
    Xiao Y, Liang F, Liu B (2022) A transfer learning-based multi-instance learning method with weak labels. IEEE Trans Cybern 52: 287–300 doi: 10.1109/TCYB.2020.2973450
    Xing Q, Wei T, Chen Z, Wang Y, Lu Y, Wang S, Zhang L, Bao Z (2017) Using a multiscale image processing method to characterize the periodic growth patterns on scallop shells. Ecol Evol 7: 1616–1626 doi: 10.1002/ece3.2789
    Xu L, Cheng W, Guo K, Han L, Liu Y, Fang L (2021) FlyFusion: realtime dynamic scene reconstruction using a flying depth camera. IEEE Trans vis Comput Graph 27: 68–82 doi: 10.1109/TVCG.2019.2930691
    Yang W, Feng H, Zhang X, Zhang J, Doonan JH, Batchelor WD, Xiong L, Yan J (2020) Crop phenomics and high-throughput phenotyping: past decades, current challenges, and future perspectives. Mol Plant 13: 187–214 doi: 10.1016/j.molp.2020.01.008
    Yoshida GM, Bangera R, Carvalheiro R, Correa K, Figueroa R, Lhorente JP, Yáñez JM (2018) Genomic prediction accuracy for resistance against Piscirickettsia salmonis in farmed rainbow trout. G3 8: 719–726 doi: 10.1534/g3.117.300499
    Yoshida GM, Lhorente JP, Correa K, Soto J, Salas D, Yáñez JM (2019) Genome-wide association study and cost-efficient genomic predictions for growth and fillet yield in Nile tilapia (Oreochromis niloticus). G3 9: 2597–2607 doi: 10.1534/g3.119.400116
    Yuan X, Wang Q, Yan B, Zhang J, Xue C, Chen J, Lin Y, Zhang X, Shen W, Chen X (2021) Single-molecule real-time and Illumina-based RNA sequencing data identified vernalization-responsive candidate genes in faba bean (Vicia faba L.). Front Genet 12: 656137 doi: 10.3389/fgene.2021.656137
    Zannella C, Mosca F, Mariani F, Franci G, Folliero V, Galdiero M, Tiscar PG, Galdiero M (2017) Microbial diseases of bivalve mollusks: infections, immunology and antimicrobial defense. Mar Drugs 15: 182 doi: 10.3390/md15060182
    Zeng J, Feng M, Deng Y, Jiang P, Bai Y, Wang J, Qu A, Liu W, Jiang Z, He Q, Wang Z, Xu P (2024) Deep learning to obtain high-throughput morphological phenotypes and its genetic correlation with swimming performance in juvenile large yellow croaker. Aquaculture 578: 740051 doi: 10.1016/j.aquaculture.2023.740051
    Zeng Q, Zhao B, Wang H, Wang M, Teng M, Hu J, Bao Z, Wang Y (2022) Aquaculture Molecular Breeding Platform (AMBP): a comprehensive web server for genotype imputation and genetic analysis in aquaculture. Nucleic Acids Res 50: 66–74 doi: 10.1093/nar/gkac424
    Zenger KR, Khatkar MS, Jones DB, Khalilisamani N, Jerry DR, Raadsma HW (2019) Genomic selection in aquaculture: application, limitations and opportunities with special reference to marine shrimp and pearl oysters. Front Genet 9: 693 doi: 10.3389/fgene.2018.00693
    Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38: 894–942 doi: 10.1214/09-AOS729
    Zhang W, Li W, Liu G, Gu L, Ye K, Zhang Y, Li W, Jiang D, Wang Z, Fang M (2021) Evaluation for the effect of low-coverage sequencing on genomic selection in large yellow croaker. Aquaculture 534: 736323 doi: 10.1016/j.aquaculture.2020.736323
    Zhao J, Li Y, Zhang F, Zhu S, Liu Y, Lu H, Ye Z (2018) Semi-supervised learning-based live fish identification in aquaculture using modified deep convolutional generative adversarial networks. T ASABE 61: 699–710 doi: 10.13031/trans.12684
    Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12: 931–934 doi: 10.1038/nmeth.3547
    Zhou M, Yuan Y, Zhang Y, Zhang W, Zhou R, Ji J, Wu H, Zhao Y, Zhang D, Liu B, Jiang D, Wang Z, Fang M (2023) The study of the genomic selection of white gill disease resistance in large yellow croaker (Larimichthys crocea). Aquaculture 574: 739682 doi: 10.1016/j.aquaculture.2023.739682
    Zhu X, Ni P, Xing Q, Wang Y, Huang X, Hu X, Hu J, Wu X, Bao Z (2021) Genomic prediction of growth traits in scallops using convolutional neural networks. Aquaculture 545: 737171 doi: 10.1016/j.aquaculture.2021.737171
  • Cited by

    Periodical cited type(4)

    1. Ziyi Kang, Jie Kong, Qi Li, et al. Genomic selection for hard-to-measure traits in aquaculture: Challenges in balancing genetic gain and diversity. Aquaculture, 2025. DOI:10.1016/j.aquaculture.2025.742576
    2. Hang Yang, Qi Feng, Shibin Xia, et al. AI-driven aquaculture: A review of technological innovations and their sustainable impacts. Artificial Intelligence in Agriculture, 2025. DOI:10.1016/j.aiia.2025.01.012
    3. Yong Chi, Clémence Fraslin, Robert Mukiibi, et al. First genome-wide association and genomic prediction of ammonia‑nitrogen tolerance in tiger pufferfish (Takifugu rubripes). Aquaculture, 2025, 600: 742260. DOI:10.1016/j.aquaculture.2025.742260
    4. Tongxin Cui, Jin Zhang, Mi Ou, et al. Potential of genome-wide association studies to improve genomic selection for growth traits in blotched snakehead (Channa maculata). Aquaculture, 2025, 596: 741895. DOI:10.1016/j.aquaculture.2024.741895

    Other cited types(0)

Catalog

    Figures(8)

    Article Metrics

    Article views (13) PDF downloads (3) Cited by(4)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return