Navegando por Autor "Sousa, Ithalo Coelho de"

Agora exibindo 1 - 3 de 3

Computational intelligence and statistical learning applied to Coffea canephora
(Universidade Federal de Viçosa, 2022-05-02) Sousa, Ithalo Coelho de; Nascimento, Moysés; Sant’anna, Isabela de Castro; Cruz, Cosme Damião; Azevedo, Camila Ferreira; Nascimento, Ana Carolina Campana
Genomic prediction in Coffee breeding has shown good potential in predictive ability (PA), genetic gains and reduction of the selection cycle time. Many methodologies are used to predict the genetic merit, but some of them require priori assumptions that may increase the complexity of the model. Artificial neural network (ANN) has advantage to not require priori assumptions about the relationships between inputs and the output allowing great flexibility to handle different types of complex non-additive effects, such as dominance and epistasis. Despite this advantage, the biological interpretability of ANNs is still limited. In the elaboration of this research project, two basic questions were formulated. The first question, is it possible to estimate genetic parameters using ANNs? The second, is it possible to reduce the panel marker size with no penalty in predictive ability? For this, the analyzes were divided into two articles. In the first article, the aim was to estimate the heritability and markers effects for two traits in Coffea canephora using an additive-dominance architecture ANN and to compare it with genomic best linear unbiased prediction (GBLUP). In the second article, the aim was to evaluate the trade-off between density marker panels size and the PA for eight agronomic traits in Coffea canephora using machine learning (bagging and random forest) algorithms and comparing them with BLASSO (Bayesian Least Absolute Shrinkage and Selection Operator) method. For both article, the data set consisted of 165 genotypes of Coffea canephora genotyped for 14,387 snp markers, after quality control analysis. For the first article the phenotypic data used was rust (Rus) and yield (Y). For the second article the phenotypic data is composed by vegetative vigor (Vig), rust (Rus) and cercosporiose incidence (Cer), fruit maturation time (Mat), fruit size (FS), plant height (PH), diameter of the canopy projection (DC) and yield (Y). In the first article we reduced the dimensionality of the data using bagging decision tree and then run 64,000 neural networks for each trait selecting the best architecture based on predictive ability for estimating the heritability, obtained results compatibles with those in literature. In the second article, 12 different density market panels were used to evaluate the effect of dimensionality reduction in PA. The common trend observed in the analysis shows an increase of the PA as the number of markers decreases, having a peak in most of the cases when used between 500 and 1,000 markers. In general, the worst results were obtained when used the full SNP panel density. The results of the second article indicate that the reduction of the number of markers can improve the selection of individuals at a lower cost. Computational Intelligence methods prove to be powerful tools for predicting genetic values, to estimate genetic parameters and to select markers. Keywords: GBLUP. BLASSO. BAGGING. Random forest. GEBV. Marker effect. Heritability.
Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms
(Escola Superior de Agricultura "Luiz de Queiroz", 2021) Sousa, Ithalo Coelho de; Nascimento, Moysés; Silva, Gabi Nunes; Nascimento, Ana Carolina Campana; Cruz, Cosme Damião; Silva, Fabyano Fonseca e; Almeida, Dênia Pires de; Pestana, Kátia Nogueira; Azevedo, Camila Ferreira; Zambolim, Laércio; Caixeta, Eveline Teixeira
Genomic selection (GS) emphasizes the simultaneous prediction of the genetic effects of thousands of scattered markers over the genome. Several statistical methodologies have been used in GS for the prediction of genetic merit. In general, such methodologies require certain assumptions about the data, such as the normality of the distribution of phenotypic values. To circumvent the non-normality of phenotypic values, the literature suggests the use of Bayesian Generalized Linear Regression (GBLASSO). Another alternative is the models based on machine learning, represented by methodologies such as Artificial Neural Networks (ANN), Decision Trees (DT) and related possible refinements such as Bagging, Random Forest and Boosting. This study aimed to use DT and its refinements for predicting resistance to orange rust in Arabica coffee. Additionally, DT and its refinements were used to identify the importance of markers related to the characteristic of interest. The results were compared with those from GBLASSO and ANN. Data on coffee rust resistance of 245 Arabica coffee plants genotyped for 137 markers were used. The DT refinements presented equal or inferior values of Apparent Error Rate compared to those obtained by DT, GBLASSO, and ANN. Moreover, DT refinements were able to identify important markers for the characteristic of interest. Out of 14 of the most important markers analyzed in each methodology, 9.3 markers on average were in regions of quantitative trait loci (QTLs) related to resistance to disease listed in the literature.
Predição genômica da resistência à ferrugem alaranjada em café arábica via algoritmos de aprendizagem de máquina
(Universidade Federal de Viçosa, 2018-02-26) Sousa, Ithalo Coelho de; Nascimento, Moysés
A seleção genômica (SG) foi proposta como uma forma de aumentar a eficiência e acelerar o melhoramento genético. A SG enfatiza a predição simultânea dos efeitos genéticos de milhares de marcadores dispersos em todo o genoma de um organismo. Algumas metodologias estatísticas têm sido utilizadas em SG para a predição do mérito genético, como por exemplo a Ridge Regression Best Linear Unbiased Prediction (RR- BLUP), Bayesian Lasso (BLASSO). Porém tais metodologias exigem algumas pressuposições a respeito dos dados tais como normalidade da distribuição dos valores fenotípicos. Além disto, a presença de fatores complicadores tais como epistasia e dominância atrapalham a utilização destes modelos, uma vez que exigem que tais efeitos sejam estabelecidos à priori pelo pesquisador. Visando contornar a não normalidade dos valores fenotípicos a literatura sugere o uso dos modelos lineares generalizados sob o enfoque bayesiano (BGLR). Outra alternativa são os modelos baseados em aprendizagem de máquina (AM), representados por metodologias tais como Redes Neurais (RNA), Árvores de Decisão (AD) e seus possíveis refinamentos (Bagging, Random Forest e Boosting) as quais podem incorporar a epistasia e a dominância no modelo além de não exigirem pressuposições quanto ao modelo e a distribuição dos valores fenotípicos. Diante disso, o objetivo deste trabalho foi utilizar AD e seus refinamentos Bagging, Random Forest e Boosting para predição da resistência a ferrugem alaranjada no café arábica. Além disso, AD e seus refinamentos foram utilizadas para identificar a importância dos marcadores relacionados a característica de interesse. Os resultados foram comparados com aqueles provenientes do GBLASSO (Lasso Bayesiano Generalizado) e RNA. Foram utilizados dados da resistência a ferrugem do café de 245 plantas derivadas do cruzamento do Híbrido de Timor e do Catuaí Amarelo, genotipados para 137 marcadores. A AD e seus refinamentos obtiveram resultados satisfatórios, visto que apresentaram valores iguais ou inferiores de Taxa de Erro Aparente comparados com aqueles obtidos pelo GBLASSO e RNA. Ademais, os refinamentos da AD demonstraram ser capazes de identificar marcadores importantes para característica de interesse, visto que dentre os 10 marcadores mais importantes analisados em cada metodologia, 3-4 viimarcadores estavam próximos a QTL’s relacionados a resistência a doença listados na literatura. Por fim, a AD e seus refinamentos mostraram um melhor desempenho em relação ao GBLASSO e a RNA quanto ao custo computacional.