Application of Machine Learning Methods for Mining Association Rules in Plant and Animal Data Sets Containing Molecular Genetic Markers, Followed by Classification or Prediction Utilizing Features Created from these Association Rules
A method for predicting the presence of at least one continuous target feature in a plant, comprising: determining by direct DNA sequencing the genotype of the plant for at least one molecular genetic marker selected from the group consisting of a DNA molecular marker and an RNA molecular marker; pr...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method for predicting the presence of at least one continuous target feature in a plant, comprising: determining by direct DNA sequencing the genotype of the plant for at least one molecular genetic marker selected from the group consisting of a DNA molecular marker and an RNA molecular marker; providing a data set comprising a set of variables, wherein at least one of the variables in the data set comprises a value representing the genotype of the plant for the molecular genetic marker(s); determining at least one association rule from the data set utilizing a computer and one or more association rule mining algorithms; utilizing the association rule(s) to create one or more new variables to the data set; adding the new variable(s) to the data set to produce a larger data set; developing a plurality of models for prediction or classification of the continuous target feature(s) using at least one new variable added to produce the larger data set; utilizing cross-validation to compare the predictive value of each of the plurality of models, and selecting the model that gives the most accurate prediction of the presence of the continuous target feature(s); utilizing the selected model to predict the presence of the continuous target feature(s) in the plant; utilizing the predicted presence of the continuous target feature(s) in the plant to select a DNA segment for introgression into an elite inbred plant line, and breeding a plant comprising the selected DNA segment with an inbred line to introgress the selected DNA segment into the inbred line. 7002536_1 (GHMatters) P88820.AU.1 AJM 15/10/2015 Fig. 1 Area under the ROC curve, before and after adding the new features from step (b). Area under ROC REPTree (Original Data) REPTree (Original Data + new features from step(b)) |
---|