Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (2024)

1. Introduction

Naive Bayes, within the realm of Bayesian network classifiers, has garnered significant interest and ranks among the top ten traditional algorithms in data mining [1,2,3,4]. Naive Bayes assumes that the attributes of a given category are independent of each other. This assumption simplifies the computation of the likelihood function and makes it easy to predict the sample category by maximizing the posterior probability. Given a test sample x with vector x 1 , , x d , Naive Bayes predicts the class of the given test sample as follows:

y ( x ) = a r g max y ϵ Y P ( y ) j = 1 d P ( x j | y )

where d is the number of attributes, x j ( j = 1 , 2 , , d ) is the attribute value of the jth attribute, y is a specific value in random variable Y, and y ( x ) is the class label of x predicted by the Bayesian networkclassifier.

Despite its popularity, theNaive Bayes algorithm often fails to consider the correlation between features, resulting in inaccurate classification. In response to the limitations of the Naive Bayes algorithm, theAODE (Averaged One-Dependence Estimators) algorithm emerged [5]. The AODE algorithm is built upon the Bayesian network, which considers the relationship between features when constructing the model. Unlike the Naive independence assumption, AODE does not assume independence between features. In order to consider dependencies between attributes in a limited scope while keeping the network structure simple, AODE allows dependencies between attributes and assumes that they all depend on a common parent attribute, forming a One-Dependence Estimator (ODE) [6]. Then, byrotating all attributes as parent attributes, theaverage of the posterior probabilities is used to predict the class of the sample, thus it achieves good results in classificationtasks.

In AODE, toenhance both the performance and robustness of classification algorithms, some researchers have proposed cross-validation risk minimization strategies, amongwhich the leave-one-out cross-validation (LOOCV) technique [7] is a commonly used method for cross-validation risk minimization. Forexample, Chenetal. [8] pointed out that the performance of classification algorithms can be evaluated more accurately by a cross-risk minimization strategy to avoid overfitting of training data. Thecross-validation risk minimization strategy is a technique used for evaluating and selecting models by estimating the generalization error of the model using cross-validation during model training to select the optimal model. Byintroducing the cross-validation risk minimization strategy, theAODE algorithm can better adapt to the characteristics of different datasets and improve the generalization ability of the classifier. However, existing cross-validation risk minimization strategies do not consider the differences in attributes in classification decisions, so this paper proposes a Model Selection-based Weighted AODE (SWAODE) algorithm. TheSWAODE algorithm adopts the mutual information as the weights for each ODE and evaluates and selects these weighted sub-models using a leave-one-out cross-validation (LOOCV) technique to determine the best model. TheAODE algorithm’s classification efficiency is greatly enhanced by this technique, which also boasts strong robustness and broadapplicability.

The main contributions of this paper are as follows:

1. The variability between ODEs and between sub-models is fully taken into account by weighting each ODE and selecting the sub-models in this paper. Inthis way, thequality of each ODE can be evaluated more finely, and the optimal set of models can be selected, which provides a new perspective for the optimization of the AODEalgorithm.

2. We propose a new Model Selection-based Weighted AODE (SWAODE) algorithm, which effectively combines the advantages of weighting and model selection. Thegoal of the SWAODE algorithm is to enhance the performance and robustness of the AODE classification algorithm. The SWAODE algorithm is able to classify data more accurately and improve the model’s ability to generalize by integrating weighting and model selectionstrategies.

3. This paper compares the SWAODE algorithm with other advanced algorithms using 70 datasets from the UCI repository [9], along with conducting ablation experiments. Experimental results indicate the superiority of the SWAODE algorithm over other advancedalgorithms.

The sections of this paper are structured as follows: In Section 1, we introduce relevant research focused on improving AODE. Section 2 discusses AODE and the process of model selection. TheSWAODE algorithm is outlined in Section 3. InSection 4, we provide a detailed description of the experimental setup and its results. Finally, our conclusions are presented in Section 5.

2. RelatedWork

In recent years, various strategies have been suggested to alleviate the effects of assumptions about attribute independence. Current research can be generally divided into three types: attribute weighting, attribute selection, and structureextension.

2.1. AttributeWeighting

Jiang and Zhang [10] first proposed the idea of assigning different weights to each attribute in AODE. Jiangetal. [11] then argued that it is not reasonable to have the same weight for every One-Dependence Estimator (ODE) in AODE, so in their paper, they proposed the classification model WAODE that assigned different weights to different ODEs. Wuetal. [12] introduced an adaptive SPODE named SODE, which leveraged the principles of immunity from the artificial immune system to autonomously and flexibly determine the weights of eachSPODE.

2.2. AttributeSelection

Zhengetal. [13] introduced attribute selection methods for AODE, including Backward Sequential Elimination (BSE) and Forward Sequential Selection (FSS), butthese techniques are not very practical for large datasets. Meanwhile, Yangetal. [14,15] conducted a comparison of attribute selection and weighting techniques in AODE. Chen etal. [16] introduced an innovative method for selecting attributes, suitable for extensive model space searches with just a single extra training dataset. Theexperimental results indicated that the novel technique markedly diminished the bias of AODE, butthe training time was slightly increased. This low bias and efficient computation made it suitable for big data learning, butthe article did not mention the effect of modelselection.

2.3. Structure Extension toNB

Friedmanetal. [17] introduced the Tree-Augmented Naive Bayes (TAN) method as an enhancement to Naive Bayes (NB), incorporating a tree structure to mitigate the independence assumptions of NB. TAN mandates that the class variables lack parent nodes, witheach attribute containing the class variable and at most one other attribute as parent nodes. This one-traversal algorithm acquires the necessary probability distributions from the training samples during one-traversal learning to construct the network structure and conditional probabilitytables.

The K-dependent Bayesian classifier (KDB) [18] is a method to improve Naive Bayes (NB). It relaxes the independence assumption of Naive Bayes by allowing each attribute to possess a maximum of k parent attributes. Asa result, NB can be viewed as a zero-dependent Bayesian classifier, whereas KDB can include a higher degree of attribute dependence by increasing the value of k. KDB can construct classifiers at any value of k, retaining most of the computational properties of NB and selecting for each attribute a network structure with up to k parent attributes for eachattribute.

Another notable enhancement to NB is AODE [5], which relaxes the independence assumption of Naive Bayes by allowing some degree of dependence between features. AODE constructs multiple One-Dependence Estimators by considering the relationship between each feature and category, andthen averages them to obtain the final classification result. This approach can more effectively utilize the correlation between features and ultimately improve classification accuracy.

3. AODE and Model SelectionAnalysis

We discuss the Averaged One-Dependence Estimators (AODE) algorithm and its model selection process in thissection.

In order to make the paper more readable, we summarize all the symbols that are defined in the paper in Table 1 for quick reference and understanding by thereader.

3.1. Constructing the AODEModel

AODE only allows one dependence between attributes; attribute X i can only depend on some attribute X j and category Y, where X j is called the parent attribute of X i . Atthe same time, inorder to keep the computation simple, it is assumed that all attributes depend on a common parent attribute X j which constitutes a Bayesian network called One-Dependence Estimator (ODE) [6]. Based on this ODE, thejoint probability p ( y , x ) can be estimated as:

P O D E ( y , x ) = P ( y , x j ) i = 1 d P ( x i | y , x j )

To eliminate the bias introduced by the selection of the parent attribute, it is allowed that all the attributes can be used as the parent attribute in turn, thus obtaining d ODEs, andfinally, the posteriori probabilities estimated from these d ODE are averaged to obtain the posterior probability estimate of the sample. Thus, theAODE algorithm calculates the joint probability as:

P A O D E ( y , x ) = j = 1 d P ( y , x j ) i = 1 d P ( x i | y , x j ) d

where P ( x i | y , x j ) can be obtained from the ratio of P ( x i , y , x j ) and P ( y , x j ) , so only the basic probabilities P ( y , x j ) and P ( x i , y , x j ) need to be estimated, which can be obtained by an M-estimation:

P ^ ( y , x j ) = F ( y , x j ) + m c v i n + m

P ^ ( x i , y , x j ) = F ( x i , x j , y ) + m c v i v j n + m

where F ( ) is the frequency of occurrence of the parameter item in the training dataset. v i is the number of values of attribute X i , c is the number of categories, and m is the smoothing parameter in the M-estimation, which is a commonly used parameter estimation method. Byintroducing the smoothing parameter m, the M-estimation can prevent the probability estimates from being zero and improves the robustness of theestimates.

The frequency F containing class labels and attribute values can be realized in a three-dimensional table in practical implementation, where the first and second dimensions represent the values taken for the first and second attributes. Thethird dimension represents the values taken for the category, andthe values in the table record the frequency values on the values of that dimension. Assuming that there are two attributes X 1 , X 2 and two categories c l a s s 1 , c l a s s 2 where X 1 has two attribute values, X 2 has three attribute values, the frequency table is shown in Table 2.

The training process of AODE is described by Algorithm 1.

Algorithm 1AODE trainingprocess
Require:

Set of training data D

Ensure:

Frequency table F containing a combination of category labels and two attributes

1:

All frequencies in the frequency table F are initialized to zero

2:

for each training sample x from Ddo

3:

Obtain the class label y for sample x and set i and j to zero

4:

while i < d do

5:

Read the value x i of attribute X i in sample x

6:

while j < i do

7:

Read the value x j of attribute X j in sample x

8:

Find the frequency at positions x i , x j , y in the frequency table F and increase by one

9:

j = j + 1

10:

end while

11:

i = i + 1

12:

end while

13:

end for

14:

return the finally calculated frequency table F

Algorithm 1 reveals that the AODE training process’s time complexity is influenced by the quantity of samples and attributes, where there are d parent and child attributes, so the total time complexity of the AODE training process is O ( n d 2 ) , where d denotes the number of attributes, and n denotes the number of samples for each attribute. AODE typically exhibits a marginally greater time complexity compared to NB, asAODE represents a singular attribute dependency, aligning more closely with the real data. Therefore, the classification performance is significantly improved compared withNB.

3.2. Model Selection-BasedAODE

To fully present the Model Selection-based AODE(SAODE) algorithm in this section, themodel space is first constructed here. Then, the attributes are ranked based on mutual information. Finally, thebest model is selected using the leave-one-out cross-validationerror.

3.2.1. Building the ModelSpace

When constructing the AODE model space, we introduce the threshold m . When a particular value x j of the parent attribute in the training data occurs more often than or as often as a threshold m , theODE model corresponding to that value is included in the computation of the AODE model. Ifwe choose the former r attributes as parent attributes and the former s attributes as child attributes, where 1 r , s d , theAODE model is approximated by:

P A O D E ( y , x ) r , s = j : 1 j r F ( x j ) m P ( y , x j ) i = 1 s P ( x i y , x j ) | { j : 1 j r F ( x j ) m } |

where F ( x j ) is the frequency of x j , and m is the minimum frequency that takes the value x j . TheAODE algorithm with the inclusion of a threshold m improves the overall performance and prediction accuracy of the model by ensuring that the number of samples for the parent attribute is sufficient, avoiding the problems of high variance and unreliable conditional probability estimation caused by data sparsity. This mechanism enables AODE to maintain high predictive stability and reliability in the face of uneven data distribution. When both r and s are equal to d, it can be seen from the formula that when calculating P A O D E ( y , x ) d , d , atmost d 2 subsets of attributes are created assub-models.

All these approximate AODE models are just a small extension of the previous model. Forexample, P A O D E ( y , x ) 1 , 2 is obtained by adding the child attribute x 2 to P A O D E ( y , x ) 1 , 1 . All of these models can be applied to test instances in a single nested computation. Thus, all models can be evaluatedefficiently.

3.2.2. AttributeSorting

Constructing a model of the later attributes depends on the model of the earlier attributes when constructing the AODE model space. Therefore, this method of nesting models depends on the order of the attributes. Thus, here, the mutual information is used to sort the attributes. Themutual information is calculated as:

M I ( X , Y ) = H ( X ) H ( X | Y ) = y Y x X P ( x , y ) log 2 P ( x , y ) P ( x ) P ( y )

where H ( X ) is the entropy of X, H ( X | Y ) is the conditional entropy, p ( x , y ) is the joint probability of x and y, and p ( x ) and p ( y ) are the probabilities of x and y, respectively. Therefore, the MI is used as an indicator of the correlation between attribute X and category Y. Thelarger the value of the MI, thestronger the correlation between attribute X and category Y isindicated.

An advantage of employing the MI is its ability to efficiently compute the MI between attributes and classes within a single training session. While the MI can identify the discriminative power of individual attributes, it cannot directly assess the discriminative power of combinations of attributes. However, this shortcoming is compensated for the fact that the ranking based on MI can be searched in a wide modelspace.

3.2.3. ModelSelection

For evaluating the distinctiveness of different models and preventing overfitting, leave-one-out cross-validation errors are employed. Through gradual cross-validation, theimpact of the absent sample in each fold is deducted from the frequency table to create a model excluding that sample. Thetechnique offers a lower bias estimate of the generalization error and assesses the model using a single trainingdataset.

In addition, asshown in Equation(6), these models are nested together, witheach model being a straightforward extension of another, providing an effective means of evaluating them. That is, these models can be evaluated simultaneously during their construction for the training samples missed in eachfold.

Among the more common methods used to evaluate the model selection are the 0–1 loss, Root-Mean-Square Error (RMSE), LogLoss, andAUC value. Forexample, Chen [19] proposed the RMSE as a criterion for model evaluation, where a lower RMSE indicated a better model. Therefore, we also use the RMSE as a criterion for model evaluation as a way to select the optimal model in Section 3.

4. Model Selection-Based WeightedAODE

In this section, our focus is on the weighting strategy for AODE and the methodology for model selection on the weighted AODE model, given that we have already described the construction methodology of the AODE model in detail in Section 2.

4.1. Weighting the AODEModel

The contribution of each ODE to the final classification result may be different in the AODE algorithm, andcertain sub-models may discriminate more accurately for specific categories while others may perform weakly. Therefore, weighting each sub-model can more accurately reflect its importance in the overall classification process, thus improving the overall model performance [11].

The classification ability of ODEs composed of different parent attributes X j is different, so different weights can be applied to different ODEs [11]. Thus, the formula transforms into:

P W A O D E ( y , x ) = j = 1 d w j P ( y , x j ) i = 1 d P ( x i | y , x j ) d

where w j is the weight of the jth ( j = 1 , 2 , , d ) ODE, where the weights are obtained by calculating the MI through Equation(8). When attribute X and category Y are completely independent, theMI is 0, indicating that there is no information sharing or dependency betweenthem.

Weighting each ODE can better improve the performance and robustness of the overall model, thus enhancing the reliability and validity of the model in practicalapplications.

4.2. Model Selection forWAODE

We first construct the model space of WAODE in this subsection. Then, the attributes are ranked according to the MI. Finally, we use the RMSE as the cross-validation error and select the optimal sub-model by minimizing theRMSE.

4.2.1. Building the ModelSpace

As shown in Equation(6), forthe WAODE algorithm, where we also introduce the threshold m , thejoint probability is given by:

P W A O D E ( y , x ) r , s = j : 1 j r F ( x j ) m w j P ( y , x j ) i = 1 s P ( x i y , x j ) | { j : 1 j r F ( x j ) m } |

Similar to the AODE model’s construction in Section 2, all of these approximate WAODE models constitute the model space depicted in Table 3, since each model is simply a small extension of the previous one. Thus, every model is capable of being assessed withefficiency.

4.2.2. AttributeSorting

In constructing the WAODE model space, themodel for later attributes is dependent on the model for earlier attributes, asshown in Table 3. This approach of nested models is influenced by the order in which the attributes are considered. Toaddress this, we utilize the MI to rank the attributes. Additionally, we observe that the sorting process also involves selecting attributes, andby sorting first, we can more easily identify the attributes that have a significant impact on categorization. Tocalculate the MI, we use Equation(8).

4.2.3. ModelSelection

We used a 10-fold CV in our experiments to make the results more objective, andwe used the LOOCV error as a criterion for model selection. Figure 1 describes the relationship between LOOCV and 10-fold CV. Thetest set in the 10-fold CV loops through the 10 folds of samples. Atthe same time, thetest instance in LOOCV loops through all the LOOCVinstances.

LOOCV errors were used to evaluate model distinctiveness and prevent overfitting by excluding one sample at a time from the training data and assessing the model without it. This method provides a lower bias estimate of the generalization error and evaluates the model using nearly all available data fortraining.

The 0–1 loss (ZOL) and the Root-Mean-Square Error (RMSE) are the most common evaluation criteria used as model selection. The 0–1 loss simply assigns “0” to correct classifications and “1” to misclassifications, considering all misclassifications as equally undesirable. However, the RMSE is more sensitive to the severity of misclassification, so it is able to make more fine-grained probabilistic predictions. The RMSE can be expressed as:

R M S E = 1 n i = 1 n 1 P ( y ( x i ) = y i x i ) 2

where y i is the true class of sample x i . Thesmaller the RMSE, thesmaller the discrepancy between the model’s predictions and the true labels. Compared to the 0–1 loss, the RMSE is able to assess model uncertainty on a continuous basis rather than simply telling us whether the model is classified correctly or not. Meanwhile, the RMSE penalizes model uncertainty more strictly, so it provides a more fine-grained calibration metric for probability estimation. Consequently, the RMSE was employed to assess potential models in ourstudy.

Therefore, theprocess of choosing the best model can be framed as the following optimization problem:

r , s * = argmin r , s 1 n i = 1 n 1 P WAODE LOO ( y ( x i ) = y i x i ) r , s 2

where P W A O D E L O O ( y x i ) r , s can be computed by first estimating P W A O D E L O O ( y , x i ) r , s from training set ( D { y i , x i } ) as in Equation(9), andthen normalizing across all possible y’s.

4.3. AlgorithmDescription

Utilizing the aforementioned method, we formulated a training algorithm for the Selection-based Weighted AODE (SWAODE) model, asshown in Algorithm 2.

Algorithm 2Training algorithm for Model Selection-based Weighted AODE (SWAODE)
1:

First pass: Form the table of joint frequencies of all combinations of x attribute values and the class label as in Algorithm 1

2:

Compute the mutual information

3:

Weighting each ODE and rank the attributes

4:

Second pass: Perform LOOCV on the sample set

5:

for sample x D do

6:

Remove sample x from the frequency table

7:

Obtain category for sample x and set i and j to zero

8:

Build d 2 models for AODE

9:

for j < d do

10:

for i < d do

11:

Predict sample x using all models in Equation(9)

12:

Accumulate the squared error for each model

13:

i = i + 1

14:

end for

15:

j = j + 1

16:

end for

17:

Add sample x back to the frequency table

18:

end for

19:

Compute the root-mean-square error for each model

20:

Select the model with the lowest RMSE

The SWAODE algorithm for weighting and model selection needs to consider the time complexity involved in computing the MI and LOOCV, respectively, where the total time complexity of calculating the MI in the WAODE-MI algorithm is O ( c d 2 ) , and the time complexity of model selection with LOOCV is O ( c n d 2 ) , so the total time complexity of the SWAODE algorithm is O ( c n d 2 ) + O ( c d 2 ) , which is almost similar to the time complexity of the SAODE algorithm, where d is the number of attributes, n is the number of samples, and c is the number of categories.

5. Experiments andDiscussion

We ran the above algorithm on 70 datasets from the UCI repository [9]. The comprehensive features of the datasets are presented in Table 4, arranged in an increasing sequence based on the count of instances. The experiments were carried out based on the high-performance computing platform of Nanjing Audit University, the computing node CPU was an Intel E5, the amount of memory was 188 G, and the operating system was CentOS7.9-x64. The algorithms were based on the Petal Machine Learning Platform [19] implemented in C++. Compared to the well-known machine learning experimental platform Weka [20], the Petal platform has one significant difference: missing values are viewed as a single value in Petal, whereas the Weka system employs means (numerical attributes) or modes (discrete attributes) instead.

5.1. Comparison onZOL

In this experiment, in order to verify the performance of the SWAODE algorithm, we compared it with classical algorithms such as NB [1], KDB (k = 1) [18], AODE [5], WAODE-MI [11], WAODE-KL [21], and so on. We adopted ZOL as the evaluation index, where the loss was one when the sample was misclassified and zero when it was correctly classified, and then we calculated the percentage of total loss in the total number of test samples in order to comprehensively assess the performance of different algorithms in the classification task. The W/D/L metrics tracked the number of wins, draws, and losses for each algorithm across multiple datasets, allowing for a comparison of their performance on the same dataset. For instance, SWAODE demonstrated strong performance with 52 wins, 5 draws, and 13 losses when compared to NB, providing an objective assessment of the algorithms’ respective strengths and weaknesses. Through this professional evaluation method, we can more comprehensively and objectively assess the advantages of the SWAODE algorithm over other algorithms and provide a scientific basis for its performance evaluation, as shown in Table 5. Meanwhile, in order to facilitate the observation of SWAODE’s experimental data, we bolded the row where SWAODE is located in all subsequent tables and presents the experimental data for each dataset in Appendix Table A1.

The analysis in Table 5 reveals that the SWAODE algorithm outperformed other advanced algorithms. Compared to the AODE algorithm, the SWAODE algorithm achieved 39 wins, 8 draws, and 23 losses, representing a significant improvement. Additionally, the weighted AODE classification algorithm also showed improvement when different weights were assigned. The WAODE-KL algorithm, which uses KL divergence as weights, achieved 37 wins, 13 draws, and 20 losses compared to the AODE algorithm, demonstrating a clear advantage. However, even the excellent WAODE-KL algorithm did not surpass our new algorithm, the SWAODE algorithm. In comparison, the SWAODE algorithm achieved 32 wins, 13 draws, and 24 losses, showing a clear advantage. Overall, our SWAODE algorithm demonstrated strong performance and brought significant improvement to the classification task.

We also represented the scatter plot of SWAODE with respect to WAODE-MI in terms of ZOL in Figure 2. Points above the diagonal represent datasets whose ZOL values are lower than those of WAODE-MI. It can be found that SWAODE consistently provided better predictions than the regular WAODE-MI in a statistically significant way.

5.2. Comparison onLogLoss

When assessing the effectiveness of the SWAODE algorithm, it is common to use LogLoss as an evaluation metric. LogLoss is a widely used metric for evaluating the predictive accuracy of a classification model. It measures the deviation between the model’s predicted probability for each sample and the actual label. To compare the SWAODE algorithm with other advanced algorithms, their LogLoss values on a test dataset can be calculated and visualized. Additionally, the W/D/L (win/draw/loss) metric can be used to analyze the strengths and weaknesses of different algorithms in the experimental results. By comparing the LogLoss values of the SWAODE algorithm with those of other algorithms, the strengths and weaknesses on different datasets can be determined, as shown in Table 6. Meanwhile, we also show the experimental data in detail in Appendix Table A2.

According to the data analysis in Table 6, the SWAODE algorithm presented excellent performance on LogLoss. In the comparison with AODE, it achieved 48 wins/1 draw/21 losses. In addition, the SWAODE algorithm also performed outstandingly compared to the weighted AODE algorithm. The SWAODE algorithm beat the WAODE-MI algorithm and the WAODE-KL algorithm on 42 of the 70 datasets, respectively. These results show that the SWAODE algorithm is adaptable to various datasets and outperforms other algorithms in most cases. Therefore, the SWAODE algorithm is a very efficient algorithm for AODE improvement.

Meanwhile, we also represented the scatter plot of SWAODE with respect to WAODE-MI in terms of LogLoss in Figure 3. It can be found that SWAODE consistently provided better predictions than the regular WAODE-MI algorithm in a statistically significant way.

5.3. AblationStudies

To delve deeper into the necessity of weighting and model selection for the AODE classification algorithm, we conducted two ablation study experiments to validate its impact in this section, again using W/D/L (win/draw/loss) as the measure. These experiments aimed to dissect the performance of the SWAODE algorithm in the absence of weighting and model selection, thus highlighting the crucial role of weighting and model selection in improving the classification performance of SWAODE. In our experiments, we implemented the WAODE-MI algorithm, which uses MI as a weight, and the SAODE algorithm, which performs model selection on AODE. The SWAODE algorithm was compared with these two algorithms in terms of ZOL and LogLoss metrics.

According to Table 7, the SWAODE algorithm achieved 34 wins/12 draws/24 losses and 42 wins/7 draws/21 losses in the two comparisons with the WAODE-MI algorithm. It also performed well in the comparison with SAODE, achieving 27 wins/22 draws/21 losses and 40 wins/4 draws/26 losses, respectively. Therefore, we can conclude that both weighting and model selection are necessary and indispensable in the SWAODE algorithm, and the algorithm is able to fully draw on the advantages of weighting and model selection to greatly improve the classification performance of the AODE algorithm.

6. Conclusions

This study proposed a new AODE classification algorithm, the SWAODE algorithm, which aimed to solve the problem of existing cross-validation risk minimization strategies not considering the difference in attributes in classification decisions. The core idea of the algorithm lay in first weighting each ODE in the AODE which used the MI values as the weights. Subsequently, a leave-one-out cross-validation (LOOCV) method was used to perform model selection on these weighted sub-models in order to select the optimal model. Experimental results indicated the SWAODE algorithm markedly surpassed other well-known popular classification algorithms on multiple datasets, exhibiting higher classification efficiency and generalization ability.

However, we recognize that this is only one aspect of model selection and that many potential extensions deserve further exploration. The next step of our work will focus on exploring the extension of attribute-weighted AODE classification models. Overall, further exploration of attribute-weighted AODE classification models is a challenging but promising research direction. By delving into this area, we hope to bring innovative ideas and tools to research related to machine learning and data mining.

Author Contributions

Conceptualization, C.Z. and S.C.; methodology, C.Z.; software, C.Z. and S.C.; validation, C.Z. and H.K.; formal analysis, C.Z. and S.C.; investigation, C.Z. and H.K.; resources, S.C.; data curation, C.Z.; writing—original draft preparation, C.Z.; writing—review and editing, C.Z.; visualization, S.C.; supervision, S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (SJCX23-1105), National Social Science Fund of China (23AJY018), and National Science Fund of China (62276136).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this paper will be provided by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AODEAveraged One-Dependence Estimators
ODEOne-Dependence Estimator
SWAODEModel Selection-based Weighted AODE
NBNaive Bayes
LOOCVLeave-one-out cross-validation
KDBK-dependent Bayesian
WAODE-MIWeighted Average of One-Dependence Estimators by Mutual Information
WAODE-KLWeighted Average of One-Dependence Estimators by Kullback–Leibler
SAODEAODE under Leave-one-out cross-validation

Appendix A

Detailed results for 0–1 loss and LogLoss ± standard deviation are shown in Table A1 and Table A2.

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (1)

Table A1. ZOL.

Table A1. ZOL.

Data SetSWAODENBKDBAODEWAODE-MIWAODE-KLSAODE
contact-lenses0.3750+/−0.34250.3750+/−0.34250.2917+/−0.35430.4167+/−0.35740.3333+/−0.35810.3333+/−0.35810.3750+/−0.3425
lung-cancer0.3750+/−0.31130.4375+/−0.26840.5938+/−0.30820.4688+/−0.28850.4688+/−0.28850.4688+/−0.28850.3750+/−0.3113
labor-negotiations0.0702+/−0.09660.0351+/−0.04220.1053+/−0.11460.0526+/−0.06750.0702+/−0.09660.0877+/−0.12690.0702+/−0.0966
post-operative0.2889+/−0.17410.3444+/−0.19660.3444+/−0.17480.3444+/−0.18820.3333+/−0.14010.3333+/−0.14010.2889+/−0.1741
zoo0.0297+/−0.06000.0297+/−0.04770.0495+/−0.06140.0198+/−0.03840.0198+/−0.03840.0198+/−0.03840.0198+/−0.0384
promoters0.0472+/−0.07480.0755+/−0.06170.1321+/−0.08910.1038+/−0.06480.0849+/−0.06560.0849+/−0.06560.0660+/−0.0992
echocardiogram0.3511+/−0.11290.2748+/−0.13470.3664+/−0.15110.3435+/−0.11430.3359+/−0.11200.3282+/−0.11520.3664+/−0.1073
lymphography0.1554+/−0.11290.1486+/−0.09790.1757+/−0.07910.1486+/−0.09910.1351+/−0.10560.1419+/−0.10260.1554+/−0.1183
iris0.0600+/−0.06550.0733+/−0.06930.0733+/−0.05050.0600+/−0.06550.0600+/−0.06550.0600+/−0.06550.0600+/−0.0655
teaching-ae0.4636+/−0.09180.5298+/−0.15790.4834+/−0.10790.4834+/−0.11790.4702+/−0.12140.4636+/−0.11860.4636+/−0.0918
hepatitis0.2000+/−0.11440.1613+/−0.11510.2194+/−0.12050.1935+/−0.12440.1871+/−0.12010.1871+/−0.12010.2129+/−0.1244
wine0.0281+/−0.04040.0225+/−0.03470.0674+/−0.06330.0281+/−0.04040.0281+/−0.04040.0281+/−0.04040.0225+/−0.0332
autos0.1756+/−0.14200.3902+/−0.16480.2293+/−0.13740.2537+/−0.11040.2537+/−0.12160.2585+/−0.12070.1854+/−0.1376
sonar0.1731+/−0.09780.2452+/−0.08890.2548+/−0.09140.1394+/−0.08880.1587+/−0.08490.1346+/−0.09180.1490+/−0.1027
glass-id0.1869+/−0.05750.2570+/−0.10190.2383+/−0.07200.1589+/−0.05760.1636+/−0.06640.1636+/−0.06640.1776+/−0.0580
new-thyroid0.0651+/−0.04100.0419+/−0.04870.0651+/−0.04540.0512+/−0.05440.0512+/−0.04680.0512+/−0.04680.0698+/−0.0492
audio0.2301+/−0.08170.2389+/−0.05480.3097+/−0.10540.2301+/−0.06490.2345+/−0.07010.2434+/−0.06710.2345+/−0.0805
hungarian0.1667+/−0.05200.1565+/−0.06980.2075+/−0.06250.1429+/−0.06760.1565+/−0.07730.1565+/−0.07730.1667+/−0.0667
heart-disease-c0.1848+/−0.10620.1683+/−0.08030.2178+/−0.14280.1848+/−0.10670.1848+/−0.10220.1848+/−0.10220.1848+/−0.1054
haberman0.2549+/−0.10700.2647+/−0.12850.2778+/−0.10240.2712+/−0.11880.2941+/−0.11520.2941+/−0.11520.2386+/−0.1068
primary-tumor0.5221+/−0.10280.5162+/−0.08830.5841+/−0.11190.5162+/−0.09840.5251+/−0.09140.5251+/−0.09140.5133+/−0.1031
ionosphere0.0798+/−0.03990.1197+/−0.08540.0684+/−0.04410.0826+/−0.04050.0826+/−0.04050.0826+/−0.04050.0798+/−0.0497
dermatology0.0191+/−0.03100.0191+/−0.02420.0301+/−0.02580.0219+/−0.02750.0191+/−0.02820.0191+/−0.02820.0246+/−0.0318
horse-colic0.1522+/−0.06270.2065+/−0.09280.2120+/−0.06150.2038+/−0.05900.1984+/−0.05910.1984+/−0.05910.1603+/−0.0596
house-votes-840.0552+/−0.04350.0943+/−0.02560.0690+/−0.03530.0529+/−0.03460.0506+/−0.03580.0506+/−0.03580.0552+/−0.0435
cylinder-bands0.2167+/−0.03550.2093+/−0.03260.2074+/−0.05750.1611+/−0.04210.1574+/−0.04090.1574+/−0.04290.2167+/−0.0355
chess0.0907+/−0.05000.1125+/−0.05510.0998+/−0.03540.1053+/−0.06310.1053+/−0.05980.0998+/−0.06130.0889+/−0.0515
syncon0.0200+/−0.01360.0483+/−0.03980.0200+/−0.01560.0200+/−0.01630.0200+/−0.01630.0200+/−0.01630.0200+/−0.0136
balance-scale0.1168+/−0.01190.0832+/−0.02070.1424+/−0.03070.1120+/−0.01590.1168+/−0.01190.1168+/−0.01190.1184+/−0.0174
soybean0.0556+/−0.01910.0893+/−0.02440.0644+/−0.02050.0542+/−0.01840.0542+/−0.01840.0542+/−0.01840.0556+/−0.0191
credit-a0.1217+/−0.03090.1449+/−0.03030.1696+/−0.04170.1261+/−0.02100.1203+/−0.02510.1203+/−0.02510.1261+/−0.0292
breast-cancer-w0.0386+/−0.02750.0258+/−0.02230.0486+/−0.01810.0386+/−0.02480.0372+/−0.02350.0372+/−0.02350.0401+/−0.0274
pima-ind-diabetes0.2461+/−0.06550.2591+/−0.07070.2578+/−0.05830.2513+/−0.06360.2539+/−0.06630.2539+/−0.06630.2409+/−0.0584
vehicle0.3132+/−0.05330.4090+/−0.04770.3026+/−0.06270.3132+/−0.05630.3156+/−0.05770.3156+/−0.05770.3109+/−0.0565
anneal0.0601+/−0.02620.0891+/−0.02610.0445+/−0.01560.0735+/−0.02320.0646+/−0.02420.0646+/−0.02420.0512+/−0.0250
tic-tac-toe0.2724+/−0.04060.3069+/−0.04270.2463+/−0.03820.2683+/−0.04320.2724+/−0.04060.2724+/−0.04060.2683+/−0.0432
vowel0.1131+/−0.02740.4061+/−0.05570.2162+/−0.02720.0808+/−0.02960.1131+/−0.02740.1131+/−0.02740.0778+/−0.0283
german0.2520+/−0.04510.2520+/−0.03250.2660+/−0.06340.2410+/−0.05350.2490+/−0.04740.2490+/−0.04740.2450+/−0.0515
led0.2690+/−0.06210.2670+/−0.06220.2640+/−0.06030.2700+/−0.06040.2700+/−0.06040.2700+/−0.06040.2700+/−0.0630
contraceptive-mc0.4691+/−0.04530.4949+/−0.05340.4684+/−0.02760.4671+/−0.04550.4596+/−0.03940.4582+/−0.04040.4684+/−0.0439
yeast0.4239+/−0.03700.4245+/−0.05040.4394+/−0.03260.4205+/−0.04020.4218+/−0.03850.4225+/−0.03780.4245+/−0.0400
volcanoes0.3362+/−0.02870.3421+/−0.02780.3520+/−0.02580.3539+/−0.03310.3539+/−0.03400.3539+/−0.03400.3467+/−0.0292
car0.1053+/−0.02440.1400+/−0.02550.0567+/−0.01820.0845+/−0.01930.0909+/−0.01830.0920+/−0.01730.0793+/−0.0181
segment0.0515+/−0.00840.1476+/−0.02450.0567+/−0.01580.0563+/−0.00910.0550+/−0.00780.0550+/−0.00780.0519+/−0.0079
hypothyroid0.0278+/−0.01050.0360+/−0.01120.0338+/−0.01370.0348+/−0.01180.0294+/−0.01040.0297+/−0.01020.0278+/−0.0105
splice-c4.50.0318+/−0.00720.0444+/−0.01120.0482+/−0.01520.0375+/−0.00870.0387+/−0.01010.0387+/−0.01010.0334+/−0.0102
kr-vs-kp0.0569+/−0.01250.1214+/−0.02170.0544+/−0.01710.0854+/−0.01870.0582+/−0.01150.0582+/−0.01150.0573+/−0.0109
abalone0.4556+/−0.02060.4893+/−0.02490.4656+/−0.02370.4551+/−0.02140.4549+/−0.02120.4549+/−0.02120.4558+/−0.0208
spambase0.0602+/−0.01150.1050+/−0.01490.0702+/−0.01210.0635+/−0.01140.0606+/−0.01120.0602+/−0.01150.0646+/−0.0138
phoneme0.1843+/−0.01770.2615+/−0.01290.2120+/−0.01230.2100+/−0.01440.2008+/−0.01390.2010+/−0.01450.1863+/−0.0155
wall-following0.0843+/−0.00990.1743+/−0.01490.1043+/−0.00940.1514+/−0.01010.1503+/−0.00990.1503+/−0.00990.0845+/−0.0097
page-blocks0.0479+/−0.00750.1376+/−0.01260.0590+/−0.01020.0502+/−0.00660.0495+/−0.00620.0495+/−0.00620.0477+/−0.0077
optdigits0.0274+/−0.00830.0861+/−0.01240.0454+/−0.00700.0283+/−0.00950.0285+/−0.00930.0286+/−0.00930.0281+/−0.0087
satellite0.1175+/−0.01040.2022+/−0.01680.1392+/−0.01350.1301+/−0.01310.1298+/−0.01250.1298+/−0.01250.1175+/−0.0106
musk20.1115+/−0.01380.2496+/−0.01010.0867+/−0.00970.1511+/−0.01010.1520+/−0.00950.1514+/−0.00980.1097+/−0.0138
mushrooms0.0000+/−0.00000.0196+/−0.00360.0006+/−0.00090.0002+/−0.00050.0000+/−0.00000.0000+/−0.00000.0001+/−0.0004
thyroid0.2211+/−0.01260.2754+/−0.01520.2319+/−0.01460.2421+/−0.01360.2333+/−0.01290.2332+/−0.01280.2213+/−0.0104
pendigits0.0252+/−0.00290.1447+/−0.01120.0529+/−0.00660.0254+/−0.00290.0251+/−0.00290.0251+/−0.00290.0253+/−0.0029
sign0.2957+/−0.00830.3851+/−0.01140.3055+/−0.01400.2960+/−0.01190.2977+/−0.00900.2977+/−0.00900.2936+/−0.0110
nursery0.0713+/−0.00630.0973+/−0.00660.0654+/−0.00610.0733+/−0.00590.0708+/−0.00650.0708+/−0.00650.0707+/−0.0058
magic0.1825+/−0.00810.2478+/−0.01180.1759+/−0.01070.1726+/−0.00840.1825+/−0.00810.1825+/−0.00810.1721+/−0.0082
letter-recog0.1439+/−0.01070.3226+/−0.01100.1920+/−0.01120.1514+/−0.00890.1440+/−0.01050.1440+/−0.01050.1452+/−0.0089
adult0.1631+/−0.00470.1809+/−0.00500.1638+/−0.00440.1679+/−0.00320.1640+/−0.00480.1640+/−0.00470.1631+/−0.0050
shuttle0.0095+/−0.00120.0311+/−0.00220.0163+/−0.00120.0101+/−0.00100.0093+/−0.00100.0093+/−0.00100.0095+/−0.0012
connect-40.2407+/−0.00390.2783+/−0.00590.2406+/−0.00300.2422+/−0.00470.2408+/−0.00390.2407+/−0.00390.2421+/−0.0048
waveform0.0339+/−0.00090.0432+/−0.00180.0396+/−0.00210.0343+/−0.00080.0343+/−0.00090.0343+/−0.00090.0338+/−0.0009
localization0.4556+/−0.00330.5449+/−0.00260.4642+/−0.00400.4333+/−0.00270.4314+/−0.00360.4314+/−0.00360.4556+/−0.0033
census-income0.0555+/−0.00100.2410+/−0.00170.0667+/−0.00140.1106+/−0.00150.0990+/−0.00180.0990+/−0.00180.0555+/−0.0009
poker-hand0.3302+/−0.00220.4988+/−0.00180.3291+/−0.00120.4812+/−0.00280.1758+/−0.00790.1757+/−0.00780.3302+/−0.0022
donation0.0002+/−0.00000.0002+/−0.00000.0001+/−0.00000.0002+/−0.00000.0002+/−0.00000.0002+/−0.00000.0002+/−0.0000

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (2)

Table A2. LogLoss.

Table A2. LogLoss.

Data SetSWAODENBKDBAODEWAODE-MIWAODE-KLSAODE
contact−lenses0.8874+/−0.84601.0171+/−0.83531.0277+/−0.70031.1270+/−0.83171.0118+/−0.82911.0015+/−0.81960.9293+/−0.8631
lung−cancer1.9531+/−1.67324.6187+/−7.03306.7035+/−4.97084.5050+/−6.54174.5673+/−6.47654.5719+/−6.46571.9683+/−1.6907
labor−negotiations0.2764+/−0.31960.1463+/−0.15630.5502+/−0.45650.2172+/−0.24910.2435+/−0.27990.2528+/−0.29130.2402+/−0.2765
post−operative1.1787+/−0.58781.2723+/−0.80201.2896+/−0.62861.2278+/−0.66531.2174+/−0.66981.2142+/−0.66891.1865+/−0.5906
zoo0.0801+/−0.09130.1111+/−0.08540.1624+/−0.16330.0803+/−0.08230.0746+/−0.07810.0753+/−0.07850.0803+/−0.0922
promoters0.1944+/−0.21490.3347+/−0.30330.9880+/−1.30470.3969+/−0.20830.4091+/−0.27380.4097+/−0.27360.1970+/−0.2263
echocardiogram0.9884+/−0.17350.9687+/−0.48701.5034+/−1.02671.0943+/−0.62941.1142+/−0.67901.1137+/−0.67670.9764+/−0.1816
lymphography0.6838+/−0.56280.6465+/−0.61710.8154+/−0.49960.5657+/−0.53030.5665+/−0.51470.5651+/−0.51170.6847+/−0.5765
iris0.2284+/−0.18850.3460+/−0.30110.2454+/−0.20430.2319+/−0.19960.2296+/−0.19260.2297+/−0.19270.2306+/−0.1897
teaching−ae2.1672+/−0.66692.1000+/−0.67562.1076+/−0.63951.9223+/−0.51511.9909+/−0.51811.9754+/−0.51722.1666+/−0.6675
hepatitis0.7173+/−0.55950.9701+/−0.81610.9867+/−0.63710.7285+/−0.57260.7432+/−0.60070.7414+/−0.60120.7980+/−0.6568
wine0.1567+/−0.19010.1304+/−0.19760.2670+/−0.23000.1314+/−0.17950.1325+/−0.17740.1327+/−0.17760.1204+/−0.1321
autos1.4860+/−1.91714.2030+/−2.74374.8262+/−4.23313.2524+/−3.23443.2625+/−3.29163.2552+/−3.29221.5943+/−1.8917
sonar1.1577+/−0.70191.6809+/−1.11931.8069+/−0.77651.0254+/−0.74771.2091+/−0.82761.0368+/−0.72481.1754+/−0.7230
glass−id0.7369+/−0.39841.0000+/−0.39150.9401+/−0.37180.6229+/−0.20040.6192+/−0.19780.6193+/−0.19750.7352+/−0.4003
new−thyroid0.3004+/−0.21330.2465+/−0.25260.3084+/−0.21550.2648+/−0.17620.2620+/−0.18010.2619+/−0.17990.3019+/−0.2100
audio2.2635+/−1.41493.9563+/−2.66285.3522+/−2.32743.9528+/−2.68793.9795+/−2.68233.9806+/−2.68282.2886+/−1.4082
hungarian0.5854+/−0.28650.8202+/−0.44670.7913+/−0.43610.6276+/−0.31110.5994+/−0.29000.5995+/−0.29020.6182+/−0.2790
heart−disease−c0.6624+/−0.27990.7119+/−0.36460.9289+/−0.48190.6468+/−0.30140.6434+/−0.29820.6433+/−0.29810.6548+/−0.2812
haberman0.7724+/−0.22100.7815+/−0.26140.8572+/−0.26110.8325+/−0.25850.8348+/−0.26580.8349+/−0.26590.7700+/−0.2192
primary−tumor2.8134+/−0.58052.9163+/−0.61533.3812+/−0.71862.8284+/−0.57532.8250+/−0.57772.8249+/−0.57762.8192+/−0.5803
ionosphere0.7014+/−0.40001.5528+/−0.99640.7280+/−0.64980.9810+/−0.55680.9590+/−0.54370.9591+/−0.54390.6841+/−0.3761
dermatology0.0762+/−0.07780.0588+/−0.06540.1170+/−0.09910.0624+/−0.06890.0615+/−0.06940.0616+/−0.06940.0890+/−0.0782
horse−colic0.6230+/−0.16801.2551+/−0.41641.2111+/−0.42580.8826+/−0.30600.8699+/−0.27240.8696+/−0.27180.6366+/−0.1638
house−votes−840.2481+/−0.23200.9110+/−0.43230.2866+/−0.20910.2513+/−0.26170.2402+/−0.26470.2402+/−0.26480.2500+/−0.2268
cylinder−bands1.9149+/−0.80501.6171+/−0.27452.9088+/−0.87031.1335+/−0.31561.1736+/−0.31681.1321+/−0.31671.9137+/−0.8063
chess0.3455+/−0.09480.4057+/−0.10430.3380+/−0.09310.3843+/−0.09560.3612+/−0.08570.3581+/−0.08410.3397+/−0.0791
syncon0.0911+/−0.07800.4910+/−0.41110.1593+/−0.16960.0907+/−0.06630.0888+/−0.06570.0888+/−0.06560.0908+/−0.0803
balance−scale0.8296+/−0.09750.7287+/−0.06910.8618+/−0.09780.8271+/−0.09870.8296+/−0.09750.8296+/−0.09750.8321+/−0.0948
soybean0.1860+/−0.06811.0345+/−0.52770.2515+/−0.16660.2741+/−0.09970.2596+/−0.09070.2596+/−0.09070.1860+/−0.0681
credit−a0.5354+/−0.18610.6433+/−0.22100.7901+/−0.25210.5482+/−0.18600.5379+/−0.17930.5377+/−0.17950.5231+/−0.1862
breast−cancer−w0.2096+/−0.19810.4577+/−0.44310.2955+/−0.28110.2209+/−0.20630.2183+/−0.20070.2181+/−0.20050.2141+/−0.1975
pima−ind−diabetes0.7112+/−0.13650.7868+/−0.17290.7983+/−0.20340.7293+/−0.15590.7312+/−0.14820.7311+/−0.14820.7065+/−0.1371
vehicle0.9724+/−0.13473.1607+/−0.61420.9929+/−0.18861.0031+/−0.15591.0077+/−0.15871.0076+/−0.15860.9761+/−0.1338
anneal0.2316+/−0.11240.5108+/−0.19700.1882+/−0.09530.2794+/−0.11460.2450+/−0.11270.2446+/−0.11260.2183+/−0.1098
tic−tac−toe0.7191+/−0.05430.7854+/−0.06160.7077+/−0.06800.6953+/−0.05420.7191+/−0.05430.7191+/−0.05430.6953+/−0.0542
vowel0.4498+/−0.12491.5849+/−0.19541.0296+/−0.16840.3227+/−0.10280.4498+/−0.12470.4504+/−0.12470.3176+/−0.1226
german0.7635+/−0.10020.7690+/−0.10400.8958+/−0.19540.7613+/−0.09830.7632+/−0.09800.7632+/−0.09810.7509+/−0.0999
led1.1813+/−0.18341.1759+/−0.18701.2015+/−0.18771.1806+/−0.18391.1805+/−0.18321.1805+/−0.18321.1816+/−0.1841
contraceptive−mc1.4203+/−0.08541.5016+/−0.12951.4185+/−0.08131.4044+/−0.08901.3988+/−0.08601.3988+/−0.08601.4233+/−0.0885
yeast1.6929+/−0.14521.7185+/−0.13701.8312+/−0.17351.6864+/−0.13621.6899+/−0.14111.6901+/−0.14121.6889+/−0.1430
volcanoes1.1081+/−0.06181.1167+/−0.07561.1341+/−0.07261.1177+/−0.07311.1353+/−0.08221.1353+/−0.08211.1170+/−0.0623
car0.3879+/−0.02770.4640+/−0.03400.2661+/−0.03210.3988+/−0.03230.3854+/−0.02990.3857+/−0.02990.3720+/−0.0310
segment0.2568+/−0.05661.0099+/−0.25860.2876+/−0.07070.2620+/−0.05990.2630+/−0.05540.2630+/−0.05540.2577+/−0.0570
hypothyroid0.0901+/−0.02630.1892+/−0.05250.1110+/−0.03930.1297+/−0.03770.0975+/−0.03020.0976+/−0.03020.0901+/−0.0263
splice−c4.50.1661+/−0.03500.2111+/−0.06130.2206+/−0.05750.1687+/−0.03950.1684+/−0.03850.1684+/−0.03850.1676+/−0.0367
kr−vs−kp0.2394+/−0.02290.4199+/−0.03390.2386+/−0.04570.3463+/−0.02910.2899+/−0.02250.2897+/−0.02250.2400+/−0.0225
abalone1.2628+/−0.03782.6815+/−0.27531.2791+/−0.03921.2643+/−0.03811.2629+/−0.03771.2629+/−0.03771.2642+/−0.0382
spambase0.3326+/−0.09270.8490+/−0.18670.3938+/−0.11510.3535+/−0.09580.3663+/−0.11430.3328+/−0.09270.3527+/−0.1057
phoneme0.9483+/−0.10881.4351+/−0.09361.3346+/−0.12521.1686+/−0.06631.1014+/−0.06901.1008+/−0.06910.9509+/−0.1096
wall−following0.2769+/−0.02381.6069+/−0.16490.5949+/−0.06911.1436+/−0.13291.1227+/−0.13291.1228+/−0.13290.2782+/−0.0238
page−blocks0.1968+/−0.04170.7670+/−0.09130.2991+/−0.08340.2219+/−0.04710.2179+/−0.04620.2179+/−0.04620.1967+/−0.0417
optdigits0.1853+/−0.07590.9326+/−0.15750.3560+/−0.10810.1942+/−0.07720.1917+/−0.07790.1917+/−0.07790.1865+/−0.0763
satellite0.6644+/−0.08415.3687+/−0.53791.0206+/−0.16390.8222+/−0.11420.8188+/−0.11390.8189+/−0.11390.6674+/−0.0853
musk20.3730+/−0.03016.9568+/−0.49791.5495+/−0.21193.9347+/−0.40823.7331+/−0.38373.9152+/−0.41210.3723+/−0.0298
mushrooms0.0003+/−0.00040.0913+/−0.02290.0019+/−0.00360.0005+/−0.00090.0003+/−0.00040.0003+/−0.00040.0004+/−0.0007
thyroid0.7717+/−0.04361.7390+/−0.18260.8803+/−0.07530.8960+/−0.06080.8424+/−0.05830.8423+/−0.05830.7733+/−0.0435
pendigits0.1204+/−0.01521.1452+/−0.09620.2674+/−0.04390.1204+/−0.01520.1203+/−0.01520.1203+/−0.01520.1205+/−0.0152
sign0.9674+/−0.01831.2576+/−0.02421.0335+/−0.03420.9621+/−0.01850.9674+/−0.01840.9674+/−0.01840.9560+/−0.0193
nursery0.3096+/−0.01110.3766+/−0.01210.2274+/−0.01200.3136+/−0.00960.3104+/−0.01090.3104+/−0.01090.2765+/−0.0108
magic0.5786+/−0.02480.7345+/−0.02960.5755+/−0.02010.5624+/−0.02440.5786+/−0.02480.5786+/−0.02480.5609+/−0.0234
letter−recog0.6486+/−0.03271.9090+/−0.06821.0277+/−0.05080.6935+/−0.03580.6486+/−0.03280.6486+/−0.03280.6521+/−0.0342
adult0.5264+/−0.01440.6728+/−0.02000.5035+/−0.01250.5614+/−0.01230.5407+/−0.01150.5407+/−0.01150.5281+/−0.0136
shuttle0.0506+/−0.00360.1404+/−0.00510.0592+/−0.00510.0540+/−0.00370.0496+/−0.00350.0496+/−0.00350.0512+/−0.0036
connect−40.8693+/−0.00590.9840+/−0.01020.8600+/−0.00810.8766+/−0.00560.8694+/−0.00590.8694+/−0.00590.8753+/−0.0056
waveform0.0993+/−0.00230.5733+/−0.02230.1312+/−0.01110.1015+/−0.00270.1012+/−0.00270.1012+/−0.00270.0992+/−0.0022
localization1.8528+/−0.00982.1440+/−0.00541.8267+/−0.01071.7891+/−0.00831.7824+/−0.00941.7824+/−0.00941.8528+/−0.0098
census−income0.2131+/−0.00271.9789+/−0.01720.2467+/−0.00580.4898+/−0.00620.4086+/−0.00500.4086+/−0.00500.2132+/−0.0027
poker−hand1.0977+/−0.00481.4158+/−0.00481.0821+/−0.00271.2089+/−0.00341.0865+/−0.00311.0865+/−0.00301.0977+/−0.0048
donation0.0006+/−0.00010.0009+/−0.00010.0004+/−0.00010.0007+/−0.00010.0007+/−0.00010.0007+/−0.00010.0005+/−0.0001

References

  1. Wu, X.; Kumar, V.; Quinlan, J.R.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Liu, B.; Yu, P.S.; et al. Top 10 algorithms in data mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef]
  2. Halbersberg, D.; Wienreb, M.; Lerner, B. Joint maximization of accuracy and information for learning the structure of a Bayesian network classifier. Mach. Learn. 2020, 109, 1039–1099. [Google Scholar] [CrossRef]
  3. Zhang, W.; Zhang, Z.; Chao, H.C.; Tseng, F.H. Kernel mixture model for probability density estimation in Bayesian classifiers. Data Min. Knowl. Discov. 2018, 32, 675–707. [Google Scholar] [CrossRef]
  4. Jiang, L.; Zhang, L.; Li, C.; Wu, J. A correlation-based feature weighting filter for naive Bayes. IEEE Trans. Knowl. Data Eng. 2019, 31, 201–213. [Google Scholar] [CrossRef]
  5. Webb, G.I.; Boughton, J.R.; Wang, Z. Not so naive Bayes: Aggregating one-dependence estimators. Mach. Learn. 2005, 58, 5–24. [Google Scholar] [CrossRef]
  6. Webb, G.I.; Boughton, J.R.; Zheng, F.; Ting, K.M.; Salem, H. Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly Naive Bayesian classification. Mach. Learn. 2012, 86, 233–272. [Google Scholar] [CrossRef]
  7. Gelfand, A.E.; Dey, D.K. Bayesian model choice: Asymp-totics and exact calculations. J. R. Stat. Soc. Ser. B 1994, 56, 501–514. [Google Scholar] [CrossRef]
  8. Chen, S.; Webb, G.I.; Liu, L.; Ma, X. A novel selective naïve Bayes algo-rithm. Knowl.-Based Syst. 2020, 192, 105361. [Google Scholar] [CrossRef]
  9. Dua, D.; Graff, C. UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml (accessed on 8 June 2024).
  10. Jiang, L.; Zhang, H. Weightily averaged one-dependence estimators. In Proceedings of the 9th Pacific Rim International Conference on Artificial Intelligence, Guilin, China, 7–11 August 2006; pp. 970–974. [Google Scholar]
  11. Jiang, L.; Zhang, H.; Cai, Z.; Wang, D. Weighted average of one-dependence estimators†. J. Exp. Theor. Artif. Intell. 2012, 24, 219–230. [Google Scholar] [CrossRef]
  12. Wu, J.; Pan, S.; Zhu, X.; Zhang, P.; Zhang, C. Sode: Self-adap-tive one-dependence estimators for classification. Pattern Recognit. 2016, 51, 358–377. [Google Scholar] [CrossRef]
  13. Zheng, F.; Webb, G.I. Finding the right family: Parent and child selection for averaged one-dependence estimators. In Proceedings of the 18th European Conference on Machine Learning, Warsaw, Poland, 17–21 September 2007; pp. 490–501. [Google Scholar]
  14. Yang, Y.; Webb, G.I.; Cerquides, J.; Korb, K.B.; Boughton, J.; Ting, K.M. To select or to weigh: A comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans. Knowl. Data Eng. 2007, 19, 1652–1665. [Google Scholar] [CrossRef]
  15. Yang, Y.; Korb, K.; Ting, K.-M.; Webb, G. Ensemble selection for su-perparent-one-dependence estimators. In Proceedings of the 18th Australian Joint Conference on Artificial Intelligence, Sydney, Australia, 5–9 December 2005; pp. 102–111. [Google Scholar]
  16. Chen, S.; Martinez, A.M.; Webb, G.I. Highly Scalable Attribute Selection for Averaged One-Dependence Estimators; Springer: Berlin/Heidelberg, Germany, 2014; pp. 86–97. [Google Scholar]
  17. Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian network classifiers. Mach. Learn. 1997, 29, 131–163. [Google Scholar] [CrossRef]
  18. Sahami, M. Learning limited dependence Bayesian classifiers. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; ACM: New York, NY, USA, 1996; pp. 335–338. [Google Scholar]
  19. Chen, S.; Martínez, A.M.; Webb, G.I.; Wang, L. Sample-based attribute selective AnDE for large data. IEEE Trans. Knowl. Data Eng. 2017, 29, 172–185. [Google Scholar] [CrossRef]
  20. Witten, I.H.; Frank, E.; Trigg, L.; Hall, M.A.; Holmes, G.; Cunningham, S.J. Weka: Practical Machine Learning Tools and Techniques with Java Implementations. Acm. Sigmod. Record. 1999, 31, 76–77. [Google Scholar] [CrossRef]
  21. Chen, S.; Gao, X.; Zhuo, C.; Zhu, C. Research on Averaged One-Dependence Estimators Classification Algorithm Based on Divergence Weighting. J. Nanjing Univ. Sci. Technol. 2024, 48. [Google Scholar]

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (3)

Figure 1. The relationship between LOOCV and 10-fold CV.

Figure 1. The relationship between LOOCV and 10-fold CV.

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (4)

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (5)

Figure 2. Scatter plot of ZOL.

Figure 2. Scatter plot of ZOL.

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (6)

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (7)

Figure 3. Scatter plot of LogLoss.

Figure 3. Scatter plot of LogLoss.

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (8)

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (9)

Table 1. Tables ofsymbols.

Table 1. Tables ofsymbols.

SymbolsDefinition
D Set of trainingdata
nNumber of trainingsamples
X , X i Variable representing theattribute
x , x i Value of attribute variable X or X i
v i , v j Number of values of attribute X i or X j
x = x 1 , , x d Vector representing asample
x i ithsample
x i , j jth value of the ithattribute
YVariable representing thecategory
yA specific value in random variable Y
y i True category of the ithsample
cNumber ofcategories
dNumber ofattributes
mSmoothingparameter
rNumber of attributes as parentattributes
sNumber of attributes as childattributes
w j Weight of the jth ( j = 1 , 2 , , d ) ODE
m Threshold

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (10)

Table 2. Frequency table with two attributes and two classvariables.

Table 2. Frequency table with two attributes and two classvariables.

Class 1 Class 2
X 2 x 2 , 1 X 1 x 1 , 1
X 1 x 1 , 2
X 2 x 2 , 2 X 1 x 1 , 1
X 1 x 1 , 2
X 2 x 2 , 3 X 1 x 1 , 1
X 1 x 1 , 2

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (11)

Table 3. Space of approximate models of WAODE with d attributes.

Table 3. Space of approximate models of WAODE with d attributes.

ParentChildren
x 1 x s x d
x 1 P WAODE ( y , x ) 1 , 1 P WAODE ( y , x ) 1 , s P WAODE ( y , x ) 1 , d
x r P WAODE ( y , x ) r , 1 P WAODE ( y , x ) r , s P WAODE ( y , x ) r , d
x d P WAODE ( y , x ) d , 1 P WAODE ( y , x ) d , s P WAODE ( y , x ) d , d

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (12)

Table 4. Datasets.

Table 4. Datasets.

No.NameInstAttClass
1contact-lenses2443
2lung-cancer32563
3labor-negotiations57162
4post-operative9083
5zoo101167
6promoters106572
7echocardiogram13162
8lymphography148184
9iris15043
10teaching-ae15153
11hepatitis155192
12wine178133
13autos205257
14sonar208602
15glass-id21493
16new-thyroid21553
17audio2266924
18hungarian294132
19heart-disease-c303132
20haberman30632
21primary-tumor3391722
22ionosphere351342
23dermatology366346
24horse-colic368212
25house-votes-84435162
26cylinder-bands540392
27chess551392
28syncon600606
29balance-scale62543
30soybean6833519
31credit-a690152
32breast-cancer-w69992
33pima-ind-diabetes76882
34vehicle846184
35anneal898386
36tic-tac-toe95892
37vowel9901311
38german1000202
39led1000710
40contraceptive-mc147393
41yeast1484810
42volcanoes152034
43car172864
44segment2310197
45hypothyroid3163252
46splice-c4.53177603
47kr-vs-kp3196362
48abalone417783
49spambase4601572
50phoneme5438750
51wall-following5456244
52page-blocks5473105
53optdigits56206410
54satellite6435366
55musk265981662
56mushrooms8124222
57thyroid91692920
58pendigits10,9921610
59sign12,54683
60nursery12,96085
61magic19,020102
62letter-recog20,0001626
63adult48,842142
64shuttle58,00097
65connect-467,557423
66waveform100,000213
67localization164,860511
68census-income299,285412
69poker-hand1,025,0101010
70donation5,749,132112

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (13)

Table 5. Win/draw/loss of ZOL for SWAODE.

Table 5. Win/draw/loss of ZOL for SWAODE.

NBKDBAODEWAODE-MIWAODE-KLSAODE
SWAODE52/5/1352/2/1639/8/2334/12/2432/14/2427/22/21
NB 24/2/4415/4/5114/3/5315/3/5214/3/53
KDB 21/3/4621/1/4820/2/4815/3/52
AODE 20/15/3520/13/3721/8/41
WAODE-MI 9/51/1025/8/37
WAODE-KL 26/7/37

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (14)

Table 6. Win/draw/loss of LogLoss for SWAODE.

Table 6. Win/draw/loss of LogLoss for SWAODE.

NBKDBAODEWAODE-MIWAODE-KLSAODE
SWAODE60/0/1056/0/1448/1/2142/7/2142/6/2240/4/26
NB 28/0/4211/0/5911/0/5911/0/5910/0/60
KDB 18/0/5218/0/5218/0/5212/0/58
AODE 25/1/4422/1/4717/2/51
WAODE-MI 19/28/2324/0/46
WAODE-KL 26/0/44

Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (15)

Table 7. Ablation Studies.

Table 7. Ablation Studies.

WAODE-MISAODE
ZOLSWAODE34/12/2427/22/21
WAODE-MI 25/8/37
LogLossSWAODE42/7/2140/4/26
WAODE-MI 24/0/46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.


© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Research on Model Selection-Based Weighted Averaged One-Dependence Estimators (2024)

References

Top Articles
Latest Posts
Article information

Author: Pres. Carey Rath

Last Updated:

Views: 5692

Rating: 4 / 5 (41 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Pres. Carey Rath

Birthday: 1997-03-06

Address: 14955 Ledner Trail, East Rodrickfort, NE 85127-8369

Phone: +18682428114917

Job: National Technology Representative

Hobby: Sand art, Drama, Web surfing, Cycling, Brazilian jiu-jitsu, Leather crafting, Creative writing

Introduction: My name is Pres. Carey Rath, I am a faithful, funny, vast, joyous, lively, brave, glamorous person who loves writing and wants to share my knowledge and understanding with you.