Skip to main content
Intended for healthcare professionals
Free access
Research article
First published online March 24, 2020

The random forest algorithm for statistical learning

Abstract

Random forests (Breiman, 2001, Machine Learning 45: 5–32) is a statistical- or machine-learning algorithm for prediction. In this article, we introduce a corresponding new command, rforest. We overview the random forest algorithm and illustrate its use with two examples: The first example is a classification problem that predicts whether a credit card holder will default on his or her debt. The second example is a regression problem that predicts the logscaled number of shares of online news articles. We conclude with a discussion that summarizes key points demonstrated in the examples.

1 Introduction

In recent years, the use of statistical- or machine-learning algorithms has increased in the social sciences.1 For instance, to predict economic recession, Liu et al. (2017) compared ordinary least-squares regression results with random forest regression results and obtained a considerably higher adjusted R-squared value with random forest regression compared with ordinary least-squares regression (Nyman and Ormerod 2017). In economics, a recent book overviews various statistical-learning algorithms for predicting economic growth and recession (Basuchoudhary, Bang, and Sen 2017). In environmental science, a recent article used learning algorithms, including least absolute shrinkage and selection operator regression, random forest, and neural networks, to predict ragweed pollen concentration based on 27 years of historical data and 85 predictor variables, with the best predictive performance obtained using random forest.
Why does random forest do better than linear regression for prediction tasks? Linear regression makes the assumption of linearity. This assumption makes the model easy to interpret but is often not flexible enough for prediction. Random decision forests easily adapt to nonlinearities found in the data and therefore tend to predict better than linear regression. More specifically, ensemble learning algorithms like random forests are well suited for medium to large datasets. When the number of independent variables is larger than the number of observations, linear regression and logistic regression algorithms will not run, because the number of parameters to be estimated exceeds the number of observations. Random forest works because not all predictor variables are used at once.
Random forest is one of the best-performing learning algorithms. For social scientists, such developments in algorithms are useful only to the extent that they can access an implementation of the algorithm. In this article, we introduce rforest, a command for random forests developed by the authors that is built on the Weka library (Witten et al. 2016; Hall et al. 2009).
The outline of this article is as follows: In section 2, we briefly discuss the random forest algorithm. In section 3, we give the syntax of the rforest command. In section 4, we give an example for predicting whether a given credit card user will default on his or her debt. In section 5, we give an example for estimating the log-scaled number of shares of online news articles. In section 6, we conclude with a discussion.

2 The random forest algorithm

We first discuss tree-based models because they form the building blocks of the random forest algorithm. A tree-based model involves recursively partitioning the given dataset into two groups based on a certain criterion until a predetermined stopping condition is met. At the bottom of decision trees are so-called leaf nodes or leaves.
Figure 1 illustrates a recursive partitioning of a two-dimensional input space with axis-aligned boundaries—that is, each time the input space is partitioned in a direction parallel to one of the axes. Here the first split occurred on x 2 ≥ a 2. Then, the two subspaces were again partitioned: The left branch was split on x 1 ≥ a 4. The right branch was first split on x 1 ≥ a 1, and one of its subbranches was split on x 2 > a 3. Figure 2 is a graphical representation of the subspaces partitioned in figure 1.
Figure 1. Recursive binary partition of a two-dimensional subspaces
Figure 2. A graphical representation of the decision tree in figure 1
Depending on how the partition and stopping criteria are set, decision trees can be designed for both classification tasks (categorical outcome, for example, logistic regression) and regression tasks (continuous outcome).
For both classification and regression problems, the subset of predictor variables selected to split an internal node depends on predetermined splitting criteria that are formulated as an optimization problem. A common splitting criterion in classification problems is entropy, which is the practical application of Shannon’s (2001) source coding theorem that specifies the lower bound on the length of a random variable’s bit representation. At each internal node of the decision tree, entropy is given by the formula
E=-i=1cpi×log(pi)
where c is the number of unique classes and pi is the prior probability of each given class. This value is maximized to gain the most information at every split of the decision tree. For regression problems, a commonly used splitting criterion is the mean squared error at each internal node.
A drawback of decision trees is that they are prone to overfitting, which means that the model follows the idiosyncrasies of the test dataset too closely and performs poorly on a new dataset—that is, the test data. Overfitting decision trees will lead to low general predictive accuracy, also referred to as generalization accuracy.
One way to increase generalization accuracy is to consider only a subset of the observations and build many individual trees. First introduced by Ho (1995), this idea of the random-subspace method was later extended and formally presented as the random forest by Breiman (2001). The random forest model is an ensemble tree-based learning algorithm; that is, the algorithm averages predictions over many individual trees. The individual trees are built on bootstrap samples rather than on the original sample. This is called bootstrap aggregating or simply bagging, and it reduces overfitting. The algorithm is as follows:
Algorithm 1. Random forest algorithm
Individual decision trees are easily interpretable, but this interpretability is lost in random forests because many decision trees are aggregated. However, in exchange, random forests often perform much better on prediction tasks.
The random forest algorithm more accurately estimates the error rate compared with decision trees. More specifically, the error rate has been mathematically proven to always converge as the number of trees increases (Breiman 2001).
The error of the random forest is approximated by the out-of-bag (oob) error during the training process. Each tree is built on a different bootstrap sample. Each bootstrap sample randomly leaves out about one-third of the observations. These left-out observations for a given tree are referred to as the oob sample. Finding parameters that would produce a low oob error is often a key consideration in model selection and parameter tuning. Note that in the random forest algorithm, the size of the subset of predictor variables, m, is crucial to controlling the final depth of the trees. Hence, it is a parameter that needs to be tuned during model selection, which will be discussed in the examples later.
To gain some insight on the complex model, we calculate the so-called variable importance of each variable. This is calculated by adding up the improvement in the objective function given in the splitting criterion over all internal nodes of a tree and across all trees in the forest, separately for each predictor variable. In the Stata implementation of random forest, the variable importance score is normalized by dividing all scores over the maximum score: the importance of the most important variable is always 100%.

3 Syntax

The syntax to fit a random forest model is
rforest depvar indepvars [if] [in] [, type( string ) iterations( int )
numvars( int )depth( int ) lsize( int ) variance( real ) seed( int )
numdecimalplaces( int )]
with the following postestimation command:
predict newvar | varlist | stub * [if] [in] [, pr]

4 Example: Credit card default

Yeh and Lien (2009) and Dheeru and Karra Taniskidou (2017) investigated the predictive accuracy of the probability of default of credit card clients. There are a total of 30,000 observations, 1 response variable, 22 explanatory variables, and no missing values. The response variable is a binary variable that encodes whether the card holder will default on his or her debt, with 0 encoded as “no default” and 1 encoded as “default”. Of the 22 explanatory variables, 10 are categorical variables containing information such as gender, education, marital status, and whether past payments have been made on time or delayed. The remaining 12 continuous explanatory variables contain information on the monthly bill amount and payment amount over 6 months. For a complete list of variables, please refer to appendix A.
In this example, we will investigate the predominant factors that affect credit card default prediction accuracy, and we will contrast the prediction accuracies obtained using random forest and logistic regression.

4.1 Model training and parameter tuning

To start the model-training process, we arrange the data points in a randomly sorted order. When the data are split into training and test data, a random sort order ensures that the training data are random as well. To allow for reproducible results, we set a seed value. Then, we split the dataset into two subsets: 50% of the data are used for training, and 50% of the data are used for testing (validation). In small datasets, a 50-50 split may reduce the size of the training data too much; for this relatively large dataset, a 50-50 split is not problematic. The randomization process mentioned previously ensures that the training data contain observations belonging to all available classes as long as the class probabilities are not heavily imbalanced. Additionally, it removes the model’s potential dependency on the ordering of observations relative to the test data. Finally, because the variable for marital status uses values 0, 1, 2, and 3 to encode unordered categorical information, we need to create four new binary indicator variables for each marital status using the command tabulate marriage, generate(marriage_enum). Creating the fourth indicator variable is redundant, but this does not matter to tree-based algorithms like rforest.
Next, we tune the hyperparameters to find the model with the highest testing accuracy. Specifically, we tune the number of iterations (that is, the number of subtrees) and number of variables to randomly investigate at each split, numvars(). The following code segment iteratively calculates the oob prediction accuracy as a function of the number of iterations and numvars(). The number of iterations starts at 10 and is incremented by 5 every time until it reaches 500. We will use both a oob error (tested against training data subsets that are not included in subtree construction) and a validation error (tested against the testing data) to determine the best possible model.
Usually, tuning parameters in statistical-learning models requires a grid search, that is, an exhaustive search on a user-specified subspace of hyperparameter values. In this case, however, because random forest oob error rates converge after the number of iterations gets large enough, we simply need to set the iterations to a value large enough for convergence to have occurred prior to tuning the numvars() parameter.
To illustrate how the oob error and validation error have similar trends as the number of iterations grow, we call the random forest function iteratively. The number of iterations variable is initialized to 10 and increments by 5 per function call until it reaches 500. Finally, the trends of oob error and validation error can be visualized by plotting those values against the number of iterations, as shown in figure 3.
The stable option ensures that the result replicates even if there are ties on the sort variable. The number of variables is investigated below; for simplicity, we set numvars(1) here.
Figure 3. oob error and validation error versus iterations plot
We can see from figure 3, generated by the above code block, that both the oob error and the validation error stabilize at around 19%. Hence, fixing the number of iterations at 500 is a good choice.
Next, we can tune the hyperparameter numvars():
Figure 4. oob error and validation error versus number of variables plot
In figure 4, we can see for how many variables the minimum error occurs. The following code automates finding the minimum error and the corresponding number of variables. (This code uses frames and requires Stata 16.)
We can see that at numvars(18), we get the lowest validation error at 0.1824. Hence, we will use numvars(18) for our final model.
In principle, the random forest algorithm can output an oob error at each iteration. However, the Weka implementation of random forest used for the Stata plugin does not output running calculations of oob error as the algorithm runs and instead only outputs one final oob error for the total number of iterations. This means that tuning the iterations parameter requires running the random forest algorithm k times for every value of iterations( k ). To make this process efficient, we set minimum and maximum values and a reasonable increment to see the trend of the change of oob error over increasing iterations.

4.2 Final model and interpretation of results

As shown in the previous section, we have set the values of the hyperparameters to be iterations(500) and numvars(18). Having reached convergence after 500 iterations, we are free to set the number of iterations even higher. Out of an abundance of caution we set iterations(1000). The following code block gives the final model and prediction error:
The final oob error is 18.25%, which is larger than the actual prediction error, which is 18.24%, calculated over 15,000 test observations. We can see from both figure 3 and figure 4 that the oob error and the validation error have the same pattern when plotted against the two hyperparameters, which are iterations and number of variables.
We also would like to ascertain which factors are the most important in the prediction process. Random forests are black boxes in that they do not offer insight on how the predictions are accomplished. The variable-importance scores of each predictor provide some limited insight. The following code segment plots the variable importance:
We can see from figure 5 that the five most important predictors are basic demographic and background information such as gender, education, and marital status (“married” and “single”) as well as the monthly spending limit (limit_bal). We can also see that none of the variables encoding monthly bill amounts (bill_amt) is particularly important, compared with the rest of the predictors. Surprisingly, however, the amount of monthly spending limit (limit_bal) is the third most important predictor in the random forest model. We can overlay two histograms of the monthly spending limit to obtain more insight on how this variable affects the response variable:
Figure 5. Importance scores of predictor variables
We can see from the histograms in figure 6 that card holders who default on their debt generally have a lower monthly spending limit than those who do not default. Variable importance measures the contribution of an x variable to the model but depends on the set of x variables. Another x variable correlated with the first would rise in importance if the first x variable was excluded.
Figure 6. Histograms of monthly spending limit

4.3 Comparison with logistic regression

Alternatively, credit card debt default can be modeled using logistic regression. The following code returns the prediction accuracy of logistic regression using the same set of predictor variables and the same train-and-test split:
The prediction error obtained using logistic regression is 18.86%, compared with the best-so-far error rate that we have from random forest, which is 18.25%. The difference in error rate is small but might still be meaningful to prevent credit card defaults.

5 Example: Online news popularity

Fernandes et al. (2015) and Dheeru and Karra Taniskidou (2017) investigated the popularity of online news.2 The data were originally presented at a Portuguese conference on artificial intelligence in 2015. There are a total of 39,644 observations, 1 response variable, and 58 explanatory variables. For this problem, we are interested in the log-scaled number of “shares” an online article obtains based on various nominal and continuous attributes such as whether the article was published on a weekend, whether certain keywords are present, number of images in the article, etc. For a full list of variable names and descriptions, please refer to appendix B.

5.1 Model training and parameter tuning

First, we need to randomize the data as we did for the previous classification example. Then, we generate a new variable for the log-scaled number of shares:
We will use a 50-50 split to partition the data into training and testing sets as in the previous example. To tune the hyperparameters numvars() and iterations(), we use the same technique as the previous example, where we fix the value of one hyperparameter when tuning the other. This is a viable parameter-optimization method that results from the error rate for random forest converging when the number of iterations is large enough. Essentially, our goal is to set a reasonably large number of iterations where the oob and validation errors converge so that when we tune the number of randomly selected variables, we can ascertain that the errors differ because of the value of numvars() and not because of iterations(). We will again start with iterations(10) and increase it by increments of 5 until iterations(100), which is approximately the highest possible value with which one can run this dataset on a CPU because of constraints on runtime memory. At the end of the loop, we plot the oob errors and the actual root mean squared error (rmse) values validated using the test data against the number of iterations.
Figure 7. oob error and validation rmse versus iterations plot
We can see from the graph that the oob error and validation rmse start to converge around 80 iterations. We get the lowest value for both errors at 100 iterations, which will be used for the final model. Now we can tune the other hyperparameter, numvars(), to see which one gives the lowest validation rmse.
Figure 8. oob error and validation error versus number of variables plot
Again, we automate finding the minimum error:
For numvars(6), we get the lowest validation error at 0.8570. Hence, we will use numvars(6) for our final model. For this dataset, the model is fairly robust to changes in the number of variables, numvars(), and numvars(6) has only a slight edge compared with other values. This might not always be the case.

5.2 Final model and interpretation of results

The final model has hyperparameter values numvars(6) and iterations(100).
The final oob error is 0.6436. This is slightly lower than the rmse calculated against the testing data, which is 0.8570. To learn which variables affect the prediction accuracy, we can generate a variable-importance plot using the same code segment as the previous classification example. For readability, only variables with an importance score of at least 40% as large as that of the most important variable are shown.
Figure 9. Importance score of predictor variables
Whether the article was published on a weekend is the most important predictor. Other important explanatory variables include news channel types and the number of keywords. To obtain more insight on how the log-scaled number of article shares is related to whether the article was published on a weekend, we use the following histogram to illustrate the relationship:
Figure 10. Histograms of log-scaled number of shares
The empirical distributions of log number of shares differ for weekdays versus weekends. This clear shift in empirical distribution helps to explain why the is_weekend explanatory variable was the most important in the model.

5.3 Comparison with linear regression

The following code block fits a linear regression model over the same set of dependent and independent variables using the same train-and-test split as shown in the random forest model:
The value of e(rmse) displayed is the rmse calculated over the training data. To compare the linear model with the random forest model, we need to calculate the rmse over the testing data using the following commands:
We can see from the output that the mean squared error is 40.90379, which means the rmse is equal to 40.903796.3956, which is much higher than the rmse fitted over the training data. Comparing with the testing rmse obtained from the random forest model, the testing rmse for the linear model is much higher. This is a strong indication that random forest outperforms linear regression for this example.

6 Discussion

The classification and regression examples have illustrated that random forest models usually have higher prediction accuracy than corresponding parametric models such as logistic regression and linear regression. Typically, greater gains in model performance are available for multiclass (multinomial) outcomes and regression than binary outcomes. Misclassification is a fairly insensitive performance criterion. When an improved algorithm changes the estimated classification probabilities for two classes from p 1 = 0.10 and p 2 = 0.90 to p 1 = 0.40 and p 2 = 0.60 for an observation, the resulting classification remains the same. An improvement over logistic regression with its linearity assumption can come either from nonlinearities or from interactions. Additionally, the scope of improvement is reduced when many of the variables are indicator variables; nonlinearities do not exist for indicator variables. In our experience, many of the variables in social sciences are indicator variables. For example, Ing et al. (2019) found that support-vector machines did not improve over logistic regression. Similarly, in our classification example, the improvement of random forest over logistic regression was minor.
In the examples, the values of hyperparameters were determined based on which value gave the lowest testing error. In practice, when there are not enough observations to allow for a train-and-test split, the oob error can be used instead. As previously demonstrated, the oob error is a close estimation of the actual testing error and can be used on its own as a criterion for parameter tuning.
While the two examples primarily focused on the typical case of tuning the options iterations() and numvars(), depending on the dataset and software constraints, other hyperparameters such as max tree depth and minimum size of leaf nodes could be taken into consideration during parameter tuning. For instance, setting the max tree depth to a fixed value may become necessary on a machine with limited RAM.

7 Acknowledgments

The software development in Stata was built on top of the Weka Java implementation, which was developed by the University of Waikato. We are grateful to Eibe Frank for allowing us to use the Weka implementation for the plugin.

Footnotes

This research was supported by the Social Sciences and Humanities Research Council of Canada (# 435-2013-0128).
1. “Statistical learning” and “machine learning” are synonymous. We use “statistical learning” for the remainder of the article.
2. To access the exact dataset used in this example, please visit https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity
3. The values of the variables pay_0–pay_6 correspond to number of months delayed.

9 References

Basuchoudhary A., Bang J. T., and Sen T. 2017. Machine-Learning Techniques in Economics: New Tools for Predicting Economic Growth. New York: Springer.
Breiman L. 2001. Random forests. Machine Learning 45: 5–32.
Dheeru D., Karra Taniskidou E. 2017. Default of credit card clients dataset. https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset.
Fernandes K., Vinagre P., Cortez P. 2015. A proactive intelligent decision support system for predicting the popularity of online news. In Progress in Artificial Intelligence: 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, September 8–11, 2015. Proceedings, 535–546. New York: Springer.
Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I. H. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11(1): 10–18.
Ho T. K. 1995. Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition, 278–282. Piscataway, NJ: IEEE.
Ing E., Su W., Schonlau M., and Torun N. 2019. Support vector machines and logistic regression to predict temporal artery biopsy outcomes. Canadian Journal of Ophthalmology 54: 116–118.
Liu X., Wu D., Zewdie G. K., Wijerante L., Timms C. I., Riley A., Levetin E., and Lary D. J. 2017. Using machine learning to estimate atmospheric Ambrosia pollen concentrations in Tulsa, OK. Environmental Health Insights 11: 1–10.
Nyman R., Ormerod P. 2017. Predicting economic recessions using machine learning algorithms. ArXiv Working Paper No. arXiv:1701.01428. https://arxiv.org/abs/1701.01428.
Shannon C. E. 2001. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5: 3–55.
Witten I. H., Frank E., Hall M. A., and Pal C. J. 2016. The WEKA workbench online appendix. In Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Burlington, MA: Morgan Kaufmann.
Yeh I.-C., Lien C.-H. 2009. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36: 2473–2480.

A Variable names for classification example

The column names from the variables limit_bal through defaultpaymentnextmonth appear as they do in the original documentation on UCI Machine Learning Repository’s website.
Variable namesColumn names
idRow number
limit_balAmount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit
sexGender (1 = male, 2 = female)
educationEducation (1 = graduate school; 2 = university; 3 = high school; 4 = others)
marriageMarital status (1 = married; 2 = single; 3 = others)
ageAge (year)
pay_03the repayment status in September, 2005
pay_2the repayment status in August, 2005
pay_3the repayment status in July, 2005
pay_4the repayment status in June, 2005
pay_5the repayment status in May, 2005
pay_6the repayment status in April, 2005
bill_amt1amount of bill statement in September, 2005
bill_amt2amount of bill statement in August, 2005
bill_amt3amount of bill statement in July, 2005
bill_amt4amount of bill statement in June, 2005
bill_amt5amount of bill statement in May, 2005
bill_amt6amount of bill statement in April, 2005
pay_amt1amount of previous payment in September, 2005
pay_amt2amount of previous payment in August, 2005
pay_amt3amount of previous payment in July, 2005
pay_amt4amount of previous payment in June, 2005
pay_amt5amount of previous payment in May, 2005
pay_amt6amount of previous payment in April, 2005
defaultpaymentnextmonthdefault payment (Yes = 1, No = 0), as the response variable
marriage_enum1Marital status is not “married”, “single”, or “other”; generated during data preprocessing
marriage_enum2Marital status = “married”; generated during data preprocessing
marriage_enum3Marital status = “single”; generated during data preprocessing
marriage_enum4Marital status = “other”; generated during data preprocessing

B Variable names for regression example

The column names in this table are reproduced based on the original documentation on UCI Machine Learning Repository’s website.
Variable namesColumn names
urlURL of the article (non-predictive)
timedeltaDays between the article publication and the dataset acquisition (non-predictive)
n_tokens_titleNumber of words in the title
n_tokens_contentNumber of words in the content
n_unique_tokensRate of unique words in the content
n_non_stop_wordsRate of non-stop words in the content
n_non_stop_unique_tokensRate of unique non-stop words in the content
num_hrefsNumber of links
num_self_hrefsNumber of links to other articles published by Mashable
num_imgsNumber of images
num_videosNumber of videos
average_token_lengthAverage length of the words in the content
num_keywordsNumber of keywords in the metadata
data_channel_is_lifestyleIs data channel ‘Lifestyle’?
data_channel_is_entertainmentIs data channel ‘Entertainment’?
data_channel_is_busIs data channel ‘Business’?
data_channel_is_socmedIs data channel ‘Social Media’?
data_channel_is_techIs data channel ‘Tech’?
data_channel_is_worldIs data channel ‘World’?
kw_min_minWorst keyword (min. shares)
kw_max_minWorst keyword (max. shares)
kw_avg_minWorst keyword (avg. shares)
kw_min_maxBest keyword (min. shares)
kw_max_maxBest keyword (max. shares)
kw_avg_maxBest keyword (avg. shares)
kw_min_avgAvg. keyword (min. shares)
kw_max_avgAvg. keyword (max. shares)
kw_avg_avgAvg. keyword (avg. shares)
self_reference_min_sharesMin. shares of referenced articles in Mashable
self_reference_max_sharesMax. shares of referenced articles in Mashable
self_reference_avg_sharessAvg. shares of referenced articles in Mashable
weekday_is_mondayWas the article published on a Monday?
weekday_is_tuesdayWas the article published on a Tuesday?
weekday_is_wednesdayWas the article published on a Wednesday?
weekday_is_thursdayWas the article published on a Thursday?
weekday_is_fridayWas the article published on a Friday?
weekday_is_saturdayWas the article published on a Saturday?
weekday_is_sundayWas the article published on a Sunday?
is_weekendWas the article published on the weekend?
LDA_00Closeness to LDA topic 0
LDA_01Closeness to LDA topic 1
LDA_02Closeness to LDA topic 2
LDA_03Closeness to LDA topic 3
LDA_04Closeness to LDA topic 4
global_subjectivityText subjectivity
global_sentiment_polarityText sentiment polarity
global_rate_positive_wordsRate of positive words in the content
global_rate_negative_wordsRate of negative words in the content
rate_positive_wordsRate of positive words among non-neutral tokens
rate_negative_wordsRate of negative words among non-neutral tokens
avg_positive_polarityAvg. polarity of positive words
min_positive_polarityMin. polarity of positive words
max_positive_polarityMax. polarity of positive words
avg_negative_polarityAvg. polarity of negative words
min_negative_polarityMin. polarity of negative words
max_negative_polarityMax. polarity of negative words
title_subjectivityTitle subjectivity
title_sentiment_polarityTitle polarity
abs_title_subjectivityAbsolute subjectivity level
abs_title_sentiment_polarityAbsolute polarity level
sharesNumber of shares (target)

Biographies

Matthias Schonlau is a professor of statistics at the University of Waterloo, Canada. His interests include survey methodology and learning from text data in the context of open-ended questions.
Rosie Yuyan Zou is a recent B.CS graduate from the University of Waterloo. She will be joining Apple, Inc. as a software design engineer. Her academic passions include applied machine learning and software system design for digital hardware.

Supplementary Material

To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type

Cite article

Cite article

Cite article

OR

Download to reference manager

If you have citation software installed, you can download article citation data to the citation manager of your choice

Share options

Share

Share this article

Share with email
EMAIL ARTICLE LINK
Share on social media

Share access to this article

Sharing links are not relevant where the article is open access and not available if you do not have a subscription.

For more information view the Sage Journals article sharing page.

Information, rights and permissions

Information

Published In

Article first published online: March 24, 2020
Issue published: March 2020

Keywords

  1. st0587
  2. rforest
  3. random decision forest algorithm

Rights and permissions

© 2020 StataCorp LLC.
Request permissions for this article.

Authors

Affiliations

Matthias Schonlau
University of Waterloo, Waterloo, Canada, [email protected]
Rosie Yuyan Zou
University of Waterloo, Waterloo, Canada, [email protected]

Metrics and citations

Metrics

Journals metrics

This article was published in The Stata Journal: Promoting communications on statistics and Stata.

VIEW ALL JOURNAL METRICS

Article usage*

Total views and downloads: 37249

*Article usage tracking started in December 2016


Articles citing this one

Receive email alerts when this article is cited

Web of Science: 0

Crossref: 381

  1. Biomass higher heating value prediction machine learning insights into...
    Go to citation Crossref Google Scholar
  2. Machine learning in physical activity, sedentary, and sleep behavior r...
    Go to citation Crossref Google Scholar
  3. A machine learning approach for modeling the occurrence of the major i...
    Go to citation Crossref Google Scholar
  4. An application of the Random Forest algorithm for the prediction of So...
    Go to citation Crossref Google Scholar
  5. Artificial intelligence-based prediction model for the elemental occur...
    Go to citation Crossref Google Scholar
  6. An ensemble model of explainable soft computing for failure mode ident...
    Go to citation Crossref Google Scholar
  7. Machine learning techniques to predict atmospheric black carbon in a t...
    Go to citation Crossref Google Scholar
  8. Machine learning approach for detection of plasma depletions from TEC
    Go to citation Crossref Google Scholar
  9. Prediction of International Roughness Index Using CatBooster and Shap ...
    Go to citation Crossref Google Scholar
  10. Machine Learning-Based Forecasting of Temperature and Solar Irradiance...
    Go to citation Crossref Google Scholar
  11. GPR-TransUNet: An improved TransUNet based on self-attention mechanism...
    Go to citation Crossref Google Scholar
  12. Predictive Power of Random Forests in Analyzing Risk Management in Isl...
    Go to citation Crossref Google Scholar
  13. Machine Learning Algorithms for Solar Irradiance Prediction: A Recent ...
    Go to citation Crossref Google Scholar
  14. Predicting effluent quality parameters for wastewater treatment plant:...
    Go to citation Crossref Google Scholar
  15. Model-agnostic generation-enhanced technology for few-shot intrusion d...
    Go to citation Crossref Google Scholar
  16. Point and pixel inclusive machine learning models for exploring gully ...
    Go to citation Crossref Google Scholar
  17. Understanding inequality in U.S. farm subsidies using large‐scale admi...
    Go to citation Crossref Google Scholar
  18. A low-cost machine learning framework for predicting drug–drug interac...
    Go to citation Crossref Google Scholar
  19. A machine learning-based assessment of subjective quality of life
    Go to citation Crossref Google Scholar
  20. Machine learning for the management of biochar yield and properties of...
    Go to citation Crossref Google Scholar
  21. Enhancing accuracy in cost estimation for façade works: integration of...
    Go to citation Crossref Google Scholar
  22. Machine learning-based prediction model for distant metastasis of brea...
    Go to citation Crossref Google Scholar
  23. Machine learning for layer-by-layer nanofiltration membrane performanc...
    Go to citation Crossref Google Scholar
  24. Machine-Learning-Based Deformation Prediction Method for Deep Foundati...
    Go to citation Crossref Google Scholar
  25. Prediction of collapsibility of loess site based on artificial intelli...
    Go to citation Crossref Google Scholar
  26. Land surface dynamics and meteorological forcings modulate land surfac...
    Go to citation Crossref Google Scholar
  27. Part Two: Neural Network Controller for Hydrogen-CNG Powered Vehicle
    Go to citation Crossref Google Scholar
  28. AMGC is a multiple-task graph neutral network for epigenetic target pr...
    Go to citation Crossref Google Scholar
  29. When Machine Learning Meets 2D Materials: A Review
    Go to citation Crossref Google Scholar
  30. Medición y comparación del rendimiento de cuatro algoritmos de aprendi...
    Go to citation Crossref Google Scholar
  31. A Data-Driven Review of the Financial Performance and Environmental Co...
    Go to citation Crossref Google Scholar
  32. Novel ensemble learning approach with SVM-imputed ADASYN features for ...
    Go to citation Crossref Google Scholar
  33. Forecast Load Demand in Thermal Power Plant with Machine Learning Algo...
    Go to citation Crossref Google Scholar
  34. Reference Evapotranspiration Estimation Using Genetic Algorithm-Optimi...
    Go to citation Crossref Google Scholar
  35. Artificial intelligence in salivary biomarker discovery and validation...
    Go to citation Crossref Google Scholar
  36. Unauthorized Microphone Access Restraint Based on User Behavior Percep...
    Go to citation Crossref Google Scholar
  37. Response of soil nutrients and erodibility to slope aspect in the nort...
    Go to citation Crossref Google Scholar
  38. Optimization of the Turning Process by Means of Machine Learning Using...
    Go to citation Crossref Google Scholar
  39. Beyond Accuracy: Building Trustworthy Extreme Events Predictions Throu...
    Go to citation Crossref Google Scholar
  40. Comparative Performance Evaluation of Random Forest, Extreme Gradient ...
    Go to citation Crossref Google Scholar
  41. Classification of Roasted Coffee Beans with Principal Component Analys...
    Go to citation Crossref Google Scholar
  42. Exploring the Ramifications of Unorganized Tourism Practices and Assoc...
    Go to citation Crossref Google Scholar
  43. Optimal feature extraction from multidimensional remote sensing data f...
    Go to citation Crossref Google Scholar
  44. Application of random forest algorithm in the detection of foreign obj...
    Go to citation Crossref Google Scholar
  45. A machine-learning-assisted study of propylene adsorption behaviors on...
    Go to citation Crossref Google Scholar
  46. A Review of Machine Learning Classification Based on Random Forest Alg...
    Go to citation Crossref Google Scholar
  47. Study on Contribution of Different Journal Evaluation Indicators to Im...
    Go to citation Crossref Google Scholar
  48. A Rapid and Simple Method for Prostate Cancer Detection Using an Elect...
    Go to citation Crossref Google Scholar
  49. Enhancing Employee Promotion Prediction with a Novel Hybrid Model Inte...
    Go to citation Crossref Google Scholar
  50. Application of Random Forest Algorithm in New Media Network Operation ...
    Go to citation Crossref Google Scholar
  51. An Online Shoppers Purchasing Intention Model Based on Ensemble Learni...
    Go to citation Crossref Google Scholar
  52. The State of AI-Empowered Backscatter Communications: A Comprehensive ...
    Go to citation Crossref Google Scholar
  53. Automated Classification of Animal Vocalization into Estrus and Non-Es...
    Go to citation Crossref Google Scholar
  54. Satisfacción del turista usando factores motivacionales: comparación d...
    Go to citation Crossref Google Scholar
  55. Ensemble-Based Infill Search Simulation Optimization Framework
    Go to citation Crossref Google Scholar
  56. Overview
    Go to citation Crossref Google Scholar
  57. Identifying metabolic shifts in Crohn's disease using 'omics-driven co...
    Go to citation Crossref Google Scholar
  58. Beyond the state of the art of reverse vaccinology: predicting vaccine...
    Go to citation Crossref Google Scholar
  59. Risk and protective factors associated with mental health status in an...
    Go to citation Crossref Google Scholar
  60. Machine learning for predicting diabetes risk in western China adults
    Go to citation Crossref Google Scholar
  61. Data-driven chimney fire risk prediction using machine learning and po...
    Go to citation Crossref Google Scholar
  62. Identifying the top determinants of psychological resilience among com...
    Go to citation Crossref Google Scholar
  63. Smart algorithms for power prediction in smart EV charging stations
    Go to citation Crossref Google Scholar
  64. pystacked: Stacking generalization and machine learning in Stata
    Go to citation Crossref Google Scholar
  65. Development of a Machine Learning Framework Based on Occupant-Related ...
    Go to citation Crossref Google Scholar
  66. Tree-level landscape transitions and changes in carbon storage through...
    Go to citation Crossref Google Scholar
  67. Stability of clinical prediction models developed using statistical or...
    Go to citation Crossref Google Scholar
  68. Predictive analysis of Low power DC loads in Residential Buildings
    Go to citation Crossref Google Scholar
  69. Deep learning-based position detection for hydraulic cylinders using s...
    Go to citation Crossref Google Scholar
  70. Computation of flow rates in rarefied gas flow through circular tubes ...
    Go to citation Crossref Google Scholar
  71. Analysis and prediction of injury severity in single micromobility cra...
    Go to citation Crossref Google Scholar
  72. Cancer Classification in Imbalance Data Using Double Filtering Approac...
    Go to citation Crossref Google Scholar
  73. Advancing Sustainability: Machine Learning Projections of Palm Oil Pro...
    Go to citation Crossref Google Scholar
  74. Modeling Invasive Prosopis juliflora Distribution Using the Newly Laun...
    Go to citation Crossref Google Scholar
  75. Identifying Possible Biomarkers for Early-Stage Hepatocellular Carcino...
    Go to citation Crossref Google Scholar
  76. Rule-based prediction of diabetes mellitus using a classification base...
    Go to citation Crossref Google Scholar
  77. The Understanding of Customer Satisfaction on A Fintech Application Us...
    Go to citation Crossref Google Scholar
  78. Research on Predicting Wordle Word Attempt Counts and Word Difficulty ...
    Go to citation Crossref Google Scholar
  79. Web Service Classification From Network Flow Using Undersampling Techn...
    Go to citation Crossref Google Scholar
  80. On Factors Selection in CatBoost Models Construction
    Go to citation Crossref Google Scholar
  81. Functional outcomes after transanal total mesorectal excision (TaTME):...
    Go to citation Crossref Google Scholar
  82. Intelligent Personalized Lighting Control System for Residents
    Go to citation Crossref Google Scholar
  83. Classification models for predicting the bioactivity of pan-TRK inhibi...
    Go to citation Crossref Google Scholar
  84. Machine Learning and Radiomics of Bone Scintigraphy: Their Role in Pre...
    Go to citation Crossref Google Scholar
  85. Bridging the Gap: Comprehensive Boreal Forest Complexity Mapping throu...
    Go to citation Crossref Google Scholar
  86. Machine learning prediction of tree species diversity using forest str...
    Go to citation Crossref Google Scholar
  87. Applying machine learning to model radon using topsoil geochemistry
    Go to citation Crossref Google Scholar
  88. Application of hybrid intelligent algorithm for multi-objective optimi...
    Go to citation Crossref Google Scholar
  89. Development and validation of a predictive model of the hospital cost ...
    Go to citation Crossref Google Scholar
  90. Performance of artificial intelligence-based algorithms to predict pro...
    Go to citation Crossref Google Scholar
  91. Wordle Distribution Prediction Model Based on Random Forest
    Go to citation Crossref Google Scholar
  92. Food security dynamics in the United States, 2001–2017
    Go to citation Crossref Google Scholar
  93. A Knowledge-Based System to Predict Crime from Criminal Records in the...
    Go to citation Crossref Google Scholar
  94. Hybrid intrusion detection system based on Random forest, decision tre...
    Go to citation Crossref Google Scholar
  95. Anemia detection through non-invasive analysis of lip mucosa images
    Go to citation Crossref Google Scholar
  96. A Novel Spectrum Handoff Technique for Long Range Applications using A...
    Go to citation Crossref Google Scholar
  97. Analysis of Eye Surgery Outcomes to Predict Patients’ Quality of Life ...
    Go to citation Crossref Google Scholar
  98. Temporal Analysis of World Disaster Risk: A Machine Learning Approach ...
    Go to citation Crossref Google Scholar
  99. A Novel Mobile Malware Detection Model Based on Ensemble Learning
    Go to citation Crossref Google Scholar
  100. Optimization algorithms for light pollution management based on TOPSIS...
    Go to citation Crossref Google Scholar
  101. Unraveling Genetic Complexity: Fine-tuning Machine Learning Models for...
    Go to citation Crossref Google Scholar
  102. Efficient modeling of double absorber layered structure in perovskite ...
    Go to citation Crossref Google Scholar
  103. Towards a Machine Learning Model for Detection of Dementia Using Lifes...
    Go to citation Crossref Google Scholar
  104. Factors Influencing the Pedestrian Injury Severity of Micromobility Cr...
    Go to citation Crossref Google Scholar
  105. An image is worth 10,000 points: Neural network architectures and alte...
    Go to citation Crossref Google Scholar
  106. A Study on the Performance Evaluation of the Convolutional Neural Netw...
    Go to citation Crossref Google Scholar
  107. Left-Side Ventricular Tachycardia Localization Made Simpler by Automat...
    Go to citation Crossref Google Scholar
  108. Overconfidence, Trust, and Information-Seeking among Smallholder Farme...
    Go to citation Crossref Google Scholar
  109. Beyond protected areas: The importance of mixed‐use landscapes for the...
    Go to citation Crossref Google Scholar
  110. An innovative tool for automating classification of stellar variabilit...
    Go to citation Crossref Google Scholar
  111. Revolutionizing biochar synthesis for enhanced heavy metal adsorption:...
    Go to citation Crossref Google Scholar
  112. A machine-learning approach to a mobility policy proposal
    Go to citation Crossref Google Scholar
  113. Identifying key landscape pattern indices influencing the NPP: A case ...
    Go to citation Crossref Google Scholar
  114. LSTM Neural Network-Based Credit Prediction Method for Food Companies
    Go to citation Crossref Google Scholar
  115. Computation of VHF Signal Strength for Point to Area Network using Mac...
    Go to citation Crossref Google Scholar
  116. Empirical Study on the Loss Functions of Contrastive Learning-based Mu...
    Go to citation Crossref Google Scholar
  117. Prediction and Characteristic Analysis of Enterprise Digital Transform...
    Go to citation Crossref Google Scholar
  118. Use of Machine Learning for Realtime Water Quality Prediction
    Go to citation Crossref Google Scholar
  119. Prediction of Pavement Overall Condition Index Based on Wrapper Featur...
    Go to citation Crossref Google Scholar
  120. Prediction Model of Book Popularity from Goodreads “To Read” and “Wors...
    Go to citation Crossref Google Scholar
  121. A Multi-Site Study of Firearms Displays by Police at Use of Force Inci...
    Go to citation Crossref Google Scholar
  122. A novel approach using multispectral imaging for rapid development of ...
    Go to citation Crossref Google Scholar
  123. The Application of Wearable Sensors and Machine Learning Algorithms in...
    Go to citation Crossref Google Scholar
  124. Nondestructive and rapid detection of foreign materials in wolfberry b...
    Go to citation Crossref Google Scholar
  125. Hybrid Modeling for Stream Flow Estimation: Integrating Machine Learni...
    Go to citation Crossref Google Scholar
  126. Machine Learning Based Linking of Patient Reported Outcome Measures to...
    Go to citation Crossref Google Scholar
  127. A Method for Determining the Nitrogen Content of Wheat Leaves Using Mu...
    Go to citation Crossref Google Scholar
  128. Characterising surface roughness of Ti-6Al-4V alloy machined using coa...
    Go to citation Crossref Google Scholar
  129. Holdout Stacking Regression Model for Estimating the Population of Ric...
    Go to citation Crossref Google Scholar
  130. RETRACTED: Evaluating the impact of climate change and geo‐environment...
    Go to citation Crossref Google Scholar
  131. Big Data for Financial Analysis: Inflation Rate Forecasting using Rand...
    Go to citation Crossref Google Scholar
  132. EFM: A Negative Network Public Opinion Early Warning Model Based on Ev...
    Go to citation Crossref Google Scholar
  133. Comparative Analysis of Machine Learning Models for Lung Cancer Predic...
    Go to citation Crossref Google Scholar
  134. Application and Research of Prediction Model Based on Machine Learning...
    Go to citation Crossref Google Scholar
  135. Toward data-driven research: preliminary study to predict surface roug...
    Go to citation Crossref Google Scholar
  136. A Crop Recommendation System Based on Nutrients and Environmental Fact...
    Go to citation Crossref Google Scholar
  137. Knowledge Gaps in Generating Cell-Based Drug Delivery Systems and a Po...
    Go to citation Crossref Google Scholar
  138. Machine learning-driven exploration of drug therapies for triple-negat...
    Go to citation Crossref Google Scholar
  139. Characterizing Female Firearm Suicide Circumstances: A Natural Languag...
    Go to citation Crossref Google Scholar
  140. Effects of global energy and price fluctuations on Turkey's inflation:...
    Go to citation Crossref Google Scholar
  141. Nondestructive Evaluation of Thermal Barrier Coatings’ Porosity Based ...
    Go to citation Crossref Google Scholar
  142. Land-Use Mapping with Multi-Temporal Sentinel Images Based on Google E...
    Go to citation Crossref Google Scholar
  143. Machine Learning Approaches for Forecasting the Best Microbial Strains...
    Go to citation Crossref Google Scholar
  144. High-resolution forest age mapping based on forest height maps derived...
    Go to citation Crossref Google Scholar
  145. Analysing effectiveness of grey theory-based feature selection for met...
    Go to citation Crossref Google Scholar
  146. Prediction of breast cancer based on computer vision and artificial in...
    Go to citation Crossref Google Scholar
  147. Emerging pathways to sustainable economic development: An interdiscipl...
    Go to citation Crossref Google Scholar
  148. A Text Classification Method of Network Public Opinion Based on Inform...
    Go to citation Crossref Google Scholar
  149. Exploration and Exploitation Approaches Based on Generative Machine Le...
    Go to citation Crossref Google Scholar
  150. Data aggregation, ML ready datasets, and an API: leveraging diverse da...
    Go to citation Crossref Google Scholar
  151. Frequency Based Gait Gender Identification
    Go to citation Crossref Google Scholar
  152. Data Science, Machine learning and big data in Digital Journalism: A s...
    Go to citation Crossref Google Scholar
  153. Real-time detection of street tree crowns using mobile laser scanning ...
    Go to citation Crossref Google Scholar
  154. Operational reliability evaluation and analysis framework of civil air...
    Go to citation Crossref Google Scholar
  155. Small Data Can Play a Big Role in Chemical Discovery
    Go to citation Crossref Google Scholar
  156. Small Data Can Play a Big Role in Chemical Discovery
    Go to citation Crossref Google Scholar
  157. Machine learning for prediction of the uniaxial compressive strength w...
    Go to citation Crossref Google Scholar
  158. An Optimized Fed-Batch Culture Strategy Based on Multidimensional Time...
    Go to citation Crossref Google Scholar
  159. Explainable Artificial Intelligence (XAI) and Supervised Machine Learn...
    Go to citation Crossref Google Scholar
  160. Machine-Learning-Based Hybrid Modeling for Geological Hazard Susceptib...
    Go to citation Crossref Google Scholar
  161. A Machine Learning Algorithm Predicting Infant Psychomotor Development...
    Go to citation Crossref Google Scholar
  162. Predicting Dry Pea Maturity Using Machine Learning and Advanced Sensor...
    Go to citation Crossref Google Scholar
  163. Hybrid BO-XGBoost and BO-RF Models for the Strength Prediction of Self...
    Go to citation Crossref Google Scholar
  164. Classification models and SAR analysis on HDAC1 inhibitors using machi...
    Go to citation Crossref Google Scholar
  165. COMPARISON OF MACHINE LEARNING MODELS FOR AUTOMATED AUTISM DIAGNOSIS
    Go to citation Crossref Google Scholar
  166. The Gender Wealth Gap in Europe: Application of Machine Learning to Pr...
    Go to citation Crossref Google Scholar
  167. Supporting students’ generation of feedback in large-scale online cour...
    Go to citation Crossref Google Scholar
  168. An Intelligent Context-Aware Threat Detection and Response Model for S...
    Go to citation Crossref Google Scholar
  169. An integrated machine learning and quantitative optimization method fo...
    Go to citation Crossref Google Scholar
  170. Dependency analysis of various factors and ML models related to Fertil...
    Go to citation Crossref Google Scholar
  171. Predicting Mortality in Hospitalized COVID-19 Patients in Zambia: An A...
    Go to citation Crossref Google Scholar
  172. Prediction of departure flight delays through the use of predictive to...
    Go to citation Crossref Google Scholar
  173. A Multi-Domain Feature Fusion Method for Wind Turbine Bearing Fault Di...
    Go to citation Crossref Google Scholar
  174. Using feature selection and Bayesian network identify cancer subtypes ...
    Go to citation Crossref Google Scholar
  175. Wear Resistance Prediction of AlCoCrFeNi-X (Ti, Cu) High-Entropy Alloy...
    Go to citation Crossref Google Scholar
  176. Predicting Multidimensional Poverty with Machine Learning Algorithms: ...
    Go to citation Crossref Google Scholar
  177. Predicting the COVID‐19 mortality among Iranian patients using tree‐ba...
    Go to citation Crossref Google Scholar
  178. Management options influence seasonal CO2 soil emissions in Mediterran...
    Go to citation Crossref Google Scholar
  179. Predicting Preeclampsia Using Principal Component Analysis and Decisio...
    Go to citation Crossref Google Scholar
  180. CLES-BERT: Contrastive Learning-based BERT Model for Automated Essay S...
    Go to citation Crossref Google Scholar
  181. Cognitive Lightweight Logistic Regression-Based IDS for IoT-Enabled FA...
    Go to citation Crossref Google Scholar
  182. Liquid Biopsy-Based Volatile Organic Compounds from Blood and Urine an...
    Go to citation Crossref Google Scholar
  183. ACME: automated classification model for E-learning feedback
    Go to citation Crossref Google Scholar
  184. Feature Extraction and Diagnosis of Dementia using Magnetic Resonance ...
    Go to citation Crossref Google Scholar
  185. Experimental and artificial intelligence approaches to measuring the w...
    Go to citation Crossref Google Scholar
  186. Harnessing Deep Learning for Omics in an Era of COVID-19
    Go to citation Crossref Google Scholar
  187. CO 2 Emission Forecasting for Living Stand...
    Go to citation Crossref Google Scholar
  188. Glucose diagnosis system combining machine learning and NIR photoacous...
    Go to citation Crossref Google Scholar
  189. Anomaly Detection Method for Unknown Protocols in a Power Plant ICS Ne...
    Go to citation Crossref Google Scholar
  190. A Machine Learning Approach for Predicting Capsular Contracture after ...
    Go to citation Crossref Google Scholar
  191. Lung Cancer Risk Analysis and Prediction Using Machine Learning Techni...
    Go to citation Crossref Google Scholar
  192. A novel framework for ultra-short-term interval wind power prediction ...
    Go to citation Crossref Google Scholar
  193. A New Power System Data Security Monitoring System Based on Ensemble L...
    Go to citation Crossref Google Scholar
  194. Intelligent Systems for Muscle Tracking: A Review on Sensor‐Algorithm ...
    Go to citation Crossref Google Scholar
  195. Ensemble Learning for the Survivability Prediction of Breast Cancer Pa...
    Go to citation Crossref Google Scholar
  196. Are sticky users less likely to lurk? Evidence from online reviews
    Go to citation Crossref Google Scholar
  197. Can molecular dynamics simulations improve predictions of protein-liga...
    Go to citation Crossref Google Scholar
  198. SIMPA -design: A System of Indicators for Monitoring the Learning Proc...
    Go to citation Crossref Google Scholar
  199. Economic and Social Outsiders but Political Insiders: Sweden’s Populis...
    Go to citation Crossref Google Scholar
  200. Discovery of novel acetylcholinesterase inhibitors through integration...
    Go to citation Crossref Google Scholar
  201. Prediction of Cardiovascular Diseases Using Machine Learning Algorithm...
    Go to citation Crossref Google Scholar
  202. Big Data and Related Model Algorithms in Commercial Bank Credit Evalua...
    Go to citation Crossref Google Scholar
  203. Credit Card Fraud Detection Using Predictive Model
    Go to citation Crossref Google Scholar
  204. Short-term electricity price forecasting based on similarity day scree...
    Go to citation Crossref Google Scholar
  205. Machine learning models of 6-lead ECGs for the interpretation of left ...
    Go to citation Crossref Google Scholar
  206. Predicting firm performance and size using machine learning with a Bay...
    Go to citation Crossref Google Scholar
  207. Citizen Participation and Political Trust in Latin America and the Car...
    Go to citation Crossref Google Scholar
  208. Genetics Information with Functional Brain Networks for Dementia Class...
    Go to citation Crossref Google Scholar
  209. Utilising Deep Learning as a Law Enforcement Ally
    Go to citation Crossref Google Scholar
  210. Filter Validation for Detecting Outliers of Photoplethysmograph Data
    Go to citation Crossref Google Scholar
  211. The Application of Smart and Precision Agriculture (SPA) for Measuring...
    Go to citation Crossref Google Scholar
  212. Estimating House Prices in Emerging Markets and Developing Economies: ...
    Go to citation Crossref Google Scholar
  213. Demand for grid-supplied electricity in the presence of distributed so...
    Go to citation Crossref Google Scholar
  214. Probabilistic bearing capacities of strip foundation on two-layered cl...
    Go to citation Crossref Google Scholar
  215. Prediction of Traffic Accident Severity Based on Random Forest
    Go to citation Crossref Google Scholar
  216. Predictive Maintenance and Fault Monitoring Enabled by Machine Learnin...
    Go to citation Crossref Google Scholar
  217. Open Dots: Securely Connecting Like-Minded People Using Machine Learni...
    Go to citation Crossref Google Scholar
  218. Machine Learning based Intelligent Career Counselling Chatbot (ICCC)
    Go to citation Crossref Google Scholar
  219. Improving Prediction for taxi demand by using Machine Learning
    Go to citation Crossref Google Scholar
  220. Probability of a Device Failure using Support Vector Machine by compar...
    Go to citation Crossref Google Scholar
  221. A machine-learning approach to estimating public intentions to become ...
    Go to citation Crossref Google Scholar
  222. Early prediction of 30- and 14-day all-cause unplanned readmissions
    Go to citation Crossref Google ScholarPub Med
  223. Speculative Computation: Application Scenarios
    Go to citation Crossref Google Scholar
  224. Performance Comparison of Feature Selection Methods for Prediction in ...
    Go to citation Crossref Google Scholar
  225. Role of AI in ADME/Tox toward formulation optimization and delivery
    Go to citation Crossref Google Scholar
  226. Estimation and prediction of the air–water interfacial tension in conv...
    Go to citation Crossref Google Scholar
  227. A Machine Learning-Based Surrogate Finite Element Model for Estimating...
    Go to citation Crossref Google Scholar
  228. Quantification and Evaluation of Water Requirements of Oil Palm Cultiv...
    Go to citation Crossref Google Scholar
  229. Recognition of Corrosion State of Water Pipe Inner Wall Based on SMA-S...
    Go to citation Crossref Google Scholar
  230. Comparison of Algorithms for the AI-Based Fault Diagnostic of Cable Jo...
    Go to citation Crossref Google Scholar
  231. Energy Potentials of Agricultural Biomass and the Possibility of Model...
    Go to citation Crossref Google Scholar
  232. The Use of Random Forest Regression for Estimating Leaf Nitrogen Conte...
    Go to citation Crossref Google Scholar
  233. Building the Classifiers
    Go to citation Crossref Google Scholar
  234. The Hybrid Cluster-And-Classify Approach
    Go to citation Crossref Google Scholar
  235. Soil textures and nutrients estimation using remote sensing data in no...
    Go to citation Crossref Google Scholar
  236. Random Forests
    Go to citation Crossref Google Scholar
  237. Support Vector Machines
    Go to citation Crossref Google Scholar
  238. Accuracy and Performance of Machine Learning Methodologies: Novel Asse...
    Go to citation Crossref Google Scholar
  239. The Impact of Data Normalization on the Accuracy of Machine Learning A...
    Go to citation Crossref Google Scholar
  240. Gender Identity and Access to Higher Education
    Go to citation Crossref Google Scholar
  241. Model-Centric AI
    Go to citation Crossref Google Scholar
  242. Using machine learning algorithms to predict groundwater levels in Ind...
    Go to citation Crossref Google Scholar
  243. Direction Detection of Select Stocks with Machine Learning
    Go to citation Crossref Google Scholar
  244. From the Death of God to the Rise of Hitler
    Go to citation Crossref Google Scholar
  245. From the Death of God to the Rise of Hitler
    Go to citation Crossref Google Scholar
  246. The Statistics of Machine Learning
    Go to citation Crossref Google Scholar
  247. Tree Modeling
    Go to citation Crossref Google Scholar
  248. Interstage Single Ventricle Heart Disease Infants Show Dysregulation i...
    Go to citation Crossref Google Scholar
  249. Understanding Inequality in US Farm Subsidy Using a Large-scale Admini...
    Go to citation Crossref Google Scholar
  250. Fingertip Detection Algorithm Based on Maximum Discrimination HOG Feat...
    Go to citation Crossref Google Scholar
  251. 5G Charging Mechanism Based on Dynamic Step Size
    Go to citation Crossref Google Scholar
  252. Machine-Learned Fermi Level Prediction of Solution-Processed Ultrawide...
    Go to citation Crossref Google Scholar
  253. Finding the Best Techniques for Predicting Term Deposit Subscriptions ...
    Go to citation Crossref Google Scholar
  254. Law Enforcement Companion
    Go to citation Crossref Google Scholar
  255. Digital soil mapping and modeling in Loess‐derived soils of Iranian Lo...
    Go to citation Crossref Google Scholar
  256. Breast Cancer Segmentation by K-Means and Classification by Machine Le...
    Go to citation Crossref Google Scholar
  257. Machine learning using Stata/Python
    Go to citation Crossref Google Scholar
  258. Prediction of wild pistachio ecological niche using machine learning m...
    Go to citation Crossref Google Scholar
  259. Designing a sustainable bioethanol supply chain network: A combination...
    Go to citation Crossref Google Scholar
  260. Intra-regional classification of Codonopsis Radix produced in Gansu pr...
    Go to citation Crossref Google Scholar
  261. Machine learning techniques for identification of carcinogenic mutatio...
    Go to citation Crossref Google Scholar
  262. Ranking the environmental factors of indoor air quality of metropolita...
    Go to citation Crossref Google Scholar
  263. Identifying Risk Factors for Premature Birth in the UK Millennium Coho...
    Go to citation Crossref Google Scholar
  264. A Proactive Attack Detection for Heating, Ventilation, and Air Conditi...
    Go to citation Crossref Google Scholar
  265. Reconstruction of Urban Rainfall Measurements to Estimate the Spatiote...
    Go to citation Crossref Google Scholar
  266. Designing a supervised feature selection technique for mixed attribute...
    Go to citation Crossref Google Scholar
  267. An offline learning co-evolutionary algorithm with problem-specific kn...
    Go to citation Crossref Google Scholar
  268. Predicting Patient Hospital Charges Using Machine Learning
    Go to citation Crossref Google Scholar
  269. Machine learning modelling of a membrane capacitive deionization (MCDI...
    Go to citation Crossref Google Scholar
  270. Automated mobile virtual reality cognitive behavior therapy for avioph...
    Go to citation Crossref Google Scholar
  271. Simulation, modelling and classification of wiki contributors: Spottin...
    Go to citation Crossref Google Scholar
  272. An Interpretable Machine Learning Approach for Hepatitis B Diagnosis
    Go to citation Crossref Google Scholar
  273. Quantification of Above-Ground Biomass over the Cross-River State, Nig...
    Go to citation Crossref Google Scholar
  274. Acoustic detectability of whales amidst underwater noise off the west ...
    Go to citation Crossref Google Scholar
  275. Analysis of prerequisite relation in knowledge graph using ElasticNet(...
    Go to citation Crossref Google Scholar
  276. Texture Features in Prediction of Bread Edibility
    Go to citation Crossref Google Scholar
  277. Integrative quantitative and qualitative analysis for the quality eval...
    Go to citation Crossref Google Scholar
  278. Comparative Analysis of Machine Learning Algorithms for Disease Detect...
    Go to citation Crossref Google Scholar
  279. The determinants of investment fraud: A machine learning and artificia...
    Go to citation Crossref Google Scholar
  280. REMOVED: Machine learning in health condition check-up: An approach us...
    Go to citation Crossref Google Scholar
  281. Classification of Valvular Regurgitation Using Echocardiography
    Go to citation Crossref Google Scholar
  282. Recognition Method for Broiler Sound Signals Based on Multi-Domain Sou...
    Go to citation Crossref Google Scholar
  283. How do machines predict energy use? Comparing machine learning approac...
    Go to citation Crossref Google Scholar
  284. The practicality of Malaysia dengue outbreak forecasting model as an e...
    Go to citation Crossref Google Scholar
  285. Data-Driven Random Forest Models for Detecting Volcanic Hot Spots in S...
    Go to citation Crossref Google Scholar
  286. Machine Learning Model Based on Radiomic Features for Differentiation ...
    Go to citation Crossref Google Scholar
  287. Associative Prediction of Carotid Artery Plaques Based on Ultrasound S...
    Go to citation Crossref Google Scholar
  288. Regression and Machine Learning Methods to Predict Discrete Outcomes i...
    Go to citation Crossref Google Scholar
  289. Machine-Learning for Prescription Patterns: Random Forest in the Predi...
    Go to citation Crossref Google Scholar
  290. Imputing missing values for Dataset of Used Cars
    Go to citation Crossref Google Scholar
  291. Sales Prediction Based on Machine Learning Scenarios
    Go to citation Crossref Google Scholar
  292. DDSS: denge decision support system to recommend the athlete-specific ...
    Go to citation Crossref Google Scholar
  293. Machine learning models in the prediction of 1-year mortality in patie...
    Go to citation Crossref Google Scholar
  294. Improving Random Forest Algorithm for University Academic Affairs Mana...
    Go to citation Crossref Google Scholar
  295. Smart meter data classification using optimized random forest algorith...
    Go to citation Crossref Google Scholar
  296. Machine Learning Improves Prediction Over Logistic Regression on Resec...
    Go to citation Crossref Google Scholar
  297. Landslide susceptibility mapping in three Upazilas of Rangamati hill d...
    Go to citation Crossref Google Scholar
  298. Image Classification and Recognition Based on Deep Learning and Random...
    Go to citation Crossref Google Scholar
  299. Automation in competitive removal of toxic metal ions by fired and non...
    Go to citation Crossref Google Scholar
  300. Statistical Methods for the Analysis of Food Composition Databases: A ...
    Go to citation Crossref Google Scholar
  301. Baseline Elevations of Leukotriene Metabolites and Altered Plasmalogen...
    Go to citation Crossref Google Scholar
  302. Evaluating Machine Learning Algorithms to Detect Employees' Attrition
    Go to citation Crossref Google Scholar
  303. Analysis of air ticket characteristics based on random forest classifi...
    Go to citation Crossref Google Scholar
  304. Thwarting Unauthorized Voice Eavesdropping via Touch Sensing in Mobile...
    Go to citation Crossref Google Scholar
  305. A comparison of performance of SWAT and machine learning models for pr...
    Go to citation Crossref Google Scholar
  306. Well-Logging-Based Lithology Classification Using Machine Learning Met...
    Go to citation Crossref Google Scholar
  307. Development of a Diabetes Diagnosis System Using Machine Learning Algo...
    Go to citation Crossref Google Scholar
  308. Accuracy Assessment of Machine Learning Algorithm(s) in Thyroid Dysfun...
    Go to citation Crossref Google Scholar
  309. Factors Controlling the Distribution of Intermediate Host Snails of Sc...
    Go to citation Crossref Google Scholar
  310. Using Machine Learning Approach to Evaluate the Excessive Financializa...
    Go to citation Crossref Google Scholar
  311. Prediction of Rainfall in Australia Using Machine Learning
    Go to citation Crossref Google Scholar
  312. Environmental Compliance and Financial Performance of Shariah-Complian...
    Go to citation Crossref Google Scholar
  313. A Comprehensive Review of Computation-Based Metal-Binding Prediction A...
    Go to citation Crossref Google Scholar
  314. Back-Analysis of Parameters of Jointed Surrounding Rock of Metro Stati...
    Go to citation Crossref Google Scholar
  315. Ensemble Learning for 5G Flying Base Station Path Loss Modelling
    Go to citation Crossref Google Scholar
  316. Quality of Life Predictors in Patients With Melanoma: A Machine Learni...
    Go to citation Crossref Google Scholar
  317. An Experimental Comparison of Classification Algorithms for Premium Be...
    Go to citation Crossref Google Scholar
  318. A Joint Model of Random Forest and Artificial Neural Network for the D...
    Go to citation Crossref Google Scholar
  319. Satellite Maneuver Detection Using Machine Learning and Neural Network...
    Go to citation Crossref Google Scholar
  320. Financial support for unmet need for personal assistance with daily ac...
    Go to citation Crossref Google Scholar
  321. Probabilistic Analysis of Solar Cell Performance Using Gaussian Proces...
    Go to citation Crossref Google Scholar
  322. IoT and Machine Learning-Based Hypoglycemia Detection System
    Go to citation Crossref Google Scholar
  323. Survey Paper: Comparative Study of Machine Learning Techniques and its...
    Go to citation Crossref Google Scholar
  324. Prediction of subsurface oceanographic parameter using machine learnin...
    Go to citation Crossref Google Scholar
  325. A New Approach to Choke Flow Models Using Machine Learning Algorithms
    Go to citation Crossref Google Scholar
  326. Automated Detection of Rehabilitation Exercise by Stroke Patients Usin...
    Go to citation Crossref Google Scholar
  327. Analysis on Dynamic Evolution of the Cost Risk of Prefabricated Buildi...
    Go to citation Crossref Google Scholar
  328. Day-Level Forecasting for Coronavirus Disease (COVID-19)
    Go to citation Crossref Google Scholar
  329. BACS: blockchain and AutoML-based technology for efficient credit scor...
    Go to citation Crossref Google Scholar
  330. A Machine Learning Approach to Predicting Higher COVID-19 Care Burden ...
    Go to citation Crossref Google ScholarPub Med
  331. From high school to postsecondary education, training, and employment:...
    Go to citation Crossref Google Scholar
  332. Speculative Computation: Application Scenarios
    Go to citation Crossref Google Scholar
  333. Situational Awareness for Law Enforcement and Public Safety Agencies O...
    Go to citation Crossref Google Scholar
  334. Next-Generation Personalized Investment Recommendations
    Go to citation Crossref Google Scholar
  335. Performance Evaluation Using Machine Learning: Detecting Non-technical...
    Go to citation Crossref Google Scholar
  336. An Analysis of Different Machine Learning Algorithms for Image Classif...
    Go to citation Crossref Google Scholar
  337. CatBoost Encoded Tree-Based Model for the Identification of Microbes a...
    Go to citation Crossref Google Scholar
  338. Machine Learning and Biomedical Sub-Terahertz/Terahertz Technology
    Go to citation Crossref Google Scholar
  339. Farming Assistance for Soil Fertility Improvement and Crop Prediction ...
    Go to citation Crossref Google Scholar
  340. Using Machine Learning Method to Design Integrated Sustainable Bioetha...
    Go to citation Crossref Google Scholar
  341. An Integrated Model to Evaluate the Transparency in Predicting Chronic...
    Go to citation Crossref Google Scholar
  342. Analysis and prediction of second-hand house price based on random for...
    Go to citation Crossref Google Scholar
  343. An Interpretation of Long Short-Term Memory Recurrent Neural Network f...
    Go to citation Crossref Google Scholar
  344. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and...
    Go to citation Crossref Google Scholar
  345. Enhanced Sea Surface Salinity Estimates Using Machine-Learning Algorit...
    Go to citation Crossref Google Scholar
  346. Textural Analysis for Medicinal Plants Identification Using Log Gabor ...
    Go to citation Crossref Google Scholar
  347. Tracking Cloud Forests With Cloud Technology and Random Forests
    Go to citation Crossref Google Scholar
  348. Using Machine-Learning for Prediction of the Response to Cardiac Resyn...
    Go to citation Crossref Google Scholar
  349. Recent innovation in benchmark rates (BMR): evidence from influential ...
    Go to citation Crossref Google Scholar
  350. Logistic Regression with Wave Preprocessing to Solve Inverse Problem i...
    Go to citation Crossref Google Scholar
  351. What is the elasticity of sharing a ridesourcing trip?
    Go to citation Crossref Google Scholar
  352. Evaluation of Classification Algorithms for Software Defect Prediction
    Go to citation Crossref Google Scholar
  353. Analysis of Office Rooms Energy Consumption Data in Respect to Meteoro...
    Go to citation Crossref Google Scholar
  354. Comparative Study of J48 Decision Tree Classification Algorithm, Rando...
    Go to citation Crossref Google Scholar
  355. Correction of the travel time estimation for ambulances of the red cro...
    Go to citation Crossref Google Scholar
  356. Classification of Beneficial and non-Beneficial Bacteria using Random ...
    Go to citation Crossref Google Scholar
  357. Machine learning approach for prediction of status of rechargeable bat...
    Go to citation Crossref Google Scholar
  358. A Comparative Study of Machine Learning Classifiers for Electric Load ...
    Go to citation Crossref Google Scholar
  359. Machine Learning Models for Sarcopenia Identification Based on Radiomi...
    Go to citation Crossref Google Scholar
  360. Hierarchical optimization of photovoltaic device performance using mac...
    Go to citation Crossref Google Scholar
  361. Rotation forest based on multimodal genetic algorithm
    Go to citation Crossref Google Scholar
  362. Implementation of ML Rough Set in Determining Cases of Timely Graduati...
    Go to citation Crossref Google Scholar
  363. Utilization of Rough Sets Method with Optimization Genetic Algorithms ...
    Go to citation Crossref Google Scholar
  364. Electrocardiogram machine learning for detection of cardiovascular dis...
    Go to citation Crossref Google Scholar
  365. Abnormal Respiratory Sound Classification Using Hierarchical Attention...
    Go to citation Crossref Google Scholar
  366. Prediction Models for Obstructive Sleep Apnea in Korean Adults Using M...
    Go to citation Crossref Google Scholar
  367. Applications of machine learning models in the prediction of gastric c...
    Go to citation Crossref Google Scholar
  368. Development and application of random forest technique for element lev...
    Go to citation Crossref Google Scholar
  369. A Review of Recent Machine Learning Advances for Forecasting Harmful A...
    Go to citation Crossref Google Scholar
  370. A Modified Random Forest Based on Kappa Measure and Binary Artificial ...
    Go to citation Crossref Google Scholar
  371. Regression and Machine Learning Methods to Predict Discrete Outcomes i...
    Go to citation Crossref Google Scholar
  372. Food Security Dynamics in the United States, 2001-2017
    Go to citation Crossref Google Scholar
  373. K-Nearest Neighbors Algorithm (KNN)
    Go to citation Crossref Google Scholar
  374. Chest CT in patients with a moderate or high pretest probability of CO...
    Go to citation Crossref Google Scholar
  375. Prediction of the xanthine oxidase inhibitory activity of celery seed ...
    Go to citation Crossref Google Scholar
  376. Role for machine learning in sex-specific prediction of successful ele...
    Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

View options

PDF/ePub

View PDF/ePub

Get access

Access options

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:


Alternatively, view purchase options below:

Purchase 24 hour online access to view and download content.

Access journal content via a DeepDyve subscription or find out more about this option.