Computational Improvement of Basketball Games Outcomes
Oral Presentation
Paper ID : 1081-SSRC
Authors
University of Kurdistan
Abstract
Background:
Predicting the outcomes of sports competitions has long been an important analytical task. Past literature has explored using in-game statistics, betting market data, and complex machine learning algorithms to forecast wins and losses in sports. Basketball poses unique predictive challenges with its continuous and dynamic gameplay. While simple statistical models can achieve reasonable accuracy, integrating graph theory and advanced feature engineering provides an opportunity to further improve predictive performance.
Purpose:
This paper aims to accurately predict wins and losses of basketball teams using a machine learning approach.
Methodology:
A dataset of 9,840 basketball game records was utilized . The NearMiss algorithm randomly selected 4,920 balanced records to analyze. Null values were removed during data pre-processing. To determine feature relationships, the Spearman method calculated correlations between variables. A threshold of 0.8 converted these correlations into a graph representation. New features were extracted from this graph including degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, density, PageRank, harmonic centrality, clustering coefficient, and load centrality. These novel graph-derived variables were merged with the basketball dataset's existing attributes. Several machine learning models were trained on this enhanced dataset and evaluated using stringent 10-fold cross-validation.
Results:
Game outcome prediction was effectively treated as a binary classification problem. Combining graph-based extracted features with game statistics enables highly accurate predictions. Among all tested models, logistic regression achieved the highest prediction accuracy, reaching 99.6% accuracy for determining game wins and losses. This model and insight can enable educators to make data-driven decisions.
Conclusion:
This research clearly showed that machine learning models, enhanced with graph mining-based feature engineering, can predict the results of basketball games with very high accuracy. Graph analysis significantly enriched the feature space. This made it possible to capture complex relational patterns in basketball data beyond simple linear correlations. Consequently, logistic regression used these richer features to effectively determine the discriminating factors between wins and losses.
Predicting the outcomes of sports competitions has long been an important analytical task. Past literature has explored using in-game statistics, betting market data, and complex machine learning algorithms to forecast wins and losses in sports. Basketball poses unique predictive challenges with its continuous and dynamic gameplay. While simple statistical models can achieve reasonable accuracy, integrating graph theory and advanced feature engineering provides an opportunity to further improve predictive performance.
Purpose:
This paper aims to accurately predict wins and losses of basketball teams using a machine learning approach.
Methodology:
A dataset of 9,840 basketball game records was utilized . The NearMiss algorithm randomly selected 4,920 balanced records to analyze. Null values were removed during data pre-processing. To determine feature relationships, the Spearman method calculated correlations between variables. A threshold of 0.8 converted these correlations into a graph representation. New features were extracted from this graph including degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, density, PageRank, harmonic centrality, clustering coefficient, and load centrality. These novel graph-derived variables were merged with the basketball dataset's existing attributes. Several machine learning models were trained on this enhanced dataset and evaluated using stringent 10-fold cross-validation.
Results:
Game outcome prediction was effectively treated as a binary classification problem. Combining graph-based extracted features with game statistics enables highly accurate predictions. Among all tested models, logistic regression achieved the highest prediction accuracy, reaching 99.6% accuracy for determining game wins and losses. This model and insight can enable educators to make data-driven decisions.
Conclusion:
This research clearly showed that machine learning models, enhanced with graph mining-based feature engineering, can predict the results of basketball games with very high accuracy. Graph analysis significantly enriched the feature space. This made it possible to capture complex relational patterns in basketball data beyond simple linear correlations. Consequently, logistic regression used these richer features to effectively determine the discriminating factors between wins and losses.
Keywords