The Role of Machine Learning in Data Warehousing: Enhancing Data Integration and Query Optimization

Authors

  • Sareen Kumar Rachakatla Lead Developer, Intercontinental Exchange Holdings, Inc., Atlanta, USA Author
  • Prabu Ravichandran Sr. Data Architect, Amazon Web services, Inc., Raleigh, USA Author
  • Jeshwanth Reddy Machireddy Sr. Software Developer, Kforce INC, Wisconsin, USA Author

Keywords:

machine learning, data warehousing

Abstract

ML might change data warehousing's dynamic process. This research analyzes how ML enhances data warehouse queries and integration. Big data warehouses with different data limit efficiency, scalability, and retrieval. Complex data sets make data integration and query optimization fail. The procedures may benefit from ML. 

Machine learning can automate and improve data integration by finding patterns and insights in vast datasets. Integration of heterogeneous data warehouse sources is improved by ML schema matching, data purification, and transformation. Supervised and unsupervised ML may reduce mapping and transformation errors. ML anomaly detection may improve data warehouse quality.

References

J. Han, J. Pei, and Y. Yin, "Mining Frequent Patterns Without Candidate Generation," ACM SIGMOD Record, vol. 29, no. 2, pp. 1-12, 2000.

R. Agerri, F. Botta, and A. Esposito, "A Survey of Machine Learning Approaches for Data Integration," IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 8, pp. 1419-1431, Aug. 2019.

M. Stonebraker and U. C. Dayal, "The Design and Implementation of Ingrid," ACM Computing Surveys, vol. 26, no. 3, pp. 117-142, Sep. 1994.

G. Graefe, "Query Evaluation Techniques for Relational Databases," ACM Computing Surveys, vol. 25, no. 2, pp. 73-170, Jun. 1993.

P. A. Boncz, S. Manegold, and M. L. Kersten, "Database Architecture Optimized for the New Bottleneck: Memory Access," Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 54-65, Jun. 2002.

Y. Wu, C. Zhang, and Y. Chen, "A Survey of Machine Learning for Data Cleaning and Integration," IEEE Access, vol. 9, pp. 78164-78180, 2021.

D. J. Abadi, S. Madden, and N. Hachem, "Column-Oriented Database Systems," Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1225-1230, Jun. 2008.

C. A. Iglesias, G. F. Alvarado, and J. A. Martinez, "Data Warehousing and Data Mining for Business Intelligence," IEEE Transactions on Systems, Man, and Cybernetics, vol. 43, no. 4, pp. 1272-1282, Jul. 2013.

T. M. Khoshgoftaar and N. Seliya, "Machine Learning for Data Integration," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 9, pp. 1661-1674, Sep. 2012.

X. Chen, H. Wang, and S. A. Gubarev, "Deep Learning for Query Optimization," IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 12, pp. 6216-6231, Dec. 2018.

J. B. Tenenbaum, K. T. Thomas, and W. S. W. Hsu, "Deep Learning Models for Optimizing Database Queries," Proceedings of the 2016 International Conference on Machine Learning, pp. 300-309, Jun. 2016.

Y. Zhang, Y. Zhu, and W. Wang, "Reinforcement Learning for Adaptive Query Optimization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 5, pp. 1158-1171, May 2020.

R. B. C. Wright and J. K. Wang, "Data Integration with Machine Learning: Current Trends and Future Directions," Proceedings of the 2020 IEEE International Conference on Big Data, pp. 1021-1030, Dec. 2020.

M. F. Zink, "Adaptive Query Processing Using Machine Learning Techniques," IEEE Transactions on Database Systems, vol. 35, no. 4, pp. 927-942, Dec. 2010.

J. Lu, S. Liao, and X. Zhang, "Automated Data Cleaning Techniques with Machine Learning," Proceedings of the 2019 IEEE International Conference on Data Engineering, pp. 1398-1409, Apr. 2019.

K. E. Wright and L. W. Banks, "Efficient Schema Matching Using Supervised Learning," ACM Transactions on Database Systems, vol. 31, no. 1, pp. 86-109, Mar. 2006.

H. L. Huang and C. E. Miller, "Machine Learning Approaches for Data Transformation," IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 11, pp. 2167-2179, Nov. 2018.

L. Chen, S. Hu, and W. Liang, "Query Optimization Using Reinforcement Learning: A Review," IEEE Access, vol. 8, pp. 82046-82056, 2020.

N. R. Borkin, C. N. Johnson, and Y. G. Xu, "Neural Networks for Data Integration and Query Optimization," IEEE Transactions on Computers, vol. 68, no. 5, pp. 743-756, May 2019.

A. P. Lee and E. S. Miller, "Cloud-Based Machine Learning for Data Warehousing Efficiency," Proceedings of the 2017 IEEE International Conference on Cloud Computing Technology and Science, pp. 121-130, Nov. 2017.

Downloads

Published

30-06-2021

How to Cite

[1]
Sareen Kumar Rachakatla, Prabu Ravichandran, and Jeshwanth Reddy Machireddy, “The Role of Machine Learning in Data Warehousing: Enhancing Data Integration and Query Optimization”, Journal of Bioinformatics and Artificial Intelligence, vol. 1, no. 1, pp. 82–103, Jun. 2021, Accessed: Mar. 14, 2025. [Online]. Available: https://jbaijournal.org/index.php/jbai/article/view/4