The Function of Generative AI in Data Engineering: GPT-4 and Subsequent Developments

Authors

  • Naresh Dulam Vice President Sr Lead Software Engineer, JP Morgan Chase, USA Author
  • Venkataramana Gosukonda Senior Software Engineering Manager, Wells Fargo, USA Author
  • Madhu Ankam Vice President Sr Lead Software Engineer, JP Morgan Chase, USA Author

Keywords:

Generative AI, GPT-4, Data Engineering, Automation, Data Pipelines

Abstract

Especially models like GPT-4, generative artificial intelligence has drastically transformed many sectors, including data engineering, by drastically changing processes. Artificial intelligence improves data engineering—which focuses the design and administration of systems handling enormous data volumes—by automating typical tasks including data processing, transformation, and purification.This automation releases data engineers to focus on important projects and sophisticated problem-solving, therefore enhancing general efficiency and lowering the time spent on manual chores.  Moreover, generative artificial intelligence models improve data pipelines by providing important insights, therefore enabling the study of large volumes of data and the identification of maybe ignored trends or abnormalities. Through trend forecasts or data-driven insight-based suggestions, the models help to guide decisions. This reduces human error via fundamental preservation of data integrity and improvement of system performance. Including generative artificial intelligence along with data engineering presents specific challenges. Ethical challenges involving data privacy, security issues, and algorithmic biases must be addressed as artificial intelligence models multiply. Artificial intelligence has to be used moralistically and in line with moral guidelines to stop mistreatment. Furthermore, in many respects including difficulties in contextual recognition and the production of totally accurate findings in challenging circumstances, these artificial intelligence technologies still have great limits despite their great potential.As artificial intelligence technology develops constantly, data engineering seems bright and presents chances for further cooperation between human knowledge and AI-driven solutions. This integration is expected to develop into more simplified processes where artificial intelligence technologies help engineers solve challenging data problems and enable the scale-based data system optimization. Generative artificial intelligence may eventually become indispensable in data engineering since it helps companies to solve the growing complexity of efficient management and derive fresh ideas and financial value from their data. By striking the ideal mix between automation and human oversight, data engineering will keep flourishing in a world going more and more data-centric.

References

1. Xiao, Z., Li, W., Moon, H., Roell, G. W., Chen, Y., & Tang, Y. J. (2023). Generative artificial intelligence GPT-4 accelerates knowledge mining and machine learning for synthetic biology. ACS synthetic biology, 12(10), 2973-2982.

2. Zhang, C., Zhang, C., Zheng, S., Qiao, Y., Li, C., Zhang, M., ... & Hong, C. S. (2023). A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?. arXiv preprint arXiv:2303.11717.

3. Alto, V. (2023). Modern Generative AI with ChatGPT and OpenAI Models: Leverage the capabilities of OpenAI's LLM for productivity and innovation with GPT3 and GPT4. Packt Publishing Ltd.

4. du Plooy, C., & Oosthuizen, R. (2023). AI usefulness in systems modelling and simulation: gpt-4 application. South African Journal of Industrial Engineering, 34(3), 286-303.

5. Mozol, S., Mozolova, L., Grznar, P., Krajcovic, M., & Mizerak, M. (2023). Implementation of generative pretrained transformer (GPT) models in industrial practice and production process. Acta Simulatio, 9(4).

6. Ge, J., Chen, I. Y., Pletcher, M. J., & Lai, J. C. (2022). Prompt Engineering for Generative Artificial Intelligence in Gastroenterology and Hepatology. Official journal of the American College of Gastroenterology| ACG, 10-14309.

7. Foster, D. (2022). Generative deep learning. " O'Reilly Media, Inc.".

8. Ghalibafan, S., Gonzalez, D. J. T., Cai, L. Z., Chou, B. G., Panneerselvam, S., Barrett, S. C., ... & Yannuzzi, N. A. (2022). Applications of Multimodal Generative AI in a Real-World Retina Clinic Setting. Retina, 10-1097.

9. O’Leary, D. E. (2022). Massive data language models and conversational artificial intelligence: Emerging issues. Intelligent Systems in Accounting, Finance and Management, 29(3), 182-198.

10. Benaich, N., & Hogarth, I. (2020). State of AI report. London, UK.[Google Scholar].

11. Haenlein, M., & Kaplan, A. (2019). A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. California management review, 61(4), 5-14.

12. Herzog, D. J., & Herzog, N. J. (2020). Towards a potential paradigm shift in health data collection and analysis: Contemporary challenges of Human-Machine interaction. Metaverse. 2024; 5 (1): 2690. Medicine.

13. Bucchiarone, A., Gini, F., Bonetti, F., Bassanelli, S., Schiavo, G., Martorella, T., ... & Zambotto, L. (2012). Can Generative AI Support Educators? Creating Learning Paths with PolyGloT. In General Aspects of Applying Generative AI in Higher Education: Opportunities and Challenges (pp. 393-428). Cham: Springer Nature Switzerland.

14. Rosenthal, K. (2018). Teaching Conceptual Modeling in the Age of Generative Conversational AI: Ideas for a Research Agenda. Also of Interest, 199.

15. Wazan, A. S., Taj, I., Shoufan, A., Laborde, R., & Venant, R. (2012). How to Design and Deliver Courses for Higher Education in the AI Era?. In General Aspects of Applying Generative AI in Higher Education: Opportunities and Challenges (pp. 347-384). Cham: Springer Nature Switzerland.

16. Thumburu, S. K. R. (2023). AI-Driven EDI Mapping: A Proof of Concept. Innovative Engineering Sciences Journal, 3(1).

17. Thumburu, S. K. R. (2023). Quality Assurance Methodologies in EDI Systems Development. Innovative Computer Sciences Journal, 9(1).

18. Gade, K. R. (2023). Security First, Speed Second: Mitigating Risks in Data Cloud Migration Projects. Innovative Engineering Sciences Journal, 3(1).

19. Gade, K. R. (2023). The Role of Data Modeling in Enhancing Data Quality and Security in Fintech Companies. Journal of Computing and Information Technology, 3(1).

20. Katari, A., & Rodwal, A. NEXT-GENERATION ETL IN FINTECH: LEVERAGING AI AND ML FOR INTELLIGENT DATA TRANSFORMATION.

21. Katari, A., Ankam, M., & Shankar, R. Data Versioning and Time Travel In Delta Lake for Financial Services: Use Cases and Implementation.

22. Komandla, V. Crafting a Clear Path: Utilizing Tools and Software for Effective Roadmap Visualization.

23. Gade, K. R. (2022). Migrations: AWS Cloud Optimization Strategies to Reduce Costs and Improve Performance. MZ Computing Journal, 3(1).

24. Thumburu, S. K. R. (2022). Real-Time Data Transformation in EDI Architectures. Innovative Engineering Sciences Journal, 2(1).

25. Thumburu, S. K. R. (2022). Transforming Legacy EDI Systems: A Comprehensive Migration Guide. Journal of Innovative Technologies, 5(1).

Downloads

Published

25-02-2024

How to Cite

[1]
Naresh Dulam, Venkataramana Gosukonda, and Madhu Ankam, “The Function of Generative AI in Data Engineering: GPT-4 and Subsequent Developments”, Journal of Bioinformatics and Artificial Intelligence, vol. 4, no. 1, pp. 227–249, Feb. 2024, Accessed: Apr. 28, 2025. [Online]. Available: https://jbaijournal.org/index.php/jbai/article/view/10