Empirical Evaluation of AWS Cloud Capabilities for Hosting Large-Scale Generative AI Microservice Architectures
Keywords:
AWS, Generative AI, , Microservices, Cloud Computing, Scalability, EC2, Lambda, Cost Analysis, Cloud ArchitectureAbstract
This paper empirically evaluates the capabilities of Amazon Web Services (AWS) in hosting large-scale generative AI systems through a microservices-based architectural paradigm. By deploying and benchmarking distributed AI models using containerized microservices on AWS services such as EC2, ECS, Lambda, and S3, the study assesses throughput, scalability, latency, and cost-efficiency. We also examine architectural patterns optimal for balancing GPU utilization and request routing. The findings reveal AWS's flexibility and auto-scaling potential but highlight performance bottlenecks under heterogeneous GPU availability and regional limitations. Results provide a foundation for designing cost-effective and scalable deployment strategies for AI-based microservices in production environments.
References
Armbrust, M., et al. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50–58.
Devalla, S. (2021). Enterprise-scale evaluation of AWS elastic scaling performance, efficiency, and strategic trade-offs. European Journal of Advances in Engineering and Technology, 8(5), 85–92.
Buyya, R., Yeo, C.S., & Venugopal, S. (2009). Market-oriented cloud computing: Vision, hype, and reality. Future Generation Computer Systems, 25(6), 599–616.
Lewis, J., & Fowler, M. (2014). Microservices: a definition of this new architectural term.
Devalla, S. (2021). Optimizing performance, stability, and cost efficiency in large-scale enterprise migrations to AWS: A data-driven approach. International Journal of Computer Engineering and Technology (IJCET), 12(1), 137–159. https://doi.org/10.34218/IJCET_12_01_013
Zaharia, M., et al. (2010). Spark: Cluster computing with working sets. USENIX Conference on HotCloud.
Devalla, S. (2019). Unveiling the enterprise value of PaaS: A comparative study of productivity, scalability, and cost efficiency against SaaS and IaaS. European Journal of Advances in Engineering and Technology, 6(2), 120–126.
Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters. OSDI.
Devalla, S. (2020). Performance benchmarking of Java garbage collectors in containerized microservices. Journal of Scientific and Engineering Research, 7(6), 326–334.
Amazon Web Services Documentation. (2023). AWS EC2, ECS, Lambda.
Devalla, S. (2020). Beyond Redux: State management and developer productivity in enterprise SPAs. European Journal of Advances in Engineering and Technology, 7(4), 70–78
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Jane Andros Milin (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
