Foundations of Data Intensive Applications: Large Scale Data Analytics Under the Hood - Paperback
Foundations of Data Intensive Applications: Large Scale Data Analytics Under the Hood - Paperback
$62.00
/
Your payment information is processed securely. We do not store credit card details nor have access to your credit card information.
by Supun Kamburugamuve (Author), Saliya Ekanayake (Author)
There is an ever increasing need to store this data, process them and incorporate the knowledge into everyday business operations of the companies. Before big data systems. there were high performance systems designed to do large calculations. Around the time big data became popular, high performance computing systems were mature enough to support the scientific community. But they weren't ready for the enterprise needs of data analytics. Because of the lack of system support for big data systems at that time, there was a large number of systems created to store and process data. These systems were created according to different design principles and some of them thrived through the years while some didn't succeed. Because of the diverse nature of systems and tools available for data analytics, there is a need to understand these systems and their applications from a theoretical perspective. These systems are masking the user from underlying details, and they use them without knowing how they work. This works for simple applications but when developing more complex applications that need to scale, users find themselves without the required foundational knowledge to reason about the issues. This knowledge is currently hidden in the systems and research papers.
The underlying principles behind data processing systems originate from the parallel and distributed computing paradigms. Among the many systems and APIs for data processing, they use the same fundamental ideas under the hood with slightly different variations. We can breakdown data analytics systems according to these principles and study them to understand the inner workings of applications.
This book defines these foundational components of large scale, distributed data processing systems and go into details independently of specific frameworks. It draws examples of current systems to explain how these principles are used in practice. Major design decisions around these foundational components define the performance, type of applications supported and usability. One of the goals of the book is to explain these differences so that readers can take informed decisions when developing applications. Further it will help readers to acquire in-depth knowledge and recognize problems in their applications such as performance issues, distributed operation issues, and fault tolerance aspects.
This book aims to use state of the art research when appropriate to discuss some ideas and future of data analytics tools.
Front Jacket
PEEK "UNDER THE HOOD" OF BIG DATA ANALYTICS
The world of big data analytics grows ever more complex. And while many people can work superficially with specific frameworks, far fewer understand the fundamental principles of large-scale, distributed data processing systems and how they operate. In Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood, renowned big-data experts and computer scientists Drs. Supun Kamburugamuve and Saliya Ekanayake deliver a practical guide to applying the principles of big data to software development for optimal performance.
The authors discuss foundational components of large-scale data systems and walk readers through the major software design decisions that define performance, application type, and usability. You ll learn how to recognize problems in your applications resulting in performance and distributed operation issues, diagnose them, and effectively eliminate them by relying on the bedrock big data principles explained within.
Moving beyond individual frameworks and APIs for data processing, this book unlocks the theoretical ideas that operate under the hood of every big data processing system.
Ideal for data scientists, data architects, dev-ops engineers, and developers, Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood shows readers how to:
- Identify the foundations of large-scale, distributed data processing systems
- Make major software design decisions that optimize performance
- Diagnose performance problems and distributed operation issues
- Understand state-of-the-art research in big data
- Explain and use the major big data frameworks and understand what underpins them
- Use big data analytics in the real world to solve practical problems
Author Biography
SUPUN KAMBURUGAMUVE, PhD, is a computer scientist researching and designing large scale data analytics tools. He received his doctorate in Computer Science from Indiana University, Bloomington and architected the data processing systems Twister2 and Cylon.
SALIYA EKANAYAKE, PhD, is a Senior Software Engineer at Microsoft working in the intersection of scaling deep learning systems and parallel computing. He is also a research affiliate at Berkeley Lab. He received his doctorate in Computer Science from Indiana University, Bloomington.