27 Best 「data enginer」 Books of 2025| Books Explorer

No.1

100

Fundamentals of Data Engineering: Plan and Build Robust Data Systems

Reis, Joe

O'Reilly Media

Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you will learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available in the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You will understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, governance, and deployment that are critical in any data environment regardless of the underlying technology. This book will help you: Assess data engineering problems using an end-to-end data framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle

Everyone's Review

No reviews yet.

No.2

91

Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python

Crickard, Paul

Packt Publishing

data enginer

See more on Amazon

Build, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projects Key Features Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples Design data models and learn how to extract, transform, and load (ETL) data using Python Schedule, automate, and monitor complex data pipelines in production Book Description Data engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python. The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You'll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You'll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you'll build architectures on which you'll learn how to deploy data pipelines. By the end of this Python book, you'll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production. What you will learn Understand how data engineering supports data science workflows Discover how to extract data from files and databases and then clean, transform, and enrich it Configure processors for handling different file formats as well as both relational and NoSQL databases Find out how to implement a data pipeline and dashboard to visualize results Use staging and validation to check data before landing in the warehouse Build real-time pipelines with staging areas that perform validation and handle failures Get to grips with deploying pipelines in the production environment Who this book is for This book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.

Everyone's Review

No reviews yet.

No.3

84

Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema

Corr, Lawrence

Lightning Source Inc

data enginer

See more on Amazon

Agile Data Warehouse Design is a step-by-step guide for capturing data warehousing / business intelligence (DW/BI) requirements and turning them into high performance dimensional models in the most direct way: by modelstorming (data modeling + brainstorming) with BI stakeholders.The book describes BEAM, an agile approach to dimensional modeling, for improving communication between data warehouse designers, BI stakeholders and the whole DW/BI development team. BEAM provides tools and techniques that will encourage DW/BI designers and developers to move away from their keyboards and entity relationship based tools and model interactively with their colleagues. The result is everyone thinks dimensionally from the outset! Developers understand how to efficiently implement dimensional modeling solutions. Business stakeholders feel ownership of the data warehouse they have created, and can already imagine how they will use it to answer their business questions.Within this book, you will learn: Agile dimensional modeling using Business Event Analysis & Modeling (BEAM) Modelstorming: data modeling that is quicker, more inclusive, more productive, and frankly more fun! Telling dimensional data stories using the 7Ws (who, what, when, where, how many, why and how) Modeling by example not abstraction; using data story themes, not crow’s feet, to describe detail Storyboarding the data warehouse to discover conformed dimensions and plan iterative development Visual modeling: sketching timelines, charts and grids to model complex process measurement – simply Agile design documentation: enhancing star schemas with BEAM dimensional shorthand notation Solving difficult DW/BI performance and usability problems with proven dimensional design patterns

Everyone's Review

No reviews yet.

No.4

82

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling

Kimball, Ralph

Wiley

data enginer

See more on Amazon

Everyone's Review

No reviews yet.

No.5

79

Big Data: Principles and best practices of scalable realtime data systems

Marz, Nathan

Manning Publications

data enginer

See more on Amazon

SummaryBig Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.About the BookWeb-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive.Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you'll learn specific technologies like Hadoop, Storm, and NoSQL databases.This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.What's Inside Introduction to big data systems Real-time processing of web-scale data Tools like Hadoop, Cassandra, and Storm Extensions to traditional database skillsAbout the AuthorsNathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.Table of Contents A new paradigm for Big Data PART 1 BATCH LAYER Data model for Big Data Data model for Big Data: Illustration Data storage on the batch layer Data storage on the batch layer: Illustration Batch layer Batch layer: Illustration An example batch layer: Architecture and algorithms An example batch layer: Implementation PART 2 SERVING LAYER Serving layer Serving layer: Illustration PART 3 SPEED LAYER Realtime views Realtime views: Illustration Queuing and stream processing Queuing and stream processing: Illustration Micro-batch stream processing Micro-batch stream processing: Illustration Lambda Architecture in depth

Everyone's Review

No reviews yet.

No.6

75

DW 2.0: The Architecture for the Next Generation of Data Warehousing

Inmon, W. H.

Morgan Kaufmann

data enginer

See more on Amazon

DW 2.0: The Architecture for the Next Generation of Data Warehousing is the first book on the new generation of data warehouse architecture, DW 2.0, by the father of the data warehouse. The book describes the future of data warehousing that is technologically possible today, at both an architectural level and technology level.The perspective of the book is from the top down: looking at the overall architecture and then delving into the issues underlying the components. This allows people who are building or using a data warehouse to see what lies ahead and determine what new technology to buy, how to plan extensions to the data warehouse, what can be salvaged from the current system, and how to justify the expense at the most practical level. This book gives experienced data warehouse professionals everything they need in order to implement the new generation DW 2.0.It is designed for professionals in the IT organization, including data architects, DBAs, systems design and development professionals, as well as data warehouse and knowledge management professionals. First book on the new generation of data warehouse architecture, DW 2.0 Written by the "father of the data warehouse", Bill Inmon, a columnist and newsletter editor of The Bill Inmon Channel on the Business Intelligence Network Long overdue comprehensive coverage of the implementation of technology and tools that enable the new generation of the DW: metadata, temporal data, ETL, unstructured data, and data quality control

Everyone's Review

No reviews yet.

No.7

75

Big Data: Principles and Best Practices of Scalable Real-Time Data Systems

None

data enginer

See more on Amazon

Please Read Notes: Brand New, International Softcover Edition, Printed in black and white pages, minor self wear on the cover or pages, Sale restriction may be printed on the book, but Book name, contents, and author are exactly same as Hardcover Edition. Fast delivery through DHL/FedEx express.

Everyone's Review

No reviews yet.

No.8

69

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Kleppmann, Martin

O'Reilly Media

data enginer

See more on Amazon

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

Everyone's Review

★

See more reviews

No.9

69

97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts

Macey, Tobias

O'Reilly Media

data enginer

See more on Amazon

Take advantage of today's sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges.Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents 97 concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers.Topics include: The Importance of Data Lineage - Julien Le Dem Data Security for Data Engineers - Katharine Jarmul The Two Types of Data Engineering and Data Engineers - Jesse Anderson Six Dimensions for Picking an Analytical Data Warehouse - Gleb Mezhanskiy The End of ETL as We Know It - Paul Singman Building a Career as a Data Engineer - Vijay Kiran Modern Metadata for the Modern Data Stack - Prukalpa Sankar Your Data Tests Failed! Now What? - Sam Bail

Everyone's Review

No reviews yet.

No.10

67

Spark: The Definitive Guide: Big Data Processing Made Simple

Chambers, Bill

O'Reilly Media

data enginer

See more on Amazon

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.\nYou’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library.\n\nGet a gentle overview of big data and Spark\nLearn about DataFrames, SQL, and Datasets—Spark’s core APIs—through worked examples\nDive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames\nUnderstand how Spark runs on a cluster\nDebug, monitor, and tune Spark clusters and applications\nLearn the power of Structured Streaming, Spark’s stream-processing engine\nLearn how you can apply MLlib to a variety of problems, including classification or recommendation\n

Everyone's Review

No reviews yet.

No.11

67

A Common-sense Guide to Data Structures and Algorithms: Level Up Your Core Programming Skills

Wengrow, Jay

Pragmatic Bookshelf

data enginer

See more on Amazon

If you thought that data structures and algorithms were all just theory, you're missing out on what they can do for your code. Learn to use Big O Notation to make your code run faster by orders of magnitude. Choose from data structures such as hash tables, trees, and graphs to increase your code's efficiency exponentially. With simple language and clear diagrams, this book makes this complex topic accessible, no matter your background. This new edition features practice exercises in every chapter, and new chapters on topics such as dynamic programming and heaps and tries. Get the hands-on info you need to master data structures and algorithms for your day-to-day work. Algorithms and data structures are much more than abstract concepts. Mastering them enables you to write code that runs faster and more efficiently, which is particularly important for today's web and mobile apps. Take a practical approach to data structures and algorithms, with techniques and real-world scenarios that you can use in your daily production code, with examples in JavaScript, Python, and Ruby. This new and revised second edition features new chapters on recursion, dynamic programming, and using Big O in your daily work. Use Big O notation to measure and articulate the efficiency of your code, and modify your algorithm to make it faster. Find out how your choice of arrays, linked lists, and hash tables can dramatically affect the code you write. Use recursion to solve tricky problems and create algorithms that run exponentially faster than the alternatives. Dig into advanced data structures such as binary trees and graphs to help scale specialized applications such as social networks and mapping software. You'll even encounter a single keyword that can give your code a turbo boost. Practice your new skills with exercises in every chapter, along with detailed solutions. Use these techniques today to make your code faster and more scalable.

Everyone's Review

No reviews yet.

No.12

66

Learning Spark: Lightning-Fast Big Data Analysis

Karau, Holden

O'Reilly Media

data enginer

See more on Amazon

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning. Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm Learn how to deploy interactive, batch, and streaming applications Connect to data sources including HDFS, Hive, JSON, and S3 Master advanced topics like data partitioning and shared variables

Everyone's Review

No reviews yet.

No.13

66

Data Science for Dummies

Porway, Jake

For Dummies

data enginer

See more on Amazon

Discover how data science can help you gain in-depth insight into your business - the easy way! Jobs in data science abound, but few people have the data science skills needed to fill these increasingly important roles. Data Science For Dummies is the perfect starting point for IT professionals and students who want a quick primer on all areas of the expansive data science space. With a focus on business cases, the book explores topics in big data, data science, and data engineering, and how these three areas are combined to produce tremendous value. If you want to pick-up the skills you need to begin a new career or initiate a new project, reading this book will help you understand what technologies, programming languages, and mathematical methods on which to focus. While this book serves as a wildly fantastic guide through the broad, sometimes intimidating field of big data and data science, it is not an instruction manual for hands-on implementation. Here’s what to expect: Provides a background in big data and data engineering before moving on to data science and how it's applied to generate value Includes coverage of big data frameworks like Hadoop, MapReduce, Spark, MPP platforms, and NoSQL Explains machine learning and many of its algorithms as well as artificial intelligence and the evolution of the Internet of Things Details data visualization techniques that can be used to showcase, summarize, and communicate the data insights you generate It's a big, big data world out there―let Data Science For Dummies help you harness its power and gain a competitive edge for your organization.

Everyone's Review

No reviews yet.

No.14

65

Container Security: Fundamental Technology Concepts That Protect Containerized Applications

Rice, Liz

O'Reilly Media

data enginer

See more on Amazon

Many organizations are running applications in cloud native environments, using containers and orchestration to facilitate scalability and resilience. But how do you know whether your deployment is secure? To fully grasp the security implications of containers and their operation, you need an understanding of what they are and how they work.\nThis practical book dives into the underlying technologies and components that these systems rely on to leave you better equipped to assess the security risks and potential solutions applicable to your environment. Author Liz Rice explores the building blocks and security boundaries commonly used in container-based systems and how they’re constructed in Linux.

Everyone's Review

No reviews yet.

No.15

65

Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale

Shapira, Gwen

O'Reilly Media

data enginer

See more on Amazon

Product Description \\nEvery enterprise application creates data, whether it consists of log messages, metrics, user activity, or outgoing messages. Moving all this data is just as important as the data itself. With this updated edition, application architects, developers, and production engineers new to the Kafka streaming platform will learn how to handle data in motion. Additional chapters cover Kafka's AdminClient API, transactions, new security features, and tooling changes.\nEngineers from Confluent and LinkedIn responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream processing applications with this platform. Through detailed examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.\nYou'll examine:\\nBest practices for deploying and configuring Kafka\nKafka producers and consumers for writing and reading messages\nPatterns and use-case requirements to ensure reliable data delivery\nBest practices for building data pipelines and applications with Kafka\nHow to perform monitoring, tuning, and maintenance tasks with Kafka in production\nThe most critical metrics among Kafka's operational measurements\nKafka's delivery capabilities for stream processing systems\\nAbout the Author \nGwen Shapira is a system architect at Confluent helping customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of "Hadoop Application Architectures", and a frequent presenter at data driven conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.\\nTodd is a Staff Site Reliability Engineer at LinkedIn, tasked with keeping the largest deployment of Apache Kafka, Zookeeper, and Samza fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification system. Todd is the developer of the open source project Burrow, a Kafka consumer monitoring tool, and can be found sharing his experience on Apache Kafka at industry conferences and tech talks. Todd has spent over 20 years in the technology industryrunning infrastructure services, most recently as a Systems Engineer at Verisign, developing service management automation for DNS, networking, and hardware management, as well as managing hardware and software standards across the company.\\nRajini Sivaram is a Software Engineer at Confluent designing and developing security features for Kafka. She is an Apache Kafka Committer and member of the Apache Kafka Program Management Committee. Prior to joining Confluent, she was at Pivotal working on a high-performance reactive API for Kafka based on Project Reactor. Earlier, Rajini was a key developer on IBM Message Hub which provides Kafka-as-a-Service on the IBM Bluemix platform. Her experience ranges from parallel and distributed systems to Java virtual machines and messaging systems.\\nKrit Petty is the Site Reliability Engineering Manager for Kafka at LinkedIn. Before becoming Manager, he worked as an SRE on the team expanding and increasing Kafka to overcome the hurdles associated with scaling Kafka to never before seen heights, including taking the first steps to moving LinkedIn's large-scale Kafka deployments into Microsoft's Azure cloud. Krit has a Master's Degree in Computer Science and previously worked managing Linux systems and as a Software Engineer developing software for high-performance computing projects in the oil and gas industry.

Everyone's Review

No reviews yet.

No.16

65

Cassandra – The Definitive Guide, 3e

Carpenter, Jeff

O′Reilly

data enginer

See more on Amazon

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, youâ??ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This third editionâ??updated for Cassandra 4.0â??provides the technical details and practical examples you need to put this database to work in a production environment.\nAuthors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandraâ??s nonrelational design, with special attention to data modeling. If youâ??re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandraâ??s speed and flexibility. Understand Cassandraâ??s distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlshâ??the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data

Everyone's Review

No reviews yet.

No.17

65

The Datapreneurs: The Promise of AI and the Creators Building Our Future

Muglia, Bob

Peakpoint Press

data enginer

See more on Amazon

A leader in the data economy explains how we arrived at AI--and how we can navigate its future In The Datapreneurs, Bob Muglia helps us understand how innovation in data and information technology have led us to AI--and how this technology must shape our future. The long-time Microsoft executive, former CEO of Snowflake, and current tech investor maps the evolution of the modern data stack and how it has helped build today's economy and society. And he explains how humanity must create a new social contract for the artificial general intelligence (AGI)--autonomous machines intelligent as people--that he expects to arrive in less than a decade. Muglia details his personal experience in the foundational years of computing and data analytics, including with Bill Gates and Sam Altman, the CEO of OpenAI, the creator of ChatGPT, and others that are not household names--yet. He builds upon Isaac Asimov's Laws of Robotics to explore the moral, ethical, and legal implications of today's smart machines, and how a combination of human and machine intelligence could create an era of progress and prosperity where all the people on Earth can have what they need and want without destroying our natural environment. The Datapreneurs is a call to action. AGI is surely coming. Muglia believes that tech business leaders, ethicists, policy leaders, and even the general public must collaborate answer the short- and long-term questions raised by its emergence. And he argues that we had better get going, because advances are coming so fast that society risks getting caught flatfooted--with potentially disastrous consequences.  

Everyone's Review

No reviews yet.

No.18

65

Building a Data Warehouse: With Examples in SQL Server (Expert's Voice)

Rainardi, Vincent

Apress

data enginer

See more on Amazon

Building a Data Warehouse: With Examples in SQL Server describes how to build a data warehouse completely from scratch and shows practical examples on how to do it. Author Vincent Rainardi also describes some practical issues he has experienced that developers are likely to encounter in their first data warehousing project, along with solutions and advice. The relational database management system (RDBMS) used in the examples is SQL Server; the version will not be an issue as long as the user has SQL Server 2005 or later.The book is organized as follows. In the beginning of this book (chapters 1 through 6), you learn how to build a data warehouse, for example, defining the architecture, understanding the methodology, gathering the requirements, designing the data models, and creating the databases. Then in chapters 7 through 10, you learn how to populate the data warehouse, for example, extracting from source systems, loading the data stores, maintaining data quality, and utilizing the metadata. After you populate the data warehouse, in chapters 11 through 15, you explore how to present data to users using reports and multidimensional databases and how to use the data in the data warehouse for business intelligence, customer relationship management, and other purposes. Chapters 16 and 17 wrap up the book: After you have built your data warehouse, before it can be released to production, you need to test it thoroughly. After your application is in production, you need to understand how to administer data warehouse operation.

Everyone's Review

No reviews yet.

No.19

64

Data Mesh: Delivering Data-Driven Value at Scale

Dehghani, Zhamak

O'Reilly Media

data enginer

See more on Amazon

We're at an inflection point in data, where our data management solutions no longer match the complexity of organizations, the proliferation of data sources, and the scope of our aspirations to get value from data with AI and analytics. In this practical book, author Zhamak Dehghani introduces data mesh, a decentralized sociotechnical paradigm drawn from modern distributed architecture that provides a new approach to sourcing, sharing, accessing, and managing analytical data at scale.\nDehghani guides practitioners, architects, technical leaders, and decision makers on their journey from traditional big data architecture to a distributed and multidimensional approach to analytical data management. Data mesh treats data as a product, considers domains as a primary concern, applies platform thinking to create self-serve data infrastructure, and introduces a federated computational model of data governance. Get a complete introduction to data mesh principles and its constituents Design a data mesh architecture Guide a data mesh strategy and execution Navigate organizational design to a decentralized data ownership model Move beyond traditional data warehouses and lakes to a distributed data mesh

Everyone's Review

No reviews yet.

No.20

64

Data Engineering: Mining, Information and Intelligence (International Series in Operations Research & Management Science, 132)

Chan, Yupo

Springer

data enginer

See more on Amazon

DATA ENGINEERING: Mining, Information, and Intelligence describes applied research aimed at the task of collecting data and distilling useful information from that data. Most of the work presented emanates from research completed through collaborations between Acxiom Corporation and its academic research partners under the aegis of the Acxiom Laboratory for Applied Research (ALAR). Chapters are roughly ordered to follow the logical sequence of the transformation of data from raw input data streams to refined information. Four discrete sections cover Data Integration and Information Quality; Grid Computing; Data Mining; and Visualization. Additionally, there are exercises at the end of each chapter. The primary audience for this book is the broad base of anyone interested in data engineering, whether from academia, market research firms, or business-intelligence companies. The volume is ideally suited for researchers, practitioners, and postgraduate students alike. With its focus on problems arising from industry rather than a basic research perspective, combined with its intelligent organization, extensive references, and subject and author indices, it can serve the academic, research, and industrial audiences.

Everyone's Review

No reviews yet.

No.21

64

Foundations for Architecting Data Solutions: Managing Successful Data Projects

Malaska, Ted

O'Reilly Media

data enginer

See more on Amazon

While many companies ponder implementation details such as distributed processing engines and algorithms for data analysis, this practical book takes a much wider view of big data development, starting with initial planning and moving diligently toward execution. Authors Ted Malaska and Jonathan Seidman guide you through the major components necessary to start, architect, and develop successful big data projects.\nEveryone from CIOs and COOs to lead architects and developers will explore a variety of big data architectures and applications, from massive data pipelines to web-scale applications. Each chapter addresses a piece of the software development life cycle and identifies patterns to maximize long-term success throughout the life of your project.\n\nStart the planning process by considering the key data project types\nUse guidelines to evaluate and select data management solutions\nReduce risk related to technology, your team, and vague requirements\nExplore system interface design using APIs, REST, and pub/sub systems\nChoose the right distributed storage system for your big data system\nPlan and implement metadata collections for your data architecture\nUse data pipelines to ensure data integrity from source to final storage\nEvaluate the attributes of various engines for processing the data you collect\n

Everyone's Review

No reviews yet.

No.22

64

Big Data Black Book: Covers Hadoop 2 Mapreduce Hive Yarn Pig R And Data Visualization [Paperback] [Jan 01, 2016] Dt Editorial Services

None

data enginer

See more on Amazon

This essentially self-contained, deliberately compact, and user-friendly textbook is designed for a first, one-semester course in statistical signal analysis for a broad audience of students in engineering and the physical sciences. The emphasis throughout is on fundamental concepts and relationships in the statistical theory of stationary random signals, explained in a concise, yet fairly rigorous presentation. Topics and Features: * Fourier series and transforms"fundamentally important in random signal analysis and processing"are developed from scratch, emphasizing the time-domain vs. frequency-domain duality. * Basic concepts of probability theory, laws of large numbers, the stability of fluctuations law (central limit theorem), and statistical parametric inference procedures are presented so that no prior knowledge of probability and statistics is required; the only prerequisite is a basic two"three semester calculus sequence. * Introduction of the fundamental concept of a stationary random signal and its autocorrelation structure. * Power spectra of stationary signals and transmission analysis. * Filter design with optimal signal-to-noise ratio. * Computer simulation algorithms of stationary random signals with a given power spectrum density. * Complementary bibliography for readers who wish to pursue the study of random signals in greater depth. * Many diverse examples as well as end-of-chapter problems and exercises. Developed by the author over the course of several years of classroom use, A First Course in Statistics for Signal Analysis may be used by junior/senior undergraduates or graduate students in electrical, systems, computer, and biomedical engineering, as well as the physical sciences. The work is also an excellent resource of educational and training material for scientists and engineers working in research laboratories.

Everyone's Review

No reviews yet.

No.23

64

Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale

Kunigk, Jan

O'Reilly Media

data enginer

See more on Amazon

There’s a lot of information about big data technologies, but splicing these technologies into an end-to-end enterprise data platform is a daunting task not widely covered. With this practical book, you’ll learn how to build big data infrastructure both on-premises and in the cloud and successfully architect a modern data platform.\nIdeal for enterprise architects, IT managers, application architects, and data engineers, this book shows you how to overcome the many challenges that emerge during Hadoop projects. You’ll explore the vast landscape of tools available in the Hadoop and big data realm in a thorough technical primer before diving into:\n\n\nInfrastructure: Look at all component layers in a modern data platform, from the server to the data center, to establish a solid foundation for data in your enterprise\n\nPlatform: Understand aspects of deployment, operation, security, high availability, and disaster recovery, along with everything you need to know to integrate your platform with the rest of your enterprise IT\n\nTaking Hadoop to the cloud: Learn the important architectural aspects of running a big data platform in the cloud while maintaining enterprise security and high availability\n

Everyone's Review

No reviews yet.

No.24

64

Data Management at Scale: Best Practices for Enterprise Architecture

Strengholt, Piethein

O'Reilly Media

data enginer

See more on Amazon

As data management and integration continue to evolve rapidly, storing all your data in one place, such as a data warehouse, is no longer scalable. In the very near future, data will need to be distributed and available for several technological solutions. With this practical book, you'll learnhow to migrate your enterprise from a complex and tightly coupled data landscape to a more flexible architecture ready for the modern world of data consumption. Executives, data architects, analytics teams, and compliance and governance staff will learn how to build a modern scalable data landscape using the Scaled Architecture, which you can introduce incrementally without a large upfront investment. Author Piethein Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed. Examine data management trends, including technological developments, regulatory requirements, and privacy concerns Go deep into the Scaled Architecture and learn how the pieces fit together Explore data governance and data security, master data management, self-service data marketplaces, and the importance of metadata

Everyone's Review

No reviews yet.

No.25

64

Think Like An Engineer: Inside the Minds that are Changing our Lives

Madhavan, Guru

Oneworld Publications

data enginer

See more on Amazon

Everyone's Review

No reviews yet.

No.26

64

Data Governance: The Definitive Guide: People, Processes, and Tools to Operationalize Data Trustworthiness

Eryurek, Evren

O'Reilly Media

data enginer

See more on Amazon

As you move data to the cloud, you need to consider a comprehensive approach to data governance, along with well-defined and agreed-upon policies to ensure your organization meets compliance requirements. Data governance incorporates the ways people, processes, and technology work together to ensure data is trustworthy and can be used effectively. This practical guide shows you how to effectively implement and scale data governance throughout your organization. Chief information, data, and security officers and their teams will learn strategy and tooling to support democratizing data and unlocking its value while enforcing security, privacy, and other governance standards. Through good data governance, you can inspire customer trust, enable your organization to identify business efficiencies, generate more competitive offerings, and improve customer experience. This book shows you how. You'll learn: Data governance strategies addressing people, processes, and tools Benefits and challenges of a cloud-based data governance approach How data governance is conducted from ingest to preparation and use How to handle the ongoing improvement of data quality Challenges and techniques in governing streaming data Data protection for authentication, security, backup, and monitoring How to build a data culture in your organization

Everyone's Review

No reviews yet.

No.27

63

Team Topologies: Organizing Business and Technology Teams for Fast Flow

Skelton, Matthew

It Revolution Pr

data enginer

See more on Amazon

"This book will help executives and business leaders focus on the key strategies of high performance teams to effectively address the needs of today and the evolving landscape of tomorrow.” ―Barry O'Reilly, author of Unlearn and Lean EnterpriseCompanion book Remote Team Interactions Workbook now available!Effective software teams are essential for any organization to deliver value continuously and sustainably. But how do you build the best team organization for your specific goals, culture, and needs?Team Topologies is a practical, step-by-step, adaptive model for organizational design and team interaction based on four fundamental team types and three team interaction patterns. It is a model that treats teams as the fundamental means of delivery, where team structures and communication pathways are able to evolve with technological and organizational maturity.In Team Topologies, IT consultants Matthew Skelton and Manuel Pais share secrets of successful team patterns and interactions to help readers choose and evolve the right team patterns for their organization, making sure to keep the software healthy and optimize value streams.Team Topologies is a major step forward in organizational design for software, presenting a well-defined way for teams to interact and interrelate that helps make the resulting software architecture clearer and more sustainable, turning inter-team problems into valuable signals for the self-steering organization.

Everyone's Review

No reviews yet.