Learn about the definition and benefits of Data Products in accelerating data analytics, delve into the key components for speeding up the development of Data Products, and uncover the initial steps to build your Data Product strategy.
Description
Most companies today are storing data and running applications in a hybrid multi-cloud environment. Analytical systems tend to be centralized and siloed like data warehouses and data marts for BI, Hadoop or cloud storage data lakes for data science and stand-alone streaming analytical systems for real-time analysis. These centralized systems rely on data engineers and data scientists working within each silo to ingest data from many different sources, clean and integrate it for use in a specific analytical system or machine learning models. There are many issues with this centralized, siloed approach including multiple tools to prepare and integrate data, reinvention of data integration pipelines in each silo and centralized data engineering with a poor understanding of source data unable to keep pace with business demands for new data. Also, master data is not well managed.
To address these issues, a new approach has emerged attempting to accelerate the creation of data for use in multiple analytical workloads. That approach is to create Data Products. This is a decentralized business domain-oriented approach to data ownership and data engineering to create a mesh of reusable data products that can be created once and shared across multiple analytical systems and workloads. Multiple data architecture options are available to create data products. These include using one or more cloud storage accounts on cloud storage, on an organized data lake, on a Lakehouse, on a data cloud, using Kafka or using data virtualization. Data products can then be consumed in other pipelines for use in streaming analytics, Data Warehouses or Lakehouse Gold Tables, for use in business intelligence, data science and other analytical workloads.
This 2-day course looks at:
- Data Poducts in detail and examines its strengths, and weaknesses
- The strengths and weaknesses of Data Product implementation options
- Which architecture is best to implement a Data Products?
- How to co-ordinate multiple domain-oriented teams
- How to use of common data infrastructure software like Data Fabric to create high-quality, compliant, reusable, data products
- How to use a data marketplace to govern and share data products
- The objective to shorten the time to value while also ensuring that data is correctly governed and engineered in a decentralized environment
- Organizational implications of a democratised Data Product development
- How to create sharable data products for master data management AND for use in multi-dimensional analysis on a data warehouse, data science, graph analysis and real-time streaming analytics to drive business value
- Technologies like data catalogs, Data Fabric for collaborative development of data integration pipelines to create data products, DataOps to speed up the process, data orchestration automation, data marketplaces and data governance platforms
Why attend
You will learn about:
- The problems caused in existing analytical systems by a hybrid, multi-cloud data landscape
- Strengths and weaknesses of centralized data architectures used in analytics
- What Data Poducts are and how they differ from other approaches?
- What benefits do Data Poducts offer and what are the implementation options?
- What are the principles, requirements, and challenges of implementing Data Products?
- How to organize to create data products in a decentralized environment so you avoid chaos
- The critical importance of a data catalog in understanding what data is available
- How business glossaries can help ensure data products are understood and semantically linked
- A best practice operating model for coordinating development of Data Products across different domains to succeed in implementation
- What software is required to build, operate and govern Data Products for use in a Data Lake, a Data Lakehouse or Data Warehouse?
- What is Data Fabric software, how does it integrate with data catalogs and connect to data in your data estate
- An implementation methodology to produce ready-made, trusted, reusable Data Products
- Collaborative domain-oriented development of modular and distributed DataOps pipelines to create Data Products
- How a data catalog, Generative AI and automation software can be used to generate DataOps pipelines to create Data Products
- Managing data quality, privacy, access security, versioning and the lifecycle of Data Products
- Publishing semantically linked Data Products in a data marketplace for others to consume and use
- Federated data architecture and Data Products - the emergence of lakehouses open tables as a way for multiple analytical workloads to access shared data products
- Persisting master Data Products in an MDM system
- Consuming and assembling Data Products in multiple analytical systems like data warehouses, lakehouses and graph databases to shorten time to value
- How to implement federated data governance
Who should attend
This course is intended for business data analysts, data architects, chief data officers, master data management professionals, data scientists, ETL developers and data governance professionals.
Prerequisites
This course assumes you understand basic data management principles and data architecture plus a reasonable understanding of data cleansing, data integration, data catalogs, data lakes and data governance.
Related Content
What is a Data Mesh and how does it differ from a Data Lake and a Data Lakehouse?