Tämä koulutus tarjoaa käytännönläheisen johdannon Azure Databricksin käyttöön suurten tietotekniikkakuormitusten suorittamiseen pilvessä. Osallistujat oppivat hyödyntämään Apache Sparkin tehoa ja tehokkaita klustereita Azure Databricks -ympäristössä, mikä mahdollistaa suurten tietomäärien käsittelyn ja analysoinnin tehokkaasti.

Tavoite

Koulutuksen tavoitteena on antaa osallistujille valmiudet käyttää Azure Databricksia suurten tietotekniikkakuormitusten suorittamiseen pilvessä. Osallistujat oppivat hyödyntämään Apache Sparkin tehoa ja tehokkaita klustereita Azure Databricks -ympäristössä, mikä mahdollistaa suurten tietomäärien käsittelyn ja analysoinnin tehokkaasti.

Kenelle

Koulutus on suunnattu datainsinööreille, datatieteilijöille ja ELT -kehittäjille, jotka haluavat oppia hyödyntämään Apache Sparkin tehoa ja tehokkaita klustereita Azure Databricks -ympäristössä suurten tietotekniikkakuormitusten suorittamiseen pilvessä.

Koulutuksen sisältö

Perform incremental processing with spark structured streaming

Understand Spark structured streaming
Some techniques to optimize structured streaming
How to handle late arriving or out of order events
How to set up real-time-sources for incremental processing
Lab: Real-time ingestion and processing with Delta Live Tables with Azure Databricks

Implement streaming architecture patterns with Delta Live Tables

Use Event driven architectures with Delta Live Tables
Ingest streaming data
Achieve Data consistency and reliability
Scale streaming workloads with Delta Live Tables
Lab: end-to-end streaming pipeline with Delta Live tables

Optimize performance with Spark and Delta Live Tables

Use serverless compute and parallelism with Delta live tables
Perform cost based optimization and query performance
Use Change Data Capture (CDC)
Apply enhanced autoscaling capabilities
Implement Observability and enhance data quality metrics
Lab: optimize data pipelines for better performance in Azure Databricks

Implement CICD workflows in Azure Databricks

Implement version control and Git integration
Perform unit testing and integration testing
Maintain environment and configuration management
Implement rollback and roll-forward strategies
Lab: Implement CI/CD workflows

Automate workloads with Azure Databricks Jobs

Implement job scheduling and automation
Optimize workflows with parameters
Handle dependency management
Implement error handling and retry mechanisms
Explore best practices and guidelines
Lab: Automate data ingestion and processing

Manage data privacy and governance with Azure Databricks

Implement data encryption techniques
Manage access controls
Implement data masking and anonymization
Use compliance frameworks and secure data sharing
Use data lineage and metadata management
Roll out governance automation
Lab: Practice the implementation of Unity Catalog

Use SQL Warehouses in Azure Databricks

Create and configure SQL Warehouses in Azure Databricks
Create databases and tables
Create queries and dashboards
Lab: Use a SQL Warehouse in Azure Databricks

Run Azure Databricks Notebooks with Azure Data Factory

Describe how Azure Databricks notebooks can be run in a pipeline
Create an Azure Data Factory linked service for Azure Databricks
Use a Notebook activity in a pipeline
Pass parameters to a notebook
Lab: Run an Azure Databricks Notebook with Azure Data Factory

Avainsanat

Applied Skills, Azure Databricks, Apache Spark, Tietotekniikka, Pilvipalvelut, Tietomäärien käsittely, Analysointi, ELT -kehitys

Paikka	Etäkoulutus
Päivämäärä	13.4.2026

DP-3027 Implement a data engineering solution with Azure Databricks 🆕