My Notes -
CREATE EXTERNAL TABLE AS SELECT (CETAS) statement includes a SELECT statement that queries and manipulates data from any valid data source (which could be an existing table or view in a database, or an OPENROWSET function that reads file-based data from the data lake)
Unpivot creates new rows based on names of columns. Pivot creates new columns based on values of rows.
Azure Stream Analytics job = > Job1 is configured to use one Streaming Unit (SU) and can be parallelized for up to three nodes. Each six SUs is one node; therefore three nodes will require a minimum of 18 SUs.
Linked services allow you to connect to your data source. Datasets are the databases that are available via the linked service. Activities contain the transformations or analysis of data factories. Pipelines are groups of activities.
Tumbling window is a valid type of trigger in Data Factory.
Custom event is a valid type of trigger in Data Factory.
You cannot use SharePoint to trigger a Data Factory pipeline directly. You can do it from a logic app.
You cannot use Twitter to trigger a Data Factory pipeline directly. You can do it from a logic app.
Change Tracking vs CDC
Change tracking captures the fact that a row was changed without tracking the data that was changed. Change tracking requires fewer server resources than change data capture.
Great Article Must read before the exam: on Different window functions -
Azure Stream Analytics offers five types of temporal windowing functions to aggregate events within specified time intervals. These are Tumbling, Hopping, Sliding, Session, and Snapshot windows:
Tumbling Windows: Segment a data stream into fixed-size, non-overlapping intervals. Each event belongs to a single window. Example: Aggregating maximum readings per minute using TumblingWindow(minute, 1).
Hopping Windows: Create overlapping windows that jump forward by a fixed period. Events can belong to multiple windows. Example: Maximum readings over 60-second windows, with a 30-second hop, using HoppingWindow(second, 60, 30).
Sliding Windows: Generate events when window content changes, outputting only when an event enters or exits. Example: Maximum readings in one-minute windows when events occur using SlidingWindow(minute, 1).
Session Windows: Cluster events occurring close together, separated by periods of inactivity. The window extends until a specified timeout or maximum duration is reached. Example: Session lengths in clickstream data with a 20-second timeout and 60-second max duration using SessionWindow(second, 20, 60).
Snapshot Windows: Group events by identical timestamps. This is done using System.Timestamp() in the GROUP BY clause without a specific window function. Example: Maximum readings for events occurring at the same time using GROUP BY System.Timestamp().
These windowing functions are utilized in Stream Analytics jobs to facilitate real-time data analysis and aggregation.
Hash optimizes querying only.
Round robin is optimized for loading data.
Replicated tables provides optimized query performance for small tables.
Clustered columnstore indexes work on segments of 1,048,576 rows. As Azure Synapse Analytics has 60 nodes per distribution, the minimum recommended number of rows for a clustered columnstore index is 60,000,000.
https://neilbagchi.notion.site/DP-203-Notes-by-Neil-Bagchi-0a01364f73324e98a4b25b90d37b020e