{"id":4815,"date":"2020-04-19T18:52:32","date_gmt":"2020-04-19T18:52:32","guid":{"rendered":"https:\/\/www.testpreptraining.com\/tutorial\/?page_id=4815"},"modified":"2022-04-05T05:45:47","modified_gmt":"2022-04-05T05:45:47","slug":"streaming-pipeline-google-professional-data-engineer-gcp","status":"publish","type":"page","link":"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/","title":{"rendered":"Streaming Pipeline Google Professional Data Engineer GCP"},"content":{"rendered":"\n<p>In this, we will learn the concepts of Streaming Pipeline for Google Professional Data Engineer GCP exam.<\/p>\n\n\n<h3 class=\"devsite-page-title\"><strong>Streaming pipelines<\/strong><\/h3>\n<p>In streaming pipelines, unbounded PCollections, or unbounded collections, represent data. Data from a continually changing data source, such as Pub\/Sub, is stored in an unbounded collection. In an unlimited collection, you can&#8217;t just use a key to organize elements. Because the data source is continually adding new components, there might be an endless number of elements for a particular key in streaming data. To aggregate elements in unbounded collections, you can utilize windows, watermarks, and triggers. Bounded PCollections that represent data in batch pipelines are also considered windows.<\/p>\n<h5><strong>Windows and windowing functions<\/strong><\/h5>\n<p>Unbounded collections are divided into logical components, or windows, using windowing techniques. Windowing functions use the timestamps of individual elements to organize unbounded collections. There are a limited amount of items in each window.<\/p>\n<p>With the Apache Beam SDK or Dataflow SQL streaming extensions, you may specify the following windows:<\/p>\n<ul>\n<li><strong>Tumbling windows (called fixed windows in Apache Beam):<\/strong> In the data stream, a tumbling window represents a consistent, discontinuous temporal span.<\/li>\n<li><strong>Hopping windows (called sliding windows in Apache Beam): <\/strong>In the data stream, a hopping window indicates a continuous time interval. Tumbling windows are discontinuous, whereas hopping windows might overlap.<\/li>\n<li><strong>Sessions Windows:\u00a0<\/strong>A session window is made up of items that are separated by a time gap. A data stream&#8217;s gap duration is the time between new data. Data is allocated to a new window if it comes after the gap time.<\/li>\n<\/ul>\n<h5><strong>Watermarks<\/strong><\/h5>\n<p>A watermark is a threshold that tells Dataflow when all of the data in a window is expected to arrive. The data is considered late if it comes with a timestamp that is inside the timeframe but older than the watermark.<\/p>\n<p>Watermarks are tracked by Dataflow for the following reasons:<\/p>\n<ul>\n<li>Data does not always come in the same sequence or at consistent intervals.<\/li>\n<li>The order in which data events are created does not ensure that they will appear in pipelines in the same order in which they were generated.<\/li>\n<\/ul>\n<h5><strong>Triggers<\/strong><\/h5>\n<p>As data enters, triggers determine when to emit aggregated findings. By default, when the watermark reaches the edge of the window, the results are emitted.\u00a0Create or edit triggers for each collection in a streaming pipeline using the Apache Beam SDK. Dataflow SQL does not allow you to create triggers.\u00a0The Apache Beam SDK allows you to create triggers based on any combination of the following criteria:<\/p>\n<ul>\n<li>The timestamp on each data piece indicates the event time.<\/li>\n<li>Processing time refers to the amount of time it takes for a data element to be processed at each stage of the pipeline.<\/li>\n<li>A collection&#8217;s number of data components.<\/li>\n<\/ul>\n\n\n<p><strong><a href=\"https:\/\/cloud.google.com\/dataflow\/docs\/concepts\/streaming-pipelines\" target=\"_blank\" rel=\"noreferrer noopener\">For more check here.<\/a><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this, we will learn the concepts of Streaming Pipeline for Google Professional Data Engineer GCP exam. Streaming pipelines In streaming pipelines, unbounded PCollections, or unbounded collections, represent data. Data from a continually changing data source, such as Pub\/Sub, is stored in an unbounded collection. In an unlimited collection, you can&#8217;t just use a key&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"categories":[617],"tags":[692,619,699,623,622,618,621],"class_list":["post-4815","page","type-page","status-publish","hentry","category-google-gcp","tag-cloud-dataflow","tag-data-engineer","tag-dataflow-streaming-pipeline","tag-gcp","tag-google-certification","tag-google-cloud","tag-professional-data-engineer"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Streaming Pipeline Google Professional Data Engineer GCP<\/title>\n<meta name=\"description\" content=\"Enhance your GCP Data Engineer exam preparation level by learning and understanding the concepts of Streaming pipelines Now!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Streaming Pipeline Google Professional Data Engineer GCP\" \/>\n<meta property=\"og:description\" content=\"Enhance your GCP Data Engineer exam preparation level by learning and understanding the concepts of Streaming pipelines Now!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/\" \/>\n<meta property=\"og:site_name\" content=\"Testprep Training Tutorials\" \/>\n<meta property=\"article:modified_time\" content=\"2022-04-05T05:45:47+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/\",\"name\":\"Streaming Pipeline Google Professional Data Engineer GCP\",\"isPartOf\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\"},\"datePublished\":\"2020-04-19T18:52:32+00:00\",\"dateModified\":\"2022-04-05T05:45:47+00:00\",\"description\":\"Enhance your GCP Data Engineer exam preparation level by learning and understanding the concepts of Streaming pipelines Now!\",\"breadcrumb\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Streaming Pipeline Google Professional Data Engineer GCP\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"name\":\"Testprep Training Tutorials\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\",\"name\":\"Testprep Training\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"contentUrl\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"width\":583,\"height\":153,\"caption\":\"Testprep Training\"},\"image\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Streaming Pipeline Google Professional Data Engineer GCP","description":"Enhance your GCP Data Engineer exam preparation level by learning and understanding the concepts of Streaming pipelines Now!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/","og_locale":"en_US","og_type":"article","og_title":"Streaming Pipeline Google Professional Data Engineer GCP","og_description":"Enhance your GCP Data Engineer exam preparation level by learning and understanding the concepts of Streaming pipelines Now!","og_url":"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/","og_site_name":"Testprep Training Tutorials","article_modified_time":"2022-04-05T05:45:47+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/","url":"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/","name":"Streaming Pipeline Google Professional Data Engineer GCP","isPartOf":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website"},"datePublished":"2020-04-19T18:52:32+00:00","dateModified":"2022-04-05T05:45:47+00:00","description":"Enhance your GCP Data Engineer exam preparation level by learning and understanding the concepts of Streaming pipelines Now!","breadcrumb":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/streaming-pipeline-google-professional-data-engineer-gcp\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.testpreptraining.ai\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Streaming Pipeline Google Professional Data Engineer GCP"}]},{"@type":"WebSite","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","name":"Testprep Training Tutorials","description":"","publisher":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization","name":"Testprep Training","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","contentUrl":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","width":583,"height":153,"caption":"Testprep Training"},"image":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/4815","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/comments?post=4815"}],"version-history":[{"count":2,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/4815\/revisions"}],"predecessor-version":[{"id":54111,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/4815\/revisions\/54111"}],"wp:attachment":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/media?parent=4815"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/categories?post=4815"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/tags?post=4815"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}