{"id":4800,"date":"2020-04-19T18:35:12","date_gmt":"2020-04-19T18:35:12","guid":{"rendered":"https:\/\/www.testpreptraining.com\/tutorial\/?page_id=4800"},"modified":"2020-04-19T18:35:12","modified_gmt":"2020-04-19T18:35:12","slug":"key-concepts-google-professional-data-engineer-gcp","status":"publish","type":"page","link":"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/","title":{"rendered":"Key Concepts Google Professional Data Engineer GCP"},"content":{"rendered":"<p><strong>Pipelines<\/strong><\/p>\n<ul>\n<li>encapsulates the all processes in reading input data, transforming that data, and writing output data.<\/li>\n<li>The input source and output sink can be the same or of different types<\/li>\n<li>Apache Beam programs start by constructing a Pipeline object<\/li>\n<li>then using that object as the basis for creating the pipeline&#8217;s datasets.<\/li>\n<li>Each pipeline represents a single, repeatable job.<\/li>\n<\/ul>\n<p><strong>PCollection<\/strong><\/p>\n<ul>\n<li>It represents a distributed, multi-element dataset<\/li>\n<li>acts as the pipeline&#8217;s data.<\/li>\n<li>Apache Beam transforms use PCollection objects as inputs and outputs for each step in pipeline.<\/li>\n<li>A PCollection can hold a dataset of a fixed size or an unbounded dataset from a continuously updating data source.<\/li>\n<\/ul>\n<p><strong>Transforms<\/strong><\/p>\n<ul>\n<li>Represent a processing operation that transforms data.<\/li>\n<li>takes one or more PCollections as input, performs an specified operation and produces one or more PCollections as output.<\/li>\n<li>can perform many kind of processing operation<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><strong>ParDo<\/strong><\/p>\n<ul>\n<li>It is the core parallel processing operation in the Apache Beam SDKs,<\/li>\n<li>It invokes a user-specified function on each of the elements of the input PCollection.<\/li>\n<li>ParDo collects the zero or more output elements into an output PCollection.<\/li>\n<li>The ParDo transform processes elements independently and possibly in parallel.<\/li>\n<\/ul>\n<p><strong>Pipeline I\/O<\/strong><\/p>\n<ul>\n<li>let you read data into pipeline and write output data from pipeline.<\/li>\n<li>consists of a source and a sink.<\/li>\n<li>can also write a custom I\/O connector.<\/li>\n<\/ul>\n<p><strong>Aggregation<\/strong><\/p>\n<ul>\n<li>process of computing some value from multiple input elements.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>Side input<\/p>\n<ul>\n<li>Can be static like constant<\/li>\n<li>Can also be a list or map. If side input is a pcollection, we first convert to list or map and pass that as side input.<\/li>\n<li>Call parDo.withsideInputs with the map or list<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>Mapreduce<\/p>\n<ul>\n<li>Map \u2013 operates in parallel, reduce \u2013 aggregates based on key<\/li>\n<li>parDo acts on one item at a time, similar to map operation in mapreduce, should not have state\/history. Useful for filtering, mapping.<\/li>\n<li>In python, map done using map for 1:1, flatmap for non 1:1. In Java, done using parDo<\/li>\n<\/ul>\n<p><strong>User-defined functions (UDFs)<\/strong><\/p>\n<ul>\n<li>Apache Beam allow executing user-defined code to configure the transform.<\/li>\n<li>For ParDo, user-defined code specifies the operation to apply to every element,<\/li>\n<li>UDFs can be written in a different language than the language of runner.<\/li>\n<\/ul>\n<p><strong>Runner<\/strong><\/p>\n<ul>\n<li>the software that accepts a pipeline and executes it.<\/li>\n<li>runners are translators or adapters to massively parallel big-data processing systems.<\/li>\n<\/ul>\n<p><strong>Event time<\/strong><\/p>\n<ul>\n<li>The time a data event occurs,<\/li>\n<li>determined by the timestamp on the data element itself.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><strong>Windowing<\/strong><\/p>\n<ul>\n<li>enables grouping operations over unbounded collections<\/li>\n<li>divides the collection into windows of finite collections<\/li>\n<\/ul>\n<p><strong>Watermarks<\/strong><\/p>\n<ul>\n<li>Apache Beam tracks a watermark, all data in a certain window to have arrived in the pipeline.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><strong>Trigger<\/strong><\/p>\n<ul>\n<li>Triggers determine when to emit aggregated results as data arrives.<\/li>\n<li>For bounded data, results are emitted after all of the input has been processed.<\/li>\n<li>For unbounded data, results are emitted when the watermark passes the end of the window<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Pipelines encapsulates the all processes in reading input data, transforming that data, and writing output data. The input source and output sink can be the same or of different types Apache Beam programs start by constructing a Pipeline object then using that object as the basis for creating the pipeline&#8217;s datasets. Each pipeline represents a&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"categories":[617],"tags":[692,619,694,623,622,618,621],"class_list":["post-4800","page","type-page","status-publish","hentry","category-google-gcp","tag-cloud-dataflow","tag-data-engineer","tag-dataflow-key-concepts","tag-gcp","tag-google-certification","tag-google-cloud","tag-professional-data-engineer"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Key Concepts Google Professional Data Engineer GCP - Testprep Training Tutorials<\/title>\n<meta name=\"description\" content=\"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Key Concepts\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Key Concepts Google Professional Data Engineer GCP - Testprep Training Tutorials\" \/>\n<meta property=\"og:description\" content=\"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Key Concepts\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/\" \/>\n<meta property=\"og:site_name\" content=\"Testprep Training Tutorials\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/\",\"name\":\"Key Concepts Google Professional Data Engineer GCP - Testprep Training Tutorials\",\"isPartOf\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\"},\"datePublished\":\"2020-04-19T18:35:12+00:00\",\"dateModified\":\"2020-04-19T18:35:12+00:00\",\"description\":\"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Key Concepts\",\"breadcrumb\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Key Concepts Google Professional Data Engineer GCP\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"name\":\"Testprep Training Tutorials\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\",\"name\":\"Testprep Training\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"contentUrl\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"width\":583,\"height\":153,\"caption\":\"Testprep Training\"},\"image\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Key Concepts Google Professional Data Engineer GCP - Testprep Training Tutorials","description":"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Key Concepts","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/","og_locale":"en_US","og_type":"article","og_title":"Key Concepts Google Professional Data Engineer GCP - Testprep Training Tutorials","og_description":"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Key Concepts","og_url":"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/","og_site_name":"Testprep Training Tutorials","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/","url":"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/","name":"Key Concepts Google Professional Data Engineer GCP - Testprep Training Tutorials","isPartOf":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website"},"datePublished":"2020-04-19T18:35:12+00:00","dateModified":"2020-04-19T18:35:12+00:00","description":"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Key Concepts","breadcrumb":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/key-concepts-google-professional-data-engineer-gcp\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.testpreptraining.ai\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Key Concepts Google Professional Data Engineer GCP"}]},{"@type":"WebSite","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","name":"Testprep Training Tutorials","description":"","publisher":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization","name":"Testprep Training","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","contentUrl":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","width":583,"height":153,"caption":"Testprep Training"},"image":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/4800","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/comments?post=4800"}],"version-history":[{"count":1,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/4800\/revisions"}],"predecessor-version":[{"id":4812,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/4800\/revisions\/4812"}],"wp:attachment":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/media?parent=4800"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/categories?post=4800"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/tags?post=4800"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}