{"id":4651,"date":"2020-04-13T17:04:44","date_gmt":"2020-04-13T17:04:44","guid":{"rendered":"https:\/\/www.testpreptraining.com\/tutorial\/?page_id=4651"},"modified":"2020-04-17T19:10:15","modified_gmt":"2020-04-17T19:10:15","slug":"ingest-google-professional-data-engineer-gcp","status":"publish","type":"page","link":"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/","title":{"rendered":"Ingest Google Professional Data Engineer GCP"},"content":{"rendered":"<ul>\n<li>Capture raw data depending on the data\u2019s size, source, and latency<\/li>\n<li>Various ingest sources\n<ul>\n<li>App: Data from app events, like log files or user events<\/li>\n<li>Streaming: A continuous stream of small, asynchronous messages.<\/li>\n<li>Batch: Large amounts of data in set of files to transfer to storage in bulk.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Google Cloud services map for app\/streaming and batch workloads &#8211;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-4668\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2020\/04\/Professional-Data-Engineer-Google-Cloud-image009-633x400.png\" alt=\"\" width=\"633\" height=\"400\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2020\/04\/Professional-Data-Engineer-Google-Cloud-image009-633x400.png 633w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2020\/04\/Professional-Data-Engineer-Google-Cloud-image009.png 736w\" sizes=\"auto, (max-width: 633px) 100vw, 633px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The data transfer model you choose depends on workload, and each model has different infrastructure requirements.<\/p>\n<h3>Ingesting app data<\/h3>\n<ul>\n<li>Consists of apps and services data and includes<\/li>\n<li>app event logs<\/li>\n<li>clickstream data<\/li>\n<li>social network interactions<\/li>\n<li>e-commerce transactions<\/li>\n<li>App data helps in showing user trends and gives business insights<\/li>\n<li>GCP hosts apps from App Engine (managed platform) and Google Kubernetes Engine (GKE &#8211; container management).<\/li>\n<li>Use cases of GCP hosted apps\n<ul>\n<li>Writing data to a file: App outputs batch CSV files to the object store of Cloud Storage then to import function of BigQuery, an data warehouse, for analysis and querying.<\/li>\n<li>Writing data to a database: App writes data to GCP database service<\/li>\n<li>Streaming data as messages: App streams data to Pub\/Sub and other app, subscribed to the messages, can transfer the data to storage or process it immediately in situations such as fraud detection.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>\u00a0<\/strong>Cloud Logging<\/p>\n<ul>\n<li>A centralized log management service<\/li>\n<li>Collects log data from apps running on GCP.<\/li>\n<li>Export data collected by Cloud Logging and send the data to Cloud Storage, Pub\/Sub, and BigQuery.<\/li>\n<li>Many GCP services automatically record log data to Cloud Logging like App Engine<\/li>\n<li>Also provide custom logging messages to stdout and stderr<\/li>\n<li>displays data in the Logs Viewer.<\/li>\n<li>Involves a logging agent, based on fluentd, which run on VM instances<\/li>\n<li>Agent streams log data<\/li>\n<\/ul>\n<h3>Ingesting streaming data<\/h3>\n<ul>\n<li>Streaming data is\n<ul>\n<li>delivered asynchronously<\/li>\n<li>without expecting a reply<\/li>\n<li>are small in size<\/li>\n<\/ul>\n<\/li>\n<li>Streaming data can\n<ul>\n<li>fire event triggers<\/li>\n<li>perform complex session analysis<\/li>\n<li>be input for ML tasks.<\/li>\n<\/ul>\n<\/li>\n<li>Streaming Data Use cases\n<ul>\n<li>Telemetry data: Data from network-connected Internet of Things (IoT) devices who gather data about surrounding environment by sensors.<\/li>\n<li>User events and analytics: Mobile app logging events about app usage, crash, etc<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Pub\/Sub<\/h3>\n<ul>\n<li>A real-time messaging service\n<ul>\n<li>sends and receives messages between apps<\/li>\n<\/ul>\n<\/li>\n<li>A use cases is inter-app messaging to ingest streaming event data.<\/li>\n<li>Pub\/Sub automatically manages\n<ul>\n<li>Sharding<\/li>\n<li>replication<\/li>\n<li>load-balancing<\/li>\n<li>partitioning of the incoming data streams.<\/li>\n<\/ul>\n<\/li>\n<li>Pub\/Sub has global endpoints using GCP load balancer, with minimal latency.<\/li>\n<li>Automatic scaling to meet demand, without pre-provisioning the system resources.<\/li>\n<li>Message streams re organized as topics.\n<ul>\n<li>Streaming data target a topic<\/li>\n<li>each message has unique identifier and timestamp.<\/li>\n<\/ul>\n<\/li>\n<li>After data ingestion, apps can retrieve messages by using a topic subscription in a pull or push model.\n<ul>\n<li>In a push subscription, server sends a request to the subscriber app at a preconfigured URL endpoint.<\/li>\n<li>In the pull model, the subscriber requests messages from the server and acknowledges receipt.<\/li>\n<\/ul>\n<\/li>\n<li>Pub\/Sub guarantees message delivery at least once per subscriber.<\/li>\n<li>No guarantees about the order of message delivery.<\/li>\n<li>For strict message ordering with buffering, use Dataflow for real-time processing\n<ul>\n<li>After processing, move the data into Datastore\/BigQuery.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Ingesting bulk data<\/h3>\n<ul>\n<li>Bulk data is\n<ul>\n<li>large datasets<\/li>\n<li>ingestion needs high aggregate bandwidth between a small sources and the target.<\/li>\n<\/ul>\n<\/li>\n<li>Data can be\n<ul>\n<li>files (CSV, JSON, Avro, or Parquet files) or in<\/li>\n<li>a relational database<\/li>\n<li>NoSQL database<\/li>\n<\/ul>\n<\/li>\n<li>Source data can be on-premises or on other cloud platforms.<\/li>\n<li>Use cases\n<ul>\n<li>Scientific workloads<\/li>\n<li>Migrating to the cloud<\/li>\n<li>Backing up data or Replication<\/li>\n<li>Importing legacy data<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Storage Transfer Service<\/p>\n<ul>\n<li>Managed file transfer to a Cloud Storage bucket<\/li>\n<li>Data source can be\n<ul>\n<li>AWS S3 bucket<\/li>\n<li>a web-accessible URL<\/li>\n<li>another Cloud Storage bucket.<\/li>\n<\/ul>\n<\/li>\n<li>Used for bulk transfer<\/li>\n<li>Optimized for 1 TB or more data volumes.<\/li>\n<li>Usually used for backing up data to archive storage bucket<\/li>\n<li>Supports one-time transfers or recurring transfers.<\/li>\n<li>Has advanced filters based on file creation dates\/filename\/times of day<\/li>\n<li>Supports the deletion of the source data after it\u2019s been copied.<\/li>\n<\/ul>\n<p>Transfer Appliance:<\/p>\n<ul>\n<li>A shippable, high-capacity storage server<\/li>\n<li>It is leased from Google.<\/li>\n<li>connect it to network, load data and ship to an upload facility.<\/li>\n<li>Appliance comes in multiple sizes<\/li>\n<li>Use appliance a per cost and time feasibility for same<\/li>\n<li>Appliance deduplicates, compresses, and encrypts captured data with strong AES-256 encryption using a password and passphrase given by user. During reading of data from Cloud Storage, same password and passphrase are needed.<\/li>\n<\/ul>\n<p>gsutil<\/p>\n<ul>\n<li>A command-line utility<\/li>\n<li>moves file-based data from any existing file system into Cloud Storage.<\/li>\n<li>Written in Python and runs on Linux, macOS and Windows.<\/li>\n<li>It can also\n<ul>\n<li>create and manage Cloud Storage buckets<\/li>\n<li>edit access rights of objects<\/li>\n<li>copy objects from Cloud Storage.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Database migration<\/p>\n<ul>\n<li><strong>For RDBMS data, can migrate to Cloud SQL and Cloud Spanner.<\/strong><\/li>\n<li><strong>For Data warehouses data, migrate to <\/strong><\/li>\n<li><strong>For NoSQL databases migrate to Bigtable (for column-oriented NoSQL) a<\/strong>nd Datastore (for JSON-oriented NoSQL).<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Capture raw data depending on the data\u2019s size, source, and latency Various ingest sources App: Data from app events, like log files or user events Streaming: A continuous stream of small, asynchronous messages. Batch: Large amounts of data in set of files to transfer to storage in bulk. Google Cloud services map for app\/streaming and&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"categories":[617],"tags":[619,644,623,622,618,645,621],"class_list":["post-4651","page","type-page","status-publish","hentry","category-google-gcp","tag-data-engineer","tag-data-lifecycle","tag-gcp","tag-google-certification","tag-google-cloud","tag-ingest-phase","tag-professional-data-engineer"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Ingest Google Professional Data Engineer GCP - Testprep Training Tutorials<\/title>\n<meta name=\"description\" content=\"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Ingest\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Ingest Google Professional Data Engineer GCP - Testprep Training Tutorials\" \/>\n<meta property=\"og:description\" content=\"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Ingest\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/\" \/>\n<meta property=\"og:site_name\" content=\"Testprep Training Tutorials\" \/>\n<meta property=\"article:modified_time\" content=\"2020-04-17T19:10:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2020\/04\/Professional-Data-Engineer-Google-Cloud-image009-633x400.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/\",\"name\":\"Ingest Google Professional Data Engineer GCP - Testprep Training Tutorials\",\"isPartOf\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\"},\"datePublished\":\"2020-04-13T17:04:44+00:00\",\"dateModified\":\"2020-04-17T19:10:15+00:00\",\"description\":\"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Ingest\",\"breadcrumb\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Ingest Google Professional Data Engineer GCP\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"name\":\"Testprep Training Tutorials\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\",\"name\":\"Testprep Training\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"contentUrl\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"width\":583,\"height\":153,\"caption\":\"Testprep Training\"},\"image\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Ingest Google Professional Data Engineer GCP - Testprep Training Tutorials","description":"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Ingest","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/","og_locale":"en_US","og_type":"article","og_title":"Ingest Google Professional Data Engineer GCP - Testprep Training Tutorials","og_description":"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Ingest","og_url":"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/","og_site_name":"Testprep Training Tutorials","article_modified_time":"2020-04-17T19:10:15+00:00","og_image":[{"url":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2020\/04\/Professional-Data-Engineer-Google-Cloud-image009-633x400.png"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/","url":"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/","name":"Ingest Google Professional Data Engineer GCP - Testprep Training Tutorials","isPartOf":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website"},"datePublished":"2020-04-13T17:04:44+00:00","dateModified":"2020-04-17T19:10:15+00:00","description":"Google Cloud Certified Professional Data Engineer Tutorial, dumps, brief notes on Ingest","breadcrumb":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/ingest-google-professional-data-engineer-gcp\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.testpreptraining.ai\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Ingest Google Professional Data Engineer GCP"}]},{"@type":"WebSite","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","name":"Testprep Training Tutorials","description":"","publisher":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization","name":"Testprep Training","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","contentUrl":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","width":583,"height":153,"caption":"Testprep Training"},"image":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/4651","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/comments?post=4651"}],"version-history":[{"count":3,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/4651\/revisions"}],"predecessor-version":[{"id":4693,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/4651\/revisions\/4693"}],"wp:attachment":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/media?parent=4651"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/categories?post=4651"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/tags?post=4651"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}