{"id":1091,"date":"2019-07-09T11:06:51","date_gmt":"2019-07-09T11:06:51","guid":{"rendered":"https:\/\/www.testpreptraining.com\/tutorial\/?page_id=1091"},"modified":"2020-05-01T10:08:42","modified_gmt":"2020-05-01T10:08:42","slug":"identify-the-appropriate-data-processing-technology-for-a-given-scenario","status":"publish","type":"page","link":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/","title":{"rendered":"Identify the Appropriate Data Processing Technology for a given scenario"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\"><strong>Real-time file processing<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Use S3 to trigger AWS Lambda to process data immediately after an upload. <\/li><li>Real-time processing examples <ul><li>thumbnail images<\/li><\/ul><ul><li>transcode videos<\/li><\/ul><ul><li>index files<\/li><\/ul><ul><li>process logs<\/li><\/ul><ul><li>validate content<\/li><li>aggregate and filter data in real-time. <\/li><\/ul><\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"218\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image018-750x218.png\" alt=\"\" class=\"wp-image-1293\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image018-750x218.png 750w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image018.png 951w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Real-time stream processing<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Use AWS Lambda and Amazon Kinesis to process real-time streaming data <\/li><li>Real-time streaming data example<ul><li>application activity tracking<\/li><\/ul><ul><li>transaction order processing<\/li><\/ul><ul><li>click stream analysis<\/li><\/ul><ul><li>data cleansing<\/li><\/ul><ul><li>metrics generation<\/li><\/ul><ul><li>log filtering<\/li><\/ul><ul><li>indexing<\/li><\/ul><ul><li>social media analysis<\/li><li>IoT device data telemetry and metering. <\/li><\/ul><\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"218\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image019-750x218.png\" alt=\"\" class=\"wp-image-1295\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image019-750x218.png 750w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image019.png 951w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Extract, transform, load<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Lambda to perform ETL, as<ul><li>data validation<\/li><\/ul><ul><li>filtering<\/li><\/ul><ul><li>sorting<\/li><\/ul><ul><li>other transformations for every data change in a DynamoDB table <\/li><\/ul><\/li><li>load the transformed data to another data store. <\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"178\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image020-1-750x178.png\" alt=\"\" class=\"wp-image-1296\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image020-1-750x178.png 750w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image020-1.png 1163w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>IoT backends<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Build serverless backends with Lambda <\/li><li>Manage <ul><li>Web applications<\/li><\/ul><ul><li>Mobile applications<\/li><\/ul><ul><li>Internet of Things (IoT)<\/li><li>3rd party API requests <\/li><\/ul><\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"218\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image021-750x218.png\" alt=\"\" class=\"wp-image-1297\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image021-750x218.png 750w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image021.png 951w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Mobile backends<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Build Mobile backends to create rich,\npersonalized app experiences <\/li><li>Lambda and Amazon API Gateway to <ul><li>authenticate\nand process API requests<\/li><\/ul><\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"178\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image022-750x178.png\" alt=\"\" class=\"wp-image-1298\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image022-750x178.png 750w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image022.png 1163w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Web Applications<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Build powerful web applications with Lambda <\/li><li>Applications can <ul><li>automatically scale up and down <\/li><\/ul><ul><li>run in a highly available configuration <\/li><li>zero administrative effort. <\/li><\/ul><\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"178\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image023-750x178.png\" alt=\"\" class=\"wp-image-1299\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image023-750x178.png 750w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image023.png 1162w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>AWS Lambda<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>It is a compute service <\/li><li>Runs code without provisioning or managing\nservers. <\/li><li>It executes code only when needed and scales\nautomatically<\/li><li>Upload code and Lambda takes care of everything.\n<\/li><li>Set up code to automatically trigger from other\nAWS services <\/li><li>Call Code directly from any web or mobile app.<\/li><li>Code can be executed against triggers &#8211; changes\nin data, shifts in system state, or actions by users. Direct trigger from AWS\nservices &#8211; S3, DynamoDB, Kinesis, SNS, and CloudWatch<\/li><li>Trigger can be orchestrated into workflows by\nAWS Step Functions. <\/li><li>Can build a variety of real-time serverless data\nprocessing systems.<\/li><li>Customer is responsible only for code. <\/li><li>Lambda manages the memory, CPU, network, and\nother resources. <\/li><li>You cannot log in to compute instances, or\ncustomize the operating system or language runtime. <\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Lambda Working<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Lambda runs functions in a serverless\nenvironment to process events. <\/li><li>Each instance of function runs in an isolated\nexecution context <\/li><li>one event at a time is processed.<\/li><li>After finishing event processing, a response is\nreturned and Lambda sends it another event. <\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Lambda Components <\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Function \u2013 A script or program that runs in AWS\nLambda. Lambda passes invocation events to function. The function processes an\nevent and returns a response. <\/li><li>Runtimes \u2013 Lambda runtimes allow functions in\ndifferent languages to run in the same base execution environment. You\nconfigure function to use a runtime that matches programming language. The\nruntime sits in-between the Lambda service and function code, relaying\ninvocation events, context information, and responses between the two. You can\nuse runtimes provided by Lambda, or build own. <\/li><li>Layers \u2013 Lambda layers are a distribution\nmechanism for libraries, custom runtimes, and other function dependencies.\nLayers let you manage in-development function code independently from the\nunchanging code and resources that it uses. You can configure function to use\nlayers that you create, layers provided by AWS, or layers from other AWS\ncustomers. <\/li><li>Event source \u2013 An AWS service, such as Amazon\nSNS, or a custom service, that triggers function and executes its logic. <\/li><li>Downstream resources \u2013 An AWS service, such as\nDynamoDB tables or Amazon S3 buckets, that Lambda function calls once it is\ntriggered.<\/li><li>Log streams \u2013Lambda monitors function\ninvocations and reports metrics to CloudWatch. Annotate function code with\ncustom logging statements to analyze the execution flow and performance of\nLambda function to ensure it&#8217;s working properly.<\/li><li>AWS SAM \u2013 A model to define serverless\napplications. AWS SAM is natively supported by AWS CloudFormation and defines\nsimplified syntax for expressing serverless resources. <\/li><\/ul>\n\n\n\n<p>Lambda function configuration, deployments, and execution limits <\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><td><strong>Resource<\/strong><\/td><td><strong>Limit<\/strong><\/td><\/tr><\/thead><tbody><tr><td>Function memory allocation<\/td><td>128 MB to 3,008 MB, in 64 MB increments.<\/td><\/tr><tr><td>Function timeout<\/td><td>900 seconds (15 minutes)<\/td><\/tr><tr><td>Function environment variables<\/td><td>4 KB<\/td><\/tr><tr><td>Function resource-based policy<\/td><td>20 KB<\/td><\/tr><tr><td>Function layers<\/td><td>5 layers<\/td><\/tr><tr><td>Invocation frequency (requests per second)   <\/td><td>10 x concurrent executions limit (synchronous \u2013  all sources) 10 x concurrent executions limit (asynchronous \u2013 non-AWS sources) Unlimited (asynchronous \u2013 AWS service sources) <\/td><\/tr><tr><td>Invocation payload (request and response)<\/td><td>6 MB (synchronous) 256 KB (asynchronous)<\/td><\/tr><tr><td>Deployment package size<\/td><td>50 MB (zipped, for direct upload) 250 MB (unzipped, including layers) 3 MB (console editor)<\/td><\/tr><tr><td>Test events (console editor)<\/td><td>10<\/td><\/tr><tr><td><code>\/tmp<\/code> directory storage<\/td><td>512 MB<\/td><\/tr><tr><td>File descriptors   <\/td><td>1,024   <\/td><\/tr><tr><td>Execution processes\/threads   <\/td><td>1,024   <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>AWS Lambda-based application lifecycle<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>authoring code<\/li><li>deploying code to AWS Lambda<\/li><li> monitoring and troubleshooting <\/li><\/ul>\n\n\n\n<p>AWS Lambda supported languages, their tools and options <\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><td><strong>Language<\/strong>    <\/td><td><strong>Tools and Options for Authoring Code<\/strong><\/td><\/tr><\/thead><tbody><tr><td>Node.js   <\/td><td>AWS Lambda consoleVisual Studio, with IDE plug-in own authoring environment<\/td><\/tr><tr><td>Java   <\/td><td>Eclipse, with AWS Toolkit for Eclipse IntelliJ, with the AWS Toolkit for IntelliJown authoring environment         <\/td><\/tr><tr><td>C#   <\/td><td>Visual Studio, with IDE plug-in .NET Core own authoring environment         <\/td><\/tr><tr><td>Python   <\/td><td>AWS Lambda consolePyCharm, with the AWS Toolkit for PyCharmown authoring environment     <\/td><\/tr><tr><td>Ruby   <\/td><td>AWS Lambda consoleown authoring environment <\/td><\/tr><tr><td>Go   <\/td><td>own authoring environment   <\/td><\/tr><tr><td>PowerShell   <\/td><td>own authoring environmentPowerShell Core 6.0 .NET Core 2.1 SDK AWSLambdaPSCore Module             <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>AWS Glue<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>It is a fully managed ETL (extract, transform, and load) service <\/li><li>Simple and cost-effective to <ul><li>categorize data<\/li><\/ul><ul><li>clean it<\/li><\/ul><ul><li>enrich it<\/li><\/ul><ul><li>move it reliably between various data stores. <\/li><\/ul><\/li><li>It consists of a central metadata repository &#8211; <a href=\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/\">AWS Glue Data<\/a> Catalog<\/li><li>Data Catalog is <ul><li>an ETL engine <\/li><\/ul><ul><li>automatically generates Python or Scala code<\/li><\/ul><ul><li>a flexible scheduler <\/li><\/ul><ul><li>handles dependency resolution, job monitoring, and retries. <\/li><\/ul><\/li><li>AWS Glue is serverless, so no infrastructure to set up or manage.<\/li><li>Use the AWS Glue console to discover data and transform it<\/li><li>Console can also call services to orchestrate the work required<\/li><li>Also use AWS Glue API operations to interface with AWS Glue services. <\/li><li>Edit, debug, and test Python or Scala code <\/li><li>Apache Spark ETL code using a familiar development environment. <\/li><\/ul>\n\n\n\n<p>AWS Glue as a data warehouse:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Discovers and catalogs metadata about data\nstores into a central catalog. <\/li><li>Can also process semi-structured data, such as\nclickstream or process logs.<\/li><li>Populates with table definitions from scheduled\ncrawler programs. <\/li><li>Generates ETL scripts to transform, flatten, and\nenrich data from source to target.<\/li><li>Detects schema changes and adapts based on\npreferences.<\/li><li>Triggers ETL jobs based on a schedule or event. <\/li><li>Triggers can be used to create a dependency flow\nbetween jobs.<\/li><li>Gathers runtime metrics to monitor the\nactivities of data warehouse.<\/li><li>Handles errors and retries automatically.<\/li><li>Scales resources, as needed, to run jobs.<\/li><li>Define jobs in AWS Glue to accomplish the work <\/li><\/ul>\n\n\n\n<p>AWS Glue typical actions<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Define a crawler to populate Glue Data Catalog\nwith metadata table definitions. Point crawler at a data store, and the crawler\ncreates table definitions in the Data Catalog.<\/li><li>Glue Data Catalog can contain other metadata to\ndefine ETL jobs. <\/li><li>It can generate a script to transform data. Or,\nprovide your script in Glue console or API.<\/li><li>Run job on demand, or start when a specified\ntrigger occurs. <\/li><li>The trigger can be a time-based schedule or an\nevent.<\/li><li>When job runs, a script extracts data from data\nsource, transforms the data, and loads it to data target. <\/li><li>The script runs in an Apache Spark environment\nin AWS Glue.<\/li><\/ul>\n\n\n\n<p><strong>AWS Glue Components<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>AWS Glue Data Catalog &#8211; The persistent metadata store in AWS Glue. Each AWS account has one AWS Glue Data Catalog. It contains table definitions, job definitions, and other control information to manage AWS Glue environment.<\/li><li>Classifier &#8211; Determines the schema of data. AWS Glue provides classifiers for common file types, such as CSV, JSON, AVRO, XML, and others. It also provides classifiers for common relational database management systems using a JDBC connection. You can write own classifier by using a grok pattern or by specifying a row tag in an XML document.<\/li><li>Connection &#8211; Contains the properties that are required to connect to data store.<\/li><li>Crawler &#8211; A program that connects to a data store (source or target), progresses through a prioritized list of classifiers to determine the schema for data, and then creates metadata in the AWS Glue Data Catalog.<\/li><li>Database &#8211; A set of associated table definitions organized into a logical group in AWS Glue.<\/li><li>Data store, data source, data target &#8211; A data store is a repository for persistently storing data. Examples include Amazon S3 buckets and relational databases. A data source is a data store that is used as input to a process or transform. A data target is a data store that a process or transform writes to.<\/li><li>Development endpoint &#8211; An environment that you can use to develop and test AWS Glue scripts.<\/li><li>Job &#8211; The business logic that is required to perform ETL work. It is composed of a transformation script, data sources, and data targets. Job runs are initiated by triggers that can be scheduled or triggered by events.<\/li><li>Notebook server &#8211; A web-based environment that you can use to run PySpark statements.<\/li><li>Script &#8211; Code that extracts data from sources, transforms it, and loads it into targets. AWS Glue generates PySpark or Scala scripts. <\/li><li>Table \u2013 It defines the schema of data. Data may be S3 file, an RDS table, or another set of data. It consists of names of columns, data type definitions, and other metadata about a base dataset. The schema of data is represented in AWS Glue table definition. The actual data remains in its original data store, whether it be in a file or a relational database table. AWS Glue catalogs files and relational database tables in the AWS Glue Data Catalog. They are used as sources and targets when you create an ETL job.<\/li><li>Transform &#8211; The code logic that is used to manipulate data into a different format.<\/li><li>Trigger &#8211; Initiates an ETL job. Triggers can be defined based on a scheduled time or an event. <\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Amazon EMR <\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>It is a managed cluster platform <\/li><li>Simplifies running big data frameworks &#8211; Apache\nHadoop and Apache Spark on AWS <\/li><li>Process and analyze vast amounts of data. <\/li><li>Uses Apache Hive and Apache Pig, to process data\nfor analytics and BI. <\/li><li>Use to transform and move large amounts of data\ninto and out of other AWS data stores and databases. <\/li><\/ul>\n\n\n\n<p><strong>EMR Cluster <\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The central component is the cluster. <\/li><li>A cluster is a collection of Amazon EC2\ninstances. <\/li><li>Each instance in the cluster is called a node.\nEach node has a role within the cluster, referred to as the node type. Amazon\nEMR also installs different software components on each node type, giving each\nnode a role in a distributed application like Apache Hadoop.<\/li><\/ul>\n\n\n\n<p><strong>EMR node types <\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Master node: It manages the cluster to\ncoordinate the distribution of data and tasks among other nodes for processing.\nIt tracks status of tasks and monitors the health of the cluster. Every cluster\nhas a master node and a single-node cluster has only the master node.<\/li><li>Core node: Run tasks and store data in Hadoop\nDistributed File System (HDFS) on cluster. Multi-node clusters have at least\none core node.<\/li><li>Task node: Only runs tasks and does not store\ndata in HDFS. Task nodes are optional.<\/li><\/ul>\n\n\n\n<p>Diagram of an EMR cluster with one master node and four core nodes.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"254\" height=\"327\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image024.png\" alt=\"Amazon EMR cluster\" class=\"wp-image-1300\"\/><\/figure><\/div>\n\n\n\n<p>Options to complete the tasks can be specified Amazon\nEMR, as<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Provide the entire definition of the work to be\ndone in functions, specified as steps when cluster is created. It is for\nclusters processing a set amount of data and then terminate after process completion.<\/li><li>Create a long-running cluster and use the Amazon\nEMR console, the Amazon EMR API, or the AWS CLI to submit steps, which may\ncontain one or more jobs. <\/li><li>Create a cluster, connect to the master node and\nother nodes as required using SSH, and use the interfaces that the installed\napplications provide to perform tasks and submit queries, either scripted or\ninteractively. <\/li><\/ul>\n\n\n\n<p>Data Processing in EMR<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Select the frameworks and applications to\ninstall during cluster launch <\/li><li>To process data in cluster<\/li><li>submit jobs or queries directly to installed\napplications<\/li><li>or run steps in the cluster. <\/li><li>During processing, input is data stored as files\nin S3 or HDFS. <\/li><li>This data passes from one step to the next in\nthe processing sequence. <\/li><li>The final step writes the output data to a\nspecified location, like Amazon S3 bucket.<\/li><\/ul>\n\n\n\n<p>Steps are run in the following sequence:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>A request is submitted to begin processing\nsteps.<\/li><li>The state of all steps is set to PENDING.<\/li><li>When the first step in the sequence starts, its\nstate changes to RUNNING. The other steps remain in the PENDING state.<\/li><li>After the first step completes, its state\nchanges to COMPLETED.<\/li><li>The next step in the sequence starts, and its\nstate changes to RUNNING. When it completes, its state changes to COMPLETED.<\/li><li>This pattern repeats for each step until they\nall complete and processing ends.<\/li><li>If a step fails during processing, its state\nchanges to TERMINATED_WITH_ERRORS.<\/li><li>You can determine what happens next for each\nstep. <\/li><li>By default, any remaining steps in the sequence\nare set to CANCELLED and do not run. <\/li><li>You can choose to ignore the failure and allow\nremaining steps to proceed, or to terminate the cluster immediately.<\/li><\/ul>\n\n\n\n<p>The diagram shows the step sequence and change of state  <\/p>\n\n\n\n<figure class=\"wp-block-gallery columns-1 is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\"><ul class=\"blocks-gallery-grid\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"95\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image025-750x95.png\" alt=\"sequence and change of state \" data-id=\"1301\" data-link=\"https:\/\/www.testpreptraining.ai\/tutorial\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/image025\/\" class=\"wp-image-1301\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image025-750x95.png 750w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image025.png 770w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure><\/li><\/ul><\/figure>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Real-time file processing Use S3 to trigger AWS Lambda to process data immediately after an upload. Real-time processing examples thumbnail images transcode videos index files process logs validate content aggregate and filter data in real-time. Real-time stream processing Use AWS Lambda and Amazon Kinesis to process real-time streaming data Real-time streaming data example application activity&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":1031,"menu_order":12,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-1091","page","type-page","status-publish","hentry","category-amazon-aws"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Identify the Appropriate Data Processing Technology for a given scenario - Testprep Training Tutorials<\/title>\n<meta name=\"description\" content=\"Identify the appropriate data processing technology for a given scenario tutorial, notes\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Identify the Appropriate Data Processing Technology for a given scenario - Testprep Training Tutorials\" \/>\n<meta property=\"og:description\" content=\"Identify the appropriate data processing technology for a given scenario tutorial, notes\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/\" \/>\n<meta property=\"og:site_name\" content=\"Testprep Training Tutorials\" \/>\n<meta property=\"article:modified_time\" content=\"2020-05-01T10:08:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image018-750x218.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/\",\"name\":\"Identify the Appropriate Data Processing Technology for a given scenario - Testprep Training Tutorials\",\"isPartOf\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\"},\"datePublished\":\"2019-07-09T11:06:51+00:00\",\"dateModified\":\"2020-05-01T10:08:42+00:00\",\"description\":\"Identify the appropriate data processing technology for a given scenario tutorial, notes\",\"breadcrumb\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AWS Certified Big Data Specialty\",\"item\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Identify the Appropriate Data Processing Technology for a given scenario\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"name\":\"Testprep Training Tutorials\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\",\"name\":\"Testprep Training\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"contentUrl\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"width\":583,\"height\":153,\"caption\":\"Testprep Training\"},\"image\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Identify the Appropriate Data Processing Technology for a given scenario - Testprep Training Tutorials","description":"Identify the appropriate data processing technology for a given scenario tutorial, notes","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/","og_locale":"en_US","og_type":"article","og_title":"Identify the Appropriate Data Processing Technology for a given scenario - Testprep Training Tutorials","og_description":"Identify the appropriate data processing technology for a given scenario tutorial, notes","og_url":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/","og_site_name":"Testprep Training Tutorials","article_modified_time":"2020-05-01T10:08:42+00:00","og_image":[{"url":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image018-750x218.png"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/","url":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/","name":"Identify the Appropriate Data Processing Technology for a given scenario - Testprep Training Tutorials","isPartOf":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website"},"datePublished":"2019-07-09T11:06:51+00:00","dateModified":"2020-05-01T10:08:42+00:00","description":"Identify the appropriate data processing technology for a given scenario tutorial, notes","breadcrumb":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/identify-the-appropriate-data-processing-technology-for-a-given-scenario\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.testpreptraining.ai\/tutorial\/"},{"@type":"ListItem","position":2,"name":"AWS Certified Big Data Specialty","item":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/"},{"@type":"ListItem","position":3,"name":"Identify the Appropriate Data Processing Technology for a given scenario"}]},{"@type":"WebSite","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","name":"Testprep Training Tutorials","description":"","publisher":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization","name":"Testprep Training","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","contentUrl":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","width":583,"height":153,"caption":"Testprep Training"},"image":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/1091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/comments?post=1091"}],"version-history":[{"count":5,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/1091\/revisions"}],"predecessor-version":[{"id":5073,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/1091\/revisions\/5073"}],"up":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/1031"}],"wp:attachment":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/media?parent=1091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/categories?post=1091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/tags?post=1091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}