{"id":1099,"date":"2019-07-09T11:22:10","date_gmt":"2019-07-09T11:22:10","guid":{"rendered":"https:\/\/www.testpreptraining.com\/tutorial\/?page_id=1099"},"modified":"2020-05-01T10:12:56","modified_gmt":"2020-05-01T10:12:56","slug":"determine-how-to-design-and-architect-the-analytical-solution","status":"publish","type":"page","link":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/","title":{"rendered":"Determine How to Design and Architect the Analytical Solution"},"content":{"rendered":"\n<h4 class=\"wp-block-heading\"><strong>Analyze the Business Problem<\/strong><\/h4>\n\n\n\n<p>Look at the business problem\nobjectively <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>identify whether it is a problem or not? <\/li><li>Sheer volume or cost may not be the deciding\nfactor. <\/li><li>Multiple criteria like velocity, variety,\nchallenges with the current system and time taken for processing should be\nconsidered as well.<\/li><\/ul>\n\n\n\n<p>Some Common Use Cases:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Data Archival\/ Data Offload \u2013\nArchiving data to tapes for storing huge amounts of data spanning across years\n(active data) at a very low cost.<\/li><li>Process Offload \u2013 Offload jobs that\nconsume expensive MIPS cycles or consume extensive CPU cycles on the current\nsystems.<\/li><li>Data Lake Implementation\u2013Help in\nstoring and processing massive amounts of data.<\/li><li>Unstructured Data Processing\n\u2013Capabilities to store and process any amount of unstructured data natively. <\/li><li>Data Warehouse Modernization \u2013\nIntegrate the capabilities of Big Data and data warehouse to increase\noperational efficiency.<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Capacity Planning<\/strong><\/h4>\n\n\n\n<p>Capacity planning plays a pivotal\nrole in hardware and infrastructure sizing. Important factors to be considered\nare:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Data volume for one-time historical\nload<\/li><li>Daily data ingestion volume<\/li><li>Retention period of data<\/li><li>HDFS Replication factor based on\ncriticality of data<\/li><li>Time period for which the cluster is\nsized (typically 6months -1 year), after which the cluster is scaled\nhorizontally based on requirements<\/li><li>Multi datacenter deployment<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Hindsight, Insight, or Foresight<\/strong><\/h4>\n\n\n\n<p>Hindsight, insight, and foresight are three questions that come to mind when dealing with data; to know what happened, to understand what happened, and to predict what will happen. Hindsight is possible with aggregations and applied statistics. You can aggregate data by different groups and compare those results using statistical techniques, such as confidence intervals and statistical tests. A key component is data visualization that will show related data in context2. <\/p>\n\n\n\n<p>Insight and foresight would require machine learning and data mining. This includes finding patterns, modeling current behavior, predicting future outcomes, and detecting anomalies. Refer to data science and machine learning tools (e.g. R, Apache Spark MLLib, WSO2 Machine Learner, GraphLab) for a deeper understanding. <\/p>\n\n\n\n<p>Steps in design and architect of analytical solution.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"740\" height=\"123\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image032.png\" alt=\"\" class=\"wp-image-1322\"\/><\/figure>\n\n\n\n<p><strong>Source Profiling<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Most important step in deciding the\narchitecture. <\/li><li>It involves <ul><li>identifying\nthe different source systems <\/li><\/ul><ul><li>categorizing\nthem based on their nature and type.<\/li><\/ul><\/li><\/ul>\n\n\n\n<p><strong>Important considerations <\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Identify the internal and external sources\nsystems<\/li><li>High-Level assumption for the amount of data\ningested from each source<\/li><li>Identify the mechanism used to get data \u2013 push\nor pull<\/li><li>Determine the type of data source \u2013 Database,\nFile, web service, streams etc.<\/li><li>Determine the type of data \u2013 structured,\nsemi-structured or unstructured<\/li><\/ul>\n\n\n\n<p><strong>Ingestion Strategy and Acquisition<\/strong><\/p>\n\n\n\n<p><strong>Important considerations <\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Determine the frequency at which data would be\ningested from each source<\/li><li>Is there a need to change the semantics of the\ndata append replace etc?<\/li><li>Is there any data validation or transformation\nrequired before ingestion (Pre-processing)?<\/li><li>Segregate the data sources based on mode of\ningestion \u2013 Batch or real-time<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Storage<\/strong><\/h3>\n\n\n\n<p><strong>Storage requirements<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>able to store large amounts of data <\/li><li>store any type of data <\/li><li>able to scale on need basis<\/li><li>number of IOPS (Input output operations per\nsecond) that it can provide. <\/li><\/ul>\n\n\n\n<p>2 types of analytical requirements <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Synchronous \u2013 Data is analyzed in real-time or\nnear real-time, the storage should be optimized for low latency.<\/li><li>Asynchronous \u2013 Data is captured, recorded and\nanalyzed in batch.<\/li><\/ul>\n\n\n\n<p><strong>Important considerations <\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Type of data (Historical or Incremental)<\/li><li>Format of data ( structured, semi-structured and unstructured)<\/li><li>Compression requirements<\/li><li>Frequency of incoming data<\/li><li>Query pattern on the data<\/li><li>Consumers of the data<\/li><\/ul>\n\n\n\n<p><strong>Processing<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Earlier data was stored in RAMs, but due to the\nvolume, it is been stored on multiple disks <\/li><li>Processing now is taken closer to data to reduce\nnetwork I\/O.<\/li><li>Processing methodology is driven by business\nrequirements<\/li><li>It can be categorized as per SLA, into <ul><li>Batch<\/li><\/ul><ul><li>real-time\n<\/li><\/ul><ul><li>Hybrid\n<\/li><\/ul><\/li><\/ul>\n\n\n\n<p><strong>Batch&nbsp;Processing&nbsp;\u2013 <\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Collecting the input for a specified interval of time <\/li><li>running transformations on it in a scheduled way. <\/li><li>Historical data load is a typical batch operation<\/li><li>Technology Used: MapReduce, Hive, Pig<\/li><\/ul>\n\n\n\n<p><strong>Real-time Processing<\/strong><\/p>\n\n\n\n<p>involves running transformations as\nand when data is acquired.<\/p>\n\n\n\n<p>Technology Used: Impala, Spark, spark SQL, Tez, Apache Drill<\/p>\n\n\n\n<p><strong>Hybrid Processing&nbsp;\u2013 <\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Combination of batch and real-time processing needs.<\/li><li>Example&nbsp; &#8211; lambda architecture. <\/li><\/ul>\n\n\n\n<p><strong>Consuming Data <\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Involves consuming the output provided by processing layer. <\/li><li>Different users, consume data in different format. <\/li><\/ul>\n\n\n\n<p><strong>Data consumption forms<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Export Datasets \u2013&nbsp;Requirements for third-party dataset generation. Data sets generated using hive export or directly from HDFS for big data applications.<\/li><li>Reporting and visualization&nbsp;\u2013reporting and visualization tool scan and connect to Hadoop or database service.<\/li><li>Data Exploration \u2013&nbsp;Data scientist build models and perform deep exploration in a sandbox environment. Sandbox can be a separate cluster or a separate schema within the same cluster that contains a subset of actual data.<\/li><li>Adhoc Querying \u2013&nbsp;Adhoc or Interactive querying can be supported by using Hive, Impala or spark SQL.<\/li><\/ul>\n\n\n\n<p><strong>Important considerations <\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Dynamics of use case: There a number of\nscenarios which needs to be considered while designing the architecture, which\nare <ul><li>form\nand frequency of data<\/li><\/ul><ul><li>Type\nof data<\/li><\/ul><ul><li>Type\nof processing and analytics required.<\/li><\/ul><\/li><\/ul>\n\n\n\n<p>Myriad of technologies:&nbsp;Multiple technologies offering similar features and claiming to be better than the others. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Analyze the Business Problem Look at the business problem objectively identify whether it is a problem or not? Sheer volume or cost may not be the deciding factor. Multiple criteria like velocity, variety, challenges with the current system and time taken for processing should be considered as well. Some Common Use Cases: Data Archival\/ Data&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":1031,"menu_order":17,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[171,169,170,168],"class_list":["post-1099","page","type-page","status-publish","hentry","category-amazon-aws","tag-batch-processing","tag-consuming-data","tag-data-consumption-forms","tag-real-time-processing"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Determine How to Design and Architect the Analytical Solution - Testprep Training Tutorials<\/title>\n<meta name=\"description\" content=\"Determine how to design and architect the analytical solution tutorial, notes\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Determine How to Design and Architect the Analytical Solution - Testprep Training Tutorials\" \/>\n<meta property=\"og:description\" content=\"Determine how to design and architect the analytical solution tutorial, notes\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/\" \/>\n<meta property=\"og:site_name\" content=\"Testprep Training Tutorials\" \/>\n<meta property=\"article:modified_time\" content=\"2020-05-01T10:12:56+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image032.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/\",\"name\":\"Determine How to Design and Architect the Analytical Solution - Testprep Training Tutorials\",\"isPartOf\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\"},\"datePublished\":\"2019-07-09T11:22:10+00:00\",\"dateModified\":\"2020-05-01T10:12:56+00:00\",\"description\":\"Determine how to design and architect the analytical solution tutorial, notes\",\"breadcrumb\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AWS Certified Big Data Specialty\",\"item\":\"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Determine How to Design and Architect the Analytical Solution\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"name\":\"Testprep Training Tutorials\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\",\"name\":\"Testprep Training\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"contentUrl\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"width\":583,\"height\":153,\"caption\":\"Testprep Training\"},\"image\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Determine How to Design and Architect the Analytical Solution - Testprep Training Tutorials","description":"Determine how to design and architect the analytical solution tutorial, notes","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/","og_locale":"en_US","og_type":"article","og_title":"Determine How to Design and Architect the Analytical Solution - Testprep Training Tutorials","og_description":"Determine how to design and architect the analytical solution tutorial, notes","og_url":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/","og_site_name":"Testprep Training Tutorials","article_modified_time":"2020-05-01T10:12:56+00:00","og_image":[{"url":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2019\/07\/image032.png"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/","url":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/","name":"Determine How to Design and Architect the Analytical Solution - Testprep Training Tutorials","isPartOf":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website"},"datePublished":"2019-07-09T11:22:10+00:00","dateModified":"2020-05-01T10:12:56+00:00","description":"Determine how to design and architect the analytical solution tutorial, notes","breadcrumb":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/determine-how-to-design-and-architect-the-analytical-solution\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.testpreptraining.ai\/tutorial\/"},{"@type":"ListItem","position":2,"name":"AWS Certified Big Data Specialty","item":"https:\/\/www.testpreptraining.ai\/tutorial\/aws-certified-big-data-specialty\/"},{"@type":"ListItem","position":3,"name":"Determine How to Design and Architect the Analytical Solution"}]},{"@type":"WebSite","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","name":"Testprep Training Tutorials","description":"","publisher":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization","name":"Testprep Training","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","contentUrl":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","width":583,"height":153,"caption":"Testprep Training"},"image":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/1099","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/comments?post=1099"}],"version-history":[{"count":4,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/1099\/revisions"}],"predecessor-version":[{"id":5077,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/1099\/revisions\/5077"}],"up":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/1031"}],"wp:attachment":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/media?parent=1099"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/categories?post=1099"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/tags?post=1099"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}