ホームページ / 70-475 問題集  / 70-475 問題練習

Microsoft 70-475 問題練習

Designing and Implementing Big Data Analytics Solutions 試験

最新更新時間: 2020/11/23,合計70問。

2020 年末感謝:70-475 最新真題を買う時、日本語版と英語版両方を同時に獲得できます。


さらに試験準備時間の35%を節約するには、70-475 問題集を使用してください。

 / 5

Question No : 1
You need to automate the creation of a new Microsoft Azure data factory.
What are three possible technologies that you can use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.

You can create a pipeline with a copy activity that moves data to/from an Azure Table Storage by using different tools/APIs. The easiest way to create a pipeline is to use the Copy Wizard. See Tutorial: Create a pipeline using Copy Wizard for a quick walkthrough on creating a pipeline using the Copy data wizard. You can also use the following tools to create a pipeline: Azure portal, Visual Studio, Azure PowerShell, Azure Resource Manager template, .NET API, and REST API.

Question No : 2
A company named Fabrikam, Inc. has a web app. Millions of users visit the app daily.
Fabrikam performs a daily analysis of the previous day's logs by scheduling the following Hive query.
CREATE EXTERNAL TABLE IF NOT EXISTS UserActivity (...) Partitioned BY (LogDate string) Location MSCK REPAIR TABLE UserActivity; Select ... From UserActivity where LogDate = “{date}”;
You need to recommend a solution to gather the log collections from the web app.
What should you recommend?


Question No : 3
Your company has a Microsoft Azure environment that contains an Azure HDInsight Hadoop cluster and an Azure SQL data warehouse. The Hadoop cluster contains text files that are formatted by using UTF-8 character encoding.
You need to implement a solution to ingest the data to the SQL data warehouse from the Hadoop cluster. The solution must provide optimal read performance for the data after ingestion.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.


SQL Data Warehouse supports loading data from HDInsight via PolyBase. The process is the same as loading data from Azure Blob Storage - using PolyBase to connect to HDInsight to load data.
Use PolyBase and T-SQL Summary of loading process:
- Move your data to HDInsight and store it in text files, ORC or Parquet format.
- Configure external objects in SQL Data Warehouse to define the location and format of the data.
- Run a T-SQL command to load the data in parallel into a new database table.
Create statistics on newly loaded data. Azure SQL Data Warehouse does not yet support auto create or auto update statistics. In order to get the best performance from your queries, it's important to create statistics on all columns of all tables after the first load or any substantial changes occur in the data.

Question No : 4
You have an Apache Storm cluster.
The cluster will ingest data from a Microsoft Azure event hub.
The event hub has the characteristics described in the following table.

You are designing the Storm application topology.
You need to ingest data from all of the partitions. The solution must maximize the throughput of the data ingestion.
Which setting should you use?


Question No : 5
You manage a Microsoft Azure HDInsight Hadoop cluster. All of the data for the cluster is stored in Azure Premium Storage.
You need to prevent all users from accessing the data directly. The solution must allow only the HDInsight service to access the data.
Which five actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.


Example: scenario where a customer wishes to explore a move from an existing Hortonworks (HDP) cluster to an Azure HDInsight (HDI) cluster. With the target Storage account in place, I can now configure the HDP cluster to connect to it.
The easiest way I have found to do this is using Ambari:
- Login to the Ambari portal and navigate to the (default) dashboard
- Select HDFS from the left-hand navigation
- On the resulting HDFS page, click on the Configs tab
- On the resulting page, select the Advanced option
- Scroll down and expand the Custom core-site node
- Select Add Property… from the bottom of the expanded node
- Enter<account name>, substituting the name of the Storage account for <account name>, as the Name of the property
- Enter the Account key as the Value of the property
- Click the Add button and verify the new property appears under the Custom core-site node
- Locate the notification bar at the top of the current page
- Click the Save button on the notification bar to push changes to the cluster
- Follow any remaining prompts to complete the Save process
- Once the Save process is completed, Ambari will indicate a restart of some services is required. Click the Restart button and Restart All Affected from the resulting drop-down. Follow any remaining prompts and monitor the process until it is successfully completed.

Question No : 6
You have a Microsoft Azure subscription that contains an Azure Data Factory pipeline. You have an RSS feed that is published on a public website.
You need to configure the RSS feed as a data source for the pipeline.
Which type of linked service should you use?


Question No : 7
You are designing an Internet of Thing: (IoT) solution intended to identify trends. The solution requires the real-time analysis of data originating from sensors. The results of the analysis will be stored in a SQL database. You need to recommend a data processing solution that uses the Transact-SQL language.
Which data processing solution should you recommend?

For your Internet of Things (IoT) scenarios that use Event Hubs, Azure Stream Analytics can serve as a possible first step to perform near real-time analytics on telemetry data. Just like Event Hubs, Steam Analytics supports the streaming of millions of event per second. Unlike a standard database, analysis is performed on data in motion. This streaming input data can also be combined with reference data inputs to perform lookups or do correlation to assist in unlocking business insights. It uses a SQL-like language to simplify the analysis of data inputs and detect anomalies, trigger alerts or transform the data in order to create valuable outputs.

Question No : 8
Your company has thousands of Internet-connected sensors.
You need to recommend a computing solution to perform a real-time analysis of the date generated by the sensors.
Which computing solution should you include in the recommendation?

HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. The clusters are configured to store data directly in Azure Storage or Azure Data Lake Store, which provides low latency and increased elasticity in performance and cost choices. This enables customers to build interactive websites that work with large datasets, to build services that store sensor and telemetry data from millions of end points, and to analyze this data with Hadoop jobs. HBase and Hadoop are good starting points for big data project in Azure; in particular, they can enable real-time applications to work with large datasets.

Question No : 9
You are designing a solution based on the lambda architecture.
You need to recommend which technology to use for the serving layer.
What should you recommend?

The Serving Layer is a bit more complicated in that it needs to be able to answer a single query request against two or more databases, processing platforms, and data storage devices. Apache Druid is an example of a cluster-based tool that can marry the Batch and Speed layers into a single answerable request.

Question No : 10
You have structured data that resides in Microsoft Azure Blob storage.
You need to perform a rapid interactive analysis of the data and to generate visualizations of the data.
What is the best type of Azure HDInsight cluster to use to achieve the goal? More than one answer choice may achieve the goal. Choose the BEST answer.

A Spark cluster provides in-memory processing, interactive queries, micro-batch stream processing-

Question No : 11
A company named Fabrikam, Inc. has a Microsoft Azure web app. Billions of users visit the app daily.
The web app logs all user activity by using text files in Azure Blob storage. Each day, approximately 200 GB of text files are created. Fabrikam uses the log files from an Apache Hadoop cluster on Azure HDInsight. You need to recommend a solution to optimize the storage of the log files for later Hive use.
What is the best property to recommend adding to the Hive table definition to achieve the goal? More than
one answer choice may achieve the goal. Select the BEST answer.

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.
Compared with RCFile format, for example, ORC file format has many advantages such as:
- a single file as the output of each task, which reduces the NameNode's load
- Hive type support including datetime, decimal, and the complex types (struct, list, map, and union)
- light-weight indexes stored within the file
- skip row groups that don't pass predicate filtering
- seek to a given row
- block-mode compression based on data type
- run-length encoding for integer columns
- dictionary encoding for string columns
- concurrent reads of the same file using separate RecordReaders
- ability to split files without scanning for markers
- bound the amount of memory needed for reading or writing
- metadata stored using Protocol Buffers, which allows addition and removal of fields

Question No : 12
You have the following script.
CREATE TABLE UserVisits (username string, url string, time date)
CREATE TABLE UserVisitsOrc (username string, url string, time date)
Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the script.
NOTE: Each correct selection is worth one point.


A table created without the EXTERNAL clause is called a managed table because Hive manages its data. Reference:

Question No : 13
You are designing a solution that will use Apache HBase on Microsoft Azure HDInsight.
You need to design the row keys for the database to ensure that client traffic is directed over all of the nodes in the cluster.
What are two possible techniques that you can use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

There are two strategies that you can use to avoid hotspotting:
* Hashing keys
To spread write and insert activity across the cluster, you can randomize sequentially generated keys by hashing the keys, inverting the byte order. Note that these strategies come with trade-offs. Hashing keys, for example, makes table scans for key subranges inefficient, since the subrange is spread across the cluster.
* Salting keys
Instead of hashing the key, you can salt the key by prepending a few bytes of the hash of the key to the actual key.
Note. Salted Apache HBase tables with pre-split is a proven effective HBase solution to provide uniform workload distribution across RegionServers and prevent hot spots during bulk writes. In this design, a row key is made with a logical key plus salt at the beginning. One way of generating salt is by calculating n (number of regions) modulo on the hash code of the logical row key (date, etc).

Question No : 14
You have data generated by sensors. The data is sent to Microsoft Azure Event Hubs. You need to have an aggregated view of the data in near real time by using five-minute tumbling windows
to identify short-term trends. You must also have hourly and a daily aggregated views of the data.
Which technology should you use for each task? To answer, drag the appropriate technologies to the
correct tasks. Each technology may be used once, more than once, or not at all. You may need to drag the spilt bar between panes or scroll to view content. NOTE: Each correct selection is worth one point.


Box 1: Azure HDInsight MapReduce
Azure Event Hubs allows you to process massive amounts of data from websites, apps, and devices. The Event Hubs spout makes it easy to use Apache Storm on HDInsight to analyze this data in real time.
Box 2: Azure Event Hub
Box 3: Azure Stream Analytics
Stream Analytics is a new service that enables near real time complex event processing over streaming data. Combining Stream Analytics with Azure Event Hubs enables near real time processing of millions of events per second. This enables you to do things such as augment stream data with reference data and output to storage (or even output to another Azure Event Hub for additional processing).

Question No : 15
You are designing an Apache HBase cluster on Microsoft Azure HDInsight. You need to identify which nodes are required for the cluster.
Which three nodes should you identify? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

An Hbase cluster has the following nodes: Head server (2), region server (1+), master/ZooKeeper node (3).


 / 5