Question 7
You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?
Sample the web server logs web servers and copy them into HDFS using curl
Ingest the server web logs into HDFS using Flume
Channel these clickstreams into Hadoop using Hadoop Streaming
Import all user clicks from your OLTP databasesinto Hadoop using Sqoop
Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers
Correct answer: B
Explanation:
Apache Flume is a service for streaming logs into Hadoop. Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery.
Apache Flume is a service for streaming logs into Hadoop.
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery.