Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Hadoop Cluster- Single and multi node, Hadoop 2.0, Flume, Sqoop, Map-Reduce, PIG, Hive, Hbase, Zookeeper, Oozie etc. will be covered in the course.
Tag Archives: Pig
In the previous post we learnt how easy it was to install Hadoop with Apache Bigtop!
We know its not just Hadoop and there are sub-projects around the table! So, lets have a look at how to install Hive, Hbase and Pig in this post.
When running Pig in a production environment, you’ll likely have one or more Pig Latin scripts that run on a recurring basis (daily, weekly, monthly, etc.) that need to locate their input data based on when or where they are run. For example, you may have a Pig job that performs daily log ingestion by geographic region. It would be costly and error prone to manually edit the script to reference the location of the input data each time log data needs to be ingested. Ideally, you’d like to pass the date and geographic region to the Pig script as parameters at the time the script is executed. Fortunately, Pig provides this capability via parameter substitution. There are four different mechanisms to define parameters that can be referenced in a Pig Latin script:
- Parameters can be defined as command line arguments; each parameter is passed to Pig as a separate argument using -param switches at script execution time
- Parameters can be defined in a parameter file that’s passed to Pig using the -param_file command line argument when the script is executed
- Parameters can be defined inside Pig Latin scripts using the “%declare” and “%default” preprocessor statements
You can use none, one or any combination of the above options.