Custom text delimiters in Hadoop jobs
26 Nov 2015Hadoop will process your data, line for line, splitting it on the \n
newline character to send out to your mappers. In today’s post, I’ll demonstrate the ussage of the textinputformat.record.delimiter
setting so that your Hadoop jobs can process different data structures.
Configuration
When you’re first setting up your job, you’ll create a Configuration
object. This object has arbitrary settings that can be applied to use through the use of the set
method. To make a job work on a delimiter of ---
, you’d use the following:
From here, there’s no change to your code. Here’s a very simple map reduce module that is using the custom format.