Custom text delimiters in Hadoop jobs

26 Nov 2015

Hadoop will process your data, line for line, splitting it on the \n newline character to send out to your mappers. In today’s post, I’ll demonstrate the ussage of the textinputformat.record.delimiter setting so that your Hadoop jobs can process different data structures.

Configuration

When you’re first setting up your job, you’ll create a Configuration object. This object has arbitrary settings that can be applied to use through the use of the set method. To make a job work on a delimiter of ---, you’d use the following:

Configuration conf = new Configuration();
conf.set("textinputformat.record.delimiter", "---");

From here, there’s no change to your code. Here’s a very simple map reduce module that is using the custom format.

Cogs and Levers A blog full of technical stuff

Custom text delimiters in Hadoop jobs

Configuration

Related Posts

Traits vs Typeclasses - A Deep Comparison 28 Jun 2025

Algebraic Effects in Modern Languages 27 Jun 2025

Getting NGINX to Do Things 26 Jun 2025