A quick _but important_ note: I needed to use the JDK 1.7 to complete the following. Using 1.8 saw errors that suggested that Hive on my distribution of Hadoop was not supported.
Setup your project
Create an sbt-based project, and start off adding the following to your project/assembly.sbt.
What this had added is the sbt-assembly to your project. This allows you to bundle your scala application up as a fat JAR. When we issue the command sbt assemble at the console, we invoke this plugin to construct the fat JAR for us.
Now we fill out the build.sbt. We need to reference an external JAR, called hive-exec. This JAR is available by itself from the maven repository. I took a copy of mine from the hive distribution installed on my server. Anyway, it lands in the project’s lib folder.
Now it’s time to actually start writing some functions. In the following module, we’re just performing some basic string manipulation with trim, toUpperCase and toLowerCase. Each of which is contained in its own class, deriving from the UDF type:
Now that we’ve written all of the code, it’s time to compile and assemble our JAR:
$ sbt assemble
To invoke
Copying across the JAR into an accessible place for hive is the first step here. Once that’s done, we can start up the hive shell and add it to the session:
ADD JAR /path/to/the/jar/my-udfs.jar;
Then, using the CREATE FUNCTION syntax, we can start to reference pieces of our module:
CREATE FUNCTION trim as 'me.tuttlem.udf.TrimString';
CREATE FUNCTION toUpperCase as 'me.tuttlem.udf.UpperCaseString';
CREATE FUNCTION toLowerCase as 'me.tuttlem.udf.LowerCaseString';
We can now use our functions:
hive> CREATE FUNCTION toUpperCase as 'me.tuttlem.udf.UpperCaseString';
OK
Time taken: 0.537 seconds
hive> SELECT toUpperCase('a test string');
OK
A TEST STRING
Time taken: 1.399 seconds, Fetched: 1 row(s)
hive> CREATE FUNCTION toLowerCase as 'me.tuttlem.udf.LowerCaseString';
OK
Time taken: 0.028 seconds
hive> SELECT toLowerCase('DON\'T YELL AT ME!!!');
OK
don't yell at me!!!
Time taken: 0.093 seconds, Fetched: 1 row(s)
Today’s post is going to be a tip on creating a project structure for your Scala projects that is SBT ready. There’s no real magic to it, just a specific structure that you can easily bundle up into a console application.
The shell script
To kick start your project, you can simple use the following shell script:
#!/bin/zshmkdir$1cd$1mkdir-p src/{main,test}/{java,resources,scala}mkdir lib project target
echo'name := "$1"
version := "1.0"
scalaVersion := "2.10.0"'> build.sbt
cd ..
This will give you everything that you need to get up an running. You’ll now have a structure like the following to work with:
Sometimes, it makes sense to have multiple SSH identites. This can certainly be the case if you’re doing work with your own personal accounts, vs. doing work for your job. You’re not going to want to use your work account for your personal stuff.
In today’s post, I’m going to run through the few steps that you need to take in order to manage multiple SSH identities.
Different identities
First up, we generate two different identities:
ssh-keygen -t rsa -C"user@work.com"
When asked, make sure you give the file a unique name:
Enter file in which to save the key (/home/michael/.ssh/id_rsa): ~/.ssh/id_rsa_work
Now, we create the identity for home.
ssh-keygen -t rsa -C"user@home.com"
Again, set the file name so they don’t collide:
Enter file in which to save the key (/home/michael/.ssh/id_rsa): ~/.ssh/id_rsa_home
The wikipedia article for Futures and promises opens up with this paragraph, which I thought is the perfect definition:
In computer science, future, promise, delay, and deferred refer to constructs used for synchronizing program execution in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is yet incomplete.
In today’s article, I’ll walk you through the creation and management of the future and promise construct in the Scala language.
Execution context
Before continuing with the article, we need to make a special note about the ExecutionContext. Futures and promises both use the execution context to perform the execution of their computations.
Any of the operations that you’ll write out to start a computation requires an ExecutionContext as a parameter. These can be passed implicitly, so it’ll be a regular occurrence where you’ll see the following definition:
// define the implicit yourselfimplicitvalec:ExecutionContext=ExecutionContext.global// or - import one already definedimportExecutionContext.Implicits.global
ExecutionContext.global is an ExecutionContext that is backed by a ForkJoinPool.
Futures
We create a Future in the following ways:
/* Create a future that relies on some work being done
and that emits its value */valgetName=Future{// simulate some work hereThread.sleep(100)"John"}/* Create an already resolved future; no need to wait
on the result of this one */valalreadyGotName=Future.successful("James")/* Create an already rejected future */valbadNews=Future.failed(newException("Something went wrong"))
With a future, you set some code in place to handle both the success and fail cases. You use the onComplete function to accomplish this:
Using a for-comprehension or map/flatMap, you can perform functional composition on your Future so that adds something extra through the pipeline. In this case, we’re going to prefix the name with a message should it start with the letter “J”:
If you really need to, you can make your future block.
valblockedForThisName=Future{blocking{"Simon"}}
Promises
The different between a Future and a Promise is that a future can be thought of as a read-only container. A promise is a single-assignment container that is used to complete a future.
Here’s an example.
valgetNameFuture=Future{"Tom"}valgetNamePromise=Promise[String]()getNamePromisecompleteWithgetNameFuturegetNamePromise.future.onComplete{caseSuccess(name)=>println(s"Got the name: $name")caseFailure(e)=>e.printStackTrace()}
getNamePromise has a future that we access through the future member. We treat it as usual with onComplete. It knows that it needs to resolve because of the completeWith call, were we’re telling getNamePromise to finish the getNameFuture future.