Cogs and Levers A blog full of technical stuff

Basic file IO in Perl

One of the most basic, yet most useful operations you can perform in Perl is working with files. In today’s post, I’ll show you through a few basic patterns to get started with file IO in Perl.

open

The cornerstone to working with a file, is the open function. It takes the following forms:

  • open FILEHANDLE,EXPR
  • open FILEHANDLE,MODE,EXPR
  • open FILEHANDLE,MODE,EXPR,LIST
  • open FILEHANDLE,MODE,REFERENCE
  • open FILEHANDLE

FILEHANDLE being the local variable that you’ll use to reference the file.

MODE determines the type of file access you’re requesting over the file

Mode Description
< File is opened for reading
> File is opened for writing
>> File is opened for appending
+< File is opened for reading and writing
+> File is opened for reading and writing, but clobbered first
|- File is interpreted as a command and piped out
-| File is interpreted as a command and piped in
<:encoding(UTF-8) File is opened for reading and interpreted as UTF-8

Throwing on failure

use strict;
use warnings;
 
my $filename = 'data.txt';
open(my $fh, '<:encoding(UTF-8)', $filename)
  or die "Could not open file '$filename' $!";

# TODO: work with the file here

Warning on failure

use strict;
use warnings;
 
my $filename = 'data.txt';
if (open(my $fh, '<:encoding(UTF-8)', $filename)) {
  # TODO: work with the file here
} else {
  warn "Could not open file '$filename' $!";
}

Diamond operator <>

The diamond-operator is normally used in while loops and used to iterate through files:

# File is opened here into $fh

while (my $row = <$fh>) {
  chomp $row;
  print "$row\n";
}

Writing with print

Sending information into file is done so with print.

# File is opened here into $fh (using >)

print $fh, "This is a line of text for the file\n";

Finishing up with close

When you’re finished with your files, you’ll use close

# File is opened here into $fh 
# File work --happens--

close $fh or die "Can't close file: $!"; 

These are just the simple operations for working with files in Perl.

Hashing with OpenSSL

Today’s post will be a quick tip on generating a hash using OpenSSL.

Setup your makefile

We need to reference libssl and libcrypto in our Makefile:

$(CC) $^ -o $(TARGET) -lssl -lcrypto

The code

A simple main function that will hash a simple message:

#include <stdio.h>
#include <string.h>
#include <openssl/sha.h>

int main(int argc, char* argv[]) {

   SHA256_CTX ctx;
   unsigned char digest[32];
   char *msg = "hello";

   SHA256_Init(&ctx);
   SHA256_Update(&ctx, msg, strlen(msg));
   SHA256_Final(digest, &ctx);

   int i = 0;

   for (i = 0; i < 32; i ++) {
      printf("%x", digest[i]);
   }

   return 0;
}

Testing

Running this application just generates a hash of the word “hello”:

$ ./test
2cf24dba5fb0a3e26e83b2ac5b9e29e1b161e5c1fa7425e7343362938b9824%                 

We can verify our result using sha256sum:

$ echo -n hello | sha256sum 
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824  -

Actors in Scala

The actor model is a software pattern that has been developed to make concurrent programming easier by promoting a lack of shared state. From the wikipedia article:

The actor model in computer science is a mathematical model of concurrent computation that treats “actors” as the universal primitives of concurrent computation. In response to a message that it receives, an actor can: make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Actors may modify private state, but can only affect each other through messages (avoiding the need for any locks).

In today’s article, I’ll show you a couple of primitive examples demonstrating the Akka framework using Scala.

Basic setup

Before starting, you’ll need to make your application depend on the Akka libraries; my build.sbt looks as follows:

name := "actor-basic"

version := "1.0"

scalaVersion := "2.12.1"

libraryDependencies ++= {

  val akkaVersion = "2.4.17"

  Seq(
    "com.typesafe.akka" %% "akka-actor" % akkaVersion
  )
}

Only needed to add akka-actor. There are a whole host of different sub-libraries all providing their own piece of extra functionality.

Testing primes

In today’s example, we’re going to make an Actor that tests prime numbers. The code for the isPrime function below has been lifted from here. Seems to do the job nicely.

case class PotentialPrime(n: Integer)

class PrimeTester extends Actor {
  def receive = {
    case PotentialPrime(n) => println(s"prime: ${isPrime(n)}")
  }

  def isPrime(n: Int) = (2 until n) forall (n % _ != 0)
}

The first class here, PotentialPrime is a message class. It’s the class that will hold the information used as input for the Actor to do something. In this case, we’re carrying a number that could be a potential prime. This is then received by the PrimeTester actor in the receive method. You can see that we pattern match for the message type, in this case PotentialPrime to start prime testing.

Note that this is one-way. No information is sent back to the caller or to the actor system. The information being passed, and state remains within the actor.

We then setup a small system, an actor and pass it a message:

object ActorTest extends App {
  val system = ActorSystem("actor-testing")
  val actor1 = system.actorOf(Props[PrimeTester], name="prime-tester-actor")

  actor1 ! PotentialPrime(21)

  system.terminate()
}

We create the ActorSystem and then create an actor within that system using actorOf. The ! means “fire-and-forget”. This will send a message asynchronously and return immediately. This function is also known as tell.

We run this application, and as expected:

prime: false

Finding an actor

In a system, you can also find existing actors using their path. Like a file system where you have a hierarchical system of directories and files, actors also have parent/child relationships. In the example above, we would be able to find actor1 by its path should we use the following:

val path = system / "prime-tester-actor"
val actorRef = system.actorSelection(path)

actorRef ! PotentialPrime(19)

Actors

There are some important pieces to the Actor API that will give you a much finer level of control over your actors.

You can use unhandled to define the behavior of your actor when it receives a message that did not get handled.

override def unhandled(message: Any): Unit = {
  println("Unhandled message encountered")
}

self is an ActorRef that can be used by the actor to send itself messages.

sender is the ActorRef and context provides ActorContext telling you the current message and current actor.

supervisorStrategy defines the strategy that’s undergone when a failure occurs. It can be overridden.

preStart, preRestart, postStop and postRestart are all hook functions that you can tap into to add functionality.

Feedback

Sending information back to the sender is pretty easy. It’s a matter of bundling the information you need to send, into a message and sending. Adapting the primes example above a little more, the actor code changes just slightly:

class PrimeTester extends Actor {
  def receive = {
    case PotentialPrime(n) => sender ! isPrime(n)
  }

  def isPrime(n: Int) = (2 until n) forall (n % _ != 0)
}

Rather than just printing something out now, we’re sending a message back to sender.

When we ask or ? an actor for some information back, we don’t immediately receive the result. We receive a Future that will give us the result once it’s ready. So, the calling code becomes a trivial Future example:

val system = ActorSystem("actor-testing")
val actor1 = system.actorOf(Props[PrimeTester], name="prime-tester-actor")

implicit val timeout: Timeout = Timeout(Duration.create(5, TimeUnit.SECONDS))
implicit val ec: ExecutionContext = system.dispatcher

val future = actor1 ? PotentialPrime(21)

val result = future onComplete {
  case Success(b) => println(s"Result was ${b}")
  case Failure(e) => e.printStackTrace()
}

system.terminate()

You can simplify this further by using Await:

val system = ActorSystem("actor-testing")
val actor1 = system.actorOf(Props[PrimeTester], name="prime-tester-actor")
implicit val timeout = Timeout(Duration.create(5, TimeUnit.SECONDS))

val future = actor1 ? PotentialPrime(21)
val result = Await.result(future, timeout.duration)

println(s"Result is ${result}")

system.terminate()

That enough acting for today.

Starting a microservice with Scala

A microservice is an architectural pattern that allows your services to deployed in an isolated fashion. This isolation allows your service to remain focused on its problem (and only its problem) that its trying to solve, as well as simplify telemetry, instrumentation, and measurement metrics. From Martin Fowler’s site:

The term “Microservice Architecture” has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.

If you want to learn more about microservices, seriously, check out google. They’re everywhere!

The purpose of today’s article is to stand a microservice up in Scala, to get up and running quickly.

Getting started

In a previous article, I showed you how you can create a scala project structure with a shell script. We’ll use that right now to create our project microservice-one.

$ new-scala-project microservice-one
$ cd microservice-one
$ tree
.
├── build.sbt
├── lib
├── project
├── src
│   ├── main
│   │   ├── java
│   │   ├── resources
│   │   └── scala
│   └── test
│       ├── java
│       ├── resources
│       └── scala
└── target

12 directories, 1 file

Dependencies

Now we’ll need to sort out our dependencies.

We’ll need scalatest for testing, akka and akka-http to help us make our API concurrent/parallel as well as available over HTTP. Our build.sbt file should look like this:

name := "microservice-one"
organization := "me.tuttlem"
version := "1.0.0"
scalaVersion := "2.12.1"

scalacOptions := Seq("-unchecked", "-deprecation", "-encoding", "utf8")

libraryDependencies ++= {
  
  val akkaV       = "2.4.16"
  val akkaHttpV   = "10.0.1"
  val scalaTestV  = "3.0.1"
  
  Seq(
    "com.typesafe.akka" %% "akka-actor" % akkaV,
    "com.typesafe.akka" %% "akka-stream" % akkaV,
    "com.typesafe.akka" %% "akka-testkit" % akkaV,
    "com.typesafe.akka" %% "akka-http" % akkaHttpV,
    "com.typesafe.akka" %% "akka-http-spray-json" % akkaHttpV,
    "com.typesafe.akka" %% "akka-http-testkit" % akkaHttpV,
    "org.scalatest"     %% "scalatest" % scalaTestV % "test"
  )
  
}

Update our project now:

$ sbt update

The code

We’re going to dump everything into one file today; the main application object. All of the parts are very descriptive though and I’ll go through each one. Our microservice is going to have one route, which is a GET on /greeting. It’ll return us a simple message.

First up, we model how the message will look:

case class Greeting(message: String)

Using this case class, you’d expect messages to be returned that look like this:

{ message: "Here is the message!" }

We tell the application how to serialize this data over the http channel using Protocols:

trait Protocols extends DefaultJsonProtocol {
  implicit val greetingFormat = jsonFormat1(Greeting.apply)
}

Now, we can put together our actual service implementation. Take a look specifically at the DSL that scala is provided for route definition:

trait Service extends Protocols {
  implicit val system: ActorSystem
  implicit def executor: ExecutionContextExecutor
  implicit val materializer: Materializer

  def config: Config
  val logger: LoggingAdapter

  val routes = {
    logRequestResult("microservice-one") {
      pathPrefix("greeting") {
      	get {
      		complete(Greeting("Hello to you!"))
      	}
      }
    }
  }

}

So, our one route here will constantly just send out “Hello to you!”.

Finally, all of this gets hosted in our main application object:

object MicroserviceOne extends App with Service {
  override implicit val system = ActorSystem()
  override implicit val executor = system.dispatcher
  override implicit val materializer = ActorMaterializer()

  override val config = ConfigFactory.load()
  override val logger = Logging(system, getClass)

  Http().bindAndHandle(routes, config.getString("http.interface"), config.getInt("http.port"))
}

That’s it for the code. In the src/main/resources directory, we’ll put a application.conf file that details a few configurations for us:

akka {
	loglevel = DEBUG
}

http {
	interface = "0.0.0.0"
	port = 3000
}

Running

Lets give it a run now.

$ sbt run 

Once SBT has finished its dirty work, you’ll be able to request your route at http://localhost:3000/greeting:

$ curl -v http://localhost:3000/greeting
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 3000 (#0)
> GET /greeting HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: akka-http/10.0.1
< Date: Thu, 16 Feb 2017 22:37:52 GMT
< Content-Type: application/json
< Content-Length: 27
< 
* Connection #0 to host localhost left intact
{"message":"Hello to you!"}

Perfect.

That’s all for today.

Create a UDF for Hive with Scala

In today’s post, I’m going to walk through the basic process of creating a user defined function for Apache Hive using the Scala.

A quick _but important_ note: I needed to use the JDK 1.7 to complete the following. Using 1.8 saw errors that suggested that Hive on my distribution of Hadoop was not supported.

Setup your project

Create an sbt-based project, and start off adding the following to your project/assembly.sbt.

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

What this had added is the sbt-assembly to your project. This allows you to bundle your scala application up as a fat JAR. When we issue the command sbt assemble at the console, we invoke this plugin to construct the fat JAR for us.

Now we fill out the build.sbt. We need to reference an external JAR, called hive-exec. This JAR is available by itself from the maven repository. I took a copy of mine from the hive distribution installed on my server. Anyway, it lands in the project’s lib folder.

name := "hive-udf"
version := "1.0"
scalaVersion := "2.11.1"
unmanagedJars in Compile += file("./lib/hive-exec-2.1.1.jar")

Write your function

Now it’s time to actually start writing some functions. In the following module, we’re just performing some basic string manipulation with trim, toUpperCase and toLowerCase. Each of which is contained in its own class, deriving from the UDF type:

scala/StringFunctions.scala

package me.tuttlem.udf

import org.apache.hadoop.hive.ql.exec.UDF

class TrimString extends UDF {
  def evaluate(str: String): String = {
    str.trim
  }
}

class UpperCaseString extends UDF {
  def evaluate(str: String): String = {
    str.toUpperCase
  }
}

class LowerCaseString extends UDF {
  def evaluate(str: String): String = {
    str.toLowerCase
  }
}

Now that we’ve written all of the code, it’s time to compile and assemble our JAR:

$ sbt assemble

To invoke

Copying across the JAR into an accessible place for hive is the first step here. Once that’s done, we can start up the hive shell and add it to the session:

ADD JAR /path/to/the/jar/my-udfs.jar;

Then, using the CREATE FUNCTION syntax, we can start to reference pieces of our module:

CREATE FUNCTION trim as 'me.tuttlem.udf.TrimString';
CREATE FUNCTION toUpperCase as 'me.tuttlem.udf.UpperCaseString';
CREATE FUNCTION toLowerCase as 'me.tuttlem.udf.LowerCaseString';

We can now use our functions:

hive> CREATE FUNCTION toUpperCase as 'me.tuttlem.udf.UpperCaseString';
OK
Time taken: 0.537 seconds
hive> SELECT toUpperCase('a test string');
OK
A TEST STRING
Time taken: 1.399 seconds, Fetched: 1 row(s)

hive> CREATE FUNCTION toLowerCase as 'me.tuttlem.udf.LowerCaseString';
OK
Time taken: 0.028 seconds
hive> SELECT toLowerCase('DON\'T YELL AT ME!!!');
OK
don't yell at me!!!
Time taken: 0.093 seconds, Fetched: 1 row(s)

That’s it!