Cogs and Levers A blog full of technical stuff

Create a web service with maven

In today’s post, I’m going to walk through a simple SOAP web service creation using maven, jax-ws for java. The service will be hosted inside of Apache Tomcat once we’re up and running.

Maven

First off, we start the application off with maven.

$ mvn archetype:generate -DgroupId=org.test.ws -DartifaceId=soap-test

This creates our project structure and puts all of the project dependencies in place. The pom.xml that gets generated for us needs a little extra help for a JAX-WS project. We need to:

  • Set the packaging to war
  • Add the jaxws-rt dependencies
  • Supply a final name

Your pom.xml should look something like this:

<project xmlns="http://maven.apache.org/POM/4.0.0" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

  <modelVersion>4.0.0</modelVersion>

  <groupId>org.test.ws</groupId>
  <artifactId>soap-test</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>war</packaging>

  <name>soap-test</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>com.sun.xml.ws</groupId>
      <artifactId>jaxws-rt</artifactId>
      <version>2.2</version>
    </dependency>    
    <dependency>
      <groupId>com.sun.istack</groupId>
      <artifactId>istack-commons-runtime</artifactId>
      <version>2.22</version>
    </dependency>    
  </dependencies>

  <build>
    <finalName>HelloService</finalName>
  
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <source>1.6</source>
                <target>1.6</target>
                <encoding>UTF-8</encoding>
            </configuration>
        </plugin>
    </plugins>
    
  </build>

</project>

The two references that I had to make were the following were one of jaxws-rt

<dependency>
  <groupId>com.sun.xml.ws</groupId>
  <artifactId>jaxws-rt</artifactId>
  <version>2.2</version>
</dependency>    

and one for istack-commons-runtime.

<dependency>
  <groupId>com.sun.istack</groupId>
  <artifactId>istack-commons-runtime</artifactId>
  <version>2.22</version>
</dependency>    

Service implementation

We now write our service implementation. For this purposes of this article will be very simple. I took over the pre-generated App.java and renamed it for my purposes to HelloService.java.

package org.test.ws;

import javax.jws.WebMethod;
import javax.jws.WebParam;
import javax.jws.WebService;
import javax.jws.soap.SOAPBinding;

@WebService
@SOAPBinding(style = SOAPBinding.Style.RPC)
public class HelloService 
{
  @WebMethod(operationName = "sayHello")
  public String sayHello(@WebParam(name="guestname") String guestname) {
    if (guestname == null) { return "Hello"; }
    return "Hello " + guestname;
  }
}

Fairly basic, “hello” type service.

Endpoints

We instruct the jaxws framework that we have a service listening at any particular given endpoint by use of the sun-jaxws.xml file. Create this file in src/main/webapp/WEB-INF. It should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<endpoints xmlns="http://java.sun.com/xml/ns/jax-ws/ri/runtime" 
           version="2.0">
  <endpoint name="HelloService" 
            implementation="org.test.ws.HelloService" 
            url-pattern="/helloService" >
  </endpoint>
</endpoints>

To let Tomcat know from a deployment perspective what this application will handle, we also create a web.xml file that will be located in the same directory, src/main/webapp/WEB-INF. It looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" 
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         id="WebApp_ID" 
         version="2.5">
   
  <display-name>jaxwsExample</display-name>
 
  <listener>
    <listener-class>com.sun.xml.ws.transport.http.servlet.WSServletContextListener</listener-class>
  </listener>
  <servlet>
    <servlet-name>helloService</servlet-name>
    <servlet-class>com.sun.xml.ws.transport.http.servlet.WSServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>helloService</servlet-name>
    <url-pattern>/helloService</url-pattern>
  </servlet-mapping>
  <session-config>
    <session-timeout>120</session-timeout>
  </session-config>
</web-app>

Building and Running

At the console, you can now build this project:

mvn clean install

After you have deployed your war file to Tomcat, you service becomes available at http://localhost:8080/HelloService/helloService, this is if you’ve deployed locally; that is.

You’re offered a WSDL that your clients can use to integrate with your service http://localhost:8080/HelloService/helloService?WSDL.

Testing

Now that the service is up and running, we really want to test it to make sure it’s working. SOAP requests are HTTP POSTS. Sending the following request:

POST http://localhost:8080/HelloService/helloService HTTP/1.1
Accept-Encoding: gzip,deflate
Content-Type: text/xml;charset=UTF-8
SOAPAction: ""
Content-Length: 262
Host: 172.17.42.1:8080
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1.1 (java 1.5)
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" 
                  xmlns:ws="http://ws.test.org/">
  <soapenv:Header/>
    <soapenv:Body>
      <ws:sayHello>
        <guestname>Joe</guestname>
      </ws:sayHello>
    </soapenv:Body>
</soapenv:Envelope>

. . . should get you this response.

<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
  <S:Body>
    <ns2:sayHelloResponse xmlns:ns2="http://ws.test.org/">
      <return>Hello Joe</return>
    </ns2:sayHelloResponse>
  </S:Body>
</S:Envelope>

A closer look at Hive

In a previous post we went through a fairly simple example of how to get up and running quickly with Apache Hive. In today’s post I’ll take a deeper dive a look a little closer at the different aspects of using it.

Everything that I mention in this article can be found in the language manual on the Apache wiki.

For the examples that are listed in this blogpost, I’m using data that can be downloaded from the FAA site.

Databases

Your first job, much the same with any database system is to create a database.

hive> CREATE DATABASE first;
OK
Time taken: 0.793 seconds

hive> USE first;
OK
Time taken: 0.037 seconds

You can also use EXISTS in your creation and destruction statements to ensure something is or isn’t there.

hive> CREATE DATABASE IF NOT EXISTS first;
OK
Time taken: 0.065 seconds

hive> DROP DATABASE IF EXISTS first;
OK
Time taken: 0.26 seconds

Tables

To create a table that’s managed by the hive warehouse, we can use the following.

hive> CREATE TABLE airports (
    > iata STRING, airport STRING, city STRING, 
    > state STRING, country STRING, 
    > lat DECIMAL, long DECIMAL
    > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
OK
Time taken: 0.324 seconds

This table can then be filled with data that is sourced locally:

hive> LOAD DATA LOCAL INPATH '/srv/airports.csv' 
    > OVERWRITE INTO TABLE airports;
Loading data to table faa.airports
Table faa.airports stats: [numFiles=1, numRows=0, totalSize=244383, rawDataSize=0]
OK
Time taken: 1.56 seconds

You can also create an external table using the following syntax:

hive> CREATE EXTERNAL TABLE carriers ( 
    > code STRING, description STRING
    > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," 
    > LOCATION '/user/root/carriers';
OK
Time taken: 0.408 seconds

You can see that this has used a file hosted on HDFS as the data source. The idea is that the existing file (that we’d specified in the LOCATION statement) will now be accessible to hive through this table.

From the wiki:

The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated. When dropping an EXTERNAL table, data in the table is NOT deleted from the file system.

An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.

It’s important to note that when you DROP an external table, the underlying data is NOT deleted.

Views

You can provide a more targeted representation of your data to you users by offering them views. Views allow you to also specify aggregate functions as columns. In the following view, we simple retrieve all of the countries that an airport is located; along with the number of airports located in that country.

hive> CREATE VIEW airports_per_country_vw
    > AS
    > SELECT country, COUNT(*) AS country_count 
    > FROM airports 
    > GROUP BY country;
OK
Time taken: 0.134 seconds

Partitions and Buckets

Because you’ll be working with very large data sets, Hive offers you the ability to partition data on columns that you nominate. These partitions are then broken down even further with into buckets.

From the wiki:

Partitioned tables can be created using the PARTITIONED BY clause. A table can have one or more partition columns and a separate data directory is created for each distinct value combination in the partition columns. Further, tables or partitions can be bucketed using CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns. This can improve performance on certain kinds of queries.

So this technique does change the way data is physically structured on disk. It tried to structure it in such a way that it’ll bias towards the performance of the queries that you’re running. Of course, this is up to you as you need to define which fields to partition and cluster by.

Here’s the airports table, partitioned by country.

hive> CREATE EXTERNAL TABLE airport_part_by_country (
    > iata STRING, airport STRING, city STRING, 
    > state STRING, lat DECIMAL, long DECIMAL
    > ) PARTITIONED BY (country STRING) 
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY "," 
    > LOCATION '/user/root/partitioned';
OK
Time taken: 0.128 seconds

When this table gets clustered into buckets, the database developer needs to specify the number of buckets to possible distribute across. From here, hive will make decisions on which bucket to place the data into with the following formula:

target_bucket = hash_value(bucket_column) % bucket_count

We then create and fill the bucketed store like so:

-- create the bucketed store
hive> CREATE EXTERNAL TABLE airports_b (
    > iata string, airport string, city string, 
    > state string, lat decimal, long decimal
    > ) PARTITIONED BY (country string) 
    > CLUSTERED BY (state) INTO 100 BUCKETS;

-- fill the bucketed store
hive> set hive.enforce.bucketing = true;
hive> FROM airports 
    > INSERT OVERWRITE TABLE airports_b 
    > PARTITION (country='USA') 
    > SELECT iata, airport, city, state, lat, long;

nmap Cheatsheet

The following post is a quick guide to getting around the nmap network administration and security tool.

General scanning

Scanning with nmap gives you the insight into what is available to a server (from an external user’s perspective). Information about the techniques that nmap will use can be found here.

# scan a host by ip/name
nmap 192.168.0.1
nmap host.example.com

# scan multiple hosts
nmap 192.168.0.1 192.168.0.2 192.168.0.3
nmap 192.168.0.1,2,3

# range scanning
nmap 192.168.0.1-3
nmap 192.168.0.*

# subnet scanning
nmap 192.168.0.0/24

Utilities

Command Description
nmap -v -A 192.168.0.1 Turn on OS and version detection
nmap -sA 192.168.0.1 Check for a firewall
nmap -PN 192.168.0.1 Scan a firewall protected host
nmap -6 ::1 Scan IPv6 address
nmap -sP 192.168.0.1/24 Check for alive hosts
nmap --reason 192.168.0.1 Document the reason for a service discovery
nmap --open 192.168.0.1 Show open ports
nmap --packet-trace 192.168.0.1 Show packet trace (sent/received)
nmap --iflist Show host interface and routes
nmap -O 192.168.0.1 Detect remote operating system
nmap -sV 192.168.0.1 Detect remote service/daemon version
nmap -sO 192.168.0.1 Scan for IP protocol

Port scans

Command Description
nmap -p 80 192.168.0.1 Scan http
nmap -p T:80 192.168.0.1 Scan tcp/http
nmap -p U:53 192.168.0.1 Scan udp/dns

Firewalls

The following commands scan firewalls for weaknesses

# tcp null scan
nmap -sN 192.168.0.1

# tcp fin scan
nmap -sF 192.168.0.1

# tcp xmas scan
nmap -sX 192.168.0.1

# scan a firewall for packet fragments
nmap -f 192.168.0.1

Spoof

# cloak a scan with decoys
nmap -n -Ddecoy1.example.com,decoy2.example.com 192.168.0.1

# scan with a spoofed mac address
nmap --spoof-mac MAC-ADDRESS-HERE 192.168.0.1

# scan with a random mac address
nmap -v -sT -PN --spoof-mac 0 192.168.0.1

Hadoop job setup

In today’s post, I’m going to refresh the information in some previous articles that I have written to bring them up to date.

Job, Mapper and Reducer

It’s pretty easy to bundle your job, mapper and reducer together. If they’re small enough, it makes sense to do so.

public class MyJobJob 
  extends Configured implements Tool {

  public static class MyMapper 
    extends Mapper<LongWritable, Text, Text, Text> {

    @Override 
    public void setup(Context context) {
      /* setup any configs from the command line */
      this.val = context.getConfiguration().get("some.value");
    }    

    public void map(LongWritable key, Text value, Context context) 
      throws IOException, InterruptedException {      
        /* data selection and source filtering here */
    }
    
  }
  
  public static class MyReducer 
    extends Reducer<Text, Text, NullWritable, Text> {
    
    public void reduce(Text key, Iterable<Text> values, Context context) 
      throws IOException, InterruptedException {
        /* data aggregators here */
    }
    
  }
}

There isn’t much that has changed here. Reading the type annotations can be a little hairy. You can always lookup the documentation for Mapper and for Reducer. The type definitions are uniform:

class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

ToolRunner

The ToolRunner class simplifies the execution management of a MapReduce job using the interface, Tool. Tool is a very simple interface, only providing implementing classes with a contract to run. The run method looks like this:

public abstract int run(java.lang.String[] args) 
  throws java.lang.Exception;

args are supplied as usual from the main method of the job. A typical implementation of the run method will retrieve configuration information, setup the job and execute.

public int run(String[] allArgs) throws Exception {
  
  Job job = Job.getInstance(getConf());

  job.setJarByClass(MyJob.class);

  // basic I/O shape setup 
  job.setInputFormatClass(TextInputFormat.class);
  job.setOutputFormatClass(TextOutputFormat.class);
  job.setOutputKeyClass(NullWritable.class);
  job.setOutputValueClass(Text.class);

  // map, combine, partition, reduce setup 
  job.setMapperClass(MyMapper.class);
  job.setCombinerClass(MyCombiner.class);
  job.setReducerClass(MyReducer.class);
  job.setNumReduceTasks(1);
  
  // parse options passed to the job      
  String[] args = new GenericOptionsParser(
    getConf(), allArgs
  ).getRemainingArgs();
  
  // set the files (from arguments)
  FileInputFormat.setInputPaths(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  
  // wait for the jobs to finish
  boolean status = job.waitForCompletion(true);
  return status ? 0 : 1;
}

A special part of the magic here is wrapped up in the GenericOptionsParser which takes in the standard set of command line parameters and plumbs them directly into the job’s configuration.

Finishing up

So there are a couple of features that are provided for you with this wrapper around the run function. Your main method ends up very simple:

public static void main(String[] args) throws Exception {
  Configuration conf = new Configuration();
  ToolRunner.run(new MyJob(), args);
}

Your job is then invoked from the command line using the hadoop command:

$HADOOP_PREFIX/bin/hadoop jar my-jobs-0.0.1-SNAPSHOT.jar \
  org.example.MyMapReduceJob \
  -D arbitrary.config.value=xyz \
  /user/root/input-file.csv \
  /user/root/output-dir

Make a symbolic link

As a small reminder to myself on creating symbolic links

ls -s /path/to/the/thing /path/to/the/link