Hadoop Developer – WordCount tutorial using Maven and NetBeans 7.3RC2

I have adapted the WordCount tutorial to Maven based development as this probably the most popular way to develop in companies. I am not going to rewrite how the WordCount tutorial works but it aims to get you up-and-running with Hadoop development pretty quickly.

I used NetBeans 7.3RC2 because of its integration with Maven but feel free to use an IDE of your choice. I am also using Ubuntu 12.10 64Bit as a development enviroment. I installed the Hadoop debian distribution package.

When running your WordCount application, Hadoop might throw an out of memory exception, this is because the default settings are -Xmx100m. Apache website mentioned how to fix it but it’s not relevant if you install it using the Debian distribution. Here is a quick solution, open the /usr/bin/hadoop (changing this file has no effect and doesn’t fix the problem /etc/hadoop/hadoop-env.sh):

  1. set your JAVA to the actual JVM path that you want to use.
  2. set JAVA_HEAP_MAX to increase the available memory to the applications i.e. -Xmx3000m
Here are the steps to creating the WordCount tutorial in NetBeans:
  1. Create a new Maven based Java project
    • NetBeans will create an App.java class, you can rename it to WordCount or leave it as it doesn’t affect the outcome of the tutorial. I will refer to the main class as App.java.
  2. Add the Hadoop dependencies, they are available in Maven Central. I used the hadoop-core.1.1.1 for this tutorial.
  3. Important: Maven doesn’t package dependencies when building application unless you are working with a “war” project where it will create a lib folder. In order to make sure that our libraries are available to the our program when packaged, we need to add the maven-assembly-plugin to our pom.xml. We also declare a our “Main” class which will be used to execute the program.
  4. Open App.java (or whatever you have renamed it to) and write the following:

You can create your Hadoop “input” directory and mount it to be HDFS then execute the following:

$ hadoop dfs -ls input

$ hadoop dfs -cat input/file01 

$ hadoop jar WordCount.jar com.etapix.wordcount.App input output

This is assuming that you are running from your project home directory and that you have installed Hadoop using the Debian distribution or you can follow the rest of the tutorial from the Apache website


Hacking Liferay – Securing against online vulnerabilities

This is a brief post on securing Liferay on Tomcat and MySQL.
Liferay CE is stable enterprise portal, as more companies
start to adopt it, therefore security becomes a very important aspect of the
deployment. I am not sure if Liferay has been officially tested by a 3rd
party security firm but based on my simple security test against OWASP  Top 10 vulnerabilities, I can say that it looks good in that aspect. Some of the recommendations are taken from their
respective sites while others are based on our testing. We tested the following
on Linux Ubuntu 12.04 LTS.
Here is what I did for a quick test (using default
installation of liferay-portal-tomcat-6.1.1-ce-ga2-20120731132656558) :
Download the Zed Attack Proxy (ZAP) from OWASP
Make ZAP is set to run the following attacks:
Path traversal
Remote file Inclusion
URL Redirector Abuse
Server Side Include
Cross Site Scripting
SQL Injection
Directory Browsing
Session ID in URL rewrite
Secure page browser cache
External redirect
CRLF injection
Parameter tampering
Run Liferay with default settings
Now sit back and watch Liferay logs go “CRAZY”
Passing the OSWAP Top 10 vulnerabilities doesn’t mean that
you are out of the water yet. This test just focuses on browser based
penetration tests.
Here some steps to have an even more secured Liferay
deployment on Tomcat.

Make sure Tomcat uses SSL
to serve Liferay content

Make sure that you do not run Tomcat as “root”
Tomcat user should not have remote access to the

Disable auto-deployment of web applications

Change the file permissions on the Tomcat folder;
all Tomcat files should be owned by “root” user with group Tomcat  and
whilst owner has read/write privileges, group only has read and world has no
permissions. The exceptions are the logs, temp and work directory that are
owned by the Tomcat user rather than root. This means that even if an attacker
compromises the Tomcat process, they can’t change the Tomcat configuration,
deploy new web applications or modify existing web applications. The Tomcat
process runs with a umask of 007 to maintain these permissions.

Tomcat Security Manager, this causes web application to be run in a sandbox

In Server.xml
do the following:
Disable the
shutdown port by setting its attribute to -1
sure that Tomcat HTTP connectors only to designated IP address; by default the
connectors listen to all configured IP addresses
the “ciphers” attribute used for SSL connections. By default, Tomcat uses the
default ciphers for the JVM which contains weak export grade ciphers
There are more
Tomcat settings which is available online.
You also need to
make sure that you secure your Operating System and Network. Now that we have
some basic security in place for Tomcat, let’s now tackle the our database. In
this test, we used MySQL 5.
Here is some basic
MySQL security:

Set a
root password for MySQL

all anonymous accounts

non-local root access

all test databases and any access rules related to them

Reload privilege
tables to apply the changes

SSL connection for MySQL, the default connection
is unencrypted
Now to conclude, let’s
secure our Liferay instance. Liferay is configured through portal.properties
and you should override those settings in portal-ext.properties. Create the
file if it doesn’t exist:

Set web.server.host=MY-HOST-NAME
so that it is not dynamically set

Set the preferred
protocol to web.server.protocol=https

If you
want Liferay to be only accessible from certain IP addresses, set

To make
Liferay only accessible through HTTPS, set main.servlet.https.required=true

Secure the
Axis servlet as follow:

the JSON Tunnel Servlet as follow:

Secure Liferay Tunnel Servlet as follow:

Secure Spring Remoting Servlet

Securing the Webdav Servlet

Make sure you have configured the Admin
portlet by overriding all the default values

The IFrame Portlet, when used in a high
security environment, should have the following properties set

JAAS security need to have properties set:
To stop user from passing in encrypted
password: portal.jaas.strict.password=true

Passwords: Choose a strong password
encryption algorithm to encrypt passwords by setting the following:
I am sure that many
other security settings are left out so feel free to share in the comments. I
hope this helps someone to secure their Liferay environment.