ANDS Community

[Guide] Harvester
Page 1 of 1

Author:  Lizwoods [ Fri Oct 25, 2013 9:17 am ]
Post subject:  [Guide] Harvester


The Harvester Service provides a service-oriented framework to support the processing and routing of content and metadata from a source Data Provider to a target application. It potentially makes management of distributed harvests simpler by providing a single harvest application which can service many clients wishing to perform harvesting without the overhead of writing their own embedded harvester.

Harvester Service Installation Guide

Download the latest ANDS-Harvester and extract to your server in <$HARVESTER_SOURCE> directory

The following software must be available in order to install and run the Harvester Service:       
    Sun's Java 1.5 SDK             
    Apache Ant              
    Apache Tomcat 5.x or later

Harvester Service Install process     
Having installed a MySQL server, setup new databases and initialise the tables:   

Create a new database called 'dbs_harvester':
mysql > CREATE DATABASE `dbs_harvester`
Create a database user (else use an existing user and ignore this step):
mysql> CREATE USER '<db_username>' IDENTIFIED BY '<db_password>'
Create the tables and indices:
mysql>  -u root -p dbs_harvester < $HARVESTER_SOURCE/etc/db/mysql/database.sql
Create a directory $HARVESTER_DIR where Harvester log files will be created. This should be a location that is not web accessible and must have read/write access by the Tomcat user
Ensure servlet-api.jar is in the CLASSPATH. This should be found somewhere in the common library directories of the Tomcat install
Build the distribution:
ant -Dinstall_dir=<$HARVESTER_DIR> install
Configure the default Tomcat datasource (connection pooling) by adding the following entry to the server.xml file's Host element:                     

<Context path="/harvester" docBase="harvester" crossContext="true" reloadable="true" debug="1">
   <Resource name="jdbc/mysql" validationQuery="SELECT 1" testOnBorrow="true" auth="Container" type="javax.sql.DataSource" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://db_host:3306/db_name username=“db_username" password="db_password"/>
   <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=“comma separated IP list” />
Ensure the IP and port number in the url attribute reflect the server setup and ensure the harvester database user is correctly configured.

Download ( and copy the appropriate mysql jar file to the Tomcat lib directory  e.g.: {/usr/share/java/tomcat6/mysql-connector-java-5.1.22-bin.jar}
Deploy harvester.war to the Tomcat webapps directory.

Testing the Harvester
NB: The harvester is a web application which could be embedded or integrated with other applications and as such does not provide a security framework or user interface.
Once installed, to ensure the harvester app is running try accessing http://localhost:8080/harvester/getHarvestStatus. This should produce an error response in XML form.
For a more advanced test try the requestHarvest service and provide the base URL of a data provider and create   a test script where to send the output. As an example, if you copy the following code to a file called test.php and deploy it in a  web server such as Apache it will write the output from the harvester to a file (change the file path to reflect your system) when a harvest is run.         
$harvestid = $_POST['harvestid'];
$content = $_POST['content'];
$done = $_POST['done'];
$nextrun = $_POST['date'];
$str = "content=".$content."\nharvest=".$harvestid."\ndate=".$nextrun."\ndone=".$done."\n";
$fh = fopen("/usr/local/harvest/test.txt", "a");
fwrite($fh, $str);
Then create a harvest request in a browser such as:   
Refer to the javadocs (run ant javadoc to generate these) for more information on the services available.

The javadocs can be built by running ant javadoc

Page 1 of 1 All times are UTC + 10 hours [ DST ]
Powered by phpBB® Forum Software © phpBB Group