ANDS Community
http://community.ands.org.au/

[Guide] Harvester
http://community.ands.org.au/viewtopic.php?f=211&t=3201
Page 1 of 1

Author:  Lizwoods [ Fri Oct 25, 2013 9:17 am ]
Post subject:  [Guide] Harvester

ANDS-Harvester

The Harvester Service provides a service-oriented framework to support the processing and routing of content and metadata from a source Data Provider to a target application. It potentially makes management of distributed harvests simpler by providing a single harvest application which can service many clients wishing to perform harvesting without the overhead of writing their own embedded harvester.

Harvester Service Installation Guide

Download the latest ANDS-Harvester and extract to your server in <$HARVESTER_SOURCE> directory
Code:
wget https://github.com/au-research/ANDS-Harvester/archive/master.zip

Pre-requisites  
The following software must be available in order to install and run the Harvester Service:       
                
    Sun's Java 1.5 SDK             
    MySQL              
    Apache Ant              
    Apache Tomcat 5.x or later

Harvester Service Install process     
  
Having installed a MySQL server, setup new databases and initialise the tables:   

Create a new database called 'dbs_harvester':
Code:
mysql > CREATE DATABASE `dbs_harvester`
           
Create a database user (else use an existing user and ignore this step):
Code:
mysql> CREATE USER '<db_username>' IDENTIFIED BY '<db_password>'
          
Create the tables and indices:
Code:
mysql>  -u root -p dbs_harvester < $HARVESTER_SOURCE/etc/db/mysql/database.sql
    
Create a directory $HARVESTER_DIR where Harvester log files will be created. This should be a location that is not web accessible and must have read/write access by the Tomcat user
            
Ensure servlet-api.jar is in the CLASSPATH. This should be found somewhere in the common library directories of the Tomcat install
            
Build the distribution:
Code:
cd $HARVESTER_SOURCE
            
ant -Dinstall_dir=<$HARVESTER_DIR> install
 
           
            
Configure the default Tomcat datasource (connection pooling) by adding the following entry to the server.xml file's Host element:                     

Code:
<Context path="/harvester" docBase="harvester" crossContext="true" reloadable="true" debug="1">
   
   <Resource name="jdbc/mysql" validationQuery="SELECT 1" testOnBorrow="true" auth="Container" type="javax.sql.DataSource" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://db_host:3306/db_name username=“db_username" password="db_password"/>
   <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=“comma separated IP list” />
</Context>
  
               
Ensure the IP and port number in the url attribute reflect the server setup and ensure the harvester database user is correctly configured.

Download (http://jdbc.postgresql.org/download.html) and copy the appropriate mysql jar file to the Tomcat lib directory  e.g.: {/usr/share/java/tomcat6/mysql-connector-java-5.1.22-bin.jar}
          
Deploy harvester.war to the Tomcat webapps directory.

Testing the Harvester
        
NB: The harvester is a web application which could be embedded or integrated with other applications and as such does not provide a security framework or user interface.
                
Once installed, to ensure the harvester app is running try accessing http://localhost:8080/harvester/getHarvestStatus. This should produce an error response in XML form.
        
For a more advanced test try the requestHarvest service and provide the base URL of a data provider and create   a test script where to send the output. As an example, if you copy the following code to a file called test.php and deploy it in a  web server such as Apache it will write the output from the harvester to a file (change the file path to reflect your system) when a harvest is run.         
Code:
<?php
                                     
$harvestid = $_POST['harvestid'];
                                     
$content = $_POST['content'];
                                     
$done = $_POST['done'];
                                     
$nextrun = $_POST['date'];
                                            
$str = "content=".$content."\nharvest=".$harvestid."\ndate=".$nextrun."\ndone=".$done."\n";
                                     
$fh = fopen("/usr/local/harvest/test.txt", "a");
                                     
fwrite($fh, $str);
                                     
fclose($fh);
                                
?>                 
    
Then create a harvest request in a browser such as:   
Code:
http://my-tomcat.edu/harvester/requestHarvest?responsetargeturl=http://my-apache.edu/test.php&harvestid=test&sourceurl=http://any-data-provider.edu/oai-pmh&method=PMH
     
Refer to the javadocs (run ant javadoc to generate these) for more information on the services available.

Javadocs      
The javadocs can be built by running ant javadoc

Page 1 of 1 All times are UTC + 10 hours [ DST ]
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/