View unanswered posts | View active topics It is currently Fri Oct 20, 2017 1:21 am



Reply to topic  [ 1 post ] 
[Guide] Harvester 
Author Message
ANDS Staff
User avatar

Joined: Thu Jun 23, 2011 10:31 am
Posts: 8
ANDS-Harvester

The Harvester Service provides a service-oriented framework to support the processing and routing of content and metadata from a source Data Provider to a target application. It potentially makes management of distributed harvests simpler by providing a single harvest application which can service many clients wishing to perform harvesting without the overhead of writing their own embedded harvester.

Harvester Service Installation Guide

Download the latest ANDS-Harvester and extract to your server in <$HARVESTER_SOURCE> directory
Code:
wget https://github.com/au-research/ANDS-Harvester/archive/master.zip

Pre-requisites  
The following software must be available in order to install and run the Harvester Service:       
                
    Sun's Java 1.5 SDK             
    MySQL              
    Apache Ant              
    Apache Tomcat 5.x or later

Harvester Service Install process     
  
Having installed a MySQL server, setup new databases and initialise the tables:   

Create a new database called 'dbs_harvester':
Code:
mysql > CREATE DATABASE `dbs_harvester`
           
Create a database user (else use an existing user and ignore this step):
Code:
mysql> CREATE USER '<db_username>' IDENTIFIED BY '<db_password>'
          
Create the tables and indices:
Code:
mysql>  -u root -p dbs_harvester < $HARVESTER_SOURCE/etc/db/mysql/database.sql
    
Create a directory $HARVESTER_DIR where Harvester log files will be created. This should be a location that is not web accessible and must have read/write access by the Tomcat user
            
Ensure servlet-api.jar is in the CLASSPATH. This should be found somewhere in the common library directories of the Tomcat install
            
Build the distribution:
Code:
cd $HARVESTER_SOURCE
            
ant -Dinstall_dir=<$HARVESTER_DIR> install
 
           
            
Configure the default Tomcat datasource (connection pooling) by adding the following entry to the server.xml file's Host element:                     

Code:
<Context path="/harvester" docBase="harvester" crossContext="true" reloadable="true" debug="1">
   
   <Resource name="jdbc/mysql" validationQuery="SELECT 1" testOnBorrow="true" auth="Container" type="javax.sql.DataSource" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://db_host:3306/db_name username=“db_username" password="db_password"/>
   <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=“comma separated IP list” />
</Context>
  
               
Ensure the IP and port number in the url attribute reflect the server setup and ensure the harvester database user is correctly configured.

Download (http://jdbc.postgresql.org/download.html) and copy the appropriate mysql jar file to the Tomcat lib directory  e.g.: {/usr/share/java/tomcat6/mysql-connector-java-5.1.22-bin.jar}
          
Deploy harvester.war to the Tomcat webapps directory.

Testing the Harvester
        
NB: The harvester is a web application which could be embedded or integrated with other applications and as such does not provide a security framework or user interface.
                
Once installed, to ensure the harvester app is running try accessing http://localhost:8080/harvester/getHarvestStatus. This should produce an error response in XML form.
        
For a more advanced test try the requestHarvest service and provide the base URL of a data provider and create   a test script where to send the output. As an example, if you copy the following code to a file called test.php and deploy it in a  web server such as Apache it will write the output from the harvester to a file (change the file path to reflect your system) when a harvest is run.         
Code:
<?php
                                     
$harvestid = $_POST['harvestid'];
                                     
$content = $_POST['content'];
                                     
$done = $_POST['done'];
                                     
$nextrun = $_POST['date'];
                                            
$str = "content=".$content."\nharvest=".$harvestid."\ndate=".$nextrun."\ndone=".$done."\n";
                                     
$fh = fopen("/usr/local/harvest/test.txt", "a");
                                     
fwrite($fh, $str);
                                     
fclose($fh);
                                
?>                 
    
Then create a harvest request in a browser such as:   
Code:
http://my-tomcat.edu/harvester/requestHarvest?responsetargeturl=http://my-apache.edu/test.php&harvestid=test&sourceurl=http://any-data-provider.edu/oai-pmh&method=PMH
     
Refer to the javadocs (run ant javadoc to generate these) for more information on the services available.

Javadocs      
The javadocs can be built by running ant javadoc


Fri Oct 25, 2013 9:17 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 1 post ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software for PTF.