Regain 2.1.0-STABLE API

net.sf.regain.crawler.plugin
Class CrawlerPluginManager

java.lang.Object
  extended by net.sf.regain.crawler.plugin.CrawlerPluginManager

public class CrawlerPluginManager
extends Object

Guarantees: - If one plugin throws an exception, the other plugins will be executed none-the-less - Every argument of a plugin call is non-null Singleton pattern: get the only instance by calling getInstance().

Author:
Benjamin

Field Summary
private  int insertIndex
          Count up for every inserted Plugin.
private static CrawlerPluginManager instance
          The single Manager Instance.
private static int MAX_PLUGINS
          Guessed maximum number of plugins.
private static org.apache.log4j.Logger mLog
          Logger instance
private  int nextOrder
          Keep a record of the next value "order" so that the plugin is inserted at the end of queue
private  SortedMap<Integer,CrawlerPlugin> plugins
          List of registered Plugins (in order of call) (Dev note: Priority Queue didn't work out: iterator is not ordered, only poll is)
 
Constructor Summary
protected CrawlerPluginManager()
           
 
Method Summary
private  String argTypesToString(Class<?>[] argTypes)
          Convert argument Types into a string represantation
private  void checkArgsNotNull(Object[] args)
          Check if the array does not contain any null value.
protected  void checkIfEventExists(String methodName, Class<?>[] argTypes)
          Check if a certain eventName exists in the CrawlerPlugin Interface
 void clear()
          Unregister all Plugins
 void eventAcceptURL(String url, CrawlerJob job)
          Trigger Event: onAcceptURL
 void eventAfterPrepare(RawDocument document, WriteablePreparator preparator)
          Trigger Event: onAfterPrepare
 boolean eventAskDynamicBlacklist(String url, String sourceUrl, String sourceLinkText)
          Trigger Event: checkDynamicBlacklist (This is not lazy: all plugins are called even if the first returns true.)
 void eventBeforePrepare(RawDocument document, WriteablePreparator preparator)
          Trigger Event: onBeforePrepare
 void eventCreateIndexEntry(org.apache.lucene.document.Document doc, org.apache.lucene.index.IndexWriter index)
          Trigger Event: onCreateIndexEntry
 void eventDeclineURL(String url)
          Trigger Event: onDeclineURL
 void eventDeleteIndexEntry(org.apache.lucene.document.Document doc, org.apache.lucene.index.IndexReader index)
          Trigger Event: onDeleteIndexEntry
 void eventFinishCrawling(Crawler crawler)
          Trigger Event: onFinishCrawling
 void eventStartCrawling(Crawler crawler)
          Trigger Event: onStartCrawling
static CrawlerPluginManager getInstance()
          Instead of Constructor: get a singleton instance of the Manager, so that only one manager exists at a time.
 void registerPlugin(CrawlerPlugin plugin)
          Register a Plugin at the end of the current queue.
 void registerPlugin(CrawlerPlugin plugin, int order)
          Register a Plugin at a certain position
 String toString()
          Lists contained plugins for debugging purposes
protected  List<Object> triggerEvent(String methodName, Class<?>[] argTypes, Object... args)
          Trigger an event: call the corresponding plugins.
 void unregisterPlugin(CrawlerPlugin plugin)
          Unregister an already registered plugin.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

MAX_PLUGINS

private static final int MAX_PLUGINS
Guessed maximum number of plugins. Note that this is not a hard limit: there can be more plugins, however, inserting plugins with the same "order" may not be inserted at the end.

See Also:
Constant Field Values

plugins

private SortedMap<Integer,CrawlerPlugin> plugins
List of registered Plugins (in order of call) (Dev note: Priority Queue didn't work out: iterator is not ordered, only poll is)


instance

private static CrawlerPluginManager instance
The single Manager Instance.


mLog

private static org.apache.log4j.Logger mLog
Logger instance


nextOrder

private int nextOrder
Keep a record of the next value "order" so that the plugin is inserted at the end of queue


insertIndex

private int insertIndex
Count up for every inserted Plugin.

Constructor Detail

CrawlerPluginManager

protected CrawlerPluginManager()
Method Detail

getInstance

public static CrawlerPluginManager getInstance()
Instead of Constructor: get a singleton instance of the Manager, so that only one manager exists at a time.

Returns:
The Plugin Manager

registerPlugin

public void registerPlugin(CrawlerPlugin plugin)
Register a Plugin at the end of the current queue.

Parameters:
plugin - Plugin to register

registerPlugin

public void registerPlugin(CrawlerPlugin plugin,
                           int order)
Register a Plugin at a certain position

Parameters:
plugin - Plugin to register
order - Place where to insert the plugin (The lower the order, the earlier the plugin is called relatively to other plugins)
Throws:
NullPointerException - if plugin is null

unregisterPlugin

public void unregisterPlugin(CrawlerPlugin plugin)
Unregister an already registered plugin. Note: you need to keep the reference of the plugin instance you registered, if you plan to unregister it later on. Alternatively, configure your plugin's equal()-Function so that it returns true if only the Classname is the same.

Parameters:
plugin -

clear

public void clear()
Unregister all Plugins


triggerEvent

protected List<Object> triggerEvent(String methodName,
                                    Class<?>[] argTypes,
                                    Object... args)
Trigger an event: call the corresponding plugins. Collect return values. TODO : Profiling? (This is done via Reflection API to avoid code duplication)

Parameters:
methodName - Name of Event (as in the interface: onEvent)
args - Args of Event (as in the interface)
Returns:
Return Values of the called methods. Null if the called method threw an exception.

checkArgsNotNull

private void checkArgsNotNull(Object[] args)
Check if the array does not contain any null value.

Parameters:
args - Array of arguments
Throws:
IllegalArgumentException - if a null value is detected

checkIfEventExists

protected void checkIfEventExists(String methodName,
                                  Class<?>[] argTypes)
Check if a certain eventName exists in the CrawlerPlugin Interface

Parameters:
methodName - "on" + eventName
argTypes - Types of the arguments

argTypesToString

private String argTypesToString(Class<?>[] argTypes)
Convert argument Types into a string represantation

Parameters:
argTypes - Types of the arguments
Returns:
Stringified types (e.g. java.lang.String, java.lang.String)

eventStartCrawling

public void eventStartCrawling(Crawler crawler)
Trigger Event: onStartCrawling

Parameters:
crawler - Crawler instance (caller)
See Also:
CrawlerPlugin.onStartCrawling(Crawler)

eventFinishCrawling

public void eventFinishCrawling(Crawler crawler)
Trigger Event: onFinishCrawling

Parameters:
crawler - Crawler instance (caller)
See Also:
CrawlerPlugin.onFinishCrawling(Crawler)

eventBeforePrepare

public void eventBeforePrepare(RawDocument document,
                               WriteablePreparator preparator)
Trigger Event: onBeforePrepare

Parameters:
document - Document to prepare
preparator - Preparator that will prepare
See Also:
CrawlerPlugin.onBeforePrepare(RawDocument, WriteablePreparator)

eventAfterPrepare

public void eventAfterPrepare(RawDocument document,
                              WriteablePreparator preparator)
Trigger Event: onAfterPrepare

Parameters:
document - Document to prepare
preparator - Preparator that prepared
See Also:
CrawlerPlugin.onAfterPrepare(RawDocument, WriteablePreparator)

eventCreateIndexEntry

public void eventCreateIndexEntry(org.apache.lucene.document.Document doc,
                                  org.apache.lucene.index.IndexWriter index)
Trigger Event: onCreateIndexEntry

Parameters:
doc - Document to add
index - Index where it will be added
See Also:
CrawlerPlugin.onCreateIndexEntry(Document, IndexWriter)

eventDeleteIndexEntry

public void eventDeleteIndexEntry(org.apache.lucene.document.Document doc,
                                  org.apache.lucene.index.IndexReader index)
Trigger Event: onDeleteIndexEntry

Parameters:
doc - Document to delete
index - Index where it will be deleted
See Also:
CrawlerPlugin.onDeleteIndexEntry(Document, IndexReader)

eventAcceptURL

public void eventAcceptURL(String url,
                           CrawlerJob job)
Trigger Event: onAcceptURL

Parameters:
url - URL that was accepted
job - Resulting Job
See Also:
CrawlerPlugin.onAcceptURL(String, CrawlerJob)

eventDeclineURL

public void eventDeclineURL(String url)
Trigger Event: onDeclineURL

Parameters:
url - URL that was declined
See Also:
CrawlerPlugin.onDeclineURL(String)

eventAskDynamicBlacklist

public boolean eventAskDynamicBlacklist(String url,
                                        String sourceUrl,
                                        String sourceLinkText)
Trigger Event: checkDynamicBlacklist (This is not lazy: all plugins are called even if the first returns true.)

Parameters:
url -
sourceUrl -
sourceLinkText -
Returns:
True if blacklisted by at least one of the plugins.
See Also:
CrawlerPlugin.checkDynamicBlacklist(String, String, String)

toString

public String toString()
Lists contained plugins for debugging purposes

Overrides:
toString in class Object
Returns:
Debugging output: contained plugins.

Regain 2.1.0-STABLE API

Regain 2.1.0-STABLE, Copyright (C) 2004-2010 Til Schneider, www.murfman.de, Thomas Tesche, www.clustersystems.info