|
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnet.sf.regain.crawler.plugin.AbstractCrawlerPlugin
public abstract class AbstractCrawlerPlugin
Abstract Crawler Plugin. Contains empty stub method for each event.
CrawlerPlugin| Constructor Summary | |
|---|---|
AbstractCrawlerPlugin()
|
|
| Method Summary | |
|---|---|
boolean |
checkDynamicBlacklist(String url,
String sourceUrl,
String sourceLinkText)
Allows to blacklist specific URLs. |
void |
init(PreparatorConfig config)
Initializes the preparator or plugin. |
void |
onAcceptURL(String url,
CrawlerJob job)
Called during the crawling process when a new URL is added to the processing Queue. |
void |
onAfterPrepare(RawDocument document,
WriteablePreparator preparator)
Called after a document is being prepared to be added to the index. |
void |
onBeforePrepare(RawDocument document,
WriteablePreparator preparator)
Called before a document is being prepared to be added to the index. |
void |
onCreateIndexEntry(org.apache.lucene.document.Document doc,
org.apache.lucene.index.IndexWriter index)
Called when a document as added to the index. |
void |
onDeclineURL(String url)
Called during the crawling process when a new URL is declined to be added to the processing Queue. |
void |
onDeleteIndexEntry(org.apache.lucene.document.Document doc,
org.apache.lucene.index.IndexReader index)
Called when a document is deleted from the index. |
void |
onFinishCrawling(Crawler crawler)
Called after the crawling process has finished or aborted (because of an exception). |
void |
onStartCrawling(Crawler crawler)
Called before the crawling process starts (Crawler::run()). |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public AbstractCrawlerPlugin()
| Method Detail |
|---|
public void onStartCrawling(Crawler crawler)
CrawlerPlugin
onStartCrawling in interface CrawlerPlugincrawler - The crawler instance that is about to begin crawlingpublic void onFinishCrawling(Crawler crawler)
CrawlerPlugin
onFinishCrawling in interface CrawlerPlugincrawler - The crawler instance that is about to finish crawling
public boolean checkDynamicBlacklist(String url,
String sourceUrl,
String sourceLinkText)
CrawlerPlugin
checkDynamicBlacklist in interface CrawlerPluginurl - URL of the crawling job that should normally be added.sourceUrl - The URL where the url above has been found (a-Tag, PDF or similar)sourceLinkText - The label of the URL in the document where the url above has been found.
public void onAcceptURL(String url,
CrawlerJob job)
CrawlerPlugin
onAcceptURL in interface CrawlerPluginurl - URL that just was acceptedjob - CrawlerJob that was created as a consequencepublic void onDeclineURL(String url)
CrawlerPlugin
onDeclineURL in interface CrawlerPluginurl - URL that just was declined
public void onCreateIndexEntry(org.apache.lucene.document.Document doc,
org.apache.lucene.index.IndexWriter index)
CrawlerPlugin
onCreateIndexEntry in interface CrawlerPlugindoc - Document to writeindex - Lucene Index Writer
public void onDeleteIndexEntry(org.apache.lucene.document.Document doc,
org.apache.lucene.index.IndexReader index)
CrawlerPlugin
onDeleteIndexEntry in interface CrawlerPlugindoc - Document to readindex - Luce Index Reader
public void onBeforePrepare(RawDocument document,
WriteablePreparator preparator)
CrawlerPlugin
onBeforePrepare in interface CrawlerPlugindocument - Regain document that will be analysedpreparator - Preparator that was chosen to analyse this document
public void onAfterPrepare(RawDocument document,
WriteablePreparator preparator)
CrawlerPlugin
onAfterPrepare in interface CrawlerPlugindocument - Regain document that was analysedpreparator - Preparator that has analysed this document
public void init(PreparatorConfig config)
throws RegainException
Pluggable
init in interface Pluggableconfig - The configuration for this preparator or plugin.
RegainException - When the regular expression or the configuration
has an error.
|
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||