Regain 1.7.7-STABLE API

net.sf.regain.crawler.config
Class AuxiliaryField

java.lang.Object
  extended by net.sf.regain.crawler.config.AuxiliaryField

public class AuxiliaryField
extends Object

An auxiliary field is a additional field put into the index.

Example: If you have a directory with a sub directory for every project, then you may create a field with the project's name.

The folling rule will create a field "project" with the value "otto23" from the URL "file://c:/projects/otto23/docs/Spez.doc": new AuxiliaryField("project", "^file://c:/projects/([^/]*)", 1)

URLs that doen't match will get no "project" field.

Having done this you may search for "Offer project:otto23" and you will get only hits from this project directory.

Author:
Tilman Schneider, STZ-IDA an der FH Karlsruhe

Field Summary
private  String mFieldName
          The name of the auxiliary field.
private  boolean mIndex
          Specifies whether the field value should be indexed.
private  boolean mStore
          Specifies whether the field value should be stored in the index.
private  boolean mTokenize
          Specifies whether the field value should be tokenized.
private  boolean mToLowerCase
          Specifies whether the (extracted) value should be converted to lower case.
private  org.apache.regexp.RE mUrlRegex
          The regex that extracts the value of the field.
private  int mUrlRegexGroup
          The group of the regex that contains the value.
private  String mValue
          The value of the auxiliary field.
 
Constructor Summary
AuxiliaryField(String fieldName, String value, boolean toLowerCase, org.apache.regexp.RE urlRegex, int urlRegexGroup, boolean store, boolean index, boolean tokenize)
          Creates a new instance of AuxiliaryField.
 
Method Summary
 String getFieldName()
          Gets the name of the auxiliary field.
 boolean getToLowerCase()
          Returns whether the (extracted) value should be converted to lower case.
 org.apache.regexp.RE getUrlRegex()
          Gets the regex that extracts the value of the field.
 int getUrlRegexGroup()
          Gets the group of the regex that contains the value.
 String getValue()
          Returns the value of the auxiliary field.
 boolean isIndexed()
          Returns whether the field value should be indexed.
 boolean isStored()
          Returns whether the field value should be stored in the index.
 boolean isTokenized()
          Returns whether the field value should be tokenized.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mFieldName

private String mFieldName
The name of the auxiliary field.


mValue

private String mValue
The value of the auxiliary field. If null, the value will be extracted from the regex using the urlRegexGroup.


mToLowerCase

private boolean mToLowerCase
Specifies whether the (extracted) value should be converted to lower case.


mUrlRegex

private org.apache.regexp.RE mUrlRegex
The regex that extracts the value of the field.


mUrlRegexGroup

private int mUrlRegexGroup
The group of the regex that contains the value.


mStore

private boolean mStore
Specifies whether the field value should be stored in the index.


mIndex

private boolean mIndex
Specifies whether the field value should be indexed.


mTokenize

private boolean mTokenize
Specifies whether the field value should be tokenized.

Constructor Detail

AuxiliaryField

public AuxiliaryField(String fieldName,
                      String value,
                      boolean toLowerCase,
                      org.apache.regexp.RE urlRegex,
                      int urlRegexGroup,
                      boolean store,
                      boolean index,
                      boolean tokenize)
               throws RegainException
Creates a new instance of AuxiliaryField.

Parameters:
fieldName - The name of the auxiliary field.
value - The value of the auxiliary field. If null, the value will be extracted from the regex using the urlRegexGroup.
toLowerCase - Whether the (extracted) value should be converted to lower case.
urlRegex - The regex that extracts the value of the field.
urlRegexGroup - The group of the regex that contains the value.
store - Specifies whether the field value should be stored in the index.
index - Specifies whether the field value should be indexed.
tokenize - Specifies whether the field value should be tokenized.
Throws:
RegainException - If the regex has a syntax error.
Method Detail

getFieldName

public String getFieldName()
Gets the name of the auxiliary field.

Returns:
The name of the auxiliary field.

getValue

public String getValue()
Returns the value of the auxiliary field.

If null, the value will be extracted from the regex using the urlRegexGroup.

Returns:
The value of the auxiliary field.

getToLowerCase

public boolean getToLowerCase()
Returns whether the (extracted) value should be converted to lower case.

Returns:
Whether the (extracted) value should be converted to lower case.

getUrlRegex

public org.apache.regexp.RE getUrlRegex()
Gets the regex that extracts the value of the field.

Returns:
The regex that extracts the value of the field.

getUrlRegexGroup

public int getUrlRegexGroup()
Gets the group of the regex that contains the value.

Returns:
The group of the regex that contains the value.

isStored

public boolean isStored()
Returns whether the field value should be stored in the index.

Returns:
whether the field value should be stored in the index.

isIndexed

public boolean isIndexed()
Returns whether the field value should be indexed.

Returns:
whether the field value should be indexed.

isTokenized

public boolean isTokenized()
Returns whether the field value should be tokenized.

Returns:
whether the field value should be tokenized.

Regain 1.7.7-STABLE API

Regain 1.7.7-STABLE, Copyright (C) 2004-2010 Til Schneider, www.murfman.de, Thomas Tesche, www.clustersystems.info