Regain 2.1.0-STABLE API

net.sf.regain.crawler.config
Class AuxiliaryField

java.lang.Object
  extended by net.sf.regain.crawler.config.AuxiliaryField

public class AuxiliaryField
extends Object

An auxiliary field is a additional field put into the index.

Example: If you have a directory with a sub directory for every project, then you may create a field with the project's name.

The following rule will create a field "project" with the value "otto23" from the URL "file://c:/projects/otto23/docs/Spez.doc": new AuxiliaryField("project", "^file://c:/projects/([^/]*)", 1)

URLs that doesn't match will get no "project" field.

Having done this you may search for "Offer project:otto23" and you will get only hits from this project directory.

Author:
Tilman Schneider, www.murfman.de

Nested Class Summary
static class AuxiliaryField.SourceField
          The source field types
 
Field Summary
private  boolean mIndex
          Specifies whether the field value should be indexed.
private  org.apache.regexp.RE mRegex
          The regex that extracts the value of the field.
private  int mRegexGroup
          The group of the regex that contains the value.
private  AuxiliaryField.SourceField mSourceField
          The source field on which to apply the regex.
private  boolean mStore
          Specifies whether the field value should be stored in the index.
private  String mTargetFieldName
          The name of the auxiliary field to create.
private  boolean mTokenize
          Specifies whether the field value should be tokenized.
private  boolean mToLowerCase
          Specifies whether the (extracted) value should be converted to lower case.
private  String mValue
          The value of the auxiliary field.
 
Constructor Summary
AuxiliaryField(AuxiliaryField.SourceField sourceField, String targetFieldName, String value, boolean toLowerCase, org.apache.regexp.RE regex, int regexGroup, boolean store, boolean index, boolean tokenize)
          Creates a new instance of AuxiliaryField.
 
Method Summary
 org.apache.regexp.RE getRegex()
          Gets the regex that extracts the value of the field.
 int getRegexGroup()
          Gets the group of the regex that contains the value.
 AuxiliaryField.SourceField getSourceField()
          Returns the source field on which to apply the regex.
 String getTargetFieldName()
          Gets the name of the auxiliary field to create.
 boolean getToLowerCase()
          Returns whether the (extracted) value should be converted to lower case.
 String getValue()
          Returns the value of the auxiliary field.
 boolean isIndexed()
          Returns whether the field value should be indexed.
 boolean isStored()
          Returns whether the field value should be stored in the index.
 boolean isTokenized()
          Returns whether the field value should be tokenized.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mSourceField

private AuxiliaryField.SourceField mSourceField
The source field on which to apply the regex.


mTargetFieldName

private String mTargetFieldName
The name of the auxiliary field to create.


mValue

private String mValue
The value of the auxiliary field. If null, the value will be extracted from the regex using the urlRegexGroup.


mToLowerCase

private boolean mToLowerCase
Specifies whether the (extracted) value should be converted to lower case.


mRegex

private org.apache.regexp.RE mRegex
The regex that extracts the value of the field.


mRegexGroup

private int mRegexGroup
The group of the regex that contains the value.


mStore

private boolean mStore
Specifies whether the field value should be stored in the index.


mIndex

private boolean mIndex
Specifies whether the field value should be indexed.


mTokenize

private boolean mTokenize
Specifies whether the field value should be tokenized.

Constructor Detail

AuxiliaryField

public AuxiliaryField(AuxiliaryField.SourceField sourceField,
                      String targetFieldName,
                      String value,
                      boolean toLowerCase,
                      org.apache.regexp.RE regex,
                      int regexGroup,
                      boolean store,
                      boolean index,
                      boolean tokenize)
               throws RegainException
Creates a new instance of AuxiliaryField.

Parameters:
sourceField - The source field on which to apply the regex.
targetFieldName - The name of the auxiliary field.
value - The value of the auxiliary field. If null, the value will be extracted from the regex using the urlRegexGroup.
toLowerCase - Whether the (extracted) value should be converted to lower case.
regex - The regex that extracts the value of the field.
regexGroup - The group of the regex that contains the value.
store - Specifies whether the field value should be stored in the index.
index - Specifies whether the field value should be indexed.
tokenize - Specifies whether the field value should be tokenized.
Throws:
RegainException - If the regex has a syntax error.
Method Detail

getSourceField

public AuxiliaryField.SourceField getSourceField()
Returns the source field on which to apply the regex.

Returns:
The source field on which to apply the regex.

getTargetFieldName

public String getTargetFieldName()
Gets the name of the auxiliary field to create.

Returns:
The name of the auxiliary field to create.

getValue

public String getValue()
Returns the value of the auxiliary field.

If null, the value will be extracted from the regex using the urlRegexGroup.

Returns:
The value of the auxiliary field.

getToLowerCase

public boolean getToLowerCase()
Returns whether the (extracted) value should be converted to lower case.

Returns:
Whether the (extracted) value should be converted to lower case.

getRegex

public org.apache.regexp.RE getRegex()
Gets the regex that extracts the value of the field.

Returns:
The regex that extracts the value of the field.

getRegexGroup

public int getRegexGroup()
Gets the group of the regex that contains the value.

Returns:
The group of the regex that contains the value.

isStored

public boolean isStored()
Returns whether the field value should be stored in the index.

Returns:
whether the field value should be stored in the index.

isIndexed

public boolean isIndexed()
Returns whether the field value should be indexed.

Returns:
whether the field value should be indexed.

isTokenized

public boolean isTokenized()
Returns whether the field value should be tokenized.

Returns:
whether the field value should be tokenized.

Regain 2.1.0-STABLE API

Regain 2.1.0-STABLE, Copyright (C) 2004-2010 Til Schneider, www.murfman.de, Thomas Tesche, www.clustersystems.info