| 
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnet.sf.regain.crawler.document.AbstractPreparator
net.sf.regain.crawler.preparator.MessagePreparator
public class MessagePreparator
This class prepares messages (MIME, rfc822), specifically spoof email messages.
The document contains the message text and the file names of the attachments.
MessagePreparator| Field Summary | |
|---|---|
private static org.apache.log4j.Logger | 
mLog
The logger for this class  | 
private static java.util.regex.Pattern | 
mURLPattern
Regex Compilation to match URLs in body.  | 
| Fields inherited from interface net.sf.regain.crawler.document.Preparator | 
|---|
DEFAULT_BUFFER_SIZE | 
| Constructor Summary | |
|---|---|
MessagePreparator()
Creates a new instance of MessagePreparator.  | 
|
| Method Summary | |
|---|---|
private  Collection<String> | 
extractURLs(String text)
Extract URLs from text source.  | 
private  javax.mail.Address[] | 
fixAddress(javax.mail.internet.AddressException ae,
           javax.mail.internet.MimeMessage message,
           String headerName)
Occasionally see Addresses that have semi-colons rather than commas, which cause "Illegal semicolon, not in group" AddressException.  | 
static String | 
inputStreamAsString(InputStream stream)
Get the content of an InputStream as String.  | 
 void | 
prepare(RawDocument rawDocument)
Prepares the document for indexing.  | 
private  String | 
stripNoneWordChars(String uncleanString)
Removes unwanted chars from a given string.  | 
| Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator | 
|---|
accepts, addAdditionalField, cleanUp, close, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, init, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex | 
| Methods inherited from class java.lang.Object | 
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Field Detail | 
|---|
private static org.apache.log4j.Logger mLog
private static java.util.regex.Pattern mURLPattern
| Constructor Detail | 
|---|
public MessagePreparator()
                  throws RegainException
RegainException - If creating of the preparator failed.| Method Detail | 
|---|
public void prepare(RawDocument rawDocument)
             throws RegainException
rawDocument - The document to prepare.
RegainException - If the preparation fails.
private javax.mail.Address[] fixAddress(javax.mail.internet.AddressException ae,
                                        javax.mail.internet.MimeMessage message,
                                        String headerName)
ae - Address Exception objectmessage - MIME Message objectheaderName - Name of header, e.g. To, From, Reply-To
private Collection<String> extractURLs(String text)
text - input string of text or HTML
private String stripNoneWordChars(String uncleanString)
uncleanString - 
public static String inputStreamAsString(InputStream stream)
                                  throws IOException
stream - the InputStream
IOException
  | 
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||