|
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnet.sf.regain.crawler.document.AbstractPreparator
net.sf.regain.crawler.preparator.MessagePreparator
public class MessagePreparator
This class prepares messages (MIME, rfc822), specifically spoof email messages.
The document contains the message text and the file names of the attachments.
MessagePreparator| Field Summary | |
|---|---|
private static org.apache.log4j.Logger |
mLog
The logger for this class |
private static java.util.regex.Pattern |
mURLPattern
Regex Compilation to match URLs in body. |
| Fields inherited from interface net.sf.regain.crawler.document.Preparator |
|---|
DEFAULT_BUFFER_SIZE |
| Constructor Summary | |
|---|---|
MessagePreparator()
Creates a new instance of MessagePreparator. |
|
| Method Summary | |
|---|---|
private Collection<String> |
extractURLs(String text)
Extract URLs from text source. |
private javax.mail.Address[] |
fixAddress(javax.mail.internet.AddressException ae,
javax.mail.internet.MimeMessage message,
String headerName)
Occasionally see Addresses that have semi-colons rather than commas, which cause "Illegal semicolon, not in group" AddressException. |
static String |
inputStreamAsString(InputStream stream)
Get the content of an InputStream as String. |
void |
prepare(RawDocument rawDocument)
Prepares the document for indexing. |
private String |
stripNoneWordChars(String uncleanString)
Removes unwanted chars from a given string. |
| Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator |
|---|
accepts, addAdditionalField, cleanUp, close, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, init, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private static org.apache.log4j.Logger mLog
private static java.util.regex.Pattern mURLPattern
| Constructor Detail |
|---|
public MessagePreparator()
throws RegainException
RegainException - If creating of the preparator failed.| Method Detail |
|---|
public void prepare(RawDocument rawDocument)
throws RegainException
rawDocument - The document to prepare.
RegainException - If the preparation fails.
private javax.mail.Address[] fixAddress(javax.mail.internet.AddressException ae,
javax.mail.internet.MimeMessage message,
String headerName)
ae - Address Exception objectmessage - MIME Message objectheaderName - Name of header, e.g. To, From, Reply-To
private Collection<String> extractURLs(String text)
text - input string of text or HTML
private String stripNoneWordChars(String uncleanString)
uncleanString -
public static String inputStreamAsString(InputStream stream)
throws IOException
stream - the InputStream
IOException
|
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||