Package com.hankcs.algorithm
Class AhoCorasickDoubleArrayTrie<V>
java.lang.Object
com.hankcs.algorithm.AhoCorasickDoubleArrayTrie<V>
- All Implemented Interfaces:
Serializable
An implementation of Aho Corasick algorithm based on Double Array Trie
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate classA builder to build the AhoCorasickDoubleArrayTriestatic classA result outputstatic interfaceProcessor handles the output when hit a keywordstatic interfaceCallback that allows to cancel the search process.static interfaceProcessor handles the output when hit a keyword, with more detail -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected int[]base array of the Double Array Trie structureprotected int[]check array of the Double Array Trie structureprotected int[]fail table of the Aho Corasick automataprotected int[]the length of every keyprotected int[][]output table of the Aho Corasick automataprotected intthe size of base and check arrayprotected V[]outer value array -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidBuild a AhoCorasickDoubleArrayTrie from a mapprivate intexactMatchSearch(char[] keyChars, int pos, int len, int nodePos) match exactly by a keyintexactMatchSearch(String key) match exactly by a keyprivate intexactMatchSearch(String key, int pos, int len, int nodePos) match exactly by a keySearch first match in stringget(int index) Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameterGet value by a String key, just like a map.get() methodprivate intgetState(int currentState, char character) transmit state, supports failure functionvoidLoadbooleanChecks that string contains at least one substringvoidparseText(char[] text, AhoCorasickDoubleArrayTrie.IHit<V> processor) Parse textvoidparseText(char[] text, AhoCorasickDoubleArrayTrie.IHitFull<V> processor) Parse textparseText(CharSequence text) Parse textvoidparseText(CharSequence text, AhoCorasickDoubleArrayTrie.IHit<V> processor) Parse textvoidparseText(CharSequence text, AhoCorasickDoubleArrayTrie.IHitCancellable<V> processor) Parse textvoidsave(ObjectOutputStream out) SavebooleanUpdate a value corresponding to a keyintsize()Get the size of the keywordsprivate voidstoreEmits(int position, int currentState, List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits) store outputprotected inttransition(int current, char c) transition of a stateprotected inttransitionWithRoot(int nodePos, char c) transition of a state, if the state is root and it failed, then returns the root
-
Field Details
-
check
protected int[] checkcheck array of the Double Array Trie structure -
base
protected int[] basebase array of the Double Array Trie structure -
fail
protected int[] failfail table of the Aho Corasick automata -
output
protected int[][] outputoutput table of the Aho Corasick automata -
v
outer value array -
l
protected int[] lthe length of every key -
size
protected int sizethe size of base and check array
-
-
Constructor Details
-
AhoCorasickDoubleArrayTrie
public AhoCorasickDoubleArrayTrie()
-
-
Method Details
-
parseText
Parse text- Parameters:
text- The text- Returns:
- a list of outputs
-
parseText
Parse text- Parameters:
text- The textprocessor- A processor which handles the output
-
parseText
Parse text- Parameters:
text- The textprocessor- A processor which handles the output
-
parseText
Parse text- Parameters:
text- The textprocessor- A processor which handles the output
-
parseText
Parse text- Parameters:
text- The textprocessor- A processor which handles the output
-
matches
Checks that string contains at least one substring- Parameters:
text- source text to check- Returns:
trueif string contains at least one substring
-
findFirst
Search first match in string- Parameters:
text- source text to check- Returns:
- first match or
nullif there are no matches
-
save
Save- Parameters:
out- An ObjectOutputStream object- Throws:
IOException- Some IOException
-
load
Load- Parameters:
in- An ObjectInputStream object- Throws:
IOExceptionClassNotFoundException
-
get
Get value by a String key, just like a map.get() method- Parameters:
key- The key- Returns:
-
set
Update a value corresponding to a key- Parameters:
key- the keyvalue- the value- Returns:
- successful or not(failure if there is no key)
-
get
Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameter- Parameters:
index- The index- Returns:
- The value
-
getState
private int getState(int currentState, char character) transmit state, supports failure function- Parameters:
currentState-character-- Returns:
-
storeEmits
private void storeEmits(int position, int currentState, List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits) store output- Parameters:
position-currentState-collectedEmits-
-
transition
protected int transition(int current, char c) transition of a state- Parameters:
current-c-- Returns:
-
transitionWithRoot
protected int transitionWithRoot(int nodePos, char c) transition of a state, if the state is root and it failed, then returns the root- Parameters:
nodePos-c-- Returns:
-
build
Build a AhoCorasickDoubleArrayTrie from a map- Parameters:
map- a map containing key-value pairs
-
exactMatchSearch
match exactly by a key- Parameters:
key- the key- Returns:
- the index of the key, you can use it as a perfect hash function
-
exactMatchSearch
match exactly by a key- Parameters:
key-pos-len-nodePos-- Returns:
-
exactMatchSearch
private int exactMatchSearch(char[] keyChars, int pos, int len, int nodePos) match exactly by a key- Parameters:
keyChars- the char array of the keypos- the begin index of char arraylen- the length of the keynodePos- the starting position of the node for searching- Returns:
- the value index of the key, minus indicates null
-
size
public int size()Get the size of the keywords- Returns:
-