PHP adapter for use with Stanford CoreNLP
- Connect to Stanford University CoreNLP API online
- Connect to Stanford CoreNLP 3.7.0 server
- Annotators available: tokenize,ssplit,pos, parse, depparse, ner, regexner,lemma, mention, natlog, coref, openie, kbp
- The package creates Part-Of-Speech Trees with depth, parent- and child ID
- PHP 5.5 or higher: it also works on PHP 7
- Windows or Linux 64-bit, 8Gb memory or more recommended
- Either Guzzle HTTP Client (installed by default) or only cURL.
- Composer for PHP
https://getcomposer.org/
PHP7 Type hinting removed, because it was causing issues for some users.
Fixed issue with PHP 7.1 upwards
- Install Stanford CoreNLP Server. See the installation walkthrough below.
- Download and unpack the files from this package.
- Copy the files to your to your webserver directory. Usually "htdocs" or "var/www".
- Run a Composer update
- Insert the following line into the "require" of your "composer.json" file.
{
"require": {
"dennis-de-swart/php-stanford-corenlp-adapter": "*"
}
}
- Run a composer update
The adapter by default uses Stanford's online API service. This should work right after the composer update. Note that the online API is a public service. If you want to analyze large volumes of text or sensitive data, please install the Java server version.
OpenIE creates "subject-relation-object" tuples. This is similar (but not the same) as the "Subject-Verb-Object" concept of the English language.
Notes:
- OpenIE is only available on the Java offline version, not with the "online" mode. See the installation walkthrough below
- OpenIE data is not always available. Sometimes the result array might show empty, this is not an error.
http://nlp.stanford.edu/software/openie.html
https://en.wikipedia.org/wiki/Subject-verb-object
https://java.com/en/download/help/index_installing.xml?os=All+Platforms&j=8&n=20
http://stanfordnlp.github.io/CoreNLP/index.html#download
Default port for the Java server is port 9000. If port 9000 is not available you can change the port in the "bootstrap.php" file. Example:
define('CURLURL' , 'http://localhost:9000/');
Go to the download directory, then enter the following command:
java -mx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000
Important note: the Stanford manual says "-mx4g", however I found that this can lead to a Java OutOfMemory error. It is also important to use a 64-bit operating system with at enough memory (8Gb or more recommended)
http://localhost:9000/
When you surf to this URL, you should see the CoreNLP GUI. If you have problems with installation you can check the manual:
http://stanfordnlp.github.io/CoreNLP/corenlp-server.html
In "bootstrap.php" set define('ONLINE_API' , FALSE). This tells the Adapter to use the Java version
$coreNLP = new CorenlpAdapter();
$text = 'The Golden Gate Bridge was designed by Joseph Strauss.';
$coreNLP->getOutput($text);
Note that the first time that you process a text, the server takes about 20 to 30 seconds extra to load definitions. All other calls to the server after that will be much faster. Small texts are usually processed within seconds.
If successful the following properties will be available:
$coreNLP->serverMemory; //contains all of the server output
$coreNLP->trees; //contains processed flat trees. Each part of the tree is assigned an ID key
$coreNLP->getWordValues($coreNLP->trees[1]) // get just the words from a tree
Array
(
[1] => Array
(
[parent] =>
[pennTreebankTag] => ROOT
[depth] => 0
)
[2] => Array
(
[parent] => 1
[pennTreebankTag] => S
[depth] => 2
)
[3] => Array
(
[parent] => 2
[pennTreebankTag] => NP
[depth] => 4
)
[4] => Array
(
[parent] => 3
[pennTreebankTag] => PRP
[depth] => 6
[word] => I
[index] => 1
[originalText] => I
[lemma] => I
[characterOffsetBegin] => 0
[characterOffsetEnd] => 1
[pos] => PRP
[ner] => O
[before] =>
[after] =>
[openIE] => Array
(
[0] => subject
[1] => subject
[2] => subject
)
)
[5] => Array
(
[parent] => 2
[pennTreebankTag] => VP
[depth] => 4
)
[6] => Array
(
[parent] => 5
[pennTreebankTag] => MD
[depth] => 6
[word] => will
[index] => 2
[originalText] => will
[lemma] => will
[characterOffsetBegin] => 2
[characterOffsetEnd] => 6
[pos] => MD
[ner] => O
[before] =>
[after] =>
[openIE] => Array
(
[0] => subject
[1] => subject
[2] => relation
)
)
[7] => Array
(
[parent] => 5
[pennTreebankTag] => VP
[depth] => 6
)
[8] => Array
(
[parent] => 7
[pennTreebankTag] => VB
[depth] => 8
[word] => meet
[index] => 3
[originalText] => meet
[lemma] => meet
[characterOffsetBegin] => 7
[characterOffsetEnd] => 11
[pos] => VB
[ner] => O
[before] =>
[after] =>
[openIE] => Array
(
[0] => subject
[1] => subject
[2] => relation
)
)
[9] => Array
(
[parent] => 7
[pennTreebankTag] => NP
[depth] => 8
)
[10] => Array
(
[parent] => 9
[pennTreebankTag] => NP
[depth] => 10
)
[11] => Array
(
[parent] => 10
[pennTreebankTag] => NNP
[depth] => 12
[word] => Mary
[index] => 4
[originalText] => Mary
[lemma] => Mary
[characterOffsetBegin] => 12
[characterOffsetEnd] => 16
[pos] => NNP
[ner] => PERSON
[before] =>
[after] =>
[openIE] => Array
(
[1] => subject
[2] => object
[3] => subject
[0] => subject
)
)
[12] => Array
(
[parent] => 9
[pennTreebankTag] => PP
[depth] => 10
)
[13] => Array
(
[parent] => 12
[pennTreebankTag] => IN
[depth] => 12
[word] => in
[index] => 5
[originalText] => in
[lemma] => in
[characterOffsetBegin] => 17
[characterOffsetEnd] => 19
[pos] => IN
[ner] => O
[before] =>
[after] =>
[openIE] => Array
(
[1] => relation
[3] => relation
[0] => relation
)
)
[14] => Array
(
[parent] => 12
[pennTreebankTag] => NP
[depth] => 12
)
[15] => Array
(
[parent] => 14
[pennTreebankTag] => NNP
[depth] => 14
[word] => New
[index] => 6
[originalText] => New
[lemma] => New
[characterOffsetBegin] => 20
[characterOffsetEnd] => 23
[pos] => NNP
[ner] => LOCATION
[before] =>
[after] =>
[openIE] => Array
(
[1] => relation
[3] => object
[0] => object
)
)
[16] => Array
(
[parent] => 14
[pennTreebankTag] => NNP
[depth] => 14
[word] => York
[index] => 7
[originalText] => York
[lemma] => York
[characterOffsetBegin] => 24
[characterOffsetEnd] => 28
[pos] => NNP
[ner] => LOCATION
[before] =>
[after] =>
[openIE] => Array
(
[1] => object
[3] => object
)
)
[17] => Array
(
[parent] => 7
[pennTreebankTag] => PP
[depth] => 8
)
[18] => Array
(
[parent] => 17
[pennTreebankTag] => IN
[depth] => 10
[word] => at
[index] => 8
[originalText] => at
[lemma] => at
[characterOffsetBegin] => 29
[characterOffsetEnd] => 31
[pos] => IN
[ner] => O
[before] =>
[after] =>
[openIE] => Array
(
[1] => object
)
)
[19] => Array
(
[parent] => 17
[pennTreebankTag] => NP
[depth] => 10
)
[20] => Array
(
[parent] => 19
[pennTreebankTag] => CD
[depth] => 12
[word] => 10pm
[index] => 9
[originalText] => 10pm
[lemma] => 10pm
[characterOffsetBegin] => 32
[characterOffsetEnd] => 36
[pos] => CD
[ner] => TIME
[normalizedNER] => T22:00
[before] =>
[after] =>
[timex] => Array
(
[tid] => t1
[type] => TIME
[value] => T22:00
)
[openIE] => Array
(
[0] => object
[1] => object
)
)
)
Array
(
[0] => Array
(
[sentences] => Array
(
[0] => Array
(
[index] => 0
[parse] => (ROOT
(S
(NP (PRP I))
(VP (MD will)
(VP (VB meet)
(NP
(NP (NNP Mary))
(PP (IN in)
(NP (NNP New) (NNP York))))
(PP (IN at)
(NP (CD 10pm)))))))
[basic-dependencies] => Array
(
[0] => Array
(
[dep] => ROOT
[governor] => 0
[governorGloss] => ROOT
[dependent] => 3
[dependentGloss] => meet
)
[1] => Array
(
[dep] => nsubj
[governor] => 3
[governorGloss] => meet
[dependent] => 1
[dependentGloss] => I
)
[2] => Array
(
[dep] => aux
[governor] => 3
[governorGloss] => meet
[dependent] => 2
[dependentGloss] => will
)
[3] => Array
(
[dep] => dobj
[governor] => 3
[governorGloss] => meet
[dependent] => 4
[dependentGloss] => Mary
)
[4] => Array
(
[dep] => case
[governor] => 7
[governorGloss] => York
[dependent] => 5
[dependentGloss] => in
)
[5] => Array
(
[dep] => compound
[governor] => 7
[governorGloss] => York
[dependent] => 6
[dependentGloss] => New
)
[6] => Array
(
[dep] => nmod
[governor] => 4
[governorGloss] => Mary
[dependent] => 7
[dependentGloss] => York
)
[7] => Array
(
[dep] => case
[governor] => 9
[governorGloss] => 10pm
[dependent] => 8
[dependentGloss] => at
)
[8] => Array
(
[dep] => nmod
[governor] => 3
[governorGloss] => meet
[dependent] => 9
[dependentGloss] => 10pm
)
)
[collapsed-dependencies] => Array
(
[0] => Array
(
[dep] => ROOT
[governor] => 0
[governorGloss] => ROOT
[dependent] => 3
[dependentGloss] => meet
)
[1] => Array
(
[dep] => nsubj
[governor] => 3
[governorGloss] => meet
[dependent] => 1
[dependentGloss] => I
)
[2] => Array
(
[dep] => aux
[governor] => 3
[governorGloss] => meet
[dependent] => 2
[dependentGloss] => will
)
[3] => Array
(
[dep] => dobj
[governor] => 3
[governorGloss] => meet
[dependent] => 4
[dependentGloss] => Mary
)
[4] => Array
(
[dep] => case
[governor] => 7
[governorGloss] => York
[dependent] => 5
[dependentGloss] => in
)
[5] => Array
(
[dep] => compound
[governor] => 7
[governorGloss] => York
[dependent] => 6
[dependentGloss] => New
)
[6] => Array
(
[dep] => nmod:in
[governor] => 4
[governorGloss] => Mary
[dependent] => 7
[dependentGloss] => York
)
[7] => Array
(
[dep] => case
[governor] => 9
[governorGloss] => 10pm
[dependent] => 8
[dependentGloss] => at
)
[8] => Array
(
[dep] => nmod:at
[governor] => 3
[governorGloss] => meet
[dependent] => 9
[dependentGloss] => 10pm
)
)
[collapsed-ccprocessed-dependencies] => Array
(
[0] => Array
(
[dep] => ROOT
[governor] => 0
[governorGloss] => ROOT
[dependent] => 3
[dependentGloss] => meet
)
[1] => Array
(
[dep] => nsubj
[governor] => 3
[governorGloss] => meet
[dependent] => 1
[dependentGloss] => I
)
[2] => Array
(
[dep] => aux
[governor] => 3
[governorGloss] => meet
[dependent] => 2
[dependentGloss] => will
)
[3] => Array
(
[dep] => dobj
[governor] => 3
[governorGloss] => meet
[dependent] => 4
[dependentGloss] => Mary
)
[4] => Array
(
[dep] => case
[governor] => 7
[governorGloss] => York
[dependent] => 5
[dependentGloss] => in
)
[5] => Array
(
[dep] => compound
[governor] => 7
[governorGloss] => York
[dependent] => 6
[dependentGloss] => New
)
[6] => Array
(
[dep] => nmod:in
[governor] => 4
[governorGloss] => Mary
[dependent] => 7
[dependentGloss] => York
)
[7] => Array
(
[dep] => case
[governor] => 9
[governorGloss] => 10pm
[dependent] => 8
[dependentGloss] => at
)
[8] => Array
(
[dep] => nmod:at
[governor] => 3
[governorGloss] => meet
[dependent] => 9
[dependentGloss] => 10pm
)
)
[openie] => Array
(
[0] => Array
(
[subject] => I
[subjectSpan] => Array
(
[0] => 0
[1] => 1
)
[relation] => will meet Mary at
[relationSpan] => Array
(
[0] => 1
[1] => 3
)
[object] => 10pm
[objectSpan] => Array
(
[0] => 8
[1] => 9
)
)
[1] => Array
(
[subject] => I
[subjectSpan] => Array
(
[0] => 0
[1] => 1
)
[relation] => will meet
[relationSpan] => Array
(
[0] => 1
[1] => 3
)
[object] => Mary in New York
[objectSpan] => Array
(
[0] => 3
[1] => 7
)
)
[2] => Array
(
[subject] => I
[subjectSpan] => Array
(
[0] => 0
[1] => 1
)
[relation] => will meet
[relationSpan] => Array
(
[0] => 1
[1] => 3
)
[object] => Mary
[objectSpan] => Array
(
[0] => 3
[1] => 4
)
)
[3] => Array
(
[subject] => Mary
[subjectSpan] => Array
(
[0] => 3
[1] => 4
)
[relation] => is in
[relationSpan] => Array
(
[0] => 4
[1] => 5
)
[object] => New York
[objectSpan] => Array
(
[0] => 5
[1] => 7
)
)
)
[tokens] => Array
(
[0] => Array
(
[index] => 1
[word] => I
[originalText] => I
[lemma] => I
[characterOffsetBegin] => 0
[characterOffsetEnd] => 1
[pos] => PRP
[ner] => O
[before] =>
[after] =>
)
[1] => Array
(
[index] => 2
[word] => will
[originalText] => will
[lemma] => will
[characterOffsetBegin] => 2
[characterOffsetEnd] => 6
[pos] => MD
[ner] => O
[before] =>
[after] =>
)
[2] => Array
(
[index] => 3
[word] => meet
[originalText] => meet
[lemma] => meet
[characterOffsetBegin] => 7
[characterOffsetEnd] => 11
[pos] => VB
[ner] => O
[before] =>
[after] =>
)
[3] => Array
(
[index] => 4
[word] => Mary
[originalText] => Mary
[lemma] => Mary
[characterOffsetBegin] => 12
[characterOffsetEnd] => 16
[pos] => NNP
[ner] => PERSON
[before] =>
[after] =>
)
[4] => Array
(
[index] => 5
[word] => in
[originalText] => in
[lemma] => in
[characterOffsetBegin] => 17
[characterOffsetEnd] => 19
[pos] => IN
[ner] => O
[before] =>
[after] =>
)
[5] => Array
(
[index] => 6
[word] => New
[originalText] => New
[lemma] => New
[characterOffsetBegin] => 20
[characterOffsetEnd] => 23
[pos] => NNP
[ner] => LOCATION
[before] =>
[after] =>
)
[6] => Array
(
[index] => 7
[word] => York
[originalText] => York
[lemma] => York
[characterOffsetBegin] => 24
[characterOffsetEnd] => 28
[pos] => NNP
[ner] => LOCATION
[before] =>
[after] =>
)
[7] => Array
(
[index] => 8
[word] => at
[originalText] => at
[lemma] => at
[characterOffsetBegin] => 29
[characterOffsetEnd] => 31
[pos] => IN
[ner] => O
[before] =>
[after] =>
)
[8] => Array
(
[index] => 9
[word] => 10pm
[originalText] => 10pm
[lemma] => 10pm
[characterOffsetBegin] => 32
[characterOffsetEnd] => 36
[pos] => CD
[ner] => TIME
[normalizedNER] => T22:00
[before] =>
[after] =>
[timex] => Array
(
[tid] => t1
[type] => TIME
[value] => T22:00
)
)
)
)
)
)
Please let me know.
Some functions are forked from this "Stanford parser" package:
https://github.com/agentile/PHP-Stanford-NLP