Concept Tagger Noun Web Service

The Concept Tagger Noun Web service uses UMBEL reference concepts to tag an input text or a Web document. The OBIE (Ontology-Based Information Extraction) method is used, driven by the UMBEL reference concept ontology. By noun we mean that the tagging only occurs with the words (tokens) that are considered singular or plurial nouns in the sentence(s) of the input text. The nouns are matched to either the preferred labels or alternative labels of the reference concepts, with the match basis denoted by color. The simple tagger is merely making nouns string matches to the possible UMBEL reference concepts.

You have access to a user interface to experiment with that tagger.

This tagger uses the plain labels of the reference concepts as matches against the nouns of the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text except if you specified the usage of the stemmer. Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Usage

This Web service is is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the initial input to more sophisticated reference concept matching and disambiguation methods. No permissions are required to query this Web service endpoint. To access this endpoint, you simply have to send a HTTP GET query to the proper endpoint URL per the specifications below.
 
HTTP Method:
  • POST
Possible Accept: HTTP header field value:
  • */*: return tags in JSON
  • application/*: return tags in JSON
  • application/JSON: return tags in JSON
  • application/clojure: return tags in Clojure code
  • application/edn: return tags in Extensible Data Notation (EDN)
URLs:
  • http://umbel.org/ws/tag/concept/noun
  • http://umbel.org/ws/tag/concept/noun/stemming

If the stemming parameter is used, then stemming will be applied during the tagging process.

Body:
  • The body of the POST query can be the text you want to be processed
  • The body of the POST query can be a URL to an accessible Web documents that will be downloaded by the web service and then tagged.
    • Supported Web documents are:
      • HTML
      • XML and derivate formats
      • Microsoft Office documents
      • OpenDocument Format (ODF)
      • PDF
      • EPub
      • plain text
      • RSS feeds
      • Atom feeds
      • image metadata
      • Video metadata

Serializations

The Concept Tagger Noun endpoint currently supports three serialization formats:
  • JSON
  • Clojure code
  • Extensible Data Notation (EDN)

JSON Serialization
The JSON serialization has the following structure:
  • pref-labels object: this is the list of all concepts that have their preferred label matched with a word in the input text
    • concept label: this is the preferred label of the concept that matched
      • concepts: an array with the URIs for all of the UMBEL reference concepts that share the same preferred label
      • indices: an array of arrays that refer to the start and end of the words that got matched by the tagger in the normalized-text
  • alt-labels object: this is the list of all the concepts that have an alternative label matched with a word in the input text
    • concept label: this is the alternative label of the concept that matched
      • concepts: an array with the URIs for all of the UMBEL reference concepts that share the same alternative label
      • indices: an array of arrays that refer to the start and end of the words that got matched by the tagger in the normalized-text
  • normalized-text: This is the normalized text that was used as input to the tagging process. Words have been converted to lowercase and punctuation has been removed.

Here is an example of a JSON resultset:


  {
      "pref-labels": {
          "plain": {
              "concepts": [
                  "http://umbel.org/umbel/rc/Plain_Topographical"
              ],
              "indices": [
                  [
                      21,
                      25
                  ]
              ]
          },
          "text": {
              "concepts": [
                  "http://umbel.org/umbel/rc/PropositionalCWText",
                  "http://umbel.org/umbel/rc/TextString"
              ],
              "indices": [
                  [
                      89,
                      92
                  ]
              ]
          }
      },
      "alt-labels": {
          "labels": {
              "concepts": [
                  "http://umbel.org/umbel/rc/Label_IBO"
              ],
              "indices": [
                  [
                      27,
                      32
                  ]
              ]
          },
          "reference": {
              "concepts": [
                  "http://umbel.org/umbel/rc/ReferenceWork"
              ],
              "indices": [
                  [
                      41,
                      49
                  ]
              ]
          },
          "concepts": {
              "concepts": [
                  "http://umbel.org/umbel/rc/Concept"
              ],
              "indices": [
                  [
                      51,
                      58
                  ]
              ]
          },
          "matches": {
              "concepts": [
                  "http://umbel.org/umbel/rc/Match_FireStarter",
                  "http://umbel.org/umbel/rc/MatchSportsEvent"
              ],
              "indices": [
                  [
                      63,
                      69
                  ]
              ]
          },
          "text": {
              "concepts": [
                  "http://umbel.org/umbel/rc/TextMarking",
                  "http://umbel.org/umbel/rc/TextualMaterial",
                  "http://umbel.org/umbel/rc/Textbook"
              ],
              "indices": [
                  [
                      89,
                      92
                  ]
              ]
          }
      },
      "normalized-text": "this tagger uses the plain labels of the reference concepts as matches against the input text "
  }          
          

EDN Serialization
The EDN serialization has the following structure:
  • pref-labels key: this is the list of all the concepts that have their preferred label matched with a word in the input text
    • concept label: this is the preferred label of the concept that matched
      • concepts key: a vector with the URIs for all of the UMBEL reference concepts that share the same preferred label
      • indices key: a vector of vectors that refer to the start and end of the words that were matched by the tagger in the normalized-text
  • alt-labels key: this is the list of all of the concepts that have their alternative label matched with a word in the input text
    • concept label: this is the alternative label of the concept that matched
      • concepts key: a vector with the URIs of all of the UMBEL reference concepts that share the same alternative label
      • indices key: a vector of vectors that refer to the start and end of the words that were matched by the tagger in the normalized-text
  • alt-labels: This is the normalized text that was used as input to the tagging process. Words have been converted to lowercase and punctuation has been removed.

This serialization should be used by EDN compliant parsers. This is the serialization that should be used by ClojureScript applications.

Here is an example of a EDN resultset:


  {:alt-labels
   {"concepts"
    {:concepts ["http://umbel.org/umbel/rc/Concept"],
     :indices [[51 58]]},
    "labels"
    {:concepts ["http://umbel.org/umbel/rc/Label_IBO"],
     :indices [[27 32]]},
    "text"
    {:concepts
     ["http://umbel.org/umbel/rc/TextMarking"
      "http://umbel.org/umbel/rc/TextualMaterial"
      "http://umbel.org/umbel/rc/Textbook"],
     :indices [[89 92]]},
    "reference"
    {:concepts ["http://umbel.org/umbel/rc/ReferenceWork"],
     :indices [[41 49]]},
    "matches"
    {:concepts
     ["http://umbel.org/umbel/rc/Match_FireStarter"
      "http://umbel.org/umbel/rc/MatchSportsEvent"],
     :indices [[63 69]]}},
   :pref-labels
   {"text"
    {:concepts
     ["http://umbel.org/umbel/rc/PropositionalCWText"
      "http://umbel.org/umbel/rc/TextString"],
     :indices [[89 92]]},
    "plain"
    {:concepts ["http://umbel.org/umbel/rc/Plain_Topographical"],
     :indices [[21 25]]}},
   :normalized-text
   "this tagger uses the plain labels of the reference concepts as matches against the input text "}          
          

Clojure Code Serialization
The Clojure code serialization is the same as the EDN serialization except that all of the types are specified. This serialization should be used by Clojure compliant parsers. This serialization should not be used by ClojureScript applications.

Here is an example of a Clojure code resultset:


          
  #=(clojure.lang.PersistentArrayMap/create {:alt-labels #=(clojure.lang.PersistentArrayMap/create {"concepts" #=(clojure.lang.PersistentArrayMap/create {:concepts ["http://umbel.org/umbel/rc/Concept"], :indices [[51 58]]}), "labels" #=(clojure.lang.PersistentArrayMap/create {:concepts ["http://umbel.org/umbel/rc/Label_IBO"], :indices [[27 32]]}), "text" #=(clojure.lang.PersistentArrayMap/create {:concepts ["http://umbel.org/umbel/rc/TextMarking" "http://umbel.org/umbel/rc/TextualMaterial" "http://umbel.org/umbel/rc/Textbook"], :indices [[89 92]]}), "reference" #=(clojure.lang.PersistentArrayMap/create {:concepts ["http://umbel.org/umbel/rc/ReferenceWork"], :indices [[41 49]]}), "matches" #=(clojure.lang.PersistentArrayMap/create {:concepts ["http://umbel.org/umbel/rc/Match_FireStarter" "http://umbel.org/umbel/rc/MatchSportsEvent"], :indices [[63 69]]})}), :pref-labels #=(clojure.lang.PersistentArrayMap/create {"text" #=(clojure.lang.PersistentArrayMap/create {:concepts ["http://umbel.org/umbel/rc/PropositionalCWText" "http://umbel.org/umbel/rc/TextString"], :indices [[89 92]]}), "plain" #=(clojure.lang.PersistentArrayMap/create {:concepts ["http://umbel.org/umbel/rc/Plain_Topographical"], :indices [[21 25]]})}), :normalized-text "this tagger uses the plain labels of the reference concepts as matches against the input text "})
          

Examples

Here are a series of examples that use cURL for querying the UMBEL Concept Tagger Plain web service endpoint.
 
Get the tags for the sentence This tagger uses the plain labels of the reference concepts as matches against the input text. in JSON:
  curl -H "Accept: application/json" "http://umbel.org/ws/tag/concept/noun" -d "This tagger uses the plain labels of the reference concepts as matches against the input text."
 
Get the tags for the sentence This tagger uses the plain labels of the reference concepts as matches against the input text. in EDN:
  curl -H "Accept: application/edn" "http://umbel.org/ws/tag/concept/noun" -d "This tagger uses the plain labels of the reference concepts as matches against the input text."
 
Get the tags for the sentence This tagger uses the plain labels of the reference concepts as matches against the input text. in Clojure code:
  curl -H "Accept: application/clojure" "http://umbel.org/ws/tag/concept/noun" -d "This tagger uses the plain labels of the reference concepts as matches against the input text."
 
Get the tags for the sentence This tagger uses the plain labels of the reference concepts as matches against the input text. in Clojure code and using stemming:
  curl -H "Accept: application/clojure" "http://umbel.org/ws/tag/concept/noun/stemming" -d "This tagger uses the plain labels of the reference concepts as matches against the input text."
 
Get the tags for the Web document http://umbel.org in JSON:
  curl -H "Accept: application/json" "http://umbel.org/ws/tag/concept/noun" -d "http://umbel.org"
 

Errors

Here is the list of possible HTTP errors that can be returned by this endpoint
 
HTTP Error Message Description
406
Unsuppoted mime requested The Accept: HTTP headers doesn't contain any supported mime types. If this happens, correct to make sure that you are requesting a supported mime type.
500
empty An internal UMBEL Web service error occurred. If this happens, please contact us with the query that caused the error.
 

Copyright © 2008-2014. Structured Dynamics LLC. All content available via Creative Commons Attribution 3.0