Magic of NSLinguistic​Tagger

This evening one of my friends sent me the link of iA Writer’s video featuring their latest update to the app.

I was blown away by the syntax highlighter in edit mode, where it highlights words based on selection of parts of speech.

At first, I thought they are doing some sophisticated natural language processing, I’m not expert but supporting multiple languages sounds like a tedious/impossible job to do in an year, unless there is easy way.

Apple provides us NSLinguistic​Tagger class as part of the iOS 5 SDK, it allows us to split up natural language text and tag it with information. As per the documentation it can identify languages, scripts and stem forms of words! NSLinguistic​Tagger is quite easy to use.

We can start by creating the instance of NSLinguisticTagger with tag schemes for a language and bunch of NSLinguisticTaggerOptions, options are used to tell tagger to ignore words, white spaces, punctuations etc.

var options = NSLinguisticTaggerOptions.JoinNames 
              | NSLinguisticTaggerOptions.OmitWhitespace;

var tagger = new NSLinguisticTagger (
    NSLinguisticTagger.GetAvailableTagSchemesForLanguage ("en"), 
    options);

Once we create the instance, we will assign it a string to analyse

tagger.AnalysisString = "A quick movement of the enemy will jeopardise six gunboats";

And finally, fetch the tags with EnumerateTagsInRange method and print them out.

tagger.EnumerateTagsInRange (new NSRange (0, statement.Length), NSLinguisticTag.SchemeLexicalClass, options, TaggerEnumerator);

...

void TaggerEnumerator (NSString tag, NSRange tokenRange, NSRange sentenceRange, ref bool stop)
{
    var word = statement.Substring (tokenRange.Location,
      tokenRange.Length);
    words.Add (new Tuple<string, string> (tag, word));
}

The result after processing A quick movement of the enemy will jeopardise six gunboats looks something like this:

 Adjective: quick
 Noun: movement, enemy, gunboats
 Preposition: of
 Verb: will, jeopardise
 Number: six

it even identifies the number, that’s pretty amazing!

PS: You can find the complete code here – https://gist.github.com/prashantvc/8039121

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s