Magic of NSLinguistic​Tagger

This evening one of my friends sent me the link of iA Writer’s video featuring their latest update to the app.

I was blown away by the syntax highlighter in edit mode, where it highlights words based on selection of parts of speech.

At first, I thought they are doing some sophisticated natural language processing, I’m not expert but supporting multiple languages sounds like a tedious/impossible job to do in an year, unless there is easy way.

Apple provides us NSLinguistic​Tagger class as part of the iOS 5 SDK, it allows us to split up natural language text and tag it with information. As per the documentation it can identify languages, scripts and stem forms of words! NSLinguistic​Tagger is quite easy to use.

We can start by creating the instance of NSLinguisticTagger with tag schemes for a language and bunch of NSLinguisticTaggerOptions, options are used to tell tagger to ignore words, white spaces, punctuations etc.

var options = NSLinguisticTaggerOptions.JoinNames 
              | NSLinguisticTaggerOptions.OmitWhitespace;

var tagger = new NSLinguisticTagger (
    NSLinguisticTagger.GetAvailableTagSchemesForLanguage ("en"), 

Once we create the instance, we will assign it a string to analyse

tagger.AnalysisString = "A quick movement of the enemy will jeopardise six gunboats";

And finally, fetch the tags with EnumerateTagsInRange method and print them out.

tagger.EnumerateTagsInRange (new NSRange (0, statement.Length), NSLinguisticTag.SchemeLexicalClass, options, TaggerEnumerator);


void TaggerEnumerator (NSString tag, NSRange tokenRange, NSRange sentenceRange, ref bool stop)
    var word = statement.Substring (tokenRange.Location,
    words.Add (new Tuple<string, string> (tag, word));

The result after processing A quick movement of the enemy will jeopardise six gunboats looks something like this:

 Adjective: quick
 Noun: movement, enemy, gunboats
 Preposition: of
 Verb: will, jeopardise
 Number: six

it even identifies the number, that’s pretty amazing!

PS: You can find the complete code here –