Easiest Twitter App Ever (plus NSLinguisticTagger)

My very first idea for an iphone app was something that would allow me to navigate the language used in tweets. I wanted to create an app that pulled twitter data, based on a search phrase, parse the words in that request, pull out represent those words by frequency or importance, and present that visually to the user. Then, the user could pick from the resulting words and start the whole process again.

Well I eventually was able to create that application, it’s called TweetViz and it’s available on the store. It just uses a word frequency count to display the most commonly associated words with the search string. However, creating that app was kind of a pain. I had to learn how to use the MGTwitterEngine, which is a great library, but still a bit difficult, there are a bunch of dependencies, not all the functions work together, because some are json only and others are xml. It usually takes me a good several hours to remember how to use the library and set up the dependencies again.

Well all that’s over with the new twitter integration. It’s a more elegant interface to the twitter api and the accounts framework takes care of authentication (although here we will use the search api, which doesn’t require authentication).

In this post I’ll put together in less than 100 (plus libraries and boilerplate stuff) lines of code, a project equal to the one that took me about a week to complete previously. We’ll query twitter, organize the results by frequency count, eliminate the words that don’t have meaning (am, the, a, etc) and display it all in a treemap. I’ll use the NSLinguisticTagger class to group words that have the same meaning (reading, read, reads) together.

Here’s what it will look like:

First, we’re gonna use the treemap visualization library (tweetviz uses a commercial library, so I can’t post it here). You can get the original here, but I’ve made a few tweaks so in order to follow along download this version.

The first thing we need to do is add the twitter library/framework.

Once you’ve done that open up the ViewController file and add the following method (also declare it in the header):

-(void)searchTwitterWithString:(NSString *)str andNumResults:(int)num {
    NSString *urlString = @"http://search.twitter.com/search.json";
    NSDictionary *param = [NSDictionary dictionaryWithObjectsAndKeys:
                           str, @"q", 
                           [NSString stringWithFormat:@"%d", num], @"rpp", 
                           @"en", @"lang", 
    TWRequest *tr = [[TWRequest alloc] initWithURL:[NSURL URLWithString:urlString] 
    [tr performRequestWithHandler:^(NSData *responseData, NSHTTPURLResponse *urlResponse, NSError *error) {
        NSDictionary *dict = [NSJSONSerialization JSONObjectWithData:responseData options:0 error:&error];
        if (error) {
            NSLog(@"%@", error.localizedDescription);
        } else {
            NSLog(@"%@", dict);

The TWRequest object is created to match the structure of the twitter api. So for each request there are several parts, 1) The url, 2) parameters, 3) The request method.

We just use the url that twitter gives us in the twitter documentation, we don’t have to add the parameter string, the TWRequest will do that for us based on the supplied NSDictionary fed into the parameters argument. In those cases where you have multiple values, like the users/lookup api call, just convert those multiple values into a comma-delimited string.

In the block we use the new NSJSONSerialization object to convert the returned json to an NSDictionary. It’s so much easier . . .

Now call our method in the viewDidLoad method:

[self searchTwitterWithString:@”perry” andNumResults:200];

We’re using perry as our search string, depending on your interests you may think I’m looking for Rick Perry or Katy Perry. The results we get indicate that most people would rather talk about Katy than Rick . . . and I guess I don’t blame them. You should get a log output that looks like this:

Next we want to combine all the statuses into one giant status. We also want to strip out some of the stuff we aren’t going to want to look at. Lets remove all the @mentions, the hashtags, the links, and all the stopwords.

Change the NSLog(@”%@”, dict) to the following:

            NSString *n = @"";
            for (NSDictionary *d in [dict objectForKey:@"results"]) {
                n = [n stringByAppendingFormat:@" %@", [d objectForKey:@"text"]];
            NSDictionary *regexStrings = [NSDictionary dictionaryWithObjectsAndKeys:
                                     @"http.+?[ \n\"]", @"links", 
                                     @"@.+?[ \n]", @"mentions", 
                                     @"#.+?[ \n]", @"hashtags", nil];
            for (NSString *key in [regexStrings allKeys]) {
                NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:[regexStrings objectForKey:key] options:0 error:nil];
                n = [regex stringByReplacingMatchesInString:n options:0 range:NSMakeRange(0, [n length]) withTemplate:@""];
            NSLog(@"%@", n);

Now if you run it, you’ll get one long string with a bunch of the links and other things removed. Like this:

Now it’s time to break out the NSLinguisticTagger class. We’ll be using this to color code our common words, whether they are verbs, nouns, or names of people. We could also use this class to determine whether a phrase is a Named Entity. But we won’t do that here.

Continuing in that same method, add this code (replacing that last NSLog call):

            NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] 
initWithTagSchemes:[NSArray arrayWithObjects:NSLinguisticTagSchemeNameTypeOrLexicalClass, NSLinguisticTagSchemeLemma, nil]
           options:(NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation)];
            [tagger setString:n];
            tags = [NSMutableDictionary dictionary];
            [tagger enumerateTagsInRange:NSMakeRange(0, [n length]) 
                                 options:(NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation) 
                              usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
                NSLog(@"%@, %@", [n substringWithRange:tokenRange], tag);

Sorry about how hard that last block is to read. What we’re doing here is first creating a tagger. Our tagger can handle both Lexical Class tagging (verb, noun, particle, etc) and Lemma (word stemming, reading, read, reads, all the same stem). We pass in an array to tell the tagger what kinds of things it should tag.

We’re also giving it some other options, I only half understand them so I’m not going to try to explain them. Read the Class Reference for more info about this. I only want it to look at the words of the text, so I think those two bitmask (no whitespace, no punctuation tagging) options are what I wanted. It works.

Next, we set the string the tagger will be working with. This is essential, if you get a NSRangeException error, then you may have skipped this part.

We are creating a dictionary that we’ll use in a minute, add an NSMutableDictionary called *tags to the header file now.

Finally we tell it to start tagging. We pass in the string range, the scheme for this tagging pass, and some more options.

Finally we give it a block to deal with the tagging process. Inside the block we are going to get a range in the original string that represents the tagged section. We also get, the sentence range, the tag for that range, and then a pointer to a Boolean. If we want the tagging to stop at some point, we’d set that bool to yes in our block.

If you run it now, you’ll see we got tags:

We want our tagger to do two things, the first is to stem the word, so we can count the token (fancy word that means . . . word), and then to tell us what the lexical class of that token is. We’ll be constructing a dictionary (that’s what that tags dictionary is for) that we’ll then use to populate our treemap. So, now we’ll add some more code that will do that double duty tagging and also reorganize the data so it will be easier to use.

            NSArray *stopwords = [NSArray arrayWithContentsOfFile:[[NSBundle mainBundle] pathForResource:@"stopwords" ofType:@"plist" ]];
            [tagger enumerateTagsInRange:NSMakeRange(0, [n length]) 
                                 options:(NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation) 
                              usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
                                  NSString *tag2 = [tagger tagAtIndex:tokenRange.location scheme:NSLinguisticTagSchemeNameTypeOrLexicalClass tokenRange:NULL sentenceRange:NULL];
                                  if (![stopwords containsObject:tag] && tag != nil) {
                                      if (![[tags allKeys] containsObject:tag]) {
                                          NSMutableDictionary *tagDict = [NSMutableDictionary dictionaryWithObjectsAndKeys:tag2, @"tense", [NSNumber numberWithInt:1], @"count", nil];
                                          [tags setObject:tagDict forKey:tag];
                                      } else {
                                          NSMutableDictionary *tagDict = [tags objectForKey:tag];
                                          [tagDict setObject:[NSNumber numberWithInt:([[tagDict objectForKey:@"count"] intValue] + 1)] forKey:@"count"];
            NSLog(@"%@", tags);

The first thing we’re doing is reading in a plist of ‘stop words.’ These are words like be, have, a, to, like etc. You can download that file here or get it at the end of the post with the finished code. Words that are functional in the English language but when we are doing analysis they don’t help us. So we take them out. This list also includes a few things that are common to twitter text, like RT and "e;. Also not helpful if we’re looking to understand associations between our search phrase and the words people use when talking about that topic.

This stopwords list is from my other project, it contains every tense of each verb. This isn’t necessary with the NSLinguisticTagger class, so the list could be reduced to just the basic forms of those verbs, but it’s good enough for now.

Next we invoke the same tagger method. The difference is when we get into the block. The first thing we do is get our second tag, using the tagAtIndex:scheme:tokenRange:sentenceRange: method. We can call this method and give it an index number (representing the position in our main string) and it will give us back the tag with the specified scheme. In this way we can get our second tag scheme inside the block that’s providing the first one. There may be another way to do this, but this is what I came up with so far.

So now we have tag, which tells us the Lemma or stem, and we have the tag2, which is the Lexical Class.

Next we’re going to construct our dictionary. We need to first check to see if the tag is in our stopwords array, this means it’s a word that is probably common, but not meaningful. If it is, we do nothing with it. If it’s not in the stopwords list we also need to make sure it’s not null, because we’ll be using this value as the key to our dictionary and we cannot use a null key.

The next if statement just checks to see if it’s already been added. If not we create the dictionary that contains the Lexical class and a count of one. If the key does already exist, we increment the count by one.

That’s it. We now have a dictionary that contains a frequency count of all the tagged tokens in our twitter results.

Next we need to integrate this data into our treemap UI. We’ll restructure our data and fit it into the existing sortedKeys and data instance variables so we can leave most of the treemap code untouched.

Add this to the bottom of the searchTwitter method after the close of the [tagger enumerateTagsInRange method call:

            NSMutableDictionary *mut = [NSMutableDictionary dictionary];
            for (NSString *k in tags) {
                if ([[[tags objectForKey:k] objectForKey:@"count"] intValue] > 1 && ![k isEqualToString:str]) {
                    [mut setObject:[[tags objectForKey:k] objectForKey:@"count"] forKey:k];
            data = mut;
            sortedKeys = [[data allKeys] sortedArrayUsingSelector:@selector(caseInsensitiveCompare:)];
            [(TreemapView *)self.view createNodes];

The createNodes method is private on the TreemapView object, add the method to the header so that we can call it. Also, comment out everything in viewDidLoad except for our call to searchTwitter.

In this method we are iterating through our tags object and including all the tags that have more than a count of one and aren’t our search string. We then put the data into the two instance variables. This just makes it easier, so we aren’t changing a bunch of code we don’t need to. In a proper implementation I’d probably go through and change all the treemap methods to deal my own data structure.

Then we call createNodes on the treemap view. We call this instead of reloadData, because this treemap class calls resizeNodes on the reloadData call. If the new nodes and the old nodes don’t have the same number, it will throw an error.

Go ahead and run it now. You should have a treemap populated with your counts and tags from twitter:

Finally, lets change the updateCell method to change the cells color based on what kind of tag it is (noun, verb, etc) and lets change the tappedcell method so that it calls the searchTwitter method. These are delegate methods of the Treemap class and already exist in the file.

First the updateCell method:

- (void)updateCell:(TreemapViewCell *)cell forIndex:(NSInteger)index {
	NSString *key = [self.sortedKeys objectAtIndex:index];
	NSNumber *val = [self.data valueForKey:key];
	cell.textLabel.text = key;
	cell.valueLabel.text = [val stringValue];
    NSString *lexiClass = [[tags objectForKey:key] objectForKey:@"tense"];
    UIColor *cellBack;
    if ([lexiClass isEqualToString:@"Noun"]) {
        cellBack = [UIColor colorWithRed:0.5 green:0.0 blue:(float)index/(self.sortedKeys.count / 2) alpha:1.0];
    } else if ([lexiClass isEqualToString:@"Verb"]) {
        cellBack = [UIColor colorWithRed:0.5 green:0.5 blue:(float)index/(self.sortedKeys.count / .5) alpha:1.0];
    } else if ([lexiClass isEqualToString:@"PersonalName"]) {
        cellBack = [UIColor colorWithRed:(float)index/(self.sortedKeys.count / 2) green:0.5 blue:0.5 alpha:1.0];
    } else {
        cellBack = [UIColor colorWithRed:0.3 green:(float)index/(self.sortedKeys.count / .5) blue:0.3 alpha:1.0];
    cell.backgroundColor = cellBack;

First we get the “tense” for each tag from our tags dictionary, then we use that to create a different color algorithm for each kind. Finally, we set the cell background color to our calculation.

Now the treemapView:tapped: method:

#pragma mark -
#pragma mark TreemapView delegate

- (void)treemapView:(TreemapView *)treemapView tapped:(NSInteger)index {
    TreemapViewCell *cell = (TreemapViewCell *)[self.view.subviews objectAtIndex:index];
    NSString *searchS = cell.textLabel.text;
    [self searchTwitterWithString:searchS andNumResults:200];
	[(TreemapView *)self.view reloadData];

We use the index to get the cell. From the cell we can get the tag name for that cell. We pass that into our searchTwitter method and we’ll reload a new data set and display it. It takes a little while to reload the data, so it’s a little clunky, but works for a demo!

This is what it should look like:

You can download the completed code here. If you’re interested in a deeper treatment of either the NSLinguisticTagger or Twitter integration, both are covered in iOS 5 By Tutorials.

Leave a Reply