Twitter seeks to do better at inferring its users’ consumer and political preferences, gender, age, and more.
By David Talbot.
Twitter began selling promoted tweets in 2010, but it has always faced challenges in knowing which of those ads should be delivered to which Twitter accounts. Most Twitter users don’t give up their locations, and many don’t reveal their identities in their profiles. And mining tweets themselves for insights is hard because the language is not only short but filled with slang and abbreviations.
Now, as Twitter plans to sell shares to the public, its success will depend in part on how much better it can get at deciphering tweets. Solving that technological puzzle would help Twitter get better at selling the right promoted messages at the right times, and it could possibly lead to new revenue-producing services.
Twitter hasn’t done badly so far; the analyst firm eMarketer predicts ad revenue will double this year, to $583 million. But the company is still trying to get smarter about analyzing tweets. It has bought startups such as Bluefin Labs, which can tell which TV show—and even which precise airing of a TV advertisement—people have tweeted about (see “A Social-Media Decoder”). It has also invested in companies such as Trendly, a Web analytics provider that reveals how promoted tweets are being read and shared. And just last week, Twitter blogged that it is continually running experiments on how to do better at tasks such as suggesting relevant content.
For its next steps, Twitter might consider tapping the latest academic research. Here are some areas it could concentrate on.
Location
Fewer than 1 percent of tweets are “geotagged,” or voluntarily labeled by users with location coördinates. Much of the time, Twitter can use your computer’s IP address and get a good approximation. But that’s not the same as knowing where you are. In mobile computing, IP addresses are reassigned frequently—and some people take steps to obscure their true IP address.
But recent research has shown that the locations of friends—defined as people you follow on Twitter who are also following you—can be used to infer your location to within 10 kilometers half the time. It turns out that many Twitter friends live near one another, says David Jurgens, a computer scientist at Sapienza University of Rome, who did this research while at HRL Laboratories in Malibu, California. If some of your friends have made geotagged tweets or revealed their location in a Twitter profile, Jurgens says, that may be enough to show where you probably are.
Demographics
Natural-language processing gets better all the time. Hundreds of markers—word choices, abbreviations, slang terms, and letter and punctuation combinations—signify ever-finer strata of demographic groups and their interests.
Some things, like political leanings, are often not hard to figure out from the right hashtags or from sentiments associated with terms like “Obamacare,” says Dan Weld, a computer scientist at the University of Washington.
Meanwhile, Derek Ruths, a computer scientist who explores natural-language processing at McGill University, has recently shown that linguistic cues can identify U.S. Twitter users’ political orientation with 70 to 90 percent accuracy and can even identify their age (within five years) with 80 percent accuracy. For example, words that most strongly suggest someone is between the ages of 25 and 30 include “for,” “on”, “photo,” “I’m,” and “just,” he says. Generally, these users have a somewhat stronger allegiance to grammar than younger, slang-loving users, he says. And as with location, the profiles of the people they follow provide clues to their demographics.
But even if Twitter can make pretty good guesses about 90 percent of its users, “even missing 10 percent means you miss a lot of people,” says Ruths. “If I were Twitter, I’d want to close that 10 percent gap. And you’d want to find out real details like who someone’s mother is. If it’s Mom’s birthday, you want to tell those people how to order flowers. Twitter can’t do that—yet.”
Making Sense of Breaking News
One of the major uses of Twitter is to report on breaking news events (see “Can Twitter Make Money?”). With so many people tweeting little nuggets of news and other current information, tools have even been built to tease out play-by-play sports action (see “Researchers Turn Twitter into Real-Time Sports Commentator”).
But in major emergencies—like a terrorist attack or earthquake—so many tweets are generated that making sense of them in real time is tricky. Twitter might highlight the most meaningful ones, to cement itself as a must-visit service, but how?
A group at the University of Colorado, Boulder, is using natural-language processing to highlight the most relevant tweets in a disaster. Recent research shows significant progress in differentiating tweets about personal reflections, emotional expressions, or prayers from ones containing hard information about where a fire is burning or whether medical supplies are needed.
In one project, the group was able to identify valuable, news-containing tweets with 80 percent accuracy; these tend to contain language that is formal, objective, and lacking in personal pronouns. Last year they extended that work to classify the important tweets by categories such as damage reports, requests for aid, and advice. “We are trying to figure out which tweets have the most useful information to the people on the ground,” says Martha Palmer, a professor of linguistics and computer science at Boulder.