November 10, 2008

Reading, Life Indexing, and Self-Driving Cars

After the last entry I wrote about an article on self-driving cars, I looked forward to the new installment of the series of stories it was based on. I recommend reading that article if you're interested in the topic, but it didn't really provide enough grist for another entry on applied AI.

But while reading, I had an interesting thought. I was noticing the author making many factual claims, which I believed, but was somewhat surprised that they were not justified or explained. The standard blogging technique is just to link to background reading or supporting arguments when making claims that require support but whose arguments do not fit into the scope of the entry. This is not always easy, though, and I can empathize with the author. He, like most technology professionals, probably does a lot of reading, both for pleasure and for work, and doesn't always realize and note while reading which points will be important to remember and reference later.

This is why I would really enjoy an application or web browser add-on that could index every webpage I read all day, every day, on every computer. This application would take advantage of the rapidly decreasing price of storage, which makes indexing even the entirety of text that voracious readers consume a trivial task. The closest thing I have seen to something like this is Google's "Web History" widget, which keeps track of your google searches. If you were looking to build something like this to be immediately useful, you could leverage any web history you have as a starting point, even if it doesn't make up the entirety of your reading list.

Now, applications. First of all, the inspiration behind this, the desired link to explanation that eludes the voracious reader. The dream application would work like this: While writing a blog post (or journal article, or whatever), you realize a claim you are making would be more impactful with a direct citation. Simply copy a sequence of words (perhaps the "anchor text" you would use if you were hyperlinking), and paste it into the indexing engine.

In its most simple form, the results can simply come from a google-type bag-of-words string matching algorithm, ranked by relevance. But more advanced approaches could use something like the topic model approaches we've discussed to return documents that not only match keywords but rather the topics represented by the search query. The most advanced step (and the subject of much my of research) is to use natural language understanding techniques to actually index the relations between referents in the documents, and within the query, and find a match based on a document making similar claims about these relations as your query phrase. Of course, there are many incremental steps between current research and this last technology, but there is no reason why the data collection part cannot begin right now, even without any simplifications or compression done to the text data.

October 28, 2008

Confusing Sarah Palin (chatbot)

In what I hope to make a recurring series, I today present and analyze the results from a chat I had with a chatbot. You can try this one for yourself over at Chatbotgame.com. I tried conversing with the Sarah Palin bot, with the following results:

Here's how my train of thought went: I assumed it would work like many advanced chat bots do, very strategically. It starts by trying to grab control of the conversation with a very pointed statement, as many do. Then, what usually happens is it picks out keywords from the human chat, matches them to canned responses, or response templates. In addition, it will usually do something like keeping track of "topic", probably defined as something like the main noun in the subject of the previous sentence. This is useful when trying to understand pronouns in the next sentence.

However, it quickly became clear that this was not an advanced chatbot (obviously, the word 'game' in the site's title should've suggested that to me). Once I noticed that, I looked around the site for some explanation, and found something kind of cool. This is a Web 2.0 chatbot. In other words, like other web 2.0 sites, it relies on user input for its value. This is a cool idea, and one I think can be very useful to some natural language processing tasks, but it doesn't look like they have the correct framework here.

The system they use requires users to enter very rigid chatbot response rules, based on keywords used both by the human user and the chatbot on the previous chat. This kind of setup gives you very little flexibility, and frankly, even as a game I don't find it very interesting or fun. One problem is that with this setup, there is no memory. If you want to talk about abortion 2 sentences in a row, you have to use the word abortion in both sentences. In addition, if you look at the image above, I tricked the Palin chat-bot into supporting abortion by inspecting its rule set and constructing a question that used the term 'abortion' but with negation. Now, it is possible for this system to avoid this problem, if enough users give negative ratings to results that don't make sense. However, I would speculate that this framework doesn't have the flexibility to implement any real intelligence, and further speculate that the best possible result is one where only the vaguest answers have positive scores, since they will be least likely to be obviously wrong and get negative ratings by users.

Is there any other way to prevent a chatbot from being tricked in this manner? Well, one way is to make the rule set so complex and non-deterministic that no one could predict what response would occur. But then, you most likely wouldn't have enough people using the system to get meaningful statistics on the usefulness of the different responses, and your quality of response would still be quite low.

October 23, 2008

Human Language Technology on Cellphones

There has been a lot of buzz lately about the new Google Android software, and the recently released G1 mobile phone (Warning: Flash) running that software. I thought this would be a good time to do some brainstorming about what kinds of AI, specifically natural language related, would be useful on a mobile device like a cellphone.

The first place I thought of is in the text messaging interface. Have you ever been texting someone and wanted to use the word 'me', and the word 'of' comes up? In all likelihood this is due to basic word distribution probabilities in the English language ('of' is used more often than 'me'). But in certain contexts, 'me' is much more likely. If I start a sentence with 'Pick', I'm more likely to follow with 'me' (as in "Pick me up") than with 'of' (as in, uh, "Pick of the litter"). This is a simple engineering trick, and it is used heavily in things like speech recognizers in trying to predict the next word. When using a single word of history to predict the next word, it is called a 'bigram model'. But more generally it is called an "N-gram model," which uses N-1 words of history to predict the Nth word in a sequence.

What would it take to do this? Well, first, I think to work best it would take a large dataset of text messages. While there is plenty of data to learn these statistics in things like newspaper datasets and speech datasets, text messages seem to have their own syntactic style and specialized vocabulary that should be learned. Since most of the phone companies are sending your private data to the government anyways, we might as well use it for learning statistics before they pass it along! In addition, keeping these statistics (bigram probabilities) takes up some amount of memory, but not prohibitively much (going further than bigram, however, may be prohibitive on a mobile phone). And since texting is much slower than speaking, the CPU power needed to keep up is probably not too much of a barrier either.

Another place for AI might be in learning usage patterns for making phone calls. For instance, a very simple change, which probably shouldn't be considered AI, would be to have your contacts list sorted by frequency of calling rather than alphabetically. On my phone, I spell my most-called contacts' names starting with a space so they appear at the top. But it would be trivial I think to implement this into the phone. But it could quickly get more complex by using a learning algorithm to figure out who I'm likely to be calling depending on time, location (specifically using GPS or triangulation from cell-phone towers, or more generally from my manually input home address), data in my calendar or planner, or other environmental factors. For example, I call my parents every Sunday night, I tend to call my friends and family on their birthdays, I am likely to call my friends in Milwaukee if I am in Milwaukee. These are just a few default "features" that probably apply to everyone, but a good learning algorithm could learn its own features, like noticing that I call my significant other in Minneapolis more often when I'm not in Minneaoplis.

So what would this feature take? Well, it would require a user to have lots of information in their phone, so that the learning algorithm can use whatever is relevant. Fortunately, with developer platforms like the iPhone developer's kit and the Android developer's kit (from Google), and improving access to data and the web, it should be easier for phones to get access to this data, simply by accessing your Google calendar or facebook friends (for example).

So, where else might AI be able to improve the user experience in mobile phones? What would it take to actually implement?

October 20, 2008

Follow-up on topic models

After talking with some co-workers in the lab, I decided to re-run some of the experiments from below with different topic numbers. The rationale was that five topics seemed like a number that's kind of in "no-man's land." That is, you could say that there are fewer topics (foreign policy, domestic policy, and campaign b.s. is one factorization we came up with), or you could say that there are more topics (Iraq, Iran, Afghanistan, health care, energy policy, terrorism, tax policy, etc...), but choosing exactly five topics requires a very contrived and odd factorization of the discussion topics. There is probably not enough data to meaningfully try to figure out more than five topics, so I went smaller. Here are the results for three topics:

Obama's Topic Models

health people care give sen tax american ve plan provide cut insurance policies system lot working policy businesses companies economic
mccain make important president years don energy work point things percent making economy means understand bush true america fact put
ve senator billion john tax time spending deal country oil iran afghanistan problem year troops world states nuclear united iraq

McCain's Topic Models:

ve people ll don time american make tax americans back care friends record nuclear business billion home senate world things
senator obama united states spending lot president iraq point understand strategy work government cut thing great war washington issue sit
obama america sen health taxes country joe voted years money campaign government plan give reform increase fact made dollars history

This is a little bit sneaky, since I'm not really doing a task or evaluation here per se. Rather, I'm just doing the training process and looking at the resulting models. As such, it's tempting to just eye-ball the training results and pick the topic size based on the models that look the best (i.e. most interpretable by a human mind). If I had the data readily available, I would probably try every topic size I could above five and beyond. As it is, i decided to try to be a little bit intellectually honest and only post the number we came up with in discussions in the lab last week, three. I don't think the results are extremely illuminating here, but that's life!

One thing we can take away from this is the importance of finding the correct representation of training data. If I had more time, I would tinker with the training data a bit more. Right now every line in the data counts as one data point, and the line breaks are chosen essentially by the transcriber(s). This should roughly correspond to one topic per line, since usually a candidate's turn consists of talking about the topic that was raised by the moderator. But this is not foolproof, since the candidates often start by answering the question, then pivot to talk about something else they want to make sure gets mentioned. Finally, the transcription is not perfect, and there are some places where the line breaks might be arbitrary. If I were to do this analysis more carefully, I would use a automatic sentence segmenter and just train the models sentence by sentence.

One more thing: what is this good for? Well, the analysis here and below might not be good for much, for the reasons mentioned in the last paragraph (it's hard to tell). But assuming there was more data and the models were clear-cut topics, this could be very useful. One thing you might do is use these models to classify new texts. If there were a fourth debate, you could take the transcriptions from it, and in combination with the models, cluster each sentence in the new debate, and assign it to the topics. The way this usually works in text clustering is what's sometimes called "soft clustering", in which a sentence isn't assigned to a single cluster, but is assigned to each cluster with a certain weight. So if your three learned topic models roughly correspond to domestic policy, foreign policy, and campaign b.s., a new sentence about Iraq might classify as 70% foreign policy, 25% domestic policy, and 5% campaign b.s., since Iraq policy also has an impact on domestic policy, despite being y definition foreign.

So I think just that result is cool. But if you're not an AI researcher, and you are actually concerned with practicality, you might think, but what does that get you? Well, I'm sure there are lots of practical uses, but I'll just list a couple I can think of off the top of my head. You could use a system like this to automatically build a database of a candidate's stances on various issues. You could build a system which took questions from interested voters, assigned them to one of the topic categories, and then retrieved the similar statements made by the candidate to try to answer the question. Any other ideas?

October 16, 2008

Topic Models in Presidential Debates

The 2008 Presidential debates are now behind us, which is kind of a relief from a political perspective, but form the point of view of statistical natural language processing it means one source of data is now unfortunately gone. With that perspective in mind, I wanted to see if I could do anything interesting with the data.

I've been itching to use Andrew McCallum's Mallet software for language-related machine learning tasks to see how easy it is, and so I thought this would be a good opportunity to give it a test drive. Bottom line: Really easy, to do what I'm about to do.

The tool I used is the topic modeling, which uses a Latent Dirchlet Allocation model to build a model of topics used in the data set. Functionally, you give it a number of topics you want, and a text data set, and it gives you back the set of words that best represent each topic. So where do the topics come from? Well, the algorithm figures them out! Often, but not always, one can simply look at the representative words for each topic and give it a simple category label. How it works exactly is a difficult topic, and even a laymen's explanation would take too much time and space for today, but I'll keep that in mind for a future post. But for now, let's get to the results.

The results below show 3 "experiments," one each for each of the candidates' words alone, and then another that combines both candidates' words as well as the moderator's words. For each experiment, there are 5 lines for the 5 topics, each topic represented by the words that are most likely to be generated given that topic. You'll notice that some words are in more than one topic -- this is ok! I will refrain from doing any analysis here, so if anyone wants to they can read the topic list and try to figure out their own topics and post them in the comments and I will do the same! Without further ado, here we go!

McCain Topic Models (Trained on only John McCain's responses)

obama people ll don american taxes country world voted give years long oil state energy war washington fundamental increase countries
america tax americans back friends record nuclear home things economy tough security working power sit national times tom jobs fine
senator united states spending lot iraq strategy thing general control job georgia troops important defense defeat afghanistan russians person ukraine
sen america joe money business campaign reform fact made small dollars jobs children party businesses wealth pay spread spending choice
ve time government health make president care point billion senate understand issue work cut great fought plan today young programs

Obama's Topic Models (Trained on only Obama's responses)

senator billion john deal iran afghanistan troops states problem nuclear united iraq back part pakistan world war military al place
sen policy point economic pay campaign education college ll additional afford future tough trade young taxes joe free based behalf
ve mccain important don work making economy means cut understand lot america fact put end companies middle change government credit
tax president years energy time spending country things oil bush year families working issues talk absolutely made good korea cuts
make people health care give percent money american plan provide system policies true issue crisis doesn insurance support businesses agree

All topic models (Trained on Obama, McCain, and moderator utterances)

senator obama time question afghanistan troops country security tonight war strategy russia pakistan tom ph georgia lead defense general street
sen obama ll give campaign america joe country pay reform time education plan billion voted jobs small trade free fine
spending united states america government back iraq understand nuclear americans lot don record taxes senate home business great friends washington
mccain make important president billion things point year john deal iran crisis problem means bush issues plan policies talk making
ve people health care tax years american energy work world economy oil money issue cut don percent fact working insurance

To reproduce this you need the following:
  1. Debate transcripts: Debate 1, Debate 2, Debate 3 (Click on "Print" and then copy/paste the text into a file.)
  2. Perl script to extract text by candidate (Download and change the extension from .txt to .pl)
  3. Mallet package and instructions on topic modeling

October 15, 2008

Conversing with Computers

This year's Loebner Prize competition for computer chat programs has come and gone again. This competition is meant to be an approximation to the Turing Test for computational intelligence. Turing's original paper suggested that a computer could be considered intelligent if a judge conversing with one human and one computer could not reliably tell them apart.

There has been plenty of commentary on both the Turing Test and the Loebner prize competition over the years, to which I don't really have anything to add, but I did want to just make a few comments on this article by one of this year's Loebner prize judges.

First, the author/judge asks both chat partners if they are human or a computer. Both reply ambiguously. This is just silly. In the Turing's conception of the game, the human is meant to try to help the judge, while the computer is meant to try to trick the judge. So what is the human doing here? Well, one explanation is that the rules for this version of the test are not the same as the version Turing envisioned, having been relaxed a bit to make some success possible 1. This is a reasonable approach to stimulate research and interest, since an impossible task might just drive away potential entrants. But if this is the case, I wish the news reports would be more upfront about that fact, because to do otherwise is to mislead the non-technical reading public.

The second possibility for the odd human response is the lack of understanding of the game. In this case, there is a simple solution -- offer incentives to the human chat partner for properly helping the judge discriminate the machine from the human. Of course, this changes the entire complexion of the game. Now the test becomes not just to chat in a passable human-like manner, but also to be able to deceive people convincingly!

While the science and technology of deception are very interesting from a research perspective, from a practical perspective, I can understand why simply being able to chat in a human-like way is of more immediate interest.


1 As of this writing, the rules for this year's competition have not been posted.

October 14, 2008

Self-Driving Cars - Lessons from Total Recall

Update: From the comments, I want to note that the Ars Technica article linked below is based on a series of articles at Brad Templeton's site on self-driving cars.

I saw this article on the potential social impact of self-driving cars at Ars Technica. There are lots of good ideas therein, and I recommend reading the whole article.

However, I would like to take a different tack, and examine the role of intelligent agents in these cars. The Ars Technica article doesn't mention anything about intelligent agents, most likely because it is an aspect of the technology not thought to have an effect on society. So I'll just consider it here in the interest of being a geek.

Matthew Yglesias posts this clip of Arnold in a robot cab in the movie Total Recall:

In the AT article, the author (Tim Lee) references handhelds, suggesting that driver-less taxis will be called in advance, with number of passengers and destination input in advance via some sort of google-maps-like interface. But what about the Total Recall approach of having a speech interface once you're inside the car? There are clear benefits here - you don't need to fiddle with a device simply to catch a ride down the street, you can speak to the car just as you speak to a driver currently. Of course, these types of interactions may decrease the efficiency of the taxi network that Lee envisions as a system to optimize through communication and planning. However, I can see this being just a continuation of the way that cabs work today -- if you call a cab from home, you need to specify a destination, and the dispatcher may tell you no ("We don't go to Duluth"). But if you are in the kind of neighborhood where you can just stand on the curb and hail a cab, you can probably just get in and tell them where to go without much trouble.

The second thing I wanted to mention was the dummy from the video. It's so low quality, it seems to be a net negative to the driverless-taxi experience. But the idea is worth examining. Does the representation of a human as a driver provide any benefit? I speculate that it does. Having a human-like or otherwise intelligent-looking thing to look at while speaking is a more natural way to converse. And it has one subtle technological benefit too - by giving a person a representation to look at while speaking, the AI also is able to get a fix on the speaker's face with visual sensors. This can be useful in many ways -- foremost being that speech recognition is more accurate when using visual information (lip-reading) as well as acoustic information. In addition, visual information can be used to extract emotional information from the user. The dummy in the video probably wishes it had this information, so it could hide inside the seat when Arnold became enraged at its incompetence. I think that the same benefits could be had much cheaper with a virtual human representation on a video screen, but I think either way the addition of a virtual intelligent presence would be useful.