Topic Models in Presidential Debates
I've been itching to use Andrew McCallum's Mallet software for language-related machine learning tasks to see how easy it is, and so I thought this would be a good opportunity to give it a test drive. Bottom line: Really easy, to do what I'm about to do.
The tool I used is the topic modeling, which uses a Latent Dirchlet Allocation model to build a model of topics used in the data set. Functionally, you give it a number of topics you want, and a text data set, and it gives you back the set of words that best represent each topic. So where do the topics come from? Well, the algorithm figures them out! Often, but not always, one can simply look at the representative words for each topic and give it a simple category label. How it works exactly is a difficult topic, and even a laymen's explanation would take too much time and space for today, but I'll keep that in mind for a future post. But for now, let's get to the results.
The results below show 3 "experiments," one each for each of the candidates' words alone, and then another that combines both candidates' words as well as the moderator's words. For each experiment, there are 5 lines for the 5 topics, each topic represented by the words that are most likely to be generated given that topic. You'll notice that some words are in more than one topic -- this is ok! I will refrain from doing any analysis here, so if anyone wants to they can read the topic list and try to figure out their own topics and post them in the comments and I will do the same! Without further ado, here we go!
McCain Topic Models (Trained on only John McCain's responses)
|obama people ll don american taxes country world voted give years long oil state energy war washington fundamental increase countries|
|america tax americans back friends record nuclear home things economy tough security working power sit national times tom jobs fine|
|senator united states spending lot iraq strategy thing general control job georgia troops important defense defeat afghanistan russians person ukraine|
|sen america joe money business campaign reform fact made small dollars jobs children party businesses wealth pay spread spending choice|
|ve time government health make president care point billion senate understand issue work cut great fought plan today young programs|
Obama's Topic Models (Trained on only Obama's responses)
|senator billion john deal iran afghanistan troops states problem nuclear united iraq back part pakistan world war military al place|
|sen policy point economic pay campaign education college ll additional afford future tough trade young taxes joe free based behalf|
|ve mccain important don work making economy means cut understand lot america fact put end companies middle change government credit|
|tax president years energy time spending country things oil bush year families working issues talk absolutely made good korea cuts|
|make people health care give percent money american plan provide system policies true issue crisis doesn insurance support businesses agree|
All topic models (Trained on Obama, McCain, and moderator utterances)
|senator obama time question afghanistan troops country security tonight war strategy russia pakistan tom ph georgia lead defense general street|
|sen obama ll give campaign america joe country pay reform time education plan billion voted jobs small trade free fine|
|spending united states america government back iraq understand nuclear americans lot don record taxes senate home business great friends washington|
|mccain make important president billion things point year john deal iran crisis problem means bush issues plan policies talk making|
|ve people health care tax years american energy work world economy oil money issue cut don percent fact working insurance|
To reproduce this you need the following: