<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en">
  <title>Project: Jetson&apos;s Car</title>
  <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/mill1991/pimpmyride/" />
  <modified>2005-11-28T18:58:01Z</modified>
  <tagline>Tracking the failure of my latest pie in the sky project</tagline>
  <id>tag:blog.lib.umn.edu,2008:/mill1991/pimpmyride//1174</id>
  <generator url="http://www.movabletype.org/" version="3.33.uthink">Movable Type</generator>
  <copyright>Copyright (c) 2005, mill1991</copyright>
  <entry>
    <title>Stereo beginnings</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/mill1991/pimpmyride/013422.html" />
    <modified>2005-11-28T18:58:01Z</modified>
    <issued>2005-01-05T12:09:27-06:00</issued>
    <id>tag:blog.lib.umn.edu,2005:/mill1991/pimpmyride//1174.13422</id>
    <created>2005-01-05T18:09:27Z</created>
    <summary type="text/plain">The hardest part of this project for me is going to be the electronics interface. I&apos;ve taken one EE course in digital electronics, and I&apos;ve toyed around with some simple circuits before using do-it-yourself type books, but my theoretical knowledge...</summary>
    <author>
      <name>mill1991</name>
      
      <email>mill1991@umn.edu</email>
    </author>
    <dc:subject>Stereo control</dc:subject>
    <content type="text/html" mode="escaped" xml:lang="en" xml:base="http://blog.lib.umn.edu/mill1991/pimpmyride/">
      <![CDATA[The hardest part of this project for me is going to be the electronics interface.  I've taken one EE course in digital electronics, and I've toyed around with some simple circuits before using do-it-yourself type books, but my theoretical knowledge is close to zero, so I'm not very good at coming up with circuit designs on my own.  I do have the ability to learn new things, though, so I'm hoping I can pick up whatever I need.
<p>]]>
      <![CDATA[As I mentioned yesterday, now that I have some rudimentary speech recognition, I would like to try controlling my stereo from a computer connection.  One thing that may help me is the fact that my car stereo has a removable face - this means that there is a very well-defined interface point between the controls (the face) and the brains (the non-face part).
<p>
<img src="http://blog.lib.umn.edu/mill1991/pimpmyride/pinouts.jpg">
<p>
As you can see, there is a 15-pin connection that connects the face to the back matter (I really wish I had a better name for that).  The next sub-goal of this project is to hook up wires running from the back matter to the face.  Using an oscilloscope (no I don't have one of these, and yes, I know they are expensive), I can hopefully eavesdrop on the signals that are sent between the two parts.  By experimenting and pressing buttons on the face I should be able to capture signals, and then later try to replicate these out on the ports of my laptop.  The stuff I wrote yesterday about speech recognition was trivial for me, but this stuff, which is probably trivial to any EE undergrad, leaves me utterly afraid.  Okay, Tim, take a deep breath, relax, and begin to subdivide the task.  Thank you other voice in my head, I feel much better now.
<p>
The first sub-sub-goal is now to come up with the wire connections between the face and the back matter.  If I can't get this right, then there is no point in even  worrying about an oscilloscope.  There are some pictures below of the sides of the back matter/face connection.
<p>
<img alt="left_latch.jpg" src="http://blog.lib.umn.edu/mill1991/pimpmyride/left_latch.jpg" width="640" height="480" border="0" />
<p>
<img alt="right_latch.jpg" src="http://blog.lib.umn.edu/mill1991/pimpmyride/right_latch.jpg" width="640" height="480" border="0" />
<p>
As you can see, there are little pins sticking out on each side that grab onto holes on the side of the face.  One possibility is that I can make use of those to make a wire connection that is able to hold itself in place using natural methods.  Honestly, I'll probably just tape some wires onto a piece of wood, which itself will be taped to the car dash.]]>
    </content>
  </entry>
  <entry>
    <title>Early progress in speech recognition</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/mill1991/pimpmyride/013371.html" />
    <modified>2005-11-28T18:57:55Z</modified>
    <issued>2005-01-04T13:44:08-06:00</issued>
    <id>tag:blog.lib.umn.edu,2005:/mill1991/pimpmyride//1174.13371</id>
    <created>2005-01-04T19:44:08Z</created>
    <summary type="text/plain">Since my area of research here at the U is natural language processing (NLP), that is the part of the project I am best equipped for, and thus expect the most early progress on. In fact, I have already made...</summary>
    <author>
      <name>mill1991</name>
      
      <email>mill1991@umn.edu</email>
    </author>
    <dc:subject>Speech recognition</dc:subject>
    <content type="text/html" mode="escaped" xml:lang="en" xml:base="http://blog.lib.umn.edu/mill1991/pimpmyride/">
      <![CDATA[Since my area of research here at the U is natural language processing (NLP), that is the part of the project I am best equipped for, and thus expect the most early progress on.  In fact, I have already made some preliminary progress.  I'm going to be using the 
<a href="http://cmusphinx.sourceforge.net/html/cmusphinx.php">
Sphinx system</a> developed at Carnegie Mellon.  It's biggest drawback as far as I can tell is that it uses an HMM trigram language model as opposed to a structural language model.  However, since the speech commands I intend to use are probably not even grammatical sentences (e.g. "Window up", "Volume down"), and are generally fairly short (the longest is probably something like "Rear left window down"), a trigram model probably captures most of the information needed.
<p>
]]>
      <![CDATA[The example commands given above are fairly simple, and this is clearly the best way to start.  If I try to get the system to understand "Computer, could you be a dear and roll down the rear starboard window halfway, please," I would probably not meet with early success.  While the accuracy rate of each word in Sphinx is fairly high, over a complete utterance the accuracy is (word_acc^N), where word_acc is the per-word accuracy and N is the number of words in the utterance.  Therefore, for a per-word accuracy of e.g. 90% and a mere 5 words, accuracy of utterance recognition dips to 59%, and after 10 words it drops to 35%.  By limiting the vocabulary, though, I can increase the single word recognition, and by keeping the commands as short "stock" phrases I can increase the total utterance recognition accuracy.
<p>
I have started in this direction a little bit already.  I used 
<a href="http://blog.lib.umn.edu/mill1991/pimpmyride/car.corpus">
this</a> corpus as a starting point for commands to be recognized.  Using the <a href="http://www.speech.cs.cmu.edu/tools/lmtool.html">CMU online language modeling tool</a>, I uploaded the above-mentioned corpus file to obtain <a href="http://blog.lib.umn.edu/mill1991/pimpmyride/tar5737.tar.gz">
this</a> language model file.
<p>
If you would like to try this on your own computer, download the files above.  Then install 
<a href="http://cmusphinx.sourceforge.net/html/download.php#sphinx2">
sphinx-2</a> on your computer.  There will be a sample language model set up in "/usr/local/share/sphinx2/lm/turtle/".  I'm not sure the exact directory, I'm doing this from memory.  The turtle directory contains a sample language model that is suitable for an environment to control a turtle graphics system.  Since my environment is much different, I created a new directory called "/usr/local/share/sphinx2/lm/car/" and unpacked the above tarball there.  Now, there is a program installed with sphinx2 called sphinx2-test, located in "/usr/local/bin".  This is a script which loads the turtle language model and calls sphinx-continuous, which is a program that just waits for speech input and outputs the most likely utterance.  Now make a copy of sphinx2-test called sphinx2-car and edit the script to include your car language model instead of the turtle model - it's only like two lines.  Whew!  That sounds like a lot of work, but if you're familiar with *nix at all it should be fairly trivial.  
<p>
I did all this on my computer, and ran the sphinx-car program.  So far, it hasn't missed a beat, as long as I "stick to the script."  I tried it the morning after I had set it up, and forgot that the language model doesn't know the word "radio" or "stereo," just the words "volume", "up", and "down," so I obviously wasn't met with much success.  The next thing I tried was playing loud music near the microphone to see if that bothered it.  Obviously if I'm using this system to control my stereo there will be background music, and I hope that isn't a deal-breaker.  The first experiment was The Shins - this worked fine - I think the singing is too high-pitched to be interpreted as a voice, since the system was probably mostly trained using men.  The next experiment was Johnny Cash.  This is a much tougher test, because not only does he have a nice deep voice, but much of what constitutes "singing" for the Man in Black is indistuingishable from talking.  The system did interpret some of this music as speech, but it didn't assign it any words that it knew.  So, as long as there aren't any Johnny Cash songs where he yells out "Roll the window up!" or "Smash into the car in front of you!" I should be safe. 
<p>
So, the simplest part of the project is effectively completed.  The remaining work to be done in speech recognition is the following:
<ul>
<li>Port it from my desktop into my car</li>
There are a couple different approaches I could take here.  The first is just using my laptop with a microphone.  This is the cheapest idea, since I already own both.  Another idea is to buy one of those <a href="http://www.walmart.com/catalog/product.gsp?product_id=3504708&cat=179113&type=19&dept=3944">
$500 laptops</a> running linux from Walmart.  I hate Walmart, but that is a damn good price, and then I could have a dedicated car computer which I could possibly extend with other software.  The final option is a custom computer made from something like mini PCI.  While this is the cleanest solution, there is a good chance it would cost at least $500 and a lot more in labor as I attempt to build the system.  For now the best choice is to run it on my laptop, and if that works I'll look into getting a dedicated system from Walmart.
<li>Integrate environmental data</li>
This is the area of research that is in focus in the NLP lab right now on a mobile robot.  For instance, the commands "Start turning right" and "Stop turning right" are ambiguous because "start" is phonetically somewhat close to "stop."  However, if the robot is already turning right, it's highly unlikely that you would command it to start turning right again, so the command is therefore disambiguated to "stop turning right."  This sort of system could be useful in a car as well, especially considering the background noise from the engine and the music.  So, environmental data might include current stereo volume, current stereo input source, fade and balance, car temperature, current window status, direction of audio input, etc.  These sorts of things are proprioceptive sensors, just like the muscle spindles in your biceps muscles that tell you where your arms are even if your eyes are closed.
<li>More complex language model</li>
Like I mentioned above, the first priority is getting stock commands to work.  After all, this is not part of some far-flung research program, but a practical system I want to actually benefit me ASAP.  With that said, some of my interest now is on word-learning systems.  A nifty feature would be a system that could understand a general grammar and then extend meaning to novel commands like "Crank it up, Rosie!"
</ul>]]>
    </content>
  </entry>
  <entry>
    <title>Project Description</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/mill1991/pimpmyride/013365.html" />
    <modified>2005-11-28T18:57:55Z</modified>
    <issued>2005-01-03T12:44:25-06:00</issued>
    <id>tag:blog.lib.umn.edu,2005:/mill1991/pimpmyride//1174.13365</id>
    <created>2005-01-03T18:44:25Z</created>
    <summary type="text/plain">The goal of this project is to &quot;hack&quot; my car to include a speech recognition system. I think my first goal should be to be able to control my stereo via voice control. If this should work out, my next...</summary>
    <author>
      <name>mill1991</name>
      
      <email>mill1991@umn.edu</email>
    </author>
    <dc:subject>General information</dc:subject>
    <content type="text/html" mode="escaped" xml:lang="en" xml:base="http://blog.lib.umn.edu/mill1991/pimpmyride/">
      <![CDATA[The goal of this project is to "hack" my car to include a speech recognition system.   I think my first goal should be to be able to control my stereo via voice control.  If this should work out, my next task would be to control more encapsulated functions like the heating/cooling system and windows.  The purpose of this blog is to track my progress on this project.  I don't expect to update <i>extremely</i> frequently, as school will be starting soon and I have other research.  Depending on how fast the early stuff can be done, I may have to delay the bulk of this project until summer, due to actual research and classes (I know, they're such a drag).  But insofar as I do make progress, I will update this site with it.  So, if you're actually interested, keep an 
<a href="http://blog.lib.umn.edu/mill1991/pimpmyride/index.rdf">RSS feed</a> (<a href="http://blog.lib.umn.edu/000898.html">howto</a>) and don't bother checking until I update. 
<p>
The tagline of this site is "Tracking the failure of my latest pie in the sky project."  Sometimes I start projects that are a little too ambitious for my limited time, finances, and intelligence.  One reason for keeping a project log (plog) is that with my potential failure or success out in the public, I may be more motivated to actually complete the project.  The worst case scenario is that I'll have a record of my project, and I might be able to tell where I went wrong or right.]]>
      
    </content>
  </entry>

</feed>