November 18, 2005
Why Math in Synthetic Bio?
In the past 3 years, I've developed some very advanced methods for computing the stochastic dynamical behavior of "small" physical or chemical systems, which especially includes biological systems. The methods are an improvement over existing ones, decreasing the amount of computational time by 1000x or more for some systems, but still retaining accuracy. Now, I'm working on more methods to better predict the long time behavior of biological systems.
But why is using math important in synthetic biology?
The idea is to develop an accurate model of a biological system (usually focusing on a small subsystem of a single cell) and then predict what the dynamics will be over time. If you can predict what the system will do before you build it, you save yourself both time and money. The model should be a "first principles" one based on the molecular interactions of each DNA site, protein, RNA, etc molecule in the system. That way, if you know the interactions of a DNA site in one model, then you should be able to put the same DNA site in a different model and still predict what will happen. (No lumped interactions!) Of course, we're still constrained by the limited amount of information we have on molecular "parts". That's ok for now, because (one day) we should have that information. Until then, we will need to be good engineers and make guesses (yes, guesses) on what those interactions might be and how they affect the system dynamics.
You would be surprised as to how much guessing goes into making 70 story buildings, cars that move at 120mph, and lots of other contraptions that will easily kill you if built incorrectly. The "engineer guess" is making sure that if you're 500% wrong that nothing bad will ever happen. The technical term is robustness. But, in practice, you assume the unknown quantity can take values within a very large range and then you make sure that nothing breaks for any value in that range. Of course, you have to pick which quantities to make your design robust to. That's where you get this tradeoff between "robustness" and "fragility".
But we should be rigorous about our "guessing". We should be able to identify _all_ possible behaviors that exist when varying the value of a specific quantity. What might the quantity be, as related to synthetic biology? It could be the binding affinity of a protein to a DNA site. It could be the enzymatic Kcat of a phosphorylation reaction. It could be the influx of a regulatory protein from the extracellular space. It is any parameter in our model that is not entirely known.
The math behind computing _all_ possible behaviors of a system while varying one or more parameters is called bifurcation analysis. The subject has always interested me and it's actually extremely useful in real life. Computing that your reactor has a subcritical Hopf bifurcation at a critical parameter value tells you that if your parameter is past this point, your reactor will suddenly blow up and kill lots of people. Whoever said math wasn't useful? In practice, they make sure the parameter never goes near that critical value..not even remotely near it. So reactors generally don't blow up. Whew, that's good to know.
How is bifurcation analysis related to synthetic biology? Say you wanted to create a gene therapy system that consisted of a biosensor + regulated production of a therapeutic protein. There might be 120 parameters in your system. You might have good information about 60 of those, so-so information about 30, and the rest...who knows. But you want the gene therapy to work no matter what. Even if you incorrectly measured the interaction between molecule A and B. Even if you get a mutation that changes an interaction between molecule C and D. It just has to work. If it doesn't, someone can die. So can you determine the behavior of the model over all unknown parameters and make sure that the system will never break? If the number of unknown parameters is 30, well ... that's a 30 dimensional space to worry about. Mathematically, you could do it...but it would take a while. Are there ways to speed up the process? Absolutely. (I won't go into details here.) But my main point is that bifurcation analysis is extremely important for designing synthetic biological systems.
But, wait, I did say "stochastic dynamics" and the bifurcation analysis of stochastic systems is ... not well developed yet. That's because when you're working with probability distributions, the idea of a "qualitative change" in the solution gets harder to define. And working with these types of systems is harder in general. So that is what I am currently doing. And it is going well. :)
The math can get very heady and I worry that people who lack the background will become turned off by it. But the final product is very useful: You can find all possible behaviors of a "small" physical or chemical system (such as biological system) using a combination of existing and new simulation techniques. The words "all possible" are extremely important. As it turns out, if your system has two stable states, a "good" one and a "bad" one, then, because of the random nature of the interactions, the dynamics might first go to the "good" state, but later go to the "bad" one. Not good. But you can minimize the "escape" from the good to bad state if you design the system well enough. This type of "escape" doesn't happen if you describe the system using deterministic dynamics, but it happens in real life. (One more reason why stochastic descriptions are important.)
If you've read this far, then my guess is that you're somewhat interested in mathematics. Study it! (Especially non-linear dynamics, bifurcation analysis, and stochastic processes.) They are useful for real life applications, especially in new fields such as synthetic biology. If you're looking for a book on non-linear dynamics I would suggest the one by Steven Strogatz. The material is at the intermediate-advanced undergraduate level with lots of pictures.
So to answer my initial question: "Why Math in Synthetic Biology?". Math is needed because there's no way to build large, complex dynamical systems without first understanding (and then predicting) what those dynamics will be. Math then gives us the tools to gaurantee certain types of behaviors even if we don't exactly know all of the parameters of a model of the system. Also, using math generates testable and precisely quantitative predictions about the biological system of interest.
(This post is very hodgepodge and not very technical. For technical details on the bifurcation analysis of stochastic systems, you'll just have to wait for the paper. In the mean time, there's two papers on the stochastic numerical methods that are 1000x+ faster than the original one. My group also published a paper on the design principles behind an oscillating gene network. If you search PubMed for 'Salis H', they should come up.)
September 27, 2005
Ray Kurzweil & the "Singularity"
Ray Kurzweil has a hypothesis. He thinks that technological innovation is increasing so fast that, in a relatively short time, it will change humanity so completely that it is impossible to predict what human existence will be like in 20-50 years (or so). The convergence of information science, computing, biotech, and nanotech will cause a fundamental shift in the pace of technological expansion..so huge a shift that he has given it a name: the "Singularity".
Ray wrote a book on it. Or three. His latest one is currently #22 in book sales at amazon. The idea is popular. Very popular. Why do I mention it here? People who believe in this idea (they're called "Singularitarians" or something) really love the idea of synthetic biology because they see it as the next stepping stone towards bionic ... everything.
I'm not going to deny the appeal of the idea or try to dissuade people from believing in it. It may very well come true, eventually. However, Ray describes technological innovation as an exponential process. (As time increases, the technological state of the art increases like exp(t) ). I also think that technological innovation is exponentially increasing.
But there's a funny thing about the exponential process, exp(t) ... or exp(r*t) if you want to include a rate of growth. If you start from an initial time and watch the growth of technology, it looks grand and awe-inspiring. But if every few time periods (years? months?) you get used the state of technological innovation, then the rate of technology growth no longer feels exponentially increasing. Instead, if you look back in time you will see more or less linear growth and if you look ahead in time you should expect linear growth. Why?
Well, if you split up exp(r*t) into the growth you got used at time to, which is exp(r*to), and the growth you should expect in the future, which is exp(r*(t - to)), then you get exp(r*t) = exp(r*to) * exp(r*(t - to)).
If you Taylor expand this and consider (t - to) to be small (ie. the near future) then you will get: exp(r*t) = exp(r*to) * (1 + (t-to) ).
What does that mean? If you already got used to computers, GM foods, iPods that can fit in your jeans pocket, and just the idea of a Space Elevator, then you should NOT expect miraculous technological innovation in the near future. And if you continuously get used to the innovation that regularly occurs in the marketplace and in academia than you will never be completely surprised by what's coming around the corner.
So, the people of the future will never experience a Singularity. If the people of the past were to somehow catapult themselves into the future, then they will experience the Singularity. But that holds true if they are cavemen experiencing Medieval Europe or Renaissance Europe experiencing modern tech.
I think the future will still hold amazing feats of technological innovation. But, by the time they occur, we will be more or less ready for them. At least, we will be more ready for them than the word 'Singularity' implies. I think anyone that suggests future tech will somehow supplant humanity's human-ness needs to read more Shakespeare.
August 6, 2005
Microfluidics & Laser Manipulation of Cells: The Road to Automated Molecular Biology?
Is automated DNA synthesis, transformation, selection, PCR, ligations, and imaging that far away? How would molecular biology change if a single PhD student could design and synthesize an entire plasmid or section of chromosome in a week? Besides exponentially speeding up the productivity of researchers (and encouraging PhD advisors to get even more paper greedy ;) ), it would have tremendous effects on the whole idea of synthetic biology. Design it. Make it. It works. Time To Completion? A month? A week? If you ask the typical biologist about the prospects of automating the process of genetic engineering, they'll say it's crazy talk. But two recent presentations at two separate conferences outlined the first steps towards the beginning of a new era in biology.
The first conference was on a new cell-manipulation system that uses lasers to selectively toast, lyse, or heat individual cells. The conference is named Biochemical Engineering XIV and took place in Harrison Springs, British Columbia (near Vancouver). The presenter is a founder of a company named Cyntellect and he demonstrated how a high powered laser, driven by the precise movement of mirrors, can target a single yeast or mammalian cell for treatment. The laser has three modes of operation: it can completely break open and evaporate the fluid of a cell, killing it instantly; it can hit the cell with a burst of mechanical force that lyses the cell open, but leaves the intracellular space generally unharmed (including the DNA!); and it can gently heat up a single cell to a nice 42 deg Celsius. Using those three operational modes, the device can automate the transformation and selection process _on the single cell level_. Cool, huh? There's a built in optical microscope to image either staining or fluorescence and the laser system can target groups of cells that have a particular fluorescence (which looked similar to the gating function of a cytometer).
So you can heat shock, transform, image, select, lyse, and ... uh oh..that's all it can do for now. Either way, I was very excited by the time the guy finished his presentation (I mean, hey, he's using fricken laser beams). The real reason, though, is that the field of microfluidics has steadily advanced to the point where you can start pumping in and out reagents of all types into a chamber and control the reactions. Cyntellect only supports 96-well plates for now, but there's no reason why someone couldn't put a nice microfluidic device in there with a good sized viewing window for the laser. The laser system's chamber is temperature controlled (with a good response time!) so we're talking about _everything_ one needs to continue the process of genetic engineering: heat shock, transform, image, select, lyse, ... PCR, purify, and repeat!
The second conference was on "Molecular Recognition & Biosensors" in Santa Barbara, CA. The conference was sponsored by the Army's AHPCRC. In my mind, I knew microfluidics have been making steady strides, but seeing the recent accomplishments really surprised me. A professor at Cal Tech named Yo-Chong Tai presented his work on _integrated_ microfluidic devices at a recent conference The components of the integrated microfluidic device, such as the pumps, valves, pipes, and other chambers, are molded together in a single chip using micrometer-resolution plastic deposition. To strut his stuff, he showed off his mini-HPLC (High Performance/Pressure Liquid Chromotagraphy). The mini-HPLC is 3 cm in length, uses ~100 nl of sample, and the results match the big commercial ones. Crazy, huh? How far away is he from making a tiny electrophoresis chamber? Probably not too far.
Using an integrated microfluidic device combined with laser manipulation of individual cells, one can perform all of the necessary tasks that are needed to genetically engineer cells. How long will it be until someone decides to put these two innovations together and optimize the automation of the entire process? I give it 4-6 years, tops. Genetic engineering (cloning) is such a tedious, mind numbing, and repetitive task that any efficient (and cheap) automation would have a huge, earth-shattering effect on the entire field.
Honestly, I can't wait. I don't do half the cloning that some of my other friends do, but I still think it's the most repetitive and least satisfying part of molecular biology. If it works, you're happy...but only because you don't have to repeat it. The real satisfying accomplishment is doing the experiment and gaining knowledge (and something to write a paper about, heh). And if it doesn't work...ugh, wasted time!
July 22, 2005
A Quick Update
I apologize for the huge delays in posting, but I promise that it hasn't been in vain.
For the past five months, I've been going back and forth between a wet and dry lab, continuing to develop new stochastic numerical methods while helping to build a gene network in E. coli. I started my graduate studies in the computational area of systems / synthetic biology and moving into the experimental side has certainly been a change of pace. Some of the other bio people are pretty shocked to find me pipetting. When they ask why, I usually say something glib like "What use is designing something if you never build it?" Which is more or less true. However, in between experiments, I've been designing new networks and developing new stochastic methods. The work is going well. I (and my advisor, of course) have two papers in the review process and two more being written. I'll wait until they're published online before blogging about them (probably safer that way, heh).
I still believe that synthetic biology will never mature into an engineering science unless we use computational tools to quantitatively predict the behavior of a biological system. We need to create the same sort of tools, like CAD, that the auto industry uses to design cars. They spend 95% of their time on the computer, testing new hypothetical designs, and they only build prototypes when they see something work well. The same can be said for other industries, including building chemical or pharma plants and computer chips. The same process must develop for synthetic bio. We need more experiments to determine how simple biological systems work, but then we must incorporate that information into realistic models, powered by sophisticated mathematics. Without the combination of both, we'll be flailing in the dark, using the trial and error system that is rarely productive, often boring, and never generalizable.
I also have some thoughts on the differences between wet and dry lab experiments. I'll save that for a future post, I think.
I recently received a bunch of questions about stochastic methods from Dr. Herbert Sauro, who (like me) is a member of the sbml-discuss mailing list. They were very good questions and I think I did a decent job of answering them so I think I'll link them here
Ok, that's all for now (I so suck at blogging).
February 3, 2005
And the Fun Begins
I apologize for the long delay in posting. Right now, I'm finishing up a manuscript on the design of biological AND gates that activate gene expression if and only if two or more transcription factors are present. I won't mention any details until the paper is accepted, but these facts should pique your interest:
-- They're made of readily available molecular parts
-- They require no special cooperativity or allosteric changes (ie. no intensive engineering)
-- They can be made for two or three inputs (and probably more, but I didn't want to push it)
After the paper's out the door, it's time for the real fun: experiments!
So far, most of the work has been computational: developing new algorithms and software to quickly design and analyze synthetic biological systems. The methods have been shown to be accurate, but there's always a lingering doubt that the in silico results won't match the in vivo results. Now, it's time to go back to the (wet) lab and really see if it all works. The Fun Begins.
To test out our methodoloy, we thawed out some strains of E. coli producing GFP under control of the lac operon and sent it through a flow cytometer (courtesy of Dr. Friedrich Srienc, thank you very much). The pictures it produces are simply fantastic.
It's amazing how stochastic simulations and flow cytometry complement one another. A stochastic simulation will produce the probability distribution of the solution at whatever time points you want, but requires numerous independent trials (usually 10,000, each representing a single cell). A flow cytometer, on the other hand, can count hundreds of thousands of cells and measure the fluorescence in each individual cell, producing a very smooth distribution, but only for a relatively few number of time points. Using the simulations, you can gain a high-resolution (in time) picture of the dynamics. Using the flow cytometer, you can validate the results at specific time points and easily produce very smooth distributions. Very nice.
If you're interested in computational design of biological systems, check out http://hysss.sourceforge.net. HySSS, or Hybrid Stochastic Simulation for Supercomputers, is a software package that creates, simulates, and analyzes biological or chemical systems. It uses the hybrid homogeneous stochastic simulation previously mentioned. That paper actually was published on February 1st (not Jan 15th, woops) in J Chem Phys.
November 20, 2004
Paper #2 titled 'Accurate hybrid stochastic simulation of a coupled system chemical or biochemical reactions' is slated to be in the January 15th issue of the Journal of Chemical Physics.
In short, it's a novel method improving upon the stochastic simulation algorithm of Gillespie as well as previous hybrid methods. It approximates fast reactions as a continuous Markov process (governed by a system of stochastic differential equations) while still representing the slow reactions as a jump Markov process. Partitioning of the system into fast & slow reactions is dynamic and it introduces a new way to quickly monitor when slow reactions occur while still retaining all of their time-dependence on the fast reactions.
On a related note, I saw Dr. Linda Petzold give a presentation today on a partial stochastic equilibrium and its usage to speed up the stochastic simulation algorithm. She only got through maybe half of it because the moderator kept asking boring questions in the middle of her talk (she obviously thought her questions were very insightful and fruitful, but they weren't). Of course, then the moderator proclaims the five minute warning with only 2/3 of the presentation left. Dr. Petzold was much too nice with her. I would have asked to have the conversation after the 15 minute time limit.
For those of you who aren't modelers, the stochastic simulation algorithm is a computational method that simulates the dynamics of a system of bio/chemical reactions. It's especially useful when the numbers of participating molecules are few because alternative simulation methods (like reaction rate equations / ODEs) fail in the 'small' regime. One can use these types of methods to simulate gene expression or signal transduction or the cell cycle. If you can break it down into a system of reactions, then the SSA can simulate it very nicely. The only problem is that it can be very slow in certain circumstances. The challenge now is to identify the circumstances which cause it to slow down and speed it up somehow. Hybrid methods or the partial stochastic equilibrium assumption are two approximations that seek to make the simulation go faster without losing too much accuracy.
October 22, 2004
Updated Links/Books Section
I've updated the Links and Books sections to be more relevant.
Be sure to check out MIT's Registry of Biological Parts. It's a great idea.
I've also linked the research site of Yiannis Kaznessis, my advisor. In addition to synthetic biology, my group also performs docking calculations for protein-protein pairs and protein-DNA pairs as well as bioinformatics techniques to analyze microarray data. One obstacle to designing synthetic bio systems is the lack of the necessary 'parts'. Part of our integrative approach is to use computational design of proteins/DNA sequences to predict which modifications should be made to provide the needed parts for a particular design.
My (soon to be) published papers are in the list of publications. The first one is available in PDF form and should be published soon (we've corrected the galleys already, but they've been dragging their feet on this 'special' issue of theirs. It's taken a year after acceptance for us to get back the galleys!).