## November 18, 2005

### Why Math in Synthetic Bio?

In the past 3 years, I've developed some very advanced methods for computing the stochastic dynamical behavior of "small" physical or chemical systems, which especially includes biological systems. The methods are an improvement over existing ones, decreasing the amount of computational time by 1000x or more for some systems, but still retaining accuracy. Now, I'm working on more methods to better predict the long time behavior of biological systems.

But why is using math important in synthetic biology?

The idea is to develop an accurate model of a biological system (usually focusing on a small subsystem of a single cell) and then predict what the dynamics will be over time. If you can predict what the system will do before you build it, you save yourself both time and money. The model should be a "first principles" one based on the molecular interactions of each DNA site, protein, RNA, etc molecule in the system. That way, if you know the interactions of a DNA site in one model, then you should be able to put the same DNA site in a different model and still predict what will happen. (No lumped interactions!) Of course, we're still constrained by the limited amount of information we have on molecular "parts". That's ok for now, because (one day) we should have that information. Until then, we will need to be good engineers and make guesses (yes, guesses) on what those interactions might be and how they affect the system dynamics.

You would be surprised as to how much guessing goes into making 70 story buildings, cars that move at 120mph, and lots of other contraptions that will easily kill you if built incorrectly. The "engineer guess" is making sure that if you're 500% wrong that nothing bad will ever happen. The technical term is robustness. But, in practice, you assume the unknown quantity can take values within a very large range and then you make sure that nothing breaks for any value in that range. Of course, you have to pick which quantities to make your design robust to. That's where you get this tradeoff between "robustness" and "fragility".

But we should be rigorous about our "guessing". We should be able to identify _all_ possible behaviors that exist when varying the value of a specific quantity. What might the quantity be, as related to synthetic biology? It could be the binding affinity of a protein to a DNA site. It could be the enzymatic Kcat of a phosphorylation reaction. It could be the influx of a regulatory protein from the extracellular space. It is any parameter in our model that is not entirely known.

The math behind computing _all_ possible behaviors of a system while varying one or more parameters is called bifurcation analysis. The subject has always interested me and it's actually extremely useful in real life. Computing that your reactor has a subcritical Hopf bifurcation at a critical parameter value tells you that if your parameter is past this point, your reactor will suddenly blow up and kill lots of people. Whoever said math wasn't useful? In practice, they make sure the parameter never goes near that critical value..not even remotely near it. So reactors generally don't blow up. Whew, that's good to know.

How is bifurcation analysis related to synthetic biology? Say you wanted to create a gene therapy system that consisted of a biosensor + regulated production of a therapeutic protein. There might be 120 parameters in your system. You might have good information about 60 of those, so-so information about 30, and the rest...who knows. But you want the gene therapy to work no matter what. Even if you incorrectly measured the interaction between molecule A and B. Even if you get a mutation that changes an interaction between molecule C and D. It just has to work. If it doesn't, someone can die. So can you determine the behavior of the model over all unknown parameters and make sure that the system will never break? If the number of unknown parameters is 30, well ... that's a 30 dimensional space to worry about. Mathematically, you could do it...but it would take a while. Are there ways to speed up the process? Absolutely. (I won't go into details here.) But my main point is that bifurcation analysis is extremely important for designing synthetic biological systems.

But, wait, I did say "stochastic dynamics" and the bifurcation analysis of stochastic systems is ... not well developed yet. That's because when you're working with probability distributions, the idea of a "qualitative change" in the solution gets harder to define. And working with these types of systems is harder in general. So that is what I am currently doing. And it is going well. :)

The math can get very heady and I worry that people who lack the background will become turned off by it. But the final product is very useful: You can find all possible behaviors of a "small" physical or chemical system (such as biological system) using a combination of existing and new simulation techniques. The words "all possible" are extremely important. As it turns out, if your system has two stable states, a "good" one and a "bad" one, then, because of the random nature of the interactions, the dynamics might first go to the "good" state, but later go to the "bad" one. Not good. But you can minimize the "escape" from the good to bad state if you design the system well enough. This type of "escape" doesn't happen if you describe the system using deterministic dynamics, but it happens in real life. (One more reason why stochastic descriptions are important.)

If you've read this far, then my guess is that you're somewhat interested in mathematics. Study it! (Especially non-linear dynamics, bifurcation analysis, and stochastic processes.) They are useful for real life applications, especially in new fields such as synthetic biology. If you're looking for a book on non-linear dynamics I would suggest the one by Steven Strogatz. The material is at the intermediate-advanced undergraduate level with lots of pictures.

So to answer my initial question: "Why Math in Synthetic Biology?". Math is needed because there's no way to build large, complex dynamical systems without first understanding (and then predicting) what those dynamics will be. Math then gives us the tools to gaurantee certain types of behaviors even if we don't exactly know all of the parameters of a model of the system. Also, using math generates testable and precisely quantitative predictions about the biological system of interest.

(This post is very hodgepodge and not very technical. For technical details on the bifurcation analysis of stochastic systems, you'll just have to wait for the paper. In the mean time, there's two papers on the stochastic numerical methods that are 1000x+ faster than the original one. My group also published a paper on the design principles behind an oscillating gene network. If you search PubMed for 'Salis H', they should come up.)

Posted by at 11:59 AM

## August 6, 2005

### Microfluidics & Laser Manipulation of Cells: The Road to Automated Molecular Biology?

Is automated DNA synthesis, transformation, selection, PCR, ligations, and imaging that far away? How would molecular biology change if a single PhD student could design and synthesize an entire plasmid or section of chromosome in a week? Besides exponentially speeding up the productivity of researchers (and encouraging PhD advisors to get even more paper greedy ;) ), it would have tremendous effects on the whole idea of synthetic biology. Design it. Make it. It works. Time To Completion? A month? A week? If you ask the typical biologist about the prospects of automating the process of genetic engineering, they'll say it's crazy talk. But two recent presentations at two separate conferences outlined the first steps towards the beginning of a new era in biology.

The first conference was on a new cell-manipulation system that uses lasers to selectively toast, lyse, or heat individual cells. The conference is named Biochemical Engineering XIV and took place in Harrison Springs, British Columbia (near Vancouver). The presenter is a founder of a company named Cyntellect and he demonstrated how a high powered laser, driven by the precise movement of mirrors, can target a single yeast or mammalian cell for treatment. The laser has three modes of operation: it can completely break open and evaporate the fluid of a cell, killing it instantly; it can hit the cell with a burst of mechanical force that lyses the cell open, but leaves the intracellular space generally unharmed (including the DNA!); and it can gently heat up a single cell to a nice 42 deg Celsius. Using those three operational modes, the device can automate the transformation and selection process _on the single cell level_. Cool, huh? There's a built in optical microscope to image either staining or fluorescence and the laser system can target groups of cells that have a particular fluorescence (which looked similar to the gating function of a cytometer).

So you can heat shock, transform, image, select, lyse, and ... uh oh..that's all it can do for now. Either way, I was very excited by the time the guy finished his presentation (I mean, hey, he's using fricken laser beams). The real reason, though, is that the field of microfluidics has steadily advanced to the point where you can start pumping in and out reagents of all types into a chamber and control the reactions. Cyntellect only supports 96-well plates for now, but there's no reason why someone couldn't put a nice microfluidic device in there with a good sized viewing window for the laser. The laser system's chamber is temperature controlled (with a good response time!) so we're talking about _everything_ one needs to continue the process of genetic engineering: heat shock, transform, image, select, lyse, ... PCR, purify, and repeat!

The second conference was on "Molecular Recognition & Biosensors" in Santa Barbara, CA. The conference was sponsored by the Army's AHPCRC. In my mind, I knew microfluidics have been making steady strides, but seeing the recent accomplishments really surprised me. A professor at Cal Tech named Yo-Chong Tai presented his work on _integrated_ microfluidic devices at a recent conference The components of the integrated microfluidic device, such as the pumps, valves, pipes, and other chambers, are molded together in a single chip using micrometer-resolution plastic deposition. To strut his stuff, he showed off his mini-HPLC (High Performance/Pressure Liquid Chromotagraphy). The mini-HPLC is 3 cm in length, uses ~100 nl of sample, and the results match the big commercial ones. Crazy, huh? How far away is he from making a tiny electrophoresis chamber? Probably not too far.

Using an integrated microfluidic device combined with laser manipulation of individual cells, one can perform all of the necessary tasks that are needed to genetically engineer cells. How long will it be until someone decides to put these two innovations together and optimize the automation of the entire process? I give it 4-6 years, tops. Genetic engineering (cloning) is such a tedious, mind numbing, and repetitive task that any efficient (and cheap) automation would have a huge, earth-shattering effect on the entire field.

Honestly, I can't wait. I don't do half the cloning that some of my other friends do, but I still think it's the most repetitive and least satisfying part of molecular biology. If it works, you're happy...but only because you don't have to repeat it. The real satisfying accomplishment is doing the experiment and gaining knowledge (and something to write a paper about, heh). And if it doesn't work...ugh, wasted time!

-Howard

Posted by at 12:40 AM

## July 22, 2005

### A Quick Update

Hello all,

I apologize for the huge delays in posting, but I promise that it hasn't been in vain.

For the past five months, I've been going back and forth between a wet and dry lab, continuing to develop new stochastic numerical methods while helping to build a gene network in E. coli. I started my graduate studies in the computational area of systems / synthetic biology and moving into the experimental side has certainly been a change of pace. Some of the other bio people are pretty shocked to find me pipetting. When they ask why, I usually say something glib like "What use is designing something if you never build it?" Which is more or less true. However, in between experiments, I've been designing new networks and developing new stochastic methods. The work is going well. I (and my advisor, of course) have two papers in the review process and two more being written. I'll wait until they're published online before blogging about them (probably safer that way, heh).

I still believe that synthetic biology will never mature into an engineering science unless we use computational tools to quantitatively predict the behavior of a biological system. We need to create the same sort of tools, like CAD, that the auto industry uses to design cars. They spend 95% of their time on the computer, testing new hypothetical designs, and they only build prototypes when they see something work well. The same can be said for other industries, including building chemical or pharma plants and computer chips. The same process must develop for synthetic bio. We need more experiments to determine how simple biological systems work, but then we must incorporate that information into realistic models, powered by sophisticated mathematics. Without the combination of both, we'll be flailing in the dark, using the trial and error system that is rarely productive, often boring, and never generalizable.

I also have some thoughts on the differences between wet and dry lab experiments. I'll save that for a future post, I think.

I recently received a bunch of questions about stochastic methods from Dr. Herbert Sauro, who (like me) is a member of the sbml-discuss mailing list. They were very good questions and I think I did a decent job of answering them so I think I'll link them here

Ok, that's all for now (I so suck at blogging).

Posted by at 8:55 PM

## February 3, 2005

### And the Fun Begins

I apologize for the long delay in posting. Right now, I'm finishing up a manuscript on the design of biological AND gates that activate gene expression if and only if two or more transcription factors are present. I won't mention any details until the paper is accepted, but these facts should pique your interest:
-- They require no special cooperativity or allosteric changes (ie. no intensive engineering)
-- They can be made for two or three inputs (and probably more, but I didn't want to push it)

After the paper's out the door, it's time for the real fun: experiments!

So far, most of the work has been computational: developing new algorithms and software to quickly design and analyze synthetic biological systems. The methods have been shown to be accurate, but there's always a lingering doubt that the in silico results won't match the in vivo results. Now, it's time to go back to the (wet) lab and really see if it all works. The Fun Begins.

To test out our methodoloy, we thawed out some strains of E. coli producing GFP under control of the lac operon and sent it through a flow cytometer (courtesy of Dr. Friedrich Srienc, thank you very much). The pictures it produces are simply fantastic.

It's amazing how stochastic simulations and flow cytometry complement one another. A stochastic simulation will produce the probability distribution of the solution at whatever time points you want, but requires numerous independent trials (usually 10,000, each representing a single cell). A flow cytometer, on the other hand, can count hundreds of thousands of cells and measure the fluorescence in each individual cell, producing a very smooth distribution, but only for a relatively few number of time points. Using the simulations, you can gain a high-resolution (in time) picture of the dynamics. Using the flow cytometer, you can validate the results at specific time points and easily produce very smooth distributions. Very nice.

If you're interested in computational design of biological systems, check out http://hysss.sourceforge.net. HySSS, or Hybrid Stochastic Simulation for Supercomputers, is a software package that creates, simulates, and analyzes biological or chemical systems. It uses the hybrid homogeneous stochastic simulation previously mentioned. That paper actually was published on February 1st (not Jan 15th, woops) in J Chem Phys.

Posted by at 9:58 PM

## November 20, 2004

### Good News!

Paper #2 titled 'Accurate hybrid stochastic simulation of a coupled system chemical or biochemical reactions' is slated to be in the January 15th issue of the Journal of Chemical Physics.

In short, it's a novel method improving upon the stochastic simulation algorithm of Gillespie as well as previous hybrid methods. It approximates fast reactions as a continuous Markov process (governed by a system of stochastic differential equations) while still representing the slow reactions as a jump Markov process. Partitioning of the system into fast & slow reactions is dynamic and it introduces a new way to quickly monitor when slow reactions occur while still retaining all of their time-dependence on the fast reactions.

On a related note, I saw Dr. Linda Petzold give a presentation today on a partial stochastic equilibrium and its usage to speed up the stochastic simulation algorithm. She only got through maybe half of it because the moderator kept asking boring questions in the middle of her talk (she obviously thought her questions were very insightful and fruitful, but they weren't). Of course, then the moderator proclaims the five minute warning with only 2/3 of the presentation left. Dr. Petzold was much too nice with her. I would have asked to have the conversation after the 15 minute time limit.

For those of you who aren't modelers, the stochastic simulation algorithm is a computational method that simulates the dynamics of a system of bio/chemical reactions. It's especially useful when the numbers of participating molecules are few because alternative simulation methods (like reaction rate equations / ODEs) fail in the 'small' regime. One can use these types of methods to simulate gene expression or signal transduction or the cell cycle. If you can break it down into a system of reactions, then the SSA can simulate it very nicely. The only problem is that it can be very slow in certain circumstances. The challenge now is to identify the circumstances which cause it to slow down and speed it up somehow. Hybrid methods or the partial stochastic equilibrium assumption are two approximations that seek to make the simulation go faster without losing too much accuracy.

Posted by at 11:56 PM

## October 22, 2004

I've updated the Links and Books sections to be more relevant.

Be sure to check out MIT's Registry of Biological Parts. It's a great idea.

I've also linked the research site of Yiannis Kaznessis, my advisor. In addition to synthetic biology, my group also performs docking calculations for protein-protein pairs and protein-DNA pairs as well as bioinformatics techniques to analyze microarray data. One obstacle to designing synthetic bio systems is the lack of the necessary 'parts'. Part of our integrative approach is to use computational design of proteins/DNA sequences to predict which modifications should be made to provide the needed parts for a particular design.

My (soon to be) published papers are in the list of publications. The first one is available in PDF form and should be published soon (we've corrected the galleys already, but they've been dragging their feet on this 'special' issue of theirs. It's taken a year after acceptance for us to get back the galleys!).

Posted by at 8:03 PM

## October 19, 2004

### Before there were Answers, there were Questions

Before I begin reviewing the field of Synthetic Biology and discussing topics of interest, I just wanted to start by listing some of the questions that drive the research. Before you can find useful answers, you must always ask good questions.

What is Synthetic Biology and what do we want to gain by it? How is it useful? To what extent is it 'molecular biology' under a different name and why is it called 'synthetic'? What are we creating and how do we do it? What sort of methods are used?

I'll start answering those questions, but since the field is so new, feel free to pipe up and throw in your own two cents. I won't go into too many specifics or else I'll be writing all night. ;)

Synthetic biology is the study or design of biological molecules whose function did not previously (knowingly) exist in nature. This includes proteins that have catalytic activity (enzymes) or new structural binding properties, such as DNA binding regulators. This also includes engineered mRNA or DNA that has a specific, designed function. The most common example of an engineering biological system is a 'gene network' or 'gene circuit', which is a system of one or more genes whose function has been engineered to perform a specific task.

Why is synthetic biology useful? First, if we can first design a biological system and then build it, then we know how it all (mostly) works. By first predicting what will happen before you build it and then building it, you not only state the hypothesis that a) if built according to the design, it will work, but also that b) the biological system exists as you represented it in the design. So if the system doesn't behave as one might think, then something unknown must exist. Like all good scientific efforts, we have a hypothesis. But as engineers, if something doesn't work, we can investigate the problem and determine the solution.

Secondly, biology naturally interfaces with other biology. The most effective treatments of disease will naturally be biological molecules whose purpose has either evolved to treat the disease or which has been designed (by us) to do so. Not only can we engineer molecules to activate/inhibit/bind/etc, treating a disease, but we may also engineer the production, degradation, and localization of that molecule to control its effects and prevent the cure from becoming worse than the disease. We may also construct biological devices that detect the presence of other biological or chemical molecules (a biosensor), which would have tremendous use in medical diagnostics or defense.

So we can use synthetic biology to study biology while we build new and useful biological devices.

Which brings me to another aspect of synthetic biology: There's really a lot of engineers doing it. I'm also an engineer so I'm happy about that, but I've become accustomed to entering a seminar and being the only engineer there. Why does synthetic biology attract engineers, then? Well, I've been using the word 'design' over and over so that should be a clue.

How is synthetic biology different from molecular biology? Well, ... synthetic biology IS molecular biology, except more quantitative and precise. If you're reading a journal catering to molecular biologists, you'll typically see a model as a slightly cartoonish diagram depicting the interactions between a collection of proteins/etc and arrows showing the order of events. Basic questions are left unanswered by such models: How strong are the interactions? For every protein/etc in the diagram, are there additional interactions that will affect the model? Even though the interactions are listed, what are the dynamics that result? These answers may be counter-intuitive. One part of synthetic biology is to create a more defined, quantitative (and predictive) model of biological systems. Depending upon the level of detail, one could include the kinetic constants of all interactions (reaction/binding events), all unique chemical species, diffusion of all species, and membraned compartments. One outstanding question is what amount of detail is necessary to get predictive results. Conversely, what approximations may we make without sacrificing accuracy?

Finally, what are some of the methods that we use? To build these designs, common genetic engineering techniques are employed to cut and paste DNA into vectors, transform vectors (plasmids) into an organism, and (possibly) integrate the vector into the genome of the organism. These techniques have been used for the past 50 years and, while there are difficulties in extreme cases, it is relatively easy to construct something interesting. (By easy, I mean, it won't take one person their _entire_ PhD program...maybe only a year or two. ;) )

The real obstacle is not the experimental construction, but the design of the DNA sequences. In order to quickly design the system and avoid excessive experimental trial and error, mathematical tools must be used to analyze a design and ascertain whether it will function as expected. Relying on experimental construction alone would result in years of wasted effort. When using mathematical tools, there are two main questions: What quantitative, mechanistic model best predicts the dynamic behavior of the particular biological system of interest? What mathematical representation (and simulation) best reflects the process that occurs? These are two separation questions because one may take a good model and generate faulty equations with it, where the assumptions used in forming those equations are wrong. Solving those equations perfectly would reflect an incorrect answer, even though the model is perfectly valid. Conversely, forming and solving the most accurate and complete equations with the least number of assumptions would be futile if the model itself was not accurate and predictive.

This is where my research starts. As you could tell, I haven't gone into details. Over the next few weeks, I plan to review specific topics within the synthetic biology area, including experimental construction of different gene networks, the mathematical theory behind the most advanced simulators, and the guiding principles behind the optimal design of gene networks.

The format will be informal. I'll reference where it's necessary and include pictures when I can find them. For the math, there's no LaTex and so I will probably just upload PDFs.

This blog is not a substitute for my published papers. I spend a lot more time on them than I do on this (wisely, as you may agree). Feel free to post comments below.

Posted by at 8:39 PM

## September 25, 2004

### Welcome!

Somehow you have stumbled onto this blog. That's good, I guess. First off, let me introduce myself: My name is Howard Salis and I'm a graduate student within the Department of Chemical Engineering and Materials Science here at the U of Minnesota.

My research is on the study and design of systems of genes, also called 'gene networks' or 'regulatory networks' or just plain gene expression. Some people have recently branded this area of reseach as "Synthetic Biology" because a lot of knowledge may be gained by building systems from scratch. Well, no one has built a biological system from scratch yet..so it mostly means that we're building systems that have never existed before in nature, but using parts that nature has graciously given us. I can just about hear the red whirling lights of fear dancing in your head. Every time I mention my research to *cough* lay people I elicit two widely different responses: It's either 'Wow, that's so cool!' or 'MY GOD, you'll kill us all!'. I'm sure people working on the first microwave got the same response. I plan to use this blog to both document my own thoughts on the science behind synthetic biology and, if I get any readers, to discuss the ethical consequences of the now having the capability to design biological systems from a rational perspective.

But, and you should understand this, there's a lot of mathematics behind the 'rational design' part of synthetic biology. It's not easy to predict the de novo structure of a protein, the kinetic constants of molecular interactions, or the dynamics of gene expression. There's some crazy math involved and I may delve into the vortex of technicality and jargon. O well, you're reading this far so you can't be completely bored. I may also use this blog to document my thoughts on papers I've recently read. I have about 300 papers in my desk cabinet and I've honestly come to the point where it's very difficult to pick up one that I've read and say "I remember reading this." That's bad. Notes are good. Blogs are good. Why not do both at the same time?

Sounds good.

Posted by at 1:39 AM