The road to Iteration 1 was a somewhat rocky one for my partner and I. We were not able to produce a fully working version of the scanner code until March 7th, six days after the deadline. Of course, one lesson to take away from our experience would be to start the project earlier. However, since that same lesson is learned by students nationwide then cast aside in favor of the American pastime of procrastination, I shall not speak of it further.
Instead, I'll start by describing how things went so wrong. It all came down to two simple mistakes:
- We were using arrays to store data for the regular expressions. Since C/C++ does not support the allocatable arrays I know and love from FORTRAN, we chose to make the arrays of a static size. While the number of elements in the array was originally set correctly, it was decreased by one while debugging and never put back.
One would expect that a badly-sized array would be easy to figure out as being the problem. At least, that was my impression. However, this bug produced no compilation or runtime errors/warnings, not even a segmentation fault (which is what I would have expected). In fact, the result was memory corruption of the first two elements of the array. This directed my attention away from the real problem at the end of the array.
- We originally declared our tokens without using "new". This meant that they were stored on the stack in memory rather than in the heap. When the matching function returns, its stack (along with the token) is destroyed. For some tests, we got lucky and the data from the token hadn't yet been overwritten, but many were not so fortunate.
The biggest takeaway from those failures is that going to office hours is a good thing. The TA's can help you debug your program and also explain what you've done incorrectly. This helps you avoid repeating history by making the same mistakes again. Fixing our disappearing token problem took only a couple minutes. Discovering the issue with the length of the arrays took...well, more than a couple minutes, but since neither my partner nor I knew what was wrong, it was still much faster than we could have solved it on our own.
One part of the process that did work well was the tests. We were able to use the regex tests to verify that our regular expressions themselves were not the source of our problems with the regex arrays. The tests for the regex matching and token creating method were there most useful during development. For example, a bug that should have caused several tokens to return lexical errors proved that our lexical error code was broken. So a little bit of work allowed our program to output the wrong token (yay!?) and complete the tests. This brings up what I considered to be a development priority: keep the tests running, even if they fail. It's much easier to measure progress by how many more tests your program is passing than by saying "Huh, it's still segfaulting."
While the tests did point us in the right direction during debugging and helped to solve many of the smaller problems, they were not a substitute for cout statements. In fact, the inability to simply comment out tests was an impediment to progress. Rather than using a block comment to temporarily remove large numbers of tests, one would need to cut and paste them into another file. I once accidentally committed a version of my scanner tests where all but one had been removed.
Another thing that I found worked very well was using gdb. I ran it every time every time I needed to track down a segmentation fault. Being told what line of you code it occurs on certainly beats guess and check debugging.
Working with a partner was generally a positive experience. I found working from code and tests to be quite challenging compared to projects where concrete specifications are given. Having someone to discuss possible approaches with was invaluable. I believe it greatly improved the structure of our program. Since CSCI 3081 is more about the development process than the final product, this fact is particularly significant. It was not a problem for us to find time to meet and work together on the code, and it wasn't an unpleasant experience. By that, I mean at the end of the night we would be frustrated with programming problems rather than each other. That fits my definition of a successful partnership.
In the future, we will continue to make heavy use of the tests. Since there is already functional code, it should be possible to write and run tests earlier in the development process for Iteration 2 than we did for Iteration 1. While the major bugs we encountered in Iteration 1 did not cause any additional issues by being solved later in the development process, it would be a mistake to assume that to be the norm. By running tests early, we should be able to save ourselves debugging headaches near the deadline.
The initial design of our program had the regular expressions defined as separate variables with individual expressions to match each. After receiving advice that that was not the intended design, we made the (somewhat painful) switch to an array. This did require rewriting a significant amount of the code which we had completed at that point and it was eventually the source of one of our major bugs, but it greatly increases the flexibility/maintainability of our code now that it is working. For future iterations, we will attempt to think of maintainability before ease of coding. While this moderately increases the difficulty encountered upfront, it decreases the chances of requiring a major rewrite at a point where it would affect even more code.
One objective for Iteration 2, of course, will be to finish it on time unlike Iteration 1. We did complete what we thought should have been a working program by the deadline for Iteration 1, so it is a matter of leaving more time for debugging. For the last iteration we did not anticipate the number of problems we encountered, but for future ones, we will likely be better prepared.