Saturday, April 10, 2010

STL Annoyance

If you are using the std::vector or std::deque containers and want to obtain an iterator to a location in your container, don't even think about doing this:



Or this:



So, you can't obtain a valid iterator if you try to obtain it before your container has any items. You also can't obtain a valid iterator if you obtain it while your container has an item, but then you add more items to it and try to use the original iterator afterward.

What gives?

Calls to insert() (or push_back()) when size() == capacity() will force the container to reallocate its memory (they double in size). A new block of memory is allocated and then the old block is copied to the new block, and the old block deleted. This is bad news if you've obtained an iterator to an item in the old block of memory because it still points there, but the data is no longer there.

It should be obvious why calls to reserve() have the same effect.

So, what can you do?
  • Don't use an iterator obtained prior to a call to insert() or reserve()  on your std::vector or std::deque
  • Keep the iterator updated
Keeping the iterator updated:



Note how the address of our iterator never changes, but the address of the item it points to does change. Therein lies the heart of the matter: When these containers reallocate memory, the data gets moved, but nobody tells the iterators to those pieces of data.

It should also be noted that calls to erase() invalidate any iterators that point to data positioned after the element(s) being erased. The reason should be obvious, as the data is no longer in the same location.

Moral of the story: Don't use an iterator that was obtained prior to calls on your std::vector or std::deque that will cause them to allocate or deallocate memory. Doing so will invalidate your iterators, and using them will yield undefined behavior, which is a nice way of saying "Garbage data at best, An explosion at worst."

For a full explanation of when your iterators will be invalidated (there's more situations than I've listed), consult chapter 23 of the Official C++ Standard (search for all occurrences of the word "valid"):

Monday, March 29, 2010

Reversing A String - The worse way, The better way

A common programming interview question is to reverse a string. A common first approach is to copy the contents of the string into a temporary string in reverse order, and then copy the temporary string over the original string. Do this in an interview, and you'll be informed that while your solution is correct, it can be much better. Doing it in this fashion is very inefficient, but it is still nonetheless a good place to start. As always, solve the problem first, then optimize. The naive solution would look something like this:



At this point, the interviewer is going to ask you to perform the reversing "in-place" meaning "do it all in one pass of the string, no double-pass copying shenanigans!"

So, how do we accomplish this? A basic swap should immediately come to mind. You need to swap pairs of characters in the string as you iterate over it until no more pairs exist. But, how to know when you've reached that point? Sure, you could keep track of indices of the string that have been swapped, but at that point you're diverging toward the inefficiency of the first solution because you'd have to do a lookup in the index table each time you attempted a swap, and that gets slower as your string becomes larger. So, storing indices is out. What else can we do? Well, let's see if there is an algorithm behind this.

Given the string "abc", what can we deduct about which pairs should be swapped?

'a' and 'c' get swapped, leaving us with 'b'. Clearly, we see there is nothing else to swap with, but how would we know this internally, from a source code level? The key lies in the positions of the characters in relation to the end of the string. We never want to swap 2 characters unless the location of the second character is > the location of the first character. Using our string "abc", 'a' is located at position 0 and 'c' at position 2. 'b' is located at position 1, but the character at position 2 has already been swapped.

To compute this mathematically, you do this:

firstIndex = 0
secondIndex = length of the string - 1
while secondIndex > firstIndex:
swap
increment firstIndex
decrement secondIndex
loop

Putting this into C, we have the following, efficient reverse algorithm:



And here is the output, using both versions consecutively on the same string (reverses it, then reverses it again):

$ ./reverse
original: abc
reverse: cba
reverse2: abc

Hopefully this has been helpful to you. Remember, the most obvious solution is most likely not the most efficient.

Saturday, March 27, 2010

When Errors Make Absolutely No Sense...

...you're probably doing something stupid.


I'm working on a new project (C++), and the setup is fairly simple:

I've got 5 implementation files, a utilities header file, plus my main file. Of those implementation files, 2 of them consist only of utility functions I've previously written.

In my utilities header, I've got 3 basic structures, plus an enum. 2 of the structures contain 1 very basic function defined within them.

Now that you have the details, let's get on to the issue...

I needed to add a function to determine the intersection (if any) between 2 lines in 2d space. Naturally, already having this utils.h file, I decided to throw it in there. I probably shouldn't have, as it is relatively long and complicated enough that the compiler wasn't going to be able to inline it, but at the time I just threw it in there and tried to do a quick compile:

/tmp/ccnsMHTQ.o: In function `checkIntersection(Point2d&, Point2d&, Point2d&, Point2&)':
VerticalPattern.cpp:(.text+0x2e1): multiple definition of `checkIntersection(Point2d&, Point2d&, Point2d&, Point2d&)'
/tmp/ccqHjPfS.o:HorizontalPattern.cpp:(.text+0x20b): first defined here

/tmp/cc1AN3z7.o: In function `checkIntersection(Point2d&, Point2d&, Point2d&, Point2d&)':
Enemy.cpp:(.text+0x8e): multiple definition of `checkIntersection(Point2d&, Point2d&, Point2d&, Point2d&)'
/tmp/ccqHjPfS.o:HorizontalPattern.cpp:(.text+0x20b): first defined here

/tmp/cc6 _16 _11ouOwL.o: In function `checkIntersection(Point2d&, Point2d&, Point2d&, Point2d&)':
main.cpp:(.text+0x1ba): multiple definition of `checkIntersection(Point2d&, Point2d&, Point2d&, Point2d&)'
/tmp/ccqHjPfS.o:HorizontalPattern.cpp:(.text+0x20b): first defined here

collect2: ld returned 1 exit status

Huh? That function certainly isn't defined anywhere other than utils.h. So, what gives? My first thought was that I screwed up a header guard somewhere and that coupled with me throwing the definition of the function into the .h file may be causing some issues. So, I checked...

All header guards were fine. Then I started to wonder if it could be a circular dependency of sorts. But, why then is the newly added function the only one causing the linker error? Wouldn't everything else in utils.h also be causing problems? I decide to conduct a test to find out:

I added a very simple function definition to utils.h and placed it above the definition of checkIntersection():

void foo() { int x = 2; }

A quick re-compile, and...

I got the same error as before, except now also with respect to foo(). Looking back over utils.h, it becomes clear that the compiler is inlining the functions declared within my structures. This makes sense, because member functions defined inside of a class definition are implicitly inlined. Inlining means that the functions are replaced by inserting the function bodies at the locations in which they are called. This means the linker never sees them, thus has no reason to complain about them in my situation.

At this point, I have enough information to construct a simple test case to illustrate the issue and better debug it. I need the following setup:

* A utils.h header file that contains a structure which is composed of a trivial data member as well as a function.

* Also in that utils.h file, I need to define a free standing function.

* Lastly, I need to have 2 additional source files, both of which include utils.h, one of which can be the file that holds my main method.

Once constructed, I'm left with the following:



O.K., time to try to build it and see what we get:

/tmp/ccwuqYve.o: In function `utilsFoo()':
foo.cpp:(.text+0x0): multiple definition of `utilsFoo()'

/tmp/ccCIDNQe.o:main.cpp:(.text+0x0): first defined here
collect2: ld returned 1 exit status

Good. The test case is valid. main.cpp includes both foo.h and utils.h, and foo.h also includes utils.h, thus multiple definitions. Evidently, I'm making a false assumption about what I'm doing, what the header guards are doing, or both... and you know what they say about assuming. Time to stop assuming and starting knowing:

Header guards only prevent multiple definitions within a single translation unit. Take this situation for example:

foo.h - includes utils.h, bar.h
bar.h - includes utils.h

Header guards prevent the contents of utils.h from being included more than once within foo.h (a single translation unit): Once from directly including utils.h, and again when it includes bar.h which in turn includes utils.h.

So, isn't that what I was doing in my example? No, not at all. I was trying to rely on header guards to prevent multiple translation units from defining the same function. After compilation, main.o and foo.o both contain code for utilsFoo() because both main.cpp and foo.cpp include utils.h, which is where utilsFoo() is defined. It isn't being defined more than once per-file, but rather multiple times across all files (translation units). Thus, the linker has multiple object files that all have code for utilsFoo().

So, why does sticking the definition of utilsFoo() into an implementation file as opposed to a header file fix the issue? Because in that case, that implementation file is the only object file that has code for utilsFoo(). When other translation units include utils.h, they are simply making the declaration of utilsFoo() visible to themselves. That is to say, they are making themselves aware that the function exists and can be called, but they aren't pulling in the code for the function.

I feel much better now. Sure, I knew how to fix this from the outset, but it was more important to me to know why I needed to fix it.

Monday, July 20, 2009

Getting Over The Initial Excitement Hump

So, you get a new idea for a project. It could be a game, a helpful desktop application, the next big iPhone application, etc... Your brain is working in overdrive with great ideas and you can't wait to start coding. This is normal, and it's great. No better feeling than fleshing out a brand new piece of software that you've come up with.

Chances are, you'll hop right into your IDE and start fleshing up some quick and dirty code to get the thing off the ground. You're probably too excited to bother with design documents and all of that boring stuff. This is fine to an extent, but before you get too deep into your quick proof of concept, you are going to have to make some formal outlines for your project if it is to be of any modest size.

Now that the boring stuff is done, (and unless you are working for a company or making a very large application, you shouldn't have had to spend too much time doing that boring stuff) you can get back to the exciting stuff: programming! At first, things are great. The excitement level is just as high, if not higher than it was at the outset of the idea. But sooner or later, something terrible happens. You lose that excitement and in turn you lose your motivation to work on the project anymore. So, what happened?

It is hard to say for certain, but it seems to be human nature, more specifically our tendency to prefer instant gratification, and who can blame us? We just came up with a new killer app, and we want to take the vision in our head and make it come to life. Such is the attraction to this field to begin with: making dreams and ideas into reality using our skills. However, this comes at a price.

Once you start getting deeper and deeper into your application, that buzz subsides and you are brought back down to Earth a bit when you have to go through all of the mundane tasks to convert your dream into snippets of source code. Of course, the scope of your project as well as the tools and methods you chose to use to create it will have a huge effect on these mundane tasks (converting between data types, proper design patterns, reusable code, GUI, etc...), but at the heart of the matter it is still the same issue: How do you get back that initial excitement after you drop off to the other side of the hump?

I'm not sure I have a good answer for that as I suffer from this situation myself in nearly every project that I start. I find that unless I complete something within a few days, this is almost certain to happen. One issue that plagues me is that I tend to have 3 or 4 of these "killer" ideas floating around at any single time, so once the initial buzz fades during the development of one idea, I go craving my excitement "fix" by starting one of the other ideas, and so the cycle continues. It is an easy trap to fall into for sure.

Income could certainly mitigate these effects. Professional developers encounter these same issues, and maybe more so because they are most likely developing someone else's idea, so that initial rush of excitement may never be there. However, it is their job, so they have no choice but to keep plugging away. On the other hand, if you are working on your own personal project in your free time and you have no financial stake in the application's completion, it can be much harder to get out of the rut. Sometimes just wanting that dream realized isn't enough, as I'm sure we've all come to find out all too many times.

I think the best solution I've found is balance. Although, there is a very fine line to walk. Initial surges are great, but we must take advantage of them and get as much done in those first few days or weeks as possible. Lay the ground work while your motivation is at its highest. Document your code extremely well, otherwise you'll lose time later trying to figure out what the hell you were doing when you revisit the project. That's where the balance comes in. Take breaks. Work on other projects for a few days, but always keep your main project fresh in your head. Ideally, your side projects should be very small pieces of software. Maybe proofs of concepts or something along those lines. It shouldn't be another full-fledged application. During these breaks though, keep your main project fresh in your head. Look over your source code once a day, revisit your design documents, add TODOs to the code if you happen to notice something, but don't add any actual code.

I believe these breaks will allow your mind to rest a bit and will prevent you from getting tired of a project. The side projects will keep your skills sharp as well as give you that initial buzz of excitement from working on something new, and hopefully this excitement can carry back over to your main project.

If anyone has something that works for them, please share. These are just some things I've found that help me.

Cheers.

Curious Observation About Engineering

For a while, I've wanted to develop a small application that I can use to sync my mp3 player with my computer, or vice versa. It tends to be a fairly common thing, as I'm always adding/removing songs from one or the other. It also comes up a lot if I do a new install of an operating system and I'd like to have some music on there temporarily; it's just easier to transfer it from my mp3 player rather than another file system on a hard drive somewhere.

Yesterday, I started thinking about this application again, and today I began fleshing out some things on paper. I was going over things such as how you determine if two files are identical (despite file name, file size, etc... which don't really tell you anything about uniqueness), what sort of sync behavior should be used (mutual-add, mutual-add/delete, etc...), and how to handle the more real-life situation where you've got a number of folders with sub-directories as opposed to just a list of files. I had all of these great ideas of how to solve these problems, and my brain was working overtime for the problems I didn't yet have an answer to. I was looking at source code for some common core UNIX file utilities like diff, cmp, cp, etc... looking for pointers...

And then it happened. I realized that I didn't need to make any sort of application. I realized my "problem" was non-existent. Why, you ask?

With the use of the built-in UNIX 'cp' command (file copy), and a few command line switches, I could achieve my sync behavior.

cp --recursive --update

Done.

--recursive handles the issue with multiple levels of directories, and --update handles the issue with determining if files are identical or not, as well as handling the mutual-add syncing.

The curious part comes in with my reaction to this discovery: I was totally bummed! As an engineer, you tend to want to have these problems to solve. Then, you are excited once you solve them. However, realizing there was never a problem to solve in the first place doesn't bring that same excitement, it in fact has the opposite effect.

It does illustrate a few important concepts of engineering though.

1. Don't re-invent the wheel.
2. Keep it simple, stupid!

Some engineers have a tendency to want to use complex or advanced solutions to problems which don't need them. I myself was bummed that such a simple solution would solve this problem, as I had envisioned a much more "sexy" method. This only leads to trouble down the road, and if at all possible, it's better to realize there's a better way to do something before you've finished doing it the wrong way.

Wednesday, July 9, 2008

Update: Daily Exercises

So, yesterday was the end of my first full week trying out my new routine of math, programming, and guitar each day. I felt it was a good introductory week to help me get used to the effort and time required to do such a thing, and I learned a lot about what changes I'll need to make in order to be successful with this process.

I stuck to my plan about 4 days out of the 7. It was a bit of a bad week to start this, as it fell over a holiday weekend, and 2 of the days in the 7 I had softball games (one of which I was injured in). Nonetheless, it was simply an introduction into what it would be like, and successful in that regard.

My guitar playing has improved dramatically. Not to say that I was bad, but I'm able to play some things now after only 7 days than I really ever imagined I'd be able to. As for programming, I found myself solving some complex problems of which recently I found rather stubborn. The Math was probably the hardest thing to sit down and do, and I failed in this department more than any other. However, when I did sit down and work some problems, each day they became easier.

I guess the main things I learned are:
  • It's hard work to do a routine like this with 3 different things. Usually people have a focus on one thing, and that's easy.
  • Tracking progress is essential. I didn't do any sort of that this past week and I don't feel like I can measure what I've accomplished in much of a concrete way. With guitar it's easier because I can set a goal to learn a song or a part of a song, and then do it. The other areas will require some structure.
  • It works.
I'll update this again at some point, at the longest I'll wait until a full month, but I may update it before then. I encourage you to try this out, whether it's with 1 thing, 3 things, or as many as you think you can handle, although I wouldn't push it much past 3 or the point of diminishing returns will surely kick in unless you don't have any other daily responsibilities. I work 2 jobs, so it's a time management exercise as well to fit all this stuff in!

Good luck.

Tuesday, July 1, 2008

Something has to change

This encompasses more than just programming, but it's the best place for it to go and I didn't feel like creating a new blog section...

I have been pretty busy lately, or that is to say "I've always got a lot to do, but usually don't accomplish what I should due to lack of structure and being worn out."

So, in that light, I came up with a plan to improve my skills in the areas of which I require them: Programming, Math, and Music.

Every day I will do the following:
  • Write code - I tend to do this anyways, but it's easy to do a lot of work for a 3 days and then take a day off. Nope, not anymore.
  • Do Math - I'm not great at Math, but my shortcoming in it is only due to lack of general effort toward the subject. Each day I will work some Math problems, as I need to be ready for a placement test anyways, I also need to sharpen my skills and keep them sharp for my future career.
  • Practice guitar - Practicing guitar is one of the most horrible things ever. Note I said "practice" and not "play". It's hard to force yourself to do exercises, but unless you do, you get into a rut and you never progress.
After one week of doing this, I'll post an update and let you know how it's worked out. All 3 of these areas are very closely related in terms of brain function, so while only doing one of them each day would greatly improve that area, doing all 3 should have a greater effect. The idea behind this is that you can't simply work on these things for 8 hours in one day, and then not do anything for the next 3. It's about the constant repetition of the activity day after day that will bring results. It improves memory, coordination, muscle memory, etc... I'll also make it a point to sit down and read something (other than e-mail...) each day. I don't particularly need help in this area, but reading often improves coordination (line skipping and such), concentration, vocabulary, and all sorts of other things worthy of improving.

I think not only will this routine improve the areas of which I'm targeting, but that it will also improve my overall productivity in life as I'll be on a more driven and focused path throughout my day, and I'll be more inclined to get something done rather than just chill out.

Until next time...