Many software engineers, myself included, are driven to build the best software possible, pouring their creative juices into developing “the perfect” solution. This intention is admirable and it is this quality that can lead to extremely innovative applications. However, I believe it is also this perfectionist quality that can sometimes cause us to lose sight of the bigger picture, our highest priorities. System optimization is one of those “quagmire” areas where we can easily lose sight of reality, as well as our priorities. We can fall into the habit of allowing ourselves to believe that every line of code we write must perform at blazing speed or use the absolute minimum system resources. I’ve found myself, on occasion, after researching optimization patterns or coding and recoding a particular algorithm for hours or days, when suddenly I think, “What is the problem with my original implementation? Why do I think this code won’t perform well enough? What performance tests did I run, against what performance requirement, that pointed me to this code being an issue?”. My job, my responsibility, my goal, is to deliver value and value can only be derived from a real need. When I allow myself to go off on “optimization tangents” (or any sort of effort that could be challenged as being “over-engineering”, for that matter), I lose valuable development cycles – cycles that could have been used to deliver something that actually provides a tangible benefit. The tendency towards over-engineering is natural – I’ll guarantee that we all do it. And so we must accept that, become aware of it and not beat ourselves up when we catch ourselves doing it. One way we can maintain this awareness is to form a habit of frequently asking ourselves this question: “What is the real value in doing the task I’m doing at this moment? Is it real value, or is it imagined?”

Let’s focus specifically on software optimization. What are some of the consequences of taking the “optimize it, optimize it all!” approach?

    • In the time I spent optimizing code (that may already be quite fast enough), I might have implemented and deployed a new feature that could really help the end user.
    • Optimization very often results in code that is more difficult to understand and maintain.

How can we approach optimization in a way that minimizes the risk that we’ll waste time and energy trying to optimize parts of our system that may never be an issue? Here are a few I’d like to recommend:

  • Make clean design your first priority – after correct system functionality, of course.
  • Take a step back periodically during development to consider possible performance “hot spots”.
  • Implement performance and scalability tests for hot spots and automate these tests. Make the tests as realistic as possible, implementing a mix of use cases, running parallel threads that mimic anticipated runtime behavior (yes, you cannot be 100% certain you’ll get it right, but 90% of the time you’ll be close enough to head off most problems) – sorry, but deciding what and how to design performance tests is more a judgment call than it is a science. There are tools, such as JMeter that are perfect for this. Most of these tools even incorporate “record and playback” features to ease test development and execution of regression testing.
  • Evaluate performance objectively, based on test results. If the use case or non-functional requirements don’t indicate a need for a higher level of performance, spend your time elsewhere.

For any hot spot or set of functionality you decide requires performance testing, here are the steps I typically follow:

  1. First, determine the explicit mix of use cases or user/system actions that are to be performance tested, along with the success criteria for each. Now I’ve identified the use case(s) that represent performance issues. It’s all about 1) identifying use cases or mixes of use case scenarios, under load, that represent unacceptable performance, 2) narrowing my focus as quickly as possible, and 3) prioritizing what I’ll spend time and effort on. “If it ain’t broke, don’t fix it” has serious applicability here.
  2. Design the performance tests, preferably leveraging any “record/playback” features provided by the test tool to speed up this process.
  3. Execute performance tests and evaluate the results.
  4. Outside of the performance tests, execute any use cases that did not meet the performance criteria I defined in the first step above, monitoring execution using a profiling tool. The idea is to identify threads or resources that may be causing bottlenecks. Note that I am not running the system under load at this stage – I’m looking for methods or sections of code that may be consuming the most resources. If you’re developing Java applications, VisualVM is a great tool for this and comes bundled in the JDK since Java 1.6, update 7. There are many profiling tools on the market for most programming languages. In my experience, these tools are highly under-utilized.
  5. Based on the information I’ve uncovered at this stage, I usually find it helpful to step through the code with a debugger in order to get a more detailed view of exactly what’s going on. My own experience tells me that the debugger is also one of those tools in the developer’s toolbox that isn’t leveraged nearly enough. I still encounter otherwise very solid developers who don’t even know how to configure and use their debugger. The debugger really is your friend!
  6. The combination of these last two steps will usually point me to a section of code or data structure that’s causing a bottleneck or consuming more significant system resources than others – these are my targets. Review this code, refactor it, repeat step 3 above. If the tests still fail to meet my performance criteria, move again through steps 4-6.

There are many approaches to performance testing. The above is simply intended to get you thinking about the process if you haven’t seriously considered this or don’t have a framework for approaching performance testing on your projects. Whatever you do, if performance or scalability is critical to the success of your project, address this early in the project, while also keeping in mind that not every aspect of any system requires optimization. Prove to yourself and your team where the system is not meeting performance needs and then systematically move through the process of identifying and remedying these issues.

Even though the concepts of “dependency injection” and “loose coupling” have been advocated and written about for at least the past decade, I’m still routinely finding examples that make me think, on the whole, we developers may be underestimating the power of these simple ideas and their related practices. In this post, I’ll describe a real-world, simple scenario I recently encountered and point out how a few very slight changes in thinking could have significantly improved the design and avoided a lot of challenges down the road. I’ll provide some sample code that illustrates why ignoring these principles can get us into trouble, as well as sample code that illustrates how we can clean up our design and make our lives so much easier.

I recently ran into this issue on a project… I found a need to update a Java class that represented a scheduled “job” that retrieved a very large file (contained hundreds of millions lines/records) from a remote URL and processed that file. In a nutshell, here are the primary responsibilities for this class:

  1. Download file from a URL to the local filesystem.
  2. Parse each line contained in the downloaded file, checking to see whether the line represents an entry the application is interested in.
  3. For each line of interest, update the application’s database – depending on the line’s content, that line would either be ignored, or would flag the class to delete existing records or add new records to several tables in the application’s database.

It was these last two functions that caused us to update the class’ algorithm, because when processing files containing such a huge number of records to be parsed and processed, the job would typically run for at least a couple of days. That’s the background, but the real point of this post lies with function number one above – download the file from a remote URL. With all the changes I’d made to optimize the algorithm, I needed to verify the class’ behavior – there was no unit test and I wanted to write one. The idea of unit testing is to isolate the unit (almost always a single class – the “class under test”). However, I ran into some issues:

  • The class under test called a private method to download the file. Why is this a problem? With this implementation, I cannot control what an external web site will return to me at any given time, so how can I possibly write a test that verifies my class correctly processes the returned file? Besides, it’s really a bad practice to have unit tests interact with any external system, especially if it’s a production system. And, if that system is down or broken, my unit tests will break, even though my code could be just fine.
  • The class under test directly instantiated the Data Access Objects (DAOs) it used to update the application’s database. Why is this a problem? While I do care that the data parsed from the downloaded file gets updated correctly in the application’s database, that is not the purpose of the unit test – I may write a more comprehensive integration test later to test a more complete “flow”, but that is not the scope of my unit test. I do not want to really update my database – only verify that my class under test calls the DAOs when and with the parameters I expect. By instantiating the DAOs directly, the class under test has interfered with my ability to control this.

The below code snippet is an oversimplification intended to demonstrate the original problematic approach:

public class BadSyncDBWithExternalSiteJob {
	private static final String REMOTE_SITE_URL = "http://mysite.com/update-file.txt";

	private static final String DOWNLOADED_FILE_LOCATION = "./downloadedFileToParse";

	private SiteSyncDAOImpl siteSyncDAO = new SiteSyncDAOImpl();

	public void execute() throws Exception {
		File downloadedFileToParse = retrieveRemoteFile();
		// Parse file, update local database via the siteSyncDAO
	}

	private File retrieveRemoteFile() throws Exception {
		File downloadedFile = null;
		BufferedInputStream in = null;
		FileOutputStream fout = null;
		// Yes, there are better ways to do this - e.g. Apache's FileUtils
		try {
			in = new BufferedInputStream(new URL(REMOTE_SITE_URL).openStream());
			fout = new FileOutputStream(DOWNLOADED_FILE_LOCATION);

			byte data[] = new byte[1024];
			int count;
			while ((count = in.read(data, 0, 1024)) != -1) {
				fout.write(data, 0, count);
			}

			downloadedFile = new File(DOWNLOADED_FILE_LOCATION);
		} finally {
			if (in != null)
				in.close();
			if (fout != null)
				fout.close();
		}
		return downloadedFile;
	}

}

As you can see, the class that parses the file and updates the database also decides which URL to retrieve the file from the remote site and actually downloads the file. It is as tightly bound to these functions as one can get. So, in order to test this class, my tests will always interact and be dependent upon the external URL, which means it depends on both the current state of that site, as well as the contents of the retrieved file at any given time. This is totally unacceptable. You can also see that it instantiates its own DAO. This interferes with ability to unit test this class, since it will always use this DAO and update the database – which I do not want.

So, what could the developer have done to avoid these issues? Below are some modified code snippets that shows an implementation of the above class, with the issues eliminated by decoupling it from the implementation details it should not care about and allowing dependency injection:

public class GoodSyncDBWithExternalSiteJob {
	private ExternalFileDownloader externalSiteFileDownloader = null;

	private SiteSyncDAO siteSyncDAO = null;

	public void setSiteSyncDAO(SiteSyncDAO siteSyncDAO) {
		this.siteSyncDAO = siteSyncDAO;
	}

	public void setExternalSiteFileDownloader(ExternalFileDownloader externalSiteFileDownloader) {
		this.externalSiteFileDownloader = externalSiteFileDownloader;
	}

	public void execute() throws Exception {
		File downloadedFileToParse = externalSiteFileDownloader.retrieveRemoteFile();
		// Parse file, update local database via the siteSyncDAO

	}
}

 

public interface ExternalFileDownloader {
	public void setRemoteSiteURL(String remoteSiteURL);

	public void setDownloadedFileLocation(String downloadedFileLocation);

	public File retrieveRemoteFile() throws Exception;
}

 

public class ExternalURLFileDownloader implements ExternalFileDownloader {
	private String remoteSiteURL = null;

	private String downloadedFileLocation = null;

	public void setRemoteSiteURL(String remoteSiteURL) {
		this.remoteSiteURL = remoteSiteURL;
	}

	public void setDownloadedFileLocation(String downloadedFileLocation) {
		this.downloadedFileLocation = downloadedFileLocation;
	}

	public File retrieveRemoteFile() throws Exception {
		File downloadedFile = null;
		BufferedInputStream in = null;
		FileOutputStream fout = null;
		// Yes, there are better ways to do this - e.g. Apache's FileUtils
		try {
			in = new BufferedInputStream(new URL(remoteSiteURL).openStream());
			fout = new FileOutputStream(downloadedFileLocation);

			byte data[] = new byte[1024];
			int count;
			while ((count = in.read(data, 0, 1024)) != -1) {
				fout.write(data, 0, count);
			}

			downloadedFile = new File(downloadedFileLocation);
		} finally {
			if (in != null)
				in.close();
			if (fout != null)
				fout.close();
		}
		return downloadedFile;
	}

}

Note the following about the GoodSyncDBWithExternalSiteJob class:

  • It now requires injection of an ExternalFileDownloader interface instance to handle all the details of the file download, including the where and how. In fact, GoodSyncDBWithExternalSiteJob no longer knows anything about the file, not even whether it was downloaded from a remote location. Our unit tests can inject a “mock” version of ExternalFileDownloader – one that returns a “canned” file, containing the same entries each time we execute our unit tests.
  • It also requires injection of the DAO. I won’t focus on the DAO – the same concepts discussed for the “file downloader” applies to the DAO. The job class should not decide which implementation of the DAO it should use and instantiate that – we should be able to “wire” the implementation of the DAO we want it to use at runtime.

NOTE: In order to keep the example code brief, I’ve ignored good exception handling practices. We all know it’s not a good practice to have our methods throw generic “Exception” instances. We do, don’t we? Please say “yes” :-)

Aside from enabling testability, these simple changes provide other powerful benefits:

  • Loose coupling: The class that parses and decides what needs to be updated in the database is now completely isolated from the details of where the file is retrieved from and hot it’s retrieved. Now, I have control of the class’ “collaborators”. I can inject mock instances – my own mock objects or objects created via a “mock” framework, such as EasyMock – that behave according to my testing needs and return objects to my class under test in a predictable way every time I run my tests.
  • Cohesion: Classes now have well-defined, narrow responsibilities. As applications grow larger, the value of this attribute cannot be overstated for purposes of maintainability and comprehensibility.
  • Cleaner, simpler design: Related to the above, this is just an easier design to understand, modify and maintain.

This is an example of where following a Test-Driven Design (TDD) philosophy can really pay off. Had the developer been writing his/her unit tests first, or even in tandem with the implementation of the production code, such radical refactoring would have been avoided and this class would have been unit tested from the start. You might argue that a developer can always refactor later to achieve this goal. However, the problems with this perspective, in my experience are:

  • Developers are rarely given the time required once a given feature has been implemented to refactor existing code, much less to write unit tests for code that is perceived to be “working OK right now”.
  • It is generally much more expensive to figure out how to refactor and unit test existing code than to take some simple steps to make the design clean and write the tests from the beginning.
  • In many cases, the class under test will be used by several application components, which could mean that the developer now has to consider all the ways the class is being used. This can result in the developer getting into the “this change works for this use case and breaks another use case” scenario. I’ve personally experienced cases in which trying to refactor an existing class that seemed very simple required days of effort.

I’ll address some of these topics – loose coupling, TDD, mock frameworks, etc. – in future blog posts. For now, it’s enough for us to start simply by remembering to design our classes in such a way that their responsibilities are fairly narrow and that their collaborators/dependencies can be injected at runtime and are defined via interfaces instead of concrete implementations. Once we’re doing that as part of our standard practice, our designs will be cleaner and more flexible and our unit testing will be easier and more effective.

For the experienced software developers out there, I have a question: what are the main things you believe, based on your experience, most influence the success of a software project? I should provide some context. I’m not talking about a trivial project, or a throw-away prototype, although I’m sure we all recognize that there are some principles and practices that apply to all software projects – for all projects of any type, for that matter. However, for this discussion, consider projects whose goal is to produce a production-quality enterprise-scale application. In a previous post – Wise Project Management Always Starts With “Why?” – I said that I believe we can only be effective in execution of any project when we first understand why we need to do certain things and everyone understands the “why”. This can help us to keep our focus on value – if we can’t articulate the “why” behind what we’re planning to spend significant time and effort on, there’s a red flag that maybe we shouldn’t be doing it, or should at most be spending minimal effort on it.

In this post, I’d like to consider this idea from a slightly more concrete perspective. Consider some of the core beliefs you hold about what can make or break a software project. I hope issues such as “which line the team decides the curly braces should be placed on” are not in this list for you. I’m talking about issues that really impact the team morale, the ability to deliver on agreed quality and schedule, to keep the project moving down the tracks in a positive direction, with clear intention. Read More

Do you ever have so many ideas running through your head, flip-flopping, churning, bumping into each other, that you feel that your head will explode? Do you often get stuck while trying to think through the aspects of a new idea, the steps of a process, planning a new project? Or, maybe you’ve been taking notes during meetings and find that, unless you write down every word that’s said, you just cannot recall the main ideas and action items from the meeting.

We’ve been trained to attempt to capture the content of our rapidly moving brains in mainly linear ways. Think of how most of us take notes in a meeting. We take a lined note pad, try to write down ideas as they’re stated, and we may try to somehow categorize them after the fact. It would be nice if our brains would follow a single line of thinking to its conclusion before being distracted by a totally unrelated thought. I’d personally love meetings in which all participants could diligently follow the established agenda, never veering off into wild tangents, or jumping back and forth between topics. I say that, but I can imagine how boring that would likely be. The fact is that humans are not computers and our brains will never be that disciplined. Our unruly brains seem to have no shame. They fire thoughts in a seemingly random fashion – one second thinking about the plan for our upcoming project, the next about our upcoming Hawaiian vacation. Read More

In my previous post “Adding the Power of Search to Your Hibernate App – The Easy Way“, I talked a little about when you may want to consider integrating a search capability into your application using Hibernate Search, as well as a bit about Hibernate Search and how it relates to Hibernate Core, Lucene and Solr. In this post, we’re going to take a quick look at a sample application (really, it’s a JUnit test case) that uses Hibernate Core, with Java Persistence API (JPA) annotations, to persist a simple entity to a relational database and Hibernate Search to run some searches against the Lucene indexes created/updated as the Hibernate-managed entities are updated in the database.  Read More

I’m currently working on a software project whose data layer is built using Hibernate – an Object-Relational Mapping (ORM) framework that takes a lot of the tedious work out of tying a Java-based application to a relational database. We recently found that the system needed to expose a full-text search mechanism for a subset of the data being stored in its backend Oracle database. Our first instinct was to deploy a standalone instance of the Solr search platform, which is built on top of the very popular and mature Lucene search engine library. We’d then have our core application call out to the Solr engine remotely and have Solr return search results in JSON format. If you’re an experienced Java developer, I have no doubt you’ve at least heard of all these technologies.

I’ve never personally had a need to implement full-text search. However, I was familiar with basic full-text search concepts and I did know that Solr was one of the more popular search options. I was casually chatting with one of my colleagues – Eddie – about his project and he mentioned that his team had selected Hibernate Search as their search solution. Read More

Why-how-what
As I wander from project to project in my career as a software developer, I notice that, despite the remarkable progress we’ve made in technologies and technical approaches over the past decade, one area still seems to give us fits. We love our technology, we love writing code… but, the idea of living by a “process” gives us the “bad sushi” feeling. We’ve lived under the tyranny of “Process Overlords”, whose wisdom we clearly did not understand – especially since the overlords’ grand plans seldom seems to have resulted in something we could bring ourselves to call “a success”. So, we’ve been hurt before. I think we could use a hug now. Read More

Follow

Get every new post delivered to your Inbox.