Creating a Quality CSE - Tips and Tricks
by Eric Enge
This article will outline some ideas for how you can create a custom search engine that provides better results than Google's. As you may know, Google's Custom Search Engine announcement of October 24, 2006 is creating a bit of a stir. Perhaps it has caused you to think about creating your own custom search engine. But, like many people, you may be wondering what is involved in creating a CSE that provides better search results than those from Google. It's an excellent question. Create a noticeably better search engine, and you have a good chance of getting people to use it. Create something that is not much better, and you are wasting your time, and that of your users.
Here are the major steps:
- Pick a topic area
- Pick a set of search terms (10 to 20) that you will use to test your CSE
- Assemble a list of the best sites in your chosen topic area
- Build a trial CSE using the Google form
- Test your CSE against Google's core search
- Tweak and repeat the test as needed, until done
- Get third party feedback
Let's look at each of these steps in a bit more detail
1. Pick a Topic Area. The first thing to realize is that you are not going to redo the search engine for the whole web. If you could do that, you could become the next Google. Attractive result, but essentially impossible to achieve. So forget about it. Similarly, the broader the topic area you choose for your CSE, the harder your task becomes. So be prepared to focus on a narrower area in which you already have substantial expertise.
Next, building a better search engine assumes that the original search engine is flawed. So a great way to start is to find a flawed set of search results. The most powerful way to improve the search results is to narrow your search and provide a specific context.
For example, if you search on a popular area, such as arthritis, Google has no way of knowing if you are looking for doctor, are a student doing research, are a doctor, or a pharmaceutical company employee doing research. By building a vertically oriented CSE, you can solve this problem. You could, for example, build four CSEs, as follows:
- Arthritis Information Search Engine for Doctors
- Arthritis Information Search Engine for Patients
- Arthritis Information Search Engine for Pharmaceutical Company Employees
- Arthritis Information Search Engine for Students
By doing this, you could potentially provide much more focused search results than Google, leading to an improved search engine for your customers. You can easily refine this further than I did in the above example (for example are the patients looking to buy drugs, find a doctor, or just do research?). There are many ways to build vertical search engines focused on a specific context.
2. Pick a set of search terms (10 to 20) that you will use to test your CSE. You will need to do some research to find the best sites, and you will need to test your CSE when you are finished. Both of these steps require assembling a set of search terms. You want to pick a set of search terms that broadly covers the topic area of your search engine.
Breadth is important here, because you want your search engine to provide better results across a wide array of searches. For example, if you build a CSE focused on arthritis information for doctors, you want it to work for searches more than "arthritis". For example, if you searched on a term such as "nutritional supplements and arthritis", does your search engine for doctors still provide better results? There are likely to be smaller vertically oriented sites that provide good information on some of these smaller topic areas.
3. Assemble a list of the best sites in your chosen topic area. Next, use your search terms to dig up the best sites. Start with Google to see what search results you get. Pick out the best sites that meet your purpose. If you focus is really research related, screen out those sites that have too much advertising on it, and look for authoritative resources of information.
This "Google filtering" step is critical. If you are working on the arthritis search engine for doctors, you want to eliminate patient focused pages. And, of course, you want to eliminate any spammy looking sites you may find. Starting with Google is critical, as you don't want your results to be worse then the default result, and by filtering the sites to be included based on your context, you should already be able to offer improved results.
Once you have built up a set of sites using Google, use the other search engines to look for additional sites. Try Yahoo, MSN, and Ask. Look for quality sites that did not come up in the Google results. This should provide some additional enhancements to your CSE.
Remember to use a lot of different search terms in building your CSE. You want to make sure that you provide improved results across the array of terms that your users may search on when they use your CSE.
4. Build a trial CSE using the Google form. Now that you have a set of sites, build an initial CSE. You will need to make certain basic decisions in the process. You can read my article on the the basics of creating Google Custom Search Engines for more information on this process.
Just bear in mind that this trial CSE is being built so you can test it. Don't get too hung up on the decisions at this point. Be prepared to do it at least a few times before you are done.
5. Test your CSE against Google's core search. Now that you have built it, test it. The best way to do this is to setup side by side windows, one with your CSE, and one with the standard Google search engine. Try all of your test search terms. Compare the results and make detailed notes of areas where your CSE is better, and things that you see are missing from your CSE that you want to include in it.
This step is a lengthy one, but it's the one where the quality gets baked into your CSE. Human editing improvements into a search engine is not easy work. And to make it noticeably different (and better) to your users, there need to be lots of improvements in the results. This means that you are going to need to make lots of editorial decisions, and the review process is where you test the decisions you made earlier on, and refine and improve them.
6. Tweak and repeat the test as needed, until done. Once you have come up with some improvements as a result of your testing, tweak the list of URLS, and repeat steps 4 and 5 over and over again, until you are done.
7. Get third party feedback. Once you have done the best you can, or become incredibly tired of the process, get third party feedback. There are many ways to do this, but the most basic are to get other people you know to test it before release, or release it and actively solicit end user feedback. You certainly can do both.
Expect this to be an interative process. Using the idea of a focused context that we outlined above, you can pretty quickly create CSEs that are naturally more focused than Google's core search. However, providing a consistent quality and a complete set of results across all aspects of your topic area is going to require ongoing effort. Over time, you will learn more and more about how to improve your CSE.
Advanced Topics
Google Custom Search Engines allow you to provide weighting factors for the sites you pick. These weights are used to provide ranking increases or decreases to your selected sites. As you spend more time tweaking your CSE, you will begin to want to use these to tweak your search results.
Using these weights requires building an XML file, and is beyond the scope of this article. The file is not that complicated, but does require very specific formatting which will feel like programming to non-technical people (but it's not as complicated as programming, really!). We will cover more advanced topics at another time.
About the Author
Eric Enge is a founder in Moving Traffic Incorporated, the publisher of CustomSearchGuide.com, a directory of Google Custom Search Engines, and CityTownInfo, a site that provides information on 20,000 US Cities and Towns. Eric is also the President of Stone Temple Consulting, an SEO services firm.
|