Search for Knowledge

Scientist just confirmed Einsteins theory of general relativity by observing gravitational waves. Awesome!  Perhaps the biggest scientific discovery of the century. A slightly less monumental, but perhaps more impactful on the human race, event also just occured. I think we've hit the sweet spot in the Search for Knowledge.

Obviously Google has become the dominate search engine on the Internet. And you can leverage their expertise in your own enterprise by implementing the Google Search Appliance (we have). But Google isn't always the best or correct answer to your search needs. Thankfully there are still many options that may address your need for privacy better such as Duck Duck Go (search that doesn't track you), or powerful engines that you can implement for yourself like Apache Solr (full source code and zero license fees).

But still, there are ways that all of these options fall short. Google's complex algorithms can never be perfect. There is a "signal to noise ratio" problem that is intractable as more and more content makes it way onto the global Internet. While Google no doubt has multiple algorithms (potentially one for each source of data), at the end of the day they are just one organization and their search platform is monolithic.

For pure research or education, there have always been specific (even curated) databases such as IMDb for movies, ERIC for education, LexisNexis for legal and corporate, PubMed for bio-medical and life sciences journal literature and Zillow for Real Estate. The problem with these information silos is that no matter how good they are, and even if you could access them all individually (which is impossible on a practical level), you can't search them comprehensively.  The dollar cost of these independent platforms is often prohibitive to even the interested user. (The 'free' ones are either subsidized or paid for in advertising.)

There are a couple of new developments that are very exciting and foretells a much brighter future for Search: The Wikimedia Foundation (WMF) is working on the software side to enhance the search experience for Internet users. At the same time, the cost of computers is shrinking as fast as their size. Companies such as 'Endless' are making computers for as little as $79 that can access this global knowledge (even offline where the Internet is unreliable). We're focussed on the software side in this article, but the hardware side is no less dramatic or impactful.

WMF Discovery will democratize the discovery of media, news and information -- it will make the Internet's most relevant information more accessible and openly curated, and it will create an open data engine that's completely free of commercial interests. Today, commercial search engines dominate search-engine use of the Internet, and they're employing proprietary technologies to consolidate channels of access to the Internet's knowledge and information. Their algorithms obscure the way the Internet's information is collected and displayed. WF Discovery by Wikipedia upends this commercial structure by emphasizing six key areas:

  1. Public curation mechanisms for quality
  2. Transparency, telling users exactly how the information originated
  3. Open data access to metadata, giving users the exact date and source of the information
  4. Protected user privacy, with their searching protected by strict privacy controls
  5. No advertising, which assures the free flow of information and a complete separation from commercial interests
  6. Internalization, which emphasizes community building and the sharing of information instead of a top-down approach. 

WMF Discovery will be the Internet's first transparent search engine.

And it will be federated instead of monolithic.  

An example of how this is being used today is how the Open Street Map project data is being used in WikiVoyage. Or, how you can visit the freephile wiki, and search Wikipedia by using w: as a search prefix.

You can read more about WMF Discovery at Wikimedia Discovery (presentation - pdf)