simMachines: Similarity Search and Pattern Recognition
January 8, 2013 • Arch Grant News •
Data is the new oil. In order to extract value out of it, data scientists employ a wide array of tools. A core technique is called similarity search or nearest neighbor search. It is a kind of database query that is capable of finding “fuzzy” matches. For example, the App Shazam searches for similar music patterns in order to identify a song. My company, simMachines, developed a state of the art similarity engine that is not only easy to use, it provides the highest performance. It allows the creation of next generation Apps such as Shazam, and it can be used as a pattern recognizer: you can find for example if a tumor is present in an MRI scan.
I started working on similarity search back in 2004 in the Fukuoka prefecture, Japan. The Japanese government gave me a scholarship to do research at the laboratory of Sensei Takeshi Shinohara where they specialize on similarity search databases. When I arrived to his lab, my Sensei told me that I was expected to choose a research topic for my thesis. He also mentioned that I could choose among any subject I wanted! At the moment I did not understand why he gave me so much freedom. Later, I understood that it was because similarity search applies to many real life problems. My Sensei knew that no matter which problem I chose, similarity search can be applied to it.
Even though similarity search is very useful, there have been two major roadblocks for its widespread adoption. First, it is a complex operation that requires a significant amount of computer power. Second, most algorithms come with complex parameters that are difficult to understand by end users. I spent 8 years of my life trying to solve these two problems. After many, many hours of work, I finally came to a solution that was not only fast, but also easy to configure. At this point I decided to quit my career in science (I was in Germany at the Max Planck Institute for Molecular Biomedicine) to start simMachines.
We decided to apply to Arch Grants because St. Louis is full of big data. Particularly, the bio-medical sector is large in the region and our technology is a natural fit. The grant has been instrumental to our success. From providing support that allowed us to create a presence in the United States (we were based in Costa Rica before) to bringing introductions to many important companies from the region. The environment at T-Rex, the start-up office space available in Downtown St. Louis, is very vibrant. simMachines is collaborating already with 3 start-ups from the building.
If I could give a tip to other fellow entrepreneurs is this one: follow your passion and be patient. It took 9 years to create simMachines but I have enjoyed every single bit of it, the end goal is not as important as the process.