Metric Index: An Efficient and Scalable Solution for Similarity Search
Authors | |
---|---|
Year of publication | 2009 |
Type | Article in Proceedings |
Conference | Proceedings of the 2009 Second International Workshop on Similarity Search and Applications |
MU Faculty or unit | |
Citation | |
Web | ACM Portal Link |
Field | Informatics |
Keywords | metric space; similarity search; data structure; approximation; scalability |
Description | Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index), that employs practically all known principles of metric space partitioning, pruning and filtering. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in well-established structures such as the B+-tree or even in a distributed storage. We have implemented the M-Index with B+-tree and performed experiments on a combination of five MPEG-7 descriptors in a database of hundreds of thousands digital images. The experiments put under test several M-Index variants and compare them with two orthogonal approaches - the PM-Tree and the iDistance. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Furthermore, the M-Index demonstrates an excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient - maintaining practically constant response times while preserving a very high recall as the dataset grows. |
Related projects: |