World Library  
Flag as Inappropriate
Email this Article


Article Id: WHEBN0033675697
Reproduction Date:

Title: Clusterpoint  
Author: World Heritage Encyclopedia
Language: English
Subject: NoSQL, Distributed database, XML databases, XML, Inverted index
Publisher: World Heritage Encyclopedia


Developer(s) Clusterpoint Ltd.
Initial release 2006
Stable release 2.2 / February 15, 2013 (2013-02-15)
Development status Active
Written in C++
Operating system Cross-platform
Available in English
Type NoSQL document-oriented database
License Proprietary (available for freeware trial license or shareware commercial license)
Website .com.clusterpointwww

Clusterpoint is a commercial enterprise NoSQL software for the design, search and secure operational management of distributed document-oriented (XML/JSON) data stores.[1][2][3] Software enables to create a high-performance computing cluster for

database operations where each cluster node is storing part of a 

database content while the entire database uses combined local computing

and storage resources from all cluster nodes.  Cluster database 

replicates into multiple local copies for search performance scalability

or into multiple datacenters for high  availability.  The software runs on commercial off-the-shelf 

(COTS) hardware equipment.[4]

Clusterpoint DBMS delivers scalable, high-performance Internet-search-like and SQL-like query across all database content within a single API.[5] Sub-second response-time latency for search queries is achieved using pre-sorted indexation method.[6]

Technology is based on customizable ranking index that can be tuned to match the natural language terms in queries to the most relevant data content in a customer database. When querying a distributed cluster database with free format natural language keywords or phrases, ranking index sorts the most relevant data upfront, thus cutting the amount of data to be read for every query in larger deployments.

As a result, fast and relevant full text search[7] is a preferred information

access method in Clusterpoint databases while maintaining capability to
flexibly query the database structure.  Both search methods can be 

combined in a single query for unified data access.

Use cases

Technology addresses information overflow and latency problem for interactive web and mobile GUI-based database applications where limited-size screens and bandwidth restrictions prevent users requesting

and processing large size query responses.

Scalable ranking index sorts relevant data and returns information page by page in decreasing relevance. This computing model delivers predictable sub-second search latency in very large database systems, including those with billions of data objects, and without overwhelming irrelevant data in search results. Customers can search distributed cluster databases without experiencing performance degradation characteristic to SQL databases when their data volume grows.

Clusterpoint works as a NoSQL data hub platform integrating data from different sources and providing web and mobile interfaces with fast and relevance-sorted database search and navigation functionality. Customers can unify hybrid data types from other databases into open, vendor-neutral, cross-platform XML data format, combining into a single database structured data (date, numeric, character), unstructured data (textual) and semi-structured data such as meta data extracted from blobs, images, voice, video files and stored along the original data files. All database content is indexed with a single ranking index for unified access, search and analysis.[8]

Distinctive technology

Programmable ranking index rules are established in the Policy file,[9] an XML configuration file accompanying each Clusterpoint database. Adjusting ranking rules, customers can configure various grouping, ordering and positioning algorithms for their search results through the ranking index so that it

starts delivering the best end-user search experience.  A set of 

ranking configuration rules, once established for a particular database,

is then being applied and maintained automatically by Clusterpoint 

Server when customer data is loaded or updated through Clusterpoint database CRUD API commands[10] or when database is reindexed.[11]

In Clusterpoint architecture only database ranking index must be modified to implement database search behavior changes. No changes are required for original customer data objects that are stored in plaintext

XML in a Clusterpoint database.  Customer application software design 

and code can be simplified by configuring indexing and search sorting details into the Document policy. Policy configuration determines the final ranking index organization at physical storage level by presorting

the actual index data for custom  algorithms. Customers can avoid SQL programming for data sorting and grouping in their code; instead the database ranking index delivers this functionality.

Presorted indexing is a preferred computing model in Clusterpoint architecture whenever raw search performance and predictable sub-second latency for database querying services are top customer priorities. Its

drawback is less flexible algorithmic data sorting options in customer 

application software code.

Commercial deployments

Clusterpoint DBMS is used to build and operate scalable web-GUI and mobile devices oriented backend application solutions that need to manage increasing volumes of variable data objects in real-time using distributed database architecture. Software is being used in Internet search services,[12] government databases, SIEM solutions for machine-generated data management, data acquisition solutions,[13] content aggregator systems, business directory services and systems.

Clusterpoint DBMS is deployed and used for operating public 24h/7d web and mobile application Internet services in commercial customer accounts

since 2008.[14]

Platform Components

Technically Clusterpoint DBMS software comprises a scalable XML/JSON database server, distributed cluster data storage and enterprise search engine engineered into a single platform: Clusterpoint Server. Clusterpoint Server software installs on each cluster node and is being managed across all clusters with Clusterpoint Manager application, providing centralized administration and control for all databases through a single web GUI. The Clusterpoint Server software is developed

in the C++ programming language and supports multi-threading and 

multi-core CPUs. Primary method of access for the platform is XML/JSON based web API.[15]


Clusterpoint database has multi-master shared-nothing document store

architecture and supports no-single-point-of failure fault-tolerant
infrastructure use, including multi-datacenter replication for a 

distributed database. Internally Clusterpoint data is stored in customer own defined schema-free XML.

There is one single index per database providing all search features and

customizable relevance sorting for structured (dates, numeric, chars) 

and unstructured (full-text) data in Clusterpoint architecture. New content is indexed in real-time and index data immediately can be read and searched for after each document has been inserted, updated or deleted.

To query a database customers use a single unified Internet-search-like and SQL-like query API.[16][17]

Database model

Schema-free XML / JSON document database with arbitrary customer-defined

data structure, containing machine-readable and searchable data.

Documents are identified by a unique per entire cluster database identification string: document ID.[18] Document ID works similarly to the Internet URL address and could be any free format XML tag value or customer defined string value assigned as unique document identifier (examples: customer email address, Internet domain name, product code, social security number, car registration number, geographic address, checksum of a blob object etc.).

When an SQL database with multiple tables is being migrated to the Clusterpoint database model, denormalization must be performed. All

external linked tables have to be embedded into XML parent-child tag 

hierarchy used in the Clusterpoint database model. All relational database normalization introduced technical encoding can be safely removed then. Most primary key indexes, foreign key indexes and

software codes for categorized textual values can be replaced with 

their natural language equivalents. Clusterpoint database model facilitates use of natural language text values in data items so that data can be ranked for meaningful relevance within surrounding context and users could search it using Internet-search-like free format keywords or phrases.

Query syntax

Clusterpoint API query syntax supports natural language keyword and phrase queries, wildcards in search terms, per-character based template queries and structured SQL-like field queries. The following examples illustrate key principles of Clusterpoint database XML-centric query syntax.

Search across all database content for documents matching all keywords:

Example 1:  php developer london 

Search across all database content for the exact phrase (all terms in sequence):

Example 2:  "john smith" 

Free keyword search using wildcards matching "john", "johny", "smith", "smitley" etc.:

Example 3:  joh* smit* 

Phrase search query with wildcards:

Example 4:  "john smit*" 

Search for terms by pattern matching using character positioning templates:

Example 5:  jo?n sm[iy]th 

Search with a combined Internet-search-like and SQL-like query. The following example illustrates combined full-text search, numeric range search and data structure field-search that filters out only subset of data similarly to SQL SELECT ... WHERE ... statements:

Example 6:  php developer 
3500..4500 London 

Multiple combined query rules can be constructed per single database query, using nesting brackets ((())) as Boolean AND, {} as OR, ~ as NOT logic operators. Full text keyword and phrase terms can be used as illustrated in the following example querying for PHP or Java developers, who are not in an expert position and only in 3 particular cities:

Example 7:  {php java} developer ~expert 
{london "new york" beijing} 

To search precise data structure fields in multi-level XML document hierarchy, the XPATH-style nested XML syntax is used to search only in the tag as the child tag of the parent


Example 8:  


Development of Clusterpoint DBMS began in August 2006 by Clusterpoint Ltd., a privately held European technology company run by co-founding team of software engineers lead by Gints Ernestsons .[19][20][21][22][23]

The first public Clusterpoint DBMS software release was in January 2008.

Current Clusterpoint Server production version is 2.2.

Next Clusterpoint Server version 2.3 is under development by vendor as of June, 2013.


General features

  • Data is managed in open, cross-platform, industry standard XML

format, used internally at data store level and in XML API[24]

  • Data structure agnostic and type-rich database, handles variable data

structure XML documents in a single database. Supports unstructured textual data, dates, numbers, meta-data (all XML types)

  • Cross-platform support: binaries are available for Linux,

FreeBSD and Mac OS X. Clusterpoint Server software can be compiled on other operating systems.

  • Multi-master cluster software architecture: no single point of

failure, any cluster node can serve as a master and run the management application

  • Horizontal database scalability: scales out from a single server to

hundreds of servers networked into a cluster infrastructure

Access features

  • REST API is used for JSON document format compatibility [25]
  • Consistent UTF-8 encoding. Non-UTF-8 data can be saved, queried,

and retrieved with a special binary data type.

  • XML objects for API queries and responses: enable direct integration

in other programming languages supporting XML parsing, no specific client software required

Search/query features

keywords search, result snippeting, highlighting, term proximity search

  • Internet-search-like free-format ad hoc queries across all database

structure, using natural language keywords and phrases

  • Querying with term stemming, term wildcards and character position

patterns delivering self merge-joins[26] for inflected words and plural word forms

  • SQL-like XML-structure (fielded) queries like in SQL SELECT ... WHERE

... statements

  • Cluster-wide analytics aggregation with MIN(), MAX(), COUNT(), AVG()

like in SQL SELECT ... GROUP BY ..., ORDER BY ... statements

  • Sorting of results in alphabetic, numeric, date order or according to

result relevance

  • Autocomplete (instant search as you type) using the actual index data
  • Spell-check of query terms with alternative spelling suggestions for

"Did you mean that?" functionality

  • Boosting of search query terms at query time, in order to increase,

decrease or overwrite through the API relevancy weights or sorting rules

built into the ranking index
  • Dynamic data classification per query by multi-level customer defined

facets with exact hit counting (examples: categories, themes, product catalogs, geographic locations etc.)

  • Text-analytics driven similar content search across the entire


  • XML data structure relevance ranking by tag weighting and document

relevance ranking by document rating

  • Textual relevance ranking for matching search query terms to context,

taking into account frequency and density of natural language terms

  • Predictive calculation of expected number of results based on the

actual index statistics in large size databases to optimize performance

Administration/production use features

  • Granular security partitioning: API users and their access rights are

based on groups and permissions assigned per specific databases and API commands

  • Transaction journaling, access logs, error logs and audit logs enabled
by default
  • Document versioning enabled by default (preserving previous document

versions for a certain time period)

  • Reindexing in background with automatic switchover provides

availability during reindexation

  • Online, offline and incremental database backup
  • Automatic or manual synchronization of database replicas
  • Multiple administrator accounts for secure multi-tenancy of different

customer databases on the same hardware

  • Centralized web GUI based database administration, including one-click
configuration of clustered and replicated databases across all 


Automatic full database content indexing

Clusterpoint software automatically builds and maintains document-type XML database content index when data us loaded, updated or deleted. A single ranking index is maintained to support these types of querying:

  • natural language based full text search, including language-specific

stemming and collation rules

  • XML data structure queries (with full-text, exact match and binary

match options)

  • virtual data structure search created from multiple real tags values to speed up Boolean OR queries

    • ad hoc search across all database content irrespectively from the

    database structure

    • numeric and date range search
    • geospatial search by range, distance or polygon coordinates and

    ordering by distance from a certain point

    • multi-level faceted search with automatic results classification by

    XML tags assigned as containing facets.[28]

    • combination of any of the above database search criteria into complex

    nested multi-part query expressions using Boolean AND, OR, NOT logic

    Ranking index

    A scalable ranking index presorts Clusterpoint database content access references for fast database search, including . It sorts data access pointers by customizable weighting attributes that can be configured at database configuration level by customer. Ranking index differs from the traditional SQL-type B-tree or R-tree indexes.

    It has an inverted index design, engineered to deliver linear scale
    out ability in rack and stack COTS hardware cluster architecture so 

    that it can support millisecond-latency textual search in many billions of data objects per distributed database.[29]

    Ranking index allows to get rid of repetitive data sorting characteristic to SQL database servers. SQL databases often consume excessive computing resources for data sorting in large size databases, in particular when sorting and ordering information from multiple tables

    by SQL SELECT WHERE ... JOIN ... GROUP BY ... ORDER BY statements.

    Data grouping, sorting and positioning for relevance

    Clusterpoint database supports grouping and ordering functionality that is similar to SQL's GROUP BY and ORDER BY statements. However, data sorting features are implemented differently.

    The sorting rules are "hard-wired" and built into the physical data files of ranking index. Ranking index organizes database access rules on physical disk level using sequential I/O access methods. It results into high-performance disk reads during database search and navigation so that query results can be delivered to customer applications with minimal latency. Clusterpoint Server does not need to sort data: it just follows ranking index rules and delivers data to users in portions sorted by relevance from most relevant to least relevant.

    Database ranking rules need to be established by database architect at database configuration level using Policy file. Policy is an XML configuration file containing all database indexing, search grouping and

    sorting rules reflecting customer business logic or the actual search 

    needs of the application.[30]

    Customers can flexibly overwrite default ranking index configuration rules from their application software code when using Clusterpoint API, boosting or decreasing relevance of individual query terms.

    Database administration

    Clusterpoint Server can be controlled centrally through the Clusterpoint

    Manager application.  Administrators using web-GUI dashboard control 

    all their database services enterprise-wide, including cluster database administration, configuration of indexing and ranking policy, secure user account management, audit and log file view, database backup/restore, database sharding and replication.

    Each database is being started and stopped as a separate database server

    process per each cluster node for the controlled management of CPU 

    resources, RAM memory and disk storage. All cluster databases share a single networked computing and storage infrastructure and must be managed accordingly.

    Clusterpoint Manager is used to manage underlying hardware resources to operate different cluster databases in parallel. Cluster nodes should have free RAM and disk storage capacity and dedicated network switching fabric among them for maximum performance.

    Process and storage architecture

    Technically each named Clusterpoint database is a safely isolated process that runs in its own RAM memory address space. It can access only its own local file system storage folder with

    the same name containing the particular database XML documents, index, 

    configuration and log files stored on that local cluster node (shard). This architecture delivers elastic horizontal scale out ability and cluster-wide control over resource consumption for a particular customer

    database.  It enables to customize storage allocation for each database
    cluster-wide by sharding and  so that a database performance and data growth

    patterns would not negatively affect other databases run on a given 


    A cluster-wide database is created "virtually" from all local same name databases through Clusterpoint Server software. Locally stored ranking index per each cluster node is engineered with Lego-like modularity and represents one large "virtual index" to the Clusterpoint Server software. Administrators can start or stop cluster databases with one-click controls, safely enabling or preventing on-line access to database storage.

    Multi-tenancy and virtualization

    Multi-tenant database services using Clusterpoint DBMS can securely partition their runtime computing environment among named RAM processes and named file folder storage resources on local nodes, while running multiple databases in parallel on the same equipment. This method delivers the best utilization of modern multi-core CPU hardware. This is the preferred method for high-performance database computing with Clusterpoint software vs operating system level virtualization for multi-tenancy. OS-level virtualization may significantly decrease available network bandwidth among large number of

    cluster nodes running a large Clusterpoint database and could result 

    into increased application latencies. Virtualization can still be used for smaller-scale installations, prototyping and development where operational performance guarantees and low latency are not the first priority.

    Multi-copy database replication

    Search and data access performance scalability and fault-tolerance is delivered through multi-copy database replication for a cluster database. Clusterpoint Server software can be configured to work with multiple working database copies, each additional replica running on its

    own hardware cluster and being synchronized using  method. Database replicas can be located in multiple-datacenters and managed through the same single management interface. All replicas are equal in Clusterpoint architecture and are used for automatic load balancing of database search queries through Clusterpoint API.

    In multi-datacenter use network bandwidth among locations may become the

    critical issue for Clusterpoint architecture because of increased 

    latencies for database updates and synchronization delays among replicas, in particular, if encrypted VPN networking over the Internet links is used. Dedicated bandwidth is a preferred method for high-performance database replication.

    Extendable server-side scripting with Lua

    The Lua extends Clusterpoint Server functionality with custom server-side scripts. Lua scripts can implement customer-specific functions such as data aggregation, ETL tasks, meta-data , call-back to external programming languages using web

    services for extra functionality, real-time alerting or asynchronous 

    triggers. Scripts can be executed before, during or after Clusterpoint API transactions of interest. Built-in configurable server-side hooks activate Lua scripts in different stages of each Clusterpoint transaction execution process.

    Custom Lua scripts can be stored in Clusterpoint Server to work as "stored procedures".[31]

    Programming language support

    Clusterpoint DBMS uses REST principles and HTTP/HTTPS messaging for client-server communications between customer applications and Clusterpoint Server. Any client programming language or development environment, supporting HTTP POST/GET messaging, can connect to Clusterpoint Server directly and read, write, update, delete and search XML documents.[32]

    Optional REST API interface for JSON data format transforms customer data between JSON and XML, while only XML is used for internal server-side data storage and processing by Clusterpoint Server.[33]

    Clusterpoint Server has native client API Libraries using faster TCP/IP transport protocol for the following popular programming environments:

    Licensing and support

    Commercial. Perpetual, OEM and subscription licenses. The free trial license is available upon request.[38]

    Vendor provides standard software maintenance and technical support service based on subscription model, delivering it over email, Skype or phone. Premium technical support for customers using the software in 24h/7d production environments includes remote problem diagnostics and resolution based on Service-level agreement.

    Vendor optionally provides installation support, customer training how to administer and configure Clusterpoint databases and customer training

    how to use database ranking technology.


    3rd party tools and applications

    • Clusterpark Log Data Server - organizes enterprise logs in a singe
    instantly searchable database[40]
    • Crosslink Enterprise - integrates multiple applications


    • Crosslink Clusterpoint Adapter - downloadable Clusterpoint

    connector sample code[42]

    • DigiBrowser - transforms SQL to Clusterpoint NoSQL database

    without programming[43]

    • Network Traffic Surveillance System - works as a "video-recording

    system" for lawful intercept and analysis of computer communications[44]

    See also


    1. ^ "". Retrieved June 14, 2013. 
    2. ^ "Big data startups / document stores". Retrieved June 14, 2013. 
    3. ^ "The NoSQL movement: document databases". Dataversity. Retrieved June 14, 2013. 
    4. ^ "Clusterpoint Product / Architecture". Retrieved June 14, 2013. 
    5. ^ "Clusterpoint Product / Searching". Retrieved June 14, 2013. 
    6. ^ "Clusterpoint Product / Indexing". Retrieved June 14, 2013. 
    7. ^ "Fulltext search engines". Retrieved June 14, 2013. 
    8. ^ "Clusterpoint Solutions / Data-unification". Retrieved June 14, 2013. 
    9. ^ "Document Policy / Relevance Ranking". Retrieved June 14, 2013. 
    10. ^ "Clusterpoint Developer's Guide". Retrieved June 14, 2013. 
    11. ^ "Reindexing Clusterpoint database". Retrieved June 14, 2013. 
    12. ^ "Clusterpoint Product / Data Collectors". Retrieved June 14, 2013. 
    13. ^ "Clusterpoint Network Traffic Security System". Retrieved June 14, 2013. 
    14. ^ "Clusterpoint Solutions". Retrieved June 14, 2013. 
    15. ^ "Clusterpoint website". Retrieved June 14, 2013. 
    16. ^ "Clusterpoint Search Query Syntax". June 14, 2013. 
    17. ^ "Clusterpoint Architecture". Retrieved June 14, 2013. 
    18. ^ "Clusterpoint Document Policy". Retrieved June 14, 2013. 
    19. ^ "Clusterpoint Team". Retrieved June 14, 2013. 
    20. ^ "Crunchbase Profile". Retrieved June 14, 2013. 
    21. ^ "BusinessWeek Company Profile". Businessweek. Retrieved June 14, 2013. 
    22. ^ "Clusterpoint Raises EUR1 Million From BaltCap". Privateequitywire. Retrieved June 14, 2013. 
    23. ^ "Clusterpoint Receives €1 Million From BaltCap". Retrieved June 14, 2013. 
    24. ^ "Documentation - XML API Overview". Retrieved June 14, 2013. 
    25. ^ "Documentation - REST / JSON API Overview". Retrieved June 14, 2013. 
    26. ^ "Making you app searchable using self merge-joins". Google. Retrieved June 14, 2013. 
    27. ^ "Product Features". Retrieved June 14, 2013. 
    28. ^ "Clusterpoint Data Loading". Retrieved June 14, 2013. 
    29. ^ "Clusterpoint Ranking Index". Retrieved June 14, 2013. 
    30. ^ "Result ordering and grouping". Retrieved June 14, 2013. 
    31. ^ "User Scripting". Retrieved June 14, 2013. 
    32. ^ "Clusterpoint XML API". Retrieved June 14, 2013. 
    33. ^ "Clusterpoint REST API". Retrieved June 14, 2013. 
    34. ^ "PHP API Library". Retrieved June 14, 2013. 
    35. ^ "NET API Library". Retrieved June 14, 2013. 
    36. ^ "Python API Library". Retrieved June 14, 2013. 
    37. ^ "Java API Library". Retrieved June 14, 2013. 
    38. ^ "Clusterpoint Free Trial License". Retrieved June 14, 2013. 
    39. ^ "Clusterpoint Customer Support Contacts". Retrieved June 14, 2013. 
    40. ^ "Clusterpark Log Data Server". Clusterpark Ltd. Retrieved June 17, 2013. 
    41. ^ "DigiBrowser". Datorikas Instituts DIVI. Retrieved June 14, 2013. 
    42. ^ "US.LV Network Traffic Surveillance System". US.LV. Retrieved June 14, 2013. 

    External links

    • Official Clusterpoint Website
    • Clusterpoint Developers WiKi Resource
    This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
    Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
    By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.