• Scalable, Parallel Natural Language Processing (NLP)
  • Public and Private Cloud Computing
  • Query-driven, In-database Machine Learning (ML)

We provide scalable, in-cloud NLP system. We have applied our technology to applications such as sentiment analysis, news analytics, electronic medical records, enterprise search, eDiscovery.


We have developed a scalable in-database NLP functionality:

  • Named Entity Extraction
  • POS tagging
  • Tokenization
  • Phrase Chunking
  • Dictionary Construction
  • Classification
  • Clustering
  • Inverted index
  • Collocation
  • Frequency and Probability Distributions
  • Stemming

Our NLP system can be deployed on PostgreSQL, Greenplum or HadoopDB:

  • PostgreSQL is an enterprise-level relational databases with one of the most advanced SQL processing engines in the database industry.
  • Greenplum is a popular parallel database based on PostgreSQL
  • HadoopDB, a hybrid parallel database, that combines the parallelism and fault tolerance of Hadoop with efficieny and flexibility of a relational database. HadoopDB can execute both efficient parallel SQL or MapReduce computations across a cluster of relational databases.

Parbash ETL:

  • Scalable transformation and cleaning of text data from anywhere using flexible and familiar Unix text processing tools over Hadoop.