Big data is composited of a collection of so large and complex data sets that it becomes difficult to process using relational databases or traditional data processing applications, especially for healthcare data. Big data requires massively parallel software running on tens, hundreds, or even thousands of servers with different database management systems and desktop statistics and visualization packages. Facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. The challenges and opportunities comes from three-dimensional, i.e., increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources), Gartner said it required new form of processing to enable enhanced decision making, insight discovery and process optimization. First of all, a new philosophy on data storage and management has been addressed, i.e., NoSQL. Comparing with the relational database, the NoSQL database is more suitable to handle the big data. It owns a lot of properties, such as schema-less, full distribution, automatic scaling, map/reduce, and so forth, which make it more proper to the Big Data and cloud computing applications. The big data will be sharding into small blocks based on shard keys to speed up queries answering in cloud computing environment, that is, parallel or distribution.
|Effective start/end date||8/1/14 → 7/31/15|