Big Data Glossary by Pete Warden

By Pete Warden

To assist you navigate the massive variety of new facts instruments on hand, this advisor describes 60 of the latest options, from NoSQL databases and MapReduce ways to computing device studying and visualization instruments. Descriptions are in accordance with first-hand event with those instruments in a creation environment.

This convenient thesaurus additionally incorporates a bankruptcy of keywords that support outline a lot of those device categories:

  • NoSQL Databases—Document-oriented databases utilizing a key/value interface instead of SQL
  • MapReduce—Tools that aid dispensed computing on huge datasets
  • Storage—Technologies for storing information in a dispensed method
  • Servers—Ways to hire computing energy on distant machines
  • Processing—Tools for extracting beneficial details from huge datasets
  • Natural Language Processing—Methods for extracting info from human-created textual content
  • Machine Learning—Tools that instantly practice info analyses, according to result of a one-off research
  • Visualization—Applications that current significant facts graphically
  • Acquisition—Techniques for cleansing up messy public facts resources
  • Serialization—Methods to transform info constitution or item country right into a storable structure

Show description

Read or Download Big Data Glossary PDF

Similar data modeling & design books

XML for Data Architects: Designing for Reuse and Integration

XML is a big enabler for platform agnostic info and metadata exchanges. despite the fact that, there aren't any transparent methods and strategies particularly enthusiastic about the engineering of XML buildings to help reuse and integration simplicity, that are of specific significance within the age of software integration and internet prone.

Four-Dimensional Model Assimilation of Data: A Strategy for the Earth System Sciences

Panel on Model-Assimilated facts units for Atmospheric and Oceanic learn, surroundings and assets fee on Geosciences, department in the world and existence reviews, nationwide learn Council

This quantity explores and evaluates the improvement, a number of functions, and usability of 4-dimensional (space and time) version assimilations of information within the atmospheric and oceanographic sciences and initiatives their applicability to the earth sciences as a complete. utilizing the predictive energy of geophysical legislation integrated within the normal flow version to provide a history box for comparability with incoming uncooked observations, the version assimilation technique synthesizes different, briefly inconsistent, and spatially incomplete observations from around the world land, sea, and area information acquisition structures right into a coherent illustration of an evolving earth approach. The ebook concludes that this subdiscipline is key to the geophysical sciences and offers a uncomplicated technique to expand the appliance of this subdiscipline to the earth sciences as an entire.

View Updating and Relational Theory: Solving the View Update Problem

Perspectives are digital tables. that implies they need to be updatable, simply as "real" or base tables are. in reality, view updatability is not only fascinating, it is the most important, for functional purposes in addition to theoretical ones. yet view updating has constantly been a arguable subject. Ever because the relational version first seemed, there was common skepticism as to if (in basic) view updating is even attainable.

Python Data Science Handbook: Essential Tools for Working with Data

The Python facts technological know-how guide presents a connection with the breadth of computational and statistical equipment which are imperative to data-intensive technological know-how, study, and discovery. individuals with a programming heritage who are looking to use Python successfully for information technology projects will how one can face a number of difficulties: e.

Extra resources for Big Data Glossary

Example text

Info Processing Initially best known as a graphics programming language that was accessible to designers, Processing has become a popular general-purpose tool for creating interactive web visualizations. It has accumulated a rich ecosystem of libraries, examples, and documentation, so you may well be able to find an existing template for the kind of information display you need for your data. Protovis Protovis is a JavaScript framework packed full of ready-to-use visualization components like bar and line graphs, force-directed layouts of networks, and other common building blocks.

R The R project is both a specialized language and a toolkit of modules aimed at anyone working with statistics. It covers everything from loading your data to running sophisticated analyses on it and then either exporting or visualizing the results. The interactive shell makes it easy to experiment with your data, since you can try out a lot of different approaches very quickly. The biggest downside from a data processing perspective is that it’s designed to work with datasets that fit within a single machine’s memory.

R makes a great prototyping platform for designing solutions that need to run on massive amounts of data, though, or for making sense of the smaller-scale results of your processing. Yahoo! Pipes It’s been several years since Yahoo! released the Pipes environment, but it’s still an unsurpassed tool for building simple data pipelines. It has a graphical interface where you drag and drop components, linking them together into flows of processing operations. ’s interesting APIs are exposed as building blocks, as well as components for importing web pages and RSS feeds and outputting the results as dynamic feeds.

Download PDF sample

Rated 4.36 of 5 – based on 16 votes