As engineers, when we think about how to solve big data problems, evaluating technologies becomes a choice between scalable and not scalable. Ideally we choose the technologies that can scale to a variety of business problems without hitting a ceiling down the road.
Database technologies have evolved to be able to store big data, but are largely inflexible. The data models require tedious transformations and shuffling around of data. This is a complex process that is compounded in its complexity by combining a variety of inflexible solutions and platforms.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Still, where does all that data come from? Where does it go when the analysis is done?