At the heart of Apache Spark is the concept of the Resilient Distributed Dataset (RDD), a programming abstraction that represents an immutable collection of objects that can be split across a ...
If you haven’t heard of Flink until now, get ready for the deluge. As one of a stream of Apache incubator-to-top-level projects turned commercial effort, the data processing engine’s promise is to ...
Data contracts are foundational to properly designed and well behaved data pipelines. Kafka and Flink provide the key ...
Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Jinsong Yu shares deep architectural insights ...
A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...
The data engineering trends clearly show a move toward maturity. The emphasis is on building reliable, repeatable, and ...
Hortonworks is publishing a series of blog posts on its website that explain the basics and finer details of Apache Hadoop YARN. Those who are curious about YARN or want to understand its significance ...
Most enterprise IT operations rely heavily on batch processing operations. The reliance doesn't go away when you move to a service-oriented architecture (SOA), yet SOA just means online transaction ...
During its 18 months of operation in the early 1860s, the Pony Express was a shining example of the inextricable link between data delivery and data processing. While an innovation for its time, the ...
In batch processing, if costs are not isolated, high-volume customers and products tend to subsidize lower-volume ones. This article reviews different types of batch activities and how they would be ...