Who who? Apache Cassandra, who?

Hmm… yet another distributed database …. will it ever end? Probably not. It’s hard to keep up with them all, even the old ones. That brings me to Apache Cassandra. Of all the popular big data distributed databases Cassandra seems to be kind of that student who always sits in the back row and never says anything… you forget they are there…. until someone says their name….. Apache Cassandra. I honestly didn’t even know what space Cassandra fit in before trying to install and use it… so this should fun. What Is Cassandra? Distributed NoSQL.

Read more
Apache Beam for Data Engineers.

What is this thing? What’s it good for? Who’s using it and why? That’s pretty much what I ask myself once a month when I actually see the name Apache Beam pop up in some feed I’m scrolling through. I figured it has to be legit to be Apache incubated, but I’ve never run across anyone in the wild using it yet. On the surface it appears to be semi-pointless since it runs on-top of other distributed systems like Spark, but I’m sure there is more to it. Today, I’m going to run through an overview of Apache Beam and then try installing and running some data through it, kick the tires as it were. And see if my mind changes about the pointless bit.

Read more
Streams…. the Apache Kafka one….

Streams, streams, streams…. when will it ever end? It’s hard to keep up with all the messaging systems these days. GCP PubSub, AWS SQS, RabbitMQ, blah blah. Of course there is Kafka, hard to miss that name floating around in the interwebs. Since pretty much every system designed these days is a conglomerate of services… it’s probably a good idea to poke at things under the cover. Of course Apache Kafka is probably at the top of list of those open source streaming services. Today I’m going to attempt to install a Kafka cluster and push some messages around.

Read more