,

Learning to do HTTP in Scala. – Part Trois.

Man. Every time I open IntelliJ to write/learn some more Scala I have to take a deep breath. Yes, it’s been fun and good for my to my brain feel like in a Doctor Strange movie, but it’s also challenging and frustrating at times. One of the things I find myself doing a lot as a Data Engineering is HTTP stuff, mostly pulling files or data from APIs. Doing this work in Python is most enjoyable and easy, I’ve been curious to see how Scala handles HTTP stuffy stuff.

Turns out not too much different. When I first starting Googling for HTTP in Scala it became clear there are few obvious easy options that most people must use. I also found a few libraries that seem no so easy. Those were the async versions… similar in complexity to aiohttp in Python and working with the concept of futures.

My main concern here was to just to learn basic HTTP get requests, that’s what we all use 90% of the time. Also, I’m very curious how synchronous an asynchronous calls and libraries compare to Python for performance when pull data over the wire.

scalaj

import scalaj.http._


object httpScala {
  def main(args: Array[String]): Unit = {
    val request: HttpRequest = Http("http://www.confessionsofadataguy.com/")
    1 to 50 foreach { _ => make_call(request) }
  }

  def make_call(request: HttpRequest) {
    val response = request.asString
    println("finished call.")
  }
}

Well that was easy. I think I’m going to like scalaj the best. It provides an immutable HttpRequest that can be reused for different calls using the line val response = HttpRequest.asString, this is super convenient. I ran this code 3 times, to compare how it fairs against Python. I think this Scala is similar to the below Python.

import requests
from timeit import default_timer as timer


def main():
    t1 = timer()
    url_list = ['https://www.confessionsofadataguy.com'] * 50
    s = requests.Session()
    for url in url_list:
        r = s.get(url)
        print("finished call")
    t2 = timer()
    secs = t2 - t1
    print(secs)


if __name__ == '__main__':
    main()

I’m used to Scala always been magnitudes faster then Python, sure, in this case in both Python and Scala we are pretty much dependent on the server response time. But I was really trying to understand if Scala does a single HTTP request significantly faster than Python. Both Scala and Python can be made concurrent, so it’s really a question of if there is any real advantage of switching to Scala for HTTP gets.

requests-scala

This should be interesting. I find find it highly entertaining there is a Scala package made to emulate the popular Python package requests. Who knows, maybe half the people that use it are the same one’s that say Python sucks because it’s an interpreted language. The fact someone cloned the Python requests package for Scala should tell you something.

object httpScala {
  def main(args: Array[String]): Unit = {
    val t0 = System.nanoTime()
    val s = requests.Session()
    1 to 50 foreach { _ => make_call(s) }
    val t1 = System.nanoTime()
    println("Elapsed time: " + (t1 - t0))
  }

  def make_call(s: requests.Session) {
    val r = s.get("http://www.confessionsofadataguy.com")
    println("Finished call.")
  }
}

Although holy crap this thing is slow. Not sure if I wrote it wrong or if it really is a hunk-a-junka.

akka-http

Not going to lie, little worried about writing this one. The suite of akka libraries in Scala seem awesome and powerful, but the writing and understanding of that code usually leaves me frustrated and feeling like an idiot. Although akka is the most interesting as it claims to be designed for concurrent and distributed systems. And all my smart Scala friends say its “the thing.”

yeah.... maybe next time.

My one complaint about akka for a newbie…. yikes… Even “simple” akka http stuffy stuff is a little ephemeral for me at this point. I think I will have to play with Scala a little more before I dig into akka.

Overall my experience with Scala http wasn’t too bad. Although I was little disappointed that the akka approach of making async calls was a little over my head. But I guess that just gives me fodder for another blog post.