Data Reply prides itself on the sophisticated modeling and analysis performed by its Data Scientists and—equally importantly—our ability to integrate these with our client’s other business systems. This means working alongside our Data Scientists to implement production-grade, hardened applications. It also means understanding and applying best practice when plumbing the various inputs and outputs together at scale, for example:

  • Designing and building ingestion interfaces into tools such as Kafka, Flume, SQOOP, and classic ETL;
  • Implementing queuing/messaging/distributed storage clusters such as Kafka, RabbitMQ, Cassandra or Hadoop in a way that meets completeness and/or uniqueness criteria you have set;
  • Working with multiple applications across cloud and on-premise clusters;
  • Appreciating the challenges common across, and also specific to, batch processing vs streaming - e.g. fault tolerance, exactly-once semantics or flow control;
  • Building outputs that fit the client need or use case, for example exposing an API vs dumping batch files to disk.

And all of the above must be built to perform under load and to tolerate faults along the pipeline.

You will be working with the latest tools and technologies but above all else what matters is how you think. We are looking for candidates with an engineering background. We are particularly interested in speaking with you if you have personally implemented a distributed computing infrastructure and the related software stack.

We have openings at different seniority levels. If you meet most but not all requirements your application could still be of interest.

We are particularly interested in applications from Scala developers with solid data experience.

Want to advertise?

Submit a listing

We'll handle the rest.

Looking for a Scala job?

Join our newsletter

Receive new jobs in your email.