The Big Data Engineer at Draup is responsible for building scalable techniques and processes for data storage, transformations, and analysis. The role includes designing, and implementation of the optimal, generic, and reusable
data-platforms and pipelines. You will work with a very proficient, inquisitive, enthusiastic, and experienced team of developers, engineers, and product team. You will have ability to shape up the data products.
What You Will Do
- Build scalable architectures for data storage, transformations,
- Work on data engineering projects to ensure pipelines are
reliable, efficient, testable, & maintainable.
- Design and develop solutions which are scalable, generic, and
- Build and execute data warehousing, data mining and data
modelling activities using agile development techniques.
- Leading and taking ownership of big data projects successfully
from scratch to production.
- Bring the problem-solving attitude with focus on understanding the
basics and implementing the use cases of organizational impact.
- Collaborate with various teams including data science, backend, data
harvesting and product teams.
What You’ll Need
- Proficient understanding of big data and distributed systems principles.
- Must have good programming experience in Python.
- Proficiency in Apache Spark (PySpark) is a must.
- Good work experience in developing ETL and ELT solutions in a scalable way from multiple data sources.
- Understanding in technologies like Relational and NoSQL datastores.
- Working and conceptual Knowledge of MapReduce, HDFS, Amazon S3.
- Ability to code and think in functional programming paradigm.
- Enthusiastic about optimizing the code performance and resources of the system.
- Ability to communicate complex technical concepts to both technical and non-technical audiences.
- Takes ownership of all technical aspects of software development for assigned projects.
What Will Give You an Advantage
- Expertise in big data infrastructure, distributed systems, data modelling, query processing and relational.
- Involved in the design of big data solutions with Spark/HDFS/MapReduce/Flink.
- Worked with different types of file-storage formats like Parquet, ORC, Avro, Sequence files, etc.
- Experience and understanding of cluster managers like YARN, Spark Standalone, Mesos or Kubernetes, etc.
- Strong knowledge of data structures and algorithms.
- Understands how to apply technologies to solve big data problems and to develop innovative big data solutions.
- Someone with problem solving mind-set with good design and architectural patterns will be preferred.
- Experience in working with AWS tools like EMR, Lambda, Glue or equivalent tools on other cloud systems.
- Knowledge of workflow orchestration tools like Airflow, Jenkins.
Who You Are
- E / B.Tech / M.E / M.Tech / M.S in Computer Science or software engineering.
- Experience of 2-6 Years working with Big Data technologies.
- Open to embrace the challenge of dealing with terabytes and petabytes of data on daily basis. If you can think out of the box have good code discipline, then you fit right in.