PySpark / Java Developer (Data Engineer)- 100% Remote- Only W2
- Key Responsibilities
- Design, develop, and maintain scalable ETL pipelines and data processing applications
- Build and optimize data workflows using PySpark, Java, and Hadoop ecosystem tools
- Analyze business and technical requirements to produce detailed implementation designs
- Perform unit testing, integration testing, and debugging of applications
- Troubleshoot and resolve performance issues related to high-volume data processing
- Develop and maintain SQL queries, stored procedures, and database objects
- Work with structured and unstructured datasets for healthcare analytics
- Generate statistical reports and support data validation processes
- Collaborate with cross-functional teams to ensure end-to-end data pipeline efficiency
- Follow software engineering best practices and maintain code quality standards Required Skills & Experience
- Strong experience in ETL development, data processing, and database technologies
- 5+ years of experience with Microsoft SQL Server and relational databases
- Expertise in SQL performance tuning, indexing strategies, and query optimization
- 2+ years of experience with Hadoop ecosystem tools (HDFS, Hive, Impala, Spark, Kafka, Oozie, Yarn, Sqoop, Hue)
- Hands-on experience with PySpark, Python, and/or Java
- Experience working with large-scale data processing frameworks
- Strong understanding of data transformation and data movement technologies
- Ability to handle high-volume structured and unstructured datasets
- Good understanding of end-to-end application/data pipeline lifecycle
Apply tot his job
Apply To this Job