PySpark / Java Developer (Data Engineer)- 100% Remote- Only W2

Remote, USA • Full-time • Posted 2026-05-31

Design, develop, and maintain scalable ETL pipelines and data processing applications
Build and optimize data workflows using PySpark, Java, and Hadoop ecosystem tools
Analyze business and technical requirements to produce detailed implementation designs
Perform unit testing, integration testing, and debugging of applications
Troubleshoot and resolve performance issues related to high-volume data processing
Develop and maintain SQL queries, stored procedures, and database objects
Work with structured and unstructured datasets for healthcare analytics
Generate statistical reports and support data validation processes
Collaborate with cross-functional teams to ensure end-to-end data pipeline efficiency
Follow software engineering best practices and maintain code quality standards
Strong experience in ETL development, data processing, and database technologies
5+ years of experience with Microsoft SQL Server and relational databases
Expertise in SQL performance tuning, indexing strategies, and query optimization
2+ years of experience with Hadoop ecosystem tools (HDFS, Hive, Impala, Spark, Kafka, Oozie, Yarn, Sqoop, Hue)
Hands-on experience with PySpark, Python, and/or Java
Experience working with large-scale data processing frameworks
Strong understanding of data transformation and data movement technologies
Ability to handle high-volume structured and unstructured datasets
Good understanding of end-to-end application/data pipeline lifecycle