Data Engineer
Silverchair
Software Engineering, Data Science
United States
USD 115k-135k / year
About Silverchair
Silverchair is the premier independent platform partner for scholarly and professional publishers, dedicated to expanding the reach of the world’s most valuable knowledge. By connecting creators, publishers, and users, we amplify the impact of scholarship and enhance the accessibility of critical information. Our global teams develop, build, and host websites, online products, and digital libraries for prestigious publishers, including the American Medical Association, MIT Press, and Oxford University Press.
DEI Statement
At Silverchair, we celebrate and embrace diversity in all its forms. We are committed to fostering an inclusive environment from the moment you consider joining our team. We actively encourage candidates from diverse backgrounds to apply, believing that a variety of perspectives and experiences enriches our community, drives innovation, and strengthens our impact.
Equity and inclusion are at the core of our hiring practices, and we strive to build a team that reflects a broad spectrum of cultures, experiences, and viewpoints. We are particularly committed to increasing representation from groups historically underrepresented in technology careers. Your unique experiences and perspectives are not just welcomed but are integral to our collective success. Join us in our mission to create a culture that unites and brings out the best in all of us.
Learn more about our commitment to diversity, equity, and inclusion at Silverchair.
We're looking for a Data Engineer to build and maintain the data pipelines that turn scholarly publishing activity into insights for our clients — some of the most recognized names in academic research. Our platform runs on Azure (Synapse, Data Factory, Confluent), and we're actively evaluating a move to a modern lakehouse architecture. You'll bring solid experience in data engineering fundamentals, contribute to the current platform from day one, and learn the systems deeply.
You'll be joining a small, senior analytics team at Silverchair, a company with long-established presence in scholarly publishing and the agility of a nimble software organization. The team operates with high autonomy, strong support from leadership, and real ownership of the platform you work on. You'll work alongside a Senior Data Engineer, a Senior Quality Engineer, and a Business Analyst.
Our Tech Stack
- Streaming ingestion: Confluent (Kafka)
- Pipelines and orchestration: Azure Data Factory
- Transformation: Spark / PySpark and SQL stored procedures
- Data warehouse: Azure Synapse Analytics (Dedicated SQL Pool)
- Future direction: Actively evaluating modern lakehouse platforms (Databricks, Microsoft Fabric)
Essential Functions
- Data Pipeline Development: Design, build, and maintain data pipelines that ensure reliable data flow from source systems through transformation layers to reporting. Integrate data quality checks and validation into the pipeline workflow. Implement error handling, logging, and retry capabilities to keep pipelines robust and recoverable.
- Data Transformation & Modeling: Develop SQL and Python-based transformations that cleanse, enrich, and structure data for analytical use. Design and implement dimensional models including fact tables and dimension tables.
- Performance & Optimization: Monitor and tune pipeline and query performance. Use execution plans and profiling tools to identify bottlenecks and improve throughput and efficiency.
- Production Support: Troubleshoot and resolve production data issues using logs, monitoring tools, and systematic debugging. Ensure pipelines run reliably and data is delivered on schedule.
- Collaboration & Documentation: Work closely with your scrum team and cross-functional partners across analytics, product, and engineering. Document pipeline designs, data lineage, and business rules. Participate in code reviews and contribute to team knowledge sharing.
Required Skills
- SQL Proficiency: Strong SQL skills including complex joins, CTEs, window functions, aggregations, views, functions, and stored procedures. Awareness of execution plans and indexing strategies for writing performant queries.
- Python Development: Ability to write clean, modular Python using functions and classes. Experience with data engineering libraries such as PySpark for data transformation and processing.
- Data Modeling & Warehousing: Experience designing dimensional models (star schema, fact/dimension tables). Understanding of data warehouse architecture concepts and layered data organization patterns.
- ETL/ELT Pipeline Development: Hands-on experience building data pipelines with orchestration tools. Familiarity with incremental/delta loading patterns, error handling, and idempotent pipeline design.
- Azure Data Platform (3-5 years): Production experience with Azure Data Factory and Azure Synapse Analytics (Dedicated SQL Pool, Serverless, Spark) is required. Exposure to Microsoft Fabric, Databricks, or dbt is a plus as we evaluate platform evolution.
- Distributed Computing Fundamentals: Understanding of data partitioning, shuffling, and distribution strategies. Experience working with distributed compute frameworks (e.g., Spark) and columnar file formats (e.g., Parquet, Delta).
- Version Control & Development Practices: Proficient with Git for branching, merging, and pull request workflows. Comfortable working in an Agile/Scrum environment with CI/CD practices (we use Azure DevOps).
Desired Experience
- Modern Data Platforms: Hands-on experience with modern lakehouse or unified analytics platforms (e.g., Databricks, Microsoft Fabric, Snowflake). Familiarity with how these platforms organize compute, storage, and metadata is valuable as we evolve our own.
- Streaming & Event Processing: Familiarity with Kafka-based event streaming (we use Confluent). Ability to implement simple event consumers and producers.
- Data Processing Patterns: Experience with Change Data Capture (CDC), incremental ingestion strategies, and preservation of historical data.
- BI & Reporting: Familiarity with BI tools such as Power BI, including an understanding of how dimensional models support semantic models and reporting.
- AI-Assisted Development: Comfortable using AI coding tools as part of your workflow (we use Claude Code). Curiosity about where AI tools can accelerate data engineering work and willingness to share what works.
Qualifications
- 3-5 years of professional experience in data engineering or a closely related role.
- Bachelor's degree in Computer Science, Data Science, Information Systems, or a related field, or equivalent practical experience.
- Microsoft DP-700 (Fabric Data Engineer Associate) or Databricks Data Engineer Associate certification is a nice-to-have.
- Ability to work within Eastern Time Zone hours (8a-5p)
Silverchair is an Equal Opportunity Employer. We do not discriminate against any employee or applicant for employment based on race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, veteran status, or any other characteristic protected by applicable laws. We are dedicated to ensuring a fair and inclusive hiring process for all candidates.
We encourage applications from individuals of all backgrounds and experiences and are committed to providing reasonable accommodation for qualified individuals with disabilities in the application and hiring process.
Disclaimer: At this time, we cannot sponsor a new applicant for employment authorization for this position.
No agencies please.
Salary Range: $115,000-135,000 annually