«  Back to Careers

Data Architect

Posted: November 08, 2013

Apply Now

About the New York Genome Center

The New York Genome Center (NYGC) is an independent, non-profit organization that leverages the collaborative resources of leading academic medical centers, research universities, and commercial organizations. Our vision is to transform medical research and clinical care in New York and beyond through the creation of one of the largest genomics facilities in North America, integrating sequencing, bioinformatics, and data management, as well as performing cutting-edge genomics research.

Position Description

Put your design and engineering skills in high-performance compute, databases, object management, and data mobility to use in designing and building a multi-petabyte-scale repository for genomic and clinical data. The system will support large-scale high-performance analysis by hundreds of researchers, and enable sophisticated data-sharing and domain-specific search capabilities.

This is an opportunity to learn bioinformatics and participate in cutting-edge research to apply genomic discoveries to the clinical care of patients with diseases such as cancer, ALS, Alzheimer’s, and inflammatory diseases. The ideal candidate will be a pragmatic team player with experience in high-performance computing and management of large-scale databases and data stores.

Multiple positions are available including Data Architect, SW Engineer Data Repository, Data Security Engineer and DBA.

Job duties will include:

  • Design database infrastructure to manage petabyte-scale, high-velocity, high-variability genomic data;
  • Collaborate with bioinformatics scientists, users, and stakeholders to define requirements and specifications for accessing and managing scientific data;
  • Prototype, develop, deploy, document, and support production-quality, fault-tolerant software;
  • Evaluate options for data management infrastructure and tools, design comparison tests, prototype as needed, advise on build/buy decisions, and make recommendations for appropriate tools and databases;
  • Design and build data catalog and distributed search tools for complex federated scientific data stores;
  • Work with Research Computing and IT groups to manage storage costs, and propose solutions to storage management needs;
  • Mentor and lead more junior engineers on the data management team.

Candidate Profile

  • BS in Computer Science, computer engineering, or related field plus 8 years related experience; master’s degree plus 5 years related experience, or equivalent combination of education and experience;
  • Demonstrated ability to design and build large-scale software systems, and to produce readable, documented code;
  • Ability to work independently to deliver quality code under tight deadlines;
  • Proficiency in Java, Scala, or similar languages;
  • Experience with large databases: relational (e.g., mySQL, PostgreSQL), column-oriented databases (eg Vertica), array databases (eg SciDB), or NoSQL databases (Accumulo, Couchbase, etc). Proven knowledge of multiple kinds of databases required;
  • Experience comparing and contrasting different databases for different contexts and applications highly preferred;
  • Ability to design high-performance data architectures that integrate directly with massively parallel workflow engines is essential;
  • Experience with ETL tools preferred;
  • Experience with version control and source code management systems (e.g., GIT);
  • Experience with agile programming techniques;
  • Experience with scientific applications, and genomics in particular, is not required, however, interest in learning about genetics and biomedicine is essential;
  • Excellent communication skills and proven ability to work directly with customers to understand their needs and translate those needs into actionable software requirements;
  • Ability to make decisions with incomplete information and produce results in a timely fashion;
  • Must be able to work in a fast-paced, start-up like environment.


«  Back to Careers