Job Description
Empowering Precision Medicine via Large-Scale Bulk RNA-Seq Foundation Models
The Future Talent Program features Cooperative (Co-op) education that lasts up to 6 months and will include one or more projects. These opportunities in our Research and Development Division can provide you with great development and a chance to see if we are the right company for your long-term goals.
Companion diagnostic (CDx) assays are essential tools in precision medicine, enabling clinicians to identify patients most likely to benefit from specific therapeutic interventions. Large foundation models trained on collections of disease-relevant datasets offer the potential to stratify patients and predict treatment responses across multiple related conditions, such as various cancer types, by leveraging patterns learned from diverse biological contexts rather than being constrained to single-disease paradigms. The extensive availability of transcriptomics datasets through public repositories and biobanks makes RNA-seq data particularly well-suited for training these foundation models.
Transcriptomics-based foundation models can be trained on either single-cell or bulk RNA-seq datasets, each offering distinct advantages. While single-cell models offer high-resolution insights into tissue heterogeneity and rare cell types, bulk transcriptome models excel in sample-level tasks like patient stratification and biomarker identification due to greater coverage and more stable representation of sample biology1,2. A recent foundation model trained on ~10,000 bulk transcriptomics samples from The Cancer Genome Atlas (TCGA) database showed promising results in cancer subtyping and patient stratification3. Such models could significantly benefit CDx discovery and patient stratification in autoimmune diseases with high heterogeneity in symptoms and prognosis, such as inflammatory bowel disease (IBD) and rheumatoid arthritis (RA).
Unlike Cancer Research With The Well-established TCGA Database, Autoimmune Disease Research Lacks Standardized Databases Containing Sufficient Samples (~10,000) For Large-scale Model Development. To Address This Limitation, Our Group Is Systematically Cataloging And Collecting Disease-relevant Bulk RNA-seq Datasets From Public Repositories (GEO, ArrayExpress) To Create The Database For Robust Autoimmune Disease Models. Our Approach Consists Of Three Key Phases
- Data Collection and Curation: Systematically gather and curate autoimmune disease bulk RNA-seq datasets from public repositories.
- Data Processing and Model Development: Process raw FASTQ files using a standardized pipeline to minimize technical variability, followed by foundation model construction.
- Model Validation: Fine-tune models using bulk RNA-seq data from clinical trials post-treatment and assess predictive accuracy for treatment response.
The recruited Co-op student will engage in all project phases, gaining hands-on experience in large-scale omics analysis, AI/ML model development, companion diagnostic discovery, and precision medicine for autoimmune diseases. This role offers a unique opportunity to work at the intersection of computational biology, machine learning, and translational medicine, advancing precision medicine in autoimmune diseases.
Required Education And Skills
- Candidate must be currently enrolled in a graduate program (MSc or PhD) in Biomedical Engineering, Computer Science, Biological Sciences, or a related field. PhD candidates are especially encouraged to apply.
- Candidate must have availability to work full-time on-site for a 6-month period in 2026.
Preferred Experience And Skills
- Candidate should have proficiency in R or Python programming, with a solid foundation in biostatistics and experience analyzing bulk or single-cell RNA-seq datasets.
- Candidate should have experience or strong familiarity with developing and applying machine learning models to biological data.
- Candidate should have background or keen interest in immunology, with a focus on bioinformatics applications.
- Candidate should have excellent academic record and strong analytical skills.
- Candidate should have outstanding communication and interpersonal abilities, with a proven capacity to thrive in a collaborative team environment.
Please note that this position may be closed before the posted end date or may remain open longer, at the discretion of the company.
Under New York City, Colorado State, Washington State, and California State law, the Company is required to provide a reasonable estimate of the salary range for this job. Final determinations with respect to salary will take into account a number of factors, which may include, but not be limited to the primary work location and the chosen candidate’s relevant skills, experience, and education.
Salary Range
The salary range for this role is $39,600.00-$105,500.00 USD
FTP2026
RL2026
Current Employees apply HERE
Current Contingent Workers apply HERE
US And Puerto Rico Residents Only
Our company is committed to inclusion, ensuring that candidates can engage in a hiring process that exhibits their true capabilities. Please click here if you need an accommodation during the application or hiring process.
Requirements
As an Equal Employment Opportunity Employer, we provide equal opportunities to all employees and applicants for employment and prohibit discrimination on the basis of race, color, age, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or other applicable legally protected characteristics. As a federal contractor, we comply with all affirmative action requirements for protected veterans and individuals with disabilities. For more information about personal rights under the U.S. Equal Opportunity Employment laws, visit:
EEOC Know Your Rights
EEOC GINA Supplement
We are proud to be a company that embraces the value of bringing together, talented, and committed people with diverse experiences, perspectives, skills and backgrounds. The fastest way to breakthrough innovation is when people with diverse ideas, broad experiences, backgrounds, and skills come together in an inclusive environment. We encourage our colleagues to respectfully challenge one another’s thinking and approach problems collectively.
Learn more about your rights, including under California, Colorado and other US State Acts
U.S. Hybrid Work Model
Effective September 5, 2023, employees in office-based positions in the U.S. will be working a Hybrid work consisting of three total days on-site per week, Monday - Thursday, although the specific days may vary by site or organization, with Friday designated as a remote-working day, unless business critical tasks require an on-site presence.This Hybrid work model does not apply to, and daily in-person attendance is required for, field-based positions; facility-based, manufacturing-based, or research-based positions where the work to be performed is located at a Company site; positions covered by a collective-bargaining agreement (unless the agreement provides for hybrid work); or any other position for which the Company has determined the job requirements cannot be reasonably met working remotely. Please note, this Hybrid work model guidance also does not apply to roles that have been designated as “remote”.
San Francisco Residents Only: We will consider qualified applicants with arrest and conviction records for employment in compliance with the San Francisco Fair Chance Ordinance
Los Angeles Residents Only: We will consider for employment all qualified applicants, including those with criminal histories, in a manner consistent with the requirements of applicable state and local laws, including the City of Los Angeles’ Fair Chance Initiative for Hiring Ordinance
Search Firm Representatives Please Read Carefully
Merck & Co., Inc., Rahway, NJ, USA, also known as Merck Sharp & Dohme LLC, Rahway, NJ, USA, does not accept unsolicited assistance from search firms for employment opportunities. All CVs / resumes submitted by search firms to any employee at our company without a valid written search agreement in place for this position will be deemed the sole property of our company. No fee will be paid in the event a candidate is hired by our company as a result of an agency referral where no pre-existing agreement is in place. Where agency agreements are in place, introductions are position specific. Please, no phone calls or emails.
Employee Status
Intern/Co-op (Fixed Term)
Relocation:
No relocation
VISA Sponsorship
No
Travel Requirements
No Travel Required
Flexible Work Arrangements
Not Applicable
Shift
Not Indicated
Valid Driving License
No
Hazardous Material(s)
n/a
Required Skills
Clinical Research, Cloud Data Catalog, Data Analysis, Database Management, Data Science, Data Security, Data Visualization, Data Wrangling, Detail-Oriented, Key Performance Indicators (KPI), Machine Learning, Physics, Project Management, Python (Programming Language), Software Proficiency, Vendor Relationship Management
Preferred Skills
Job Posting End Date:
11/3/2025
- A job posting is effective until 11:59:59PM on the day BEFORE the listed job posting end date. Please ensure you apply to a job posting no later than the day BEFORE the job posting end date.
Requisition ID:R362970