Browse 85 exciting jobs hiring in Evaluation now. Check out companies hiring such as Jobgether, Acelero, Inc., Parallel in Tampa, Madison, Little Rock.
Senior Machine Learning Engineer needed to build and deploy scalable, production ML systems that improve healthcare outcomes and operational efficiency.
Lead and shape the Acelero Charitable Foundation as its Founding Executive Director, driving strategy, fundraising, grantmaking, and partnerships to expand high-quality early childhood opportunities across the U.S.
Parallel is hiring a remote School Psychologist to perform psycho-educational evaluations and deliver therapeutic and consultative services to students nationwide while supporting IEP development and multidisciplinary care.
Parallel is hiring remote, licensed School Psychologists in Indiana to deliver psycho-educational evaluations, IEP development, and MTSS-aligned psychological services to support student success.
Provide remote psycho-educational evaluations and school psychology services for students with IEPs as a licensed school psychologist in Ohio with Parallel's Provider Network.
Anduril's Thunderforge team is hiring a Prompt Engineering Intern to develop prompts, agent graph architectures, and test/evaluation tooling for AI-enabled wargaming.
Lead experimentation, trace analysis, and metric design to measure and improve Replit's AI agent, converting agent traces into product-changing insights for engineering and leadership.
YouGov seeks a hands-on Data Scientist/AI Engineer to build and deploy LLM-based applications and advanced analytics for market research using survey, census, and behavioral datasets.
Lead the design and delivery of Zapier’s unified AI platform as a Staff Applied AI Engineer, shaping runtime, orchestration, and evaluation systems that power the company’s AI products.
Jump seeks a US-based QA Engineer to own AI evaluation, labeling campaigns, and QA processes that improve generative AI outputs for our meeting assistant product.
Passion for Life, a nonprofit helping under-resourced youth build career pathways, seeks a part-time Research & Evaluation Intern to support program measurement, data collection, and impact reporting.
Lead Gartner’s CLM research and advisory efforts by producing market-leading insights, advising General Counsel and legal operations leaders, and evaluating CLM technology and vendor strategies.
AECOM is hiring a Business Analyst in Sacramento to define business and technical requirements, design workflows, and lead validation for an enterprise document control system.
Northwestern Medicine is hiring a Support Specialist I to coordinate youth programming, community outreach, and partnership activities across its service area.
Lead product strategy and execution for context, memory, and retrieval systems that power MagicSchool’s AI agents to deliver reliable, educator-focused assistance at scale.
Mursion is hiring a Prompt Engineer to craft production-grade LLM prompts, manage RAG/JSON workflows, and translate learning objectives into reliable AI-driven simulation behavior.
HealthCorps seeks a motivated Regional Program Manager in San Diego to lead school-based wellness initiatives, supervise near-peer mentors, and grow community partnerships to improve teen health outcomes.
Sony AI's Research Ethics team is hiring a remote Engineering Intern (AI Ethics) to help build agentic AI infrastructure, run LLM evaluations, and develop tools for responsible AI in a research-driven environment.
Sankofa Montessori seeks a Georgia-certified School Psychologist for an evaluation-only contract role conducting psychoeducational assessments, producing legally compliant reports, and advising teams on special education eligibility.
Build and productionize multi-step AI agents and the backend infrastructure that powers PermitFlow’s pre-construction platform in a fast-moving, hybrid NYC startup.
Lead development of Arcade’s conversational AI product creation agent as the company’s first dedicated Product Manager for AI, reporting directly to the CEO.
Atrix is seeking a New York–based Forward Deployed Engineer to embed with enterprise pharma customers and deliver accurate, trusted AI workflows that drive medical and commercial decision-making.
America's Promise Alliance seeks a seasoned nonprofit leader to direct collective action, member engagement, program design, and fundraising for its Aligning K12 Education and Youth Development issue area.
The Department of Social Services is hiring a City Research Scientist I (VPS Data Analyst) to manage VPS program data, produce analytic reports, and support evaluation and linkage-to-care efforts for people experiencing homelessness.
Lead the architecture and long-term evolution of Decagon’s agent orchestration engine to enable reliable, high-performance AI agent behavior at scale.
Experienced nonprofit leader needed to oversee Arkansas operations, lead state policy and advocacy, and build cross-sector partnerships to advance youth health equity.
Elsevier is hiring a Senior Data Analyst to lead analytics and evaluation frameworks for generative AI models used in healthcare, ensuring accuracy, safety, and clinical relevance.
GLIDE seeks a Senior Project Manager to lead pilot programs and cross-functional projects that advance its mission to alleviate suffering and break cycles of poverty and marginalization.
Vetcove seeks an AI-focused BAML Engineer to design, implement, and maintain BAML-driven LLM workflows and evaluation tooling for its veterinary software platform.
The City and County of San Francisco seeks a Senior Community Development Specialist I to manage funding, monitor compliance, and evaluate community development projects across city departments.
Help developers adopt Judgment Labs' SDK and evaluation tools by building docs, demos, and sample agent setups as a Developer Relations Engineer in San Francisco.
Be part of a San Francisco-based venture-backed team as a Technical Writer crafting deep technical content on agent evaluation, monitoring, and reward modeling for a technical audience.
Elsevier seeks a Clinical AI Evaluation Specialist (RN, MSN) to lead evaluation cycles for generative AI in nursing education, ensuring data integrity and educational alignment to improve clinical outcomes.
Lead AOEU's new Center for the Advancement of Art Education to drive research, partnerships, and practice that elevate arts education at a national scale.
Freelance evaluators assess luxury retail and online experiences for top brands, completing short missions and submitting feedback via CXG's mobile platform.
Oumi seeks a Research Scientist to advance open-source LLM and VLM research by developing models, datasets, benchmarks, and publishing results with the community.
Moog SDG seeks a Senior Project Engineer in Buffalo, NY to lead technical execution, cross-functional teams, and customer-facing aspects of development programs within the Mission Enabling Services Group.
Lead the strategy and delivery of distributed inference, LLM integrations, and on-device ML features at webAI to enable privacy-first, enterprise-grade AI on the edge.
Experienced ML/AI engineer needed to lead development and productionization of LLM- and embedding-based features for Watershed's enterprise sustainability platform.
WGU seeks a meticulous Transfer Evaluation Assistant to evaluate transcripts, maintain student documentation, and ensure policy compliance in a remote role supporting prospective students.
Achieving the Dream seeks a seasoned research leader to direct applied research, evaluation, and analytics initiatives that drive institutional change and improve student outcomes across its network.
Lead Kiddom’s strategic alliance efforts to identify, evaluate, and operationalize high-impact partnerships that accelerate the company’s growth in K–12 education.
Lead design and implementation of scalable AI infrastructure and developer tooling to accelerate Vanta’s AI-powered product initiatives.
Lead applied AI product work at Vanta by designing, shipping, and scaling LLM-powered features that accelerate customer compliance and trust.
Handshake AI is hiring a contract Red Teaming Domain Expert to craft adversarial prompts and stress-test LLMs for safety and robustness across real-world edge cases.
Figma is hiring a seasoned Technical Program Manager to drive AI platform programs that scale annotation, evaluation, and model delivery across engineering, research, and product teams.
An AI engineering role focused on building and improving voice-first and omnichannel credit-servicing agents using Python and integrated language models at an early-stage fintech startup.
Oxford College at Emory University is hiring a Program Coordinator to plan and manage student engagement and leadership programs, including events, budgets, and cross-departmental collaboration.
WGU is hiring a Senior Compensation Analyst to design and manage global compensation programs and deliver strategic analysis that supports competitive pay and organizational goals.
Join Commure's Ambient Scribe team as a Senior Backend Engineer to build and scale eval and AI infrastructure that powers next-generation clinical AI products.
Below 50k*
0
|
50k-100k*
5
|
Over 100k*
16
|