Description
About the company:
Location: - Palo Alto, California
Team Strength- 500+ people
About the Company: Turing is the world’s first AI-powered tech services company. It has reimagined tech services from the ground up with AI by offering AI-vetted and matched talent, AI-accelerated development, and access to AI transformation experts who have built many of the most iconic Silicon Valley companies. Founded in 2018, the company has experienced tremendous growth with three million global developers on its Talent Cloud and 900+ clients. Turing has received numerous awards, including Forbes’s 2022 “One of America’s Best Startup Employers,” being ranked #1 in The Information’s 2021 Annual List of Most Promising B2B Companies and Fast Company’s “Annual List of the World’s Most Innovative Companies.” The company’s leadership team comprises AI technologists from leading organizations including Meta, Google, Microsoft, Apple, Amazon, Twitter, Stanford, Caltech, and MIT as well as tech consulting veterans from Accenture, Cognizant, Capgemini, McKinsey, Bain, and more.
About the position:
Designation: Python Data Scientist
Experience required- 3-5 years.
Reporting To: Manager
Vacancy: 2
Interview Processes-2 technical Interview (duration 90 mins- the interview process will include coding assessment ) + 1 HR round
About the role:
What is the role of an LLM specialist?
As an LLM specialist, we are trying to help the foundational LLM companies improve their Large Language Models
One of the ways to help these companies improve their models is to provide them with high quality proprietary data, on which they could (i) fine tune their models and/or (ii) use this as an evaluation set to benchmark the performance of their models/ competitor models.
Our job as an LLM specialist is to provide them with this high quality dataset
Now, we can generate this proprietary data for 2 approaches: Either through SFT (Supervised fine tuning) or through RLHF (Reinforcement learning from human feedback). Which of these we would essentially work on depends entirely on the customer's need and preference. RLHF would typically involve us interacting with the LLM model and giving feedback on or comparing the output of the LLM (and providing rewrites when necessary). In SFT, we would generate numerous golden prompt-response pairs (this is devoid of model interaction)
Summarizing Python Data Scientist role requirement:
We require data analysts and data scientists who have worked on Python for data analysis purposes in real world projects. A developer would need to ask Natural language data analysis (or any other area that would be of interest to the customer) related questions and write the corresponding Python code to solve for the question. You would also be expected to provide feedback to model's output of the natural language questions
What does a typical day look like? This varies from project to project and thus customer to customer.
The day starts with a meeting between the project lead and their team. The goal is to define the goals and objectives for the day for each IC. This could be for example generating 5 colab notebooks for doing SFT or doing 10 interactions on the RLHF tool or review notebooks/ RLHF outputs generated by the team.
ICs work on their tasks throughout the day to achieve the goals defined.
ICs and leads sync up calls on a need basis to unblock ICs and make sure they are on track to achieve the goals and share feedback on the quality of the work
EOD IC/project lead sync to share the progress update.
How is the role different from other roles i.e. what will you not be doing that you typically see in other roles
As mentioned above, the role requires us to provide customers with high quality dataset that can be used by them as either evaluation dataset or for fine-tuning of their models. Our job is to generate and provide this high quality data. We won't be doing any model training or fine-tuning as scope of this workstream. That is something that the customer's internal team handles.
Must Haves for this role:
Bachelor’s/Master’s degree in Engineering, Computer Science (or equivalent experience)
Atleast 2 years of relevant experience in Data Science or Data Analysis
Atleast 2 years of experience working with Python programming
This is a Full time role with Turing.
Notice Period: 0-2 weeks
Monday through Friday for 8 hours each day. 4 hours of each day will be worked from 8:30 pm IST to 12:30 am IST so that there are 4 hours of overlap with PST time zone.
