Extracting skills from content to fuel the LinkedIn Skills Graph


Co-authors: Sofus Macskassy, Lu Sun, Di Zhou, Rui Kou and Zhuliu Li

Skills are at the heart of every professional’s qualifications for a role or new opportunity. At LinkedIn, we see a future where the world of work is centered on a skills-first economy. Adopting a skills-first approach will be especially critical as the requirements for roles, businesses, and industries are rapidly changing amid the current generative AI (GAI) boom. 

That’s why at LinkedIn, we want to help our members and customers embrace a skills-first mindset to create a more efficient and equitable world of work. What this looks like practically is an organization putting a candidate’s skills at the center of their hiring decisions and creating learning opportunities for employees to develop skills to advance their careers. This will unlock outsized opportunities for both professional and business growth. While having a skills-first mindset won’t happen overnight, we are focused on building tools to help our customers and members put skills first as they hire, learn, seek and share knowledge on LinkedIn.

At the heart of how we help companies do this is the LinkedIn Skills Graph. It’s our foundational technology that underpins how we help members find new job opportunities, learn new skills, and determine which skills might be helpful when evaluating opportunities, so the skill repository we create must be comprehensive, consistent, and accurate. In previous blog posts, we shared how we built the skills taxonomy behind our Skills Graph from the more than 41,000 skills across our platform. 

In order to make skills-based experiences as robust as possible, it’s important that we map all of the skills on the platform to our Skills Graph. Doing so is straightforward for skills explicitly listed on member profiles or in job descriptions. It can be far more complex for skills embedded in content, such as LinkedIn Learning courses, resumés, and feed posts.

In this blog, we’ll examine how we use AI to extract skills from various content sources across LinkedIn and map these skills to our Skills Graph. This work helps us build a more powerful Skills Graph that provides better matching and relevance across jobs, courses, feed posts, and more for members and customers. It also helps ensure that we’re using the latest skills insights across LinkedIn, which is essential as the world of work and the skills powering it are rapidly changing.

Throughout LinkedIn’s ecosystem, there are places where it may be difficult to extract relevant skills. For example, many LinkedIn members add skills to their profiles, but not always within the dedicated skills section. Sometimes they’ll include skills in their Summary and Experience sections, or in resumes. We don’t want to miss out on these skills, so extracting them from the text is essential to generate dependable skill insights. 

In addition to skills on profiles, job postings on LinkedIn can sometimes lack a comprehensive list of required skills, especially if they are sourced from online job postings outside of LinkedIn. There are also LinkedIn Learning courses where only a subset of skills are tagged directly, with a number of relevant skills being mentioned solely in the description or in the course itself (i.e., in the dialogue that we can get from the transcription).

Extracting these skills from text is a multi-step process. Where and how a skill is mentioned can provide a significant signal on how relevant the skill is and how we should interpret the mention of the skill. Skills can be mentioned explicitly – “expected skills for this job includes programming in Java” – or indirectly – “you are expected to know how to apply different techniques to extract information from data and communicate insights through meaningful visualizations.” 

AI is integral to this process. By utilizing machine learning models, we can extract and map skills from diverse content sources and collect feedback for continuous model improvement and member value. To do this, we first need to segment out large pieces of text (such as job descriptions and resumes) into meaningful parts. We can then remove the mentions of skills from each piece of the text. Once extracted, we normalize them into canonical/single representations (i.e., “data analytics” and “data analysis” are the same type of skill), which we represent in our skills taxonomy. We also need to pay attention to where a skill sits in a piece of content and what type of content it is. Skills are often represented differently in resumes, member profiles, or job descriptions, so we fine-tuned our models to learn the specifics of those types of content.

To address these nuances, we built an architecture and platform that addresses the challenges of extracting skills and mapping them onto the LinkedIn Skills Graph. The following diagram illustrates the AI model workflow that extracts and maps skills from raw text such as job postings.



Source link