Back to Projects

Team Name:

Dizzie

Team Members:

David Lei

Evidence of Work

#nationaljobs career youtheducation youthemployment

Discoverie

Project Info

Team Name

Dizzie

Team Members

2 members with unpublished profiles.

Project Description

Discover(ie) jobs related to you! We key in on personal interest (media consumption based on YouTube) and personality type to leverage machine learning to suggest jobs you might be interested in!

#nationaljobs career youtheducation youthemployment

Data Story

Problem

Finding a job is hard, finding a job you enjoy is harder. While most traditional approaches match skill sets to potential careers, this overlooks potential enjoyment, satisfaction and interest factors of that individual.

Furthermore, current resources such as the Australian Skills Classification website are large and difficult to navigate, it has a high barrier of entry for the general public (due to the amount of clicks) and young people (from fairly complex language).

With the abundance of data out there, we aimed to help focus those choices down.

Our solution

We have created a platform Discoverie which enables this by keying into two additional factors:
1. Your personal interests
2. Your personality type

We do this in a novel way which
1. Queries your YouTube subscriptions using channels you regularly watch as a metric for interests.
2. Scrape Myers-Briggs for potential careers from 4 data sources (Indeed, Glassdoor, Workopolis, NovoResume)
3. Parse the Australian Skills Classification (xls) data https://www.nationalskillscommission.gov.au/our-work/australian-skills-classification#resources

Note: Interests are represented as a Wikipedia entity e.g. a YouTube channel that discusses physical fitness has a topic https://en.wikipedia.org/wiki/Physical_fitness (I believe this is powered by knowledge graph) which is made available via the YouTube API.

We use these 3 data sources to build a corpus of text or documents which are then used to train a gensim Doc2Vec (machine learning) model.

The Doc2Vec model captures vector representations of text (can be a sustenance / paragraph) and builds its own understanding of it (uses neural networks & distributed bags of words approach). This allows us to query for similarity between job descriptions and other data points (e.g. your interests, your personality type).

To demonstrate this we created a flask backend to host our model and a react frontend where users can
* Enter their personality type
* Select their YouTube interests once logged in
In which our frontend will ask the backend for similar jobs based on the Doc2Vec model similarity which we render narrowing down the occupation choices which the user can then discover more about on the Australian Skills Classification Website.

We also use the ABS API to show job vacancies for each occupation.

This is a low barrier to entry novel approach using Machine Learning approach to capture jobs that you might be interested in based on personality & media consumption! This helps narrow down the entry point to the Australian Skills Classification website and utilizing the abundance of valuable YouTube data (especially in Millennials and Gen Z) to characterize the user and makes it more appealing to the general public.

This seeks to simplify the powerful, but potentially overwhelming https://www.nationalskillscommission.gov.au/our-work/australian-skills-classification website by tailoring results to you, the individual.

Technology used

Backend: python, flask, gensim (doc2vec model + corpus labelling), Youtube API
Frontend: react js, material UI

Team DataSets

australian-skills-classification resources XLS

Description of Use Parsed out: Occupation Descriptions-Table 1.csv, Specialist tasks-Table 1.csv, Core_competencies-Table 1.csv to create a training corpus for our Doc2Vec model.

Data Set

ABS Job vacancies API

Description of Use Used to represent job vancanies for jobs

Data Set

Novoresume myers briggs career guide

Description of Use Scraped & parsed the website to to create a training corpus for our Doc2Vec model.

Data Set

Indeed myers briggs career guide

Description of Use Scraped & parsed the website to to create a training corpus for our Doc2Vec model.

Data Set

Workopolis myers brigs career guide

Description of Use Scraped & parsed the website to to create a training corpus for our Doc2Vec model.

Data Set

Glassdoor myers briggs career guide

Description of Use Scraped & parsed the website to to create a training corpus for our Doc2Vec model.

Data Set

YoutTube Data API

Description of Use Note: also used https://pypi.org/project/python-youtube/ but https://developers.google.com/youtube/v3/docs is the underling resource. Used to query and categorize a user's interest based on their youtube subscriptions and to create a training corpus for our Doc2Vec model. These interests are represented as a wikipedia entity e.g. a YouTube channel that discusses physical fitness has a topic https://en.wikipedia.org/wiki/Physical_fitness (I believe this is powered by knowledge graph)

Data Set