DirectlyApply launches near instant XML job feed indexing

DirectlyApply launches near instant XML job feed indexing

By Dylan Buckley

At DirectlyApply we built our feed ingester on day one back in 2018, its headline responsibility to take the jobs from employers and put them on our platform for job seekers. Sounds simple right?

Since the first version our feed ingester has grown into a complex machine that not only handles the reading and importing of large XML feeds from hundreds of sources, containing millions of jobs, but is also responsible for running the most comprehensive set of feature extractions, anomaly detection and correction, predictive analysis and indexing. In total every single job we import goes through nearly 100 processes, before being put live on our platform for job seekers to apply to. This rigorous process is a major contributor to DirectlyApply performing up to 10x higher than other job sites when comparing completed application rates. We believe that job seekers should have all the data they need at their fingertips and can easily make an informed decision about whether to apply or not.

Most job sites pull feeds a couple times a day, and simply copy and paste the content into their databases. This leads to a lot of frustration for employers who see a delay in getting applicants to their roles and from job seekers who click on jobs only to find out they have expired. Not to mention simply copying and pasting the content without enhancing it for the job seekers leaves a lot to be desired in terms of job seeker experience and engagement.

DirectlyApply has run hourly imports of all feeds for several years, however the 100+ processes required on every job meant that sometimes this process could take up to an hour, meaning there was a possible delay of almost 2 hours from when an employer updated their job in the feed and this was reflected on our platform for the job seeker. This delay is one of the shortest in the industry - but we wanted to be better.

DirectlyApply launches instant XML indexing & improved job processing pipeline

We have rebuilt how we import jobs, making two big core updates, the result is near instant import and processing from when changes are made in job feeds

  1. No longer schedule feeds to be imported on a timer - instead we ping feeds with a low bandwidth HEAD request to get the headers of the file which help us determine if it's been updated by looking at its modified date and also the size of the file. We do this every 60 seconds.

  2. Run job processes in memory, rather than on disk - with intelligent new processes to determine what parts of the job data has changed (ensure efficient processing) and what jobs are in demand from job seekers (to determine ingestion priority).

The result is a maximum delay of 60 seconds from when a job is inserted, updated or deleted from a feed to when our feed ingester starts processing it - and our new in memory processor can import 100,000 jobs in as little as 4 minutes. 

This 30x speed increase means we are providing the most up to date inventory to our job seekers and employers start getting applicants quicker.

Further reading:

We run several processes relating to geography, check out our post on solving complex geography problems in job search here: https://directlyapply.com/blog/solving-complex-geography-problems-in-job-search

Our Summer 2023 release is powered by the rich data extracted and enhanced by the feed ingester. Find out about this release and whats new here: https://directlyapply.com/releases/summer-2023

If you want more information about DirectlyApply or our features feel free to reach out to Dylan or Will on Linkedin or email: [email protected]