Six Sigma – A Short Intro
Data science has touched every aspect of business function today, enabling greater speed, accuracy and improved quality of business decision making and as an inherent part of core business operations, “people function” hasn’t been immune to this development.
The best data science practices are known to combine management tools with statistical and machine learning insights. Such combination has immensely helped in strategic decision making, directly improving productivity and profitability of a large number of businesses worldwide.
Six Sigma is perhaps the most established and well-documented approach on these very lines. Fundamentally six sigma is a data-driven methodology for improving the quality of a process ( i.e any repetitive business function ) via reducing the variation around the mean of the process. In order words, ensuring that the process falls the acceptable tolerance range (as far as possible). This is referred to as process entitlement level in six
In theory, a perfect score is six sigma is 6 ( i.e 99.7 % of all data points fall within the tolerance range), however, in practice, a good sigma score depends upon the dynamics of the particular process in question.
Six Sigma projects typically involve a well documented DMAIC framework ( Define, Measure, Analysis, Improve & Control )
A. Identifying the business process to be improved and its dependent sub-processes
B. Gathering data relevant to the process/ sub-processes,
C. Forming hypothesis through a host of brainstorming tools,
D. Testing the sigma level the validity formed hypothesis through a host of statistical calculations on accumulated data
E. Finally improving the sigma level via implementing the suggestions and reports via insights gained and hypothesis established through use of statistical tools
F. Long-term process monitoring in order to ensure continuity of improved process
Part A ( Define )
Six Sigma Applied to People Function – The Project
In this particular case study, we applied six sigma to solve an important HR business problem i.e “improving efficiency of the HR recruitment function”
Our client for this project is a prominent international tech headhunting firm. Our client estimates that “Their ROI ( return on investment ) in context to investment in Job portals and social networking platforms isn’t up to the best industry benchmarks and can be improved”.
This is negatively impacting our client via
- Overall service standards vis a vis its competitors
- Inefficient use of scarce funds
- Poor quality, untimely and insufficient number of candidate profiles
- Branding and market positioning
Defining the CTQ ( Critical To Quality ) via QFD (Quality Function Deployment)
To get the project started we needed to define the CTQ ( critical to quality ) aspect for the client business problem I.e “improving efficiency of the HR recruitment function”
The tool of preference for high-level pre-analysis in six sigma projects is the QFD (quality function deployment) and is usually employed as the first component of the measured phase. This tool pits off sub-process of the CTQ by correlating them to their functional components ( i.e engineering parameters )
QFD computes an “explicit quantitative and correlative method” on the functional components required for the sub-processes ( of the core process )and then deploy weighting functions to prioritize parameters of the functional components.
This aids in the selection and customization of functional components in order to improve the quality of the process. Important functional components can be handled as individual six sigma projects.
In the context of our project the sub-processes discovered were: quality of profiles, turnaround time, overall processing time, effective authentication of candidate profiles and ability to maintain discreetness about hires and vacancies from competitors and market. These were then plotted along with quality characteristics like methodology deployed for hiring, effective management, and indexing of resume database, professional engagement methodologies with major job boards
On the basis of the QFD we identified our CTQ for this six sigma project as the “Optimum valuation of professional engagement methodologies with major job portals and online networking platforms for maximization of ROI”
No reliable data recording system existed as such, pertaining to the “Optimum Valuation of professional engagement methodologies with major job portals and online networking platforms for maximization of ROI“.
However, by detailed and innovative analysis, including extracting of historical email records exchanged over the last 3 years plus, it was analyzed that our clients ROI for “the professional services of job portals and online networking platforms” was around 230% ( two hundred and thirty percent ).
As a point of interest and based on business intelligence estimates, the market leaders ( Korn ferry, manpower etc), enjoy an explicit ROI of 400% or above for this particular business process.
The Goal of this Project
Based on the business problem, QFD analysis, and CTQ identification the goal for this project was demarked as “ROI to be increased from 230% to 300% or above ( compounded monthly ).”
Three principal components ( Factors ) have been identified ( based on business process expertise ) which directly impact “efficiency of the recruitment function vs a vs engagement with job portals”. Hypothesis will be tested for
- The optimal monetary investment in professional association with Job portals & Professional networking portals
- How should be resume collection time of the recruiters be most efficiently and effectively distributed among the 4 job portals/ databases
- Relative strong/ weak areas among the Job Portals, Professional Networking Sites
- Should there be a distinct methodology and approach in dealing/ negotiating with the different Job Portals/ Professional Networking Sites
Summing up this section: In this section, we understood the core business problem of our customer, dwelved deeper in the business problem via the application of QFD. This has helped us identify key sub-processes involved, the CTQ of the project and eventually the project goal.
Our project goal is “ROI to be increased from 230% to 300% or above ( compounded monthly ).
Part B (Measure )
In tune with the project goal, the ROI was defined as “ percentage of revenue per week over the expenditure/investment in job portals and social networking sites “. The revenue per week wasn’t always correlated with the expenditure/ investment in job portals and social networking sites for that particular week as benefits were often realized much later, however in order to maintain computation uniformity and practicality it was assumed to be so.
Data sampling For Process capability and other statistical analysis.
There were no formal reliable records and data required for our project, hence data was meticulously extracted via informal records and from email records/personnel of four recruiters through via selective IMAP protocol.
After a preliminary preprocessing and evaluation of the extracted data, it was decided to
1. Use a combination of stratified and random sampling. Data was stratified on a “3-month cycle strata, 10 strata were created each strata containing 3 months of data, divided into units of one week ”. [ 12 weeks ( units ) or 3 months in one STRATA ]. Based on a power curve for one sample t-test (graph below ). A random sample was then chosen with equal proportion from the strata s ( 6 units from each STRATA, each strata
comprising of 12 units ) for the optimum sample size of 60 units.
2. It was decided to use the “power curve for one sample t-test” in order to calculate the “optimum sample size under the given circumstances” (power curve for one jpg)
Summing up this section In this section, we looked at the process map and collected and preprocessed data for the analysis and also determined the optimal sample size. Calculating an optimal sample size is important for any statistical analysis to be reliable.
Based on the sample data collected via the above step Process Capability Analysis for continuous data ( sample ) was drawn.
Process Capability Analysis for continuous data ( based on sample data )
Sigma Level ( adjusted ) calculated from CPK = ( 3CPK + 1.5 ) = 2.1
Run Chart for continuous data ( sample ) was drawn ( based on date order )…..in order to get a better visualization of the trends over time ( clustering, oscillation )
Observations and Points to note here are
Our focus for this six sigma improvement project is for the process to hover between the target and the USL level for improving the process capability score
- Process capability has been historically very low in the fast-moving and intensely competitive tech recruitment industry. Based on business intelligence estimates the market leaders ( Korn ferry, manpower etc) operate at about 3.2 sigma level ( for this particular CTQ )
- The target level 320 ( percent ) has been kept at ambitious levels on purpose so that improvement areas can be vigorously identified
- The run chart indicated a cyclic trend, with an interesting spike in performance between the 25th and the 35th week. Data will be intensively mined in this period so as to discover the root cause of the spike in this period. The other main parameters of the run chart look to be in control
- The control chart indicated that the process is already under overall control, though the variation can still be improved upon.
Summing up this section: We calculated the sigma score of our process ( as 2.1 ). We were also able to identify trends in the process through SPC ( statistical process control ) charts and note the spikes. This information is a starting point for further analysis.
Part C (Analyze )
Based on the observations of the “Process Capability Analysis”. the Core competencies of the Job DataSourcess/Social Networking Services were statistically analyzed using individual value plot and analysis of variance (ANOVA).
Records were extracted for the total number of resumes procured per week ( excluding duplicates ) from the four Job DataSourcess/ social Networking Services [A,B,C,D] for all the 60 weeks (units ) by all the four recruiters. Please do note this data only gave information about the total number of all matching resumes extracted/ irrespective of skill set and irrespective of the conversion ratio.
Analysis of this data gives us a broad picture of the overall resource strength of all the four Data Sources under evaluation.This analysis can help us prioritize/ rank our resources at the preliminary level.
Data was collected for the 60 weeks matching the number and type of resources received from each specific resource. Numeric identities [ 1, 2, 3, 4 ] were given to the Job DataSourcess/ Technical networking data sources & Numeric identities [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 ] were given to the specific technical requirements.
The Individual value plot and the One way ( Unstacked ) ANOVA analysis indicate that DataSource 1 was the marginal leader as far the total number of resources is concerned, however, the variation here was unacceptably high.
DataSource 3 is a very close second with much lesser variation and thereby should be the preferred source as per preliminary visual inspection. DataSource 2 performance was slightly below the two.
DataSource 4 performance on all parameters is unsatisfactory ( as a medium of sourcing quality resumes ). A policy decision will be taken during improve phase based on these analyses.
Stronger relevance to our requirements would be the information on which source among [ DataSource 1, DataSource 2, DataSource 3, DataSource 4 ] should be the preferred source for each specific tech requirement.
Summing up this section: We evaluated all four data sources in terms of the total number of matching profiles ( for all skillsets ) obtained over 60 weeks.This information will be a critical component as it will help in prioritizing engagement and investment with our Data Sources
Multi – Var Charts enable us to know competencies of Job DataSourcess/ technical networking sites with regards to specific technical requirement. An interesting observation is that while DataSource 4 overall performance is the lowest, it still has core competencies in “software testing and QA” likewise DataSource 2 has core competencies in “storage tool”. The graphical analysis also indicates inadequate core in-competencies in some areas, for instance operating system/ mainframe requirement has low resource output from all the channels, indicating that our client may need to depend upon direct headhunting or other channels to source the need
Summing up this section: We evaluated all four data sources in terms of the total number of matching profiles (for individual skillsets) obtained over 60 weeks.This information was a critical component as helped tailor our approach vs a vs specific technical requirement to be fulfilled by our client.
There are two Principal Models for resume collection from the Job DataSourcess / Social networking sites
- Direct database access to the data sources
- Advertising in the respective data sources and receiving resumes in response
When a particular client requisition for a particular requirement, both the time as well as the budget to come up with required resources is limited. A statistically comparative analysis of these two approaches would help in prioritizing our approach here
Records were extracted with respect to the total number of resumes received by each principal method, for each of the 60 sample weeks.
A box plot was created to get a strong overview of the data strength and distribution for both the principal methods.
The direct database access method was clearly the more productive method with all parameters [ mean, median, quartiles ] higher compared to the Advertisement methods. Based on this analysis a policy decision was subsequently taken during improve phase.
Summing up this section: We scrutinized all four data sources and mechanisms of resource collection via statistical tools like ANOVA, Individual value plot, Multivariate chart and Box Plot. This analysis helped us gain insights in.
- Overall strength of the four data sources and also their specialist skill strengths
- Evaluating the efficiency of direct database access vs a vs advertising
We took a policy decision regarding the effective use of these reports will be taken during improve phase based on these analyses.
Part D – Controlled Design Of Experiment (Advanced Analyze )
During the QFD and the brainstorm sessions, it was suspected that one of the factors significantly affecting our goal of “improving the percentage of revenue per week over the expenditure/ investment in job portals and social networking sites “, was the effective and efficient utilization of recruiters engagement time in sourcing profiles from these four DataSources.
It so happens that in a few working days the recruiters do not participate active headhunting calls and correspondences. For those particular working days, the task of the recruiter is to open-surf the DataSources and collect profiles vis a vis anticipated requirements in the future, and tag and index them in the database.
As per standard practice, the recruiter is given the freedom to choose the DataSources he/she wishes to surf, collect and tag resumes in the database. The recruiter surfs these sites randomly or equally, divides the time between these four DataSources. It alls depending upon the individual recruiters choice.
In order to mathematically evaluate the optimum utilization of time to be invested in surfing these 4 data sources, a Controlled Design of Experiment was conducted.
What is a DOE ( Design Of Experiment )
In the design of experiments, the values of x are experimentally controlled as observed in other statistical studies. DOE is also called observational studies. The purpose of DOE is to understand the y=f(x) equation to the maximum extent possible, and thereby usually tune it to the best performance possible.
The key areas for understanding in Design Of Experiments are
- A. The x’s that have the maximum effect on Y ( X in our case is recruiter time invested in the respective DataSources and Y is the Revenue )
- B. The exact ( or closest ) mathematical relationship between significant x’s and Y
- C. Statistically confirming that an improvement has been made or difference exists with respect to different values of X
- D. Discovering where to set the values/ levels of the significant x’s so as to have the maximum positive effect on y’s.
The methodology of DOE for this project: Since interaction effects were not considered as a factor, and in order to minimize time and costs in the experiment by reducing the number of runs, the Placket-Burman DOE the methodology was used.
Defining the architecture of the design: The four DataSources 1, 2, 3 and 4 were the factors, the levels were 1 and – 1. There were eighteen experimental runs in all
The task in relation to the architecture of the design: The experiment was conducted over a period of 18 consecutive days by one recruiter.
The task assigned to the recruiter was to surf 2 hours ( 120 minutes ) in all among the 4 data sources and collect, tag and index up-to 24 resumes ( 2 resumes each of 12 different skills sets, randomly numbered)
+1, +1, +1, +1 in a particular run would translate into the recruiter investing 30 minutes each among the four DataSources
+1, +1, -1, +1 in a particular run would translate into the recruiter investing 40 minutes each among the three Datasources having +1 sign and leaving the DataSources with the -1 sign out
+1, +1, -1, -1 in a particular run would translate into the recruiter investing 60 minutes each among the two data sources having +1 sign and leaving the two DataSources with the -1 sign out
+1, -1, -1, -1 in a particular run would translate into the recruiter investing 120 minutes on the DataSources having +1 sign and leaving the three DataSources with the -1 sign out
and so on for a total of 18 runs
The experiment was conducted and the total number of resumes collected, tagged and indexed was duly accounted. The results of the experiment were analyzed and observed as follows.
Analysis and Observations of the experiment
- In tune with the common understanding, DataSource 1 emerged as the strongest positive factor and Datasource 2 as the strongest negative factor in the experiment as evident from the regression, ANOVA, main effects plot and the standard effects plot
- Datasource 2 and 3 do not emerge as statistically significant factors ( as evident from the P value of ANOVA, regression ). However, they do still have some business significance for specialist mandates
- The factor effects of DataSource 2 and DataSource 3 were found to be close to each other. However, a casual visual analysis revealed that Datasource 3 performed better in terms of quality of resumes vis a vis DataSource 2. DataSource 2 performed better in size of database….however a mathematical evaluation is beyond the scope of this six sigma project….it will be looked into in subsequent projects.
Summing up this section: A controlled design of experiment gave us deeper insights then available through casual observations studies alone. Through the DOE we were able to confirm the difference in productivity vs a vs the time invested in the four Data Sources. This will help us optimize recruiter time investment among the four data sources in the improve phase.
Part E – Improve
Based on the overall conclusions of the Define, Measure, Analyze and improve phase the following strategic steps were initiated and implemented.
- The professional services of DataSource 4 were not renewed.
- The budget allocation was reformulated. 60% of the budget was allocated to data source 1, whereas DataSource 2 and 3 were allocated 20% each
- The Multi-var chart would be a guiding tool for a sequential use of the respective job data sources based on the specific requirement given by the client
- 75% of funding for resource collection with respect to a specific project would be used for “direct database access” and 20% funds for “advertisements”. 5% would be for reserve
- “software testing and QA”, “storage tool”, “mail servers” and “EJB design patterns” would be marketed as core competency skills of the consultancy
- A policy was created for those business days where the recruiter surfs the data sources and collects resumes vis a vis anticipated requirements in the future, and tags and index them in our database, 60% of the time would be invested for DataSource 1, and 40% remaining time equally divided between DataSource 2 and 3
- Quarterly review, fresh data accumulation and review graphical and analytical tools used in this project, if there is any change in status quo, results are to be analyzed and suitably updated.
A 30-day window was given for implementation of the recommendations. Similar to the first project suitable improvements in performance were observed even by casual observations within a month of starting the improvement program. Due to the time constraint. Data of eight weeks was collected post implementation and process capability was evaluated.
Sigma Level ( adjusted ) calculated from CPK ( 3CPK + 1.5 ) is 3. A substantial improvement from 2.1. And already equal to the best in the industry standards.
There has been a 32% increase in revenue for the past two weeks. However, as multiple Six Sigma projects were conducted side by side hence it is difficult to calculate the exact monetary gains of all the projects individually at this point in time due to confounding.
The sigma level is expected to increase over the next 6 months as the full benefits from the improvement initiatives are realized leading to.
Part F – Control
The control chart shows the process well under control [ as it was for this CTQ ] even before this particular six sigma initiative.
Summing up key long-term implications of applying Six Sigma framework to HR
This six sigma applied to HR project has several implications some of them are listed here:
- Such statistical analysis based on data collected by improved HR systems will ensure that investments in HR are more data-driven & thus helps HR become more strategic in nature
- The analysis can also assist companies to build their own unique algorithms, optimize process flow and even assist in their robotic process automation (RPA) efforts thus further improving the efficiency and effectiveness ( as in this case of their recruiters)
- The service providers namely job portals and social networking sites will increasingly have to showcase such evidence-based data to secure business. As many portals, today sell the same database to multiple players without offering any such differentiator insights into the actual breakup and effectiveness of their database for closing positions.
We believe as HR continues to digitize its operations and collect more data about its processes it is possible for it to integrate strategically and evidence-based approaches like Six-sigma. We firmly believe that the age of analytics in general and HR analytics, in particular, is already upon us and we all can work together to improve our business processes and delivers ROI.