Continuous Learning: An Argument

Continuous learning is critical for a Data Scientist and organizations should actively support and encourage it.

The Data Scientist’s Dilemma

In a data science role, you are often given a problem with the solution being required within a short time frame (say a week), but you don’t know how to solve the problem. There are two options at this point:

Option 1. You must either use the tools and methods you know and combine them in a way to invent a solution.

Pros:

  • You will progress from nothing to somewhere closer to a solution in a relatively short amount of time
  • You will have immediate work to share with stakeholders

Cons:

  • You may end up with a complicated solution involving many techniques pieced together
  • You may end up with a weak solution that stakeholders are not happy with or the solution you land on may be far from optimal

Option 2. You can start researching (whether online, asking colleagues, etc.) to try to find a tool/method that best fits the problem.

Pros:

  • You may find the perfect tool/method for the problem
  • If you find the perfect tool/method for the problem, you can feel confident you have one of the best solutions

Cons:

  • You may not find a solution and you are back to where you started (but at least you can feel more confident that there isn’t an elegant solution readily available)
  • If you don’t find a solution, from a stakeholder perspective, it looks like you haven’t done any work. You can’t answer the “What did you try?” question to defend the nil progress.
  • It takes a lot of time (probably more than was provided). You must evaluate whether the method fits the problem well enough to employ it and then get to a point where you feel confident in the implementation and interpretation of the output.

This is a dilemma I have come across multiple times and is especially difficult when a quick turnaround is required. If I try a lot of things already in my tool belt it looks like I’ve made a lot of progress from an outside perspective, even though I may never find an optimal or satisfactory solution. If I do research and find nothing applicable, all that research time, in the mind of a stakeholder, is useless. You have nothing to show for those hours spent. On the other hand, I could walk away looking like a hero in the case where I find the perfect method to use to solve the problem. Though applying a newly discovered method isn’t as easy as it sounds. It takes time to understand the ins-and-outs to be confident that you are applying and interpreting the new methodology correctly.

Where does continuous learning fit in?

Continuous learning reduces the number of times the dilemma occurs. When tasked with a problem, a data scientist will start looping through what they know and try to match a technique to the problem.

If the list looks like:

  • Linear Regression
  • Kmeans Clustering

Then you are going to run into this Dilemma A LOT.

Whereas if your list looks like:

  • Linear Regression
  • Logistic Regression
  • Linear Discriminant Analysis
  • Hypothesis Testing
  • Support Vector Machines
  • Kernel Density Estimation
  • Kmeans Clustering
  • Hierarchical Clustering
  • Density Based Clustering
  • Principal Components Analysis
  • Factor Analysis
  • ROC Analysis
  • Association Analysis
  • CART
  • Linear programming
  • Monte Carlo Simulation
  • Time Series Modeling
  • Survival Analysis
  • etc.

You are much more likely to come to a strong solution, and in a shorter amount of time.

Continuous learning and problem identification

The more your data scientists learn (and therefore know), the more they will be able to identify problems that had previously gone unnoticed. For example, say inquiries come into an insurance company in the thousands and there is a team of people sorting these to the appropriate department. Without knowing this is a problem that can be solved, it will never be solved. On the other hand, if your data science team is familiar with natural language processing techniques they would be able to quickly identify this as a solvable problem and could create an automated classification and forwarding solution. Problem identification is a critical reason why continuous learning should not be restricted to obvious business applications.

Why organizations should support and encourage it?

As stated above:

  1. Your data scientists will end up with better solutions
  2. Tasks will be accomplished in a shorter amount of time
  3. Your data scientists can identify and solve problems that were never before seen as problems to solve

Not stated above:

  1. Data scientists are naturally curious and will get bored of solving the same problems in the same way

How organizations should support and encourage it?

  1. Time - I would recommend one hour a day be given to data science professionals

A $10K a year learning budget helps SO MUCH LESS than giving people time to learn. Meaningful learning occurs through hours of study, practice, application, teaching, and thinking, it does not occur at a 2-day $3,000 conference (Conferences are not useless, they are great for sparking ideas, forming relationships, or showing you something that was previously unknown to you, but they are not meant for meaningful learning)

How organizations can create a succesful learning culture?

In the previous section I said an hour day feels about the right amount of time for continuous learning. This is a lot of time for an organization to commit to, and something that may be hard to digest. It immediately raises questions like:

  • How do I know employees aren’t abusing the time?
  • Are they learning anything useful?
  • Is there any actual benefit to the business?

While continuous learning is something I will continue to make an argument for, a passionate learning culture is not an easy thing to achieve. My argument above makes it seem like a black and white issue. I do not think this type of mindset will work at every organizaton and arguments for it may seem a little more qualitative than quantitative.

That being said, I think there are a few items that are critical for a successful continuous learning culture at an organization.

  1. You need a high performing team; A team that is excited and motivated to learn, pushes for better solutions, calls out new business problems, tries new methods on new problems. If your people aren’t on board for a learning culture, extra time will be wasted.

  2. Learning should be a part of performance evaluations. If individuals are only rewarded on tasks completed, there is little incentive to learn new things, even if “time is provided”. For example: say you are assigned a project that has a fairly tight deadline. If the only thing an organization cares about is getting that project completed, then that feeling is also pushed on the employee. What is the incentive to continue to learn if the project outcome is the only thing they will be judged on? Eventually learning will be pushed to the side and the project will consume the employees time.

  3. Learning is actively discussed with your manager. In a culture where continuous learning is highly valued, a typical conversation with your manager may include questions like:

    • What are you currently learning?
    • Do you feel like the resources you have are helpful?
    • Do you feel like you have a good grasp on the technique/method?
    • Do you think you’ll be able to use this often?
    • Could this have broader appeal outside of the team?
    • Do you see any business problems this could solve?
    • Is there anything worth sharing with your co-workers?
Will Burton
Will Burton
Analytics at Credit Karma
San Francisco Bay Area