Open Data Science Conference
  • BOOTCAMP
  • EAST
  • WEST
  • EUROPE
  • APAC
  • Ai+
  • Blog
  • Partners
  • Newsletter
  • Jobs
  • About
  • Home
Open Data Science ConferenceOpen Data Science ConferenceOpen Data Science ConferenceOpen Data Science Conference
  • Focus Areas
    • Hands-on Training
    • Deep Learning & Machine Learning
    • Machine Learning for Programmers
    • Data Visualization
    • Data Science Kick Start
    • AI X Business
    • MLOps & Data Engineering
    • Research Frontiers
    • R for Data Science
    • NLP
    • Mini-Bootcamp
  • Bootcamp
    • Register
    • Program Overview
    • Specialization Tracks
    • Bring Your Team
  • AIx Biz
  • Attend
    • Why Attend
    • Convince Your Boss
    • Bring Your Team
    • Download Attendee Guide
    • See Who Attends East
  • Schedule
    • Schedule Overview
    • Preliminary Schedule
    • Speakers
    • Training
  • Register
    • Conference Tickets
    • Bootcamp Tickets
    • AI Expo Tickets
    • AIx Summit
    • Career Expo Tickets
    • Bring Your Team
  • Speakers
    • Call for Speakers
    • East Speakers
  • Partner
    • Partner with ODSC
    • Meet our Partners
    • Partner Brochure
    • Hiring
    • AI Expo Hall
  • Info
    • Media Pass
    • Discounts
    • Volunteer
    • Scholarship Passes
    • Conference Guide
    • FAQ
  • Focus Areas
    • Hands-on Training
    • Deep Learning & Machine Learning
    • Machine Learning for Programmers
    • Data Visualization
    • Data Science Kick Start
    • AI X Business
    • MLOps & Data Engineering
    • Research Frontiers
    • R for Data Science
    • NLP
    • Mini-Bootcamp
  • Bootcamp
    • Register
    • Program Overview
    • Specialization Tracks
    • Bring Your Team
  • AIx Biz
  • Attend
    • Why Attend
    • Convince Your Boss
    • Bring Your Team
    • Download Attendee Guide
    • See Who Attends East
  • Schedule
    • Schedule Overview
    • Preliminary Schedule
    • Speakers
    • Training
  • Register
    • Conference Tickets
    • Bootcamp Tickets
    • AI Expo Tickets
    • AIx Summit
    • Career Expo Tickets
    • Bring Your Team
  • Speakers
    • Call for Speakers
    • East Speakers
  • Partner
    • Partner with ODSC
    • Meet our Partners
    • Partner Brochure
    • Hiring
    • AI Expo Hall
  • Info
    • Media Pass
    • Discounts
    • Volunteer
    • Scholarship Passes
    • Conference Guide
    • FAQ
Pandas is a popular data analysis library built on top of the Python programming language, and getting started with Pandas is an easy task. It assists with common manipulations for data cleaning, joining, sorting, filtering, deduping, and more. First released in 2009, pandas now sits as the epicenter of Python’s vast data science ecosystem and is an essential tool in the modern data analyst’s toolbox. Pandas represents a fantastic step forward for graphical spreadsheet users who’d like to handle larger amounts of data, perform more complex operations, and automate the steps of their analysis routines. I like to introduce the tool as “Excel on steroids.” Here’s some good news: you don’t need to be a software engineer to work effectively with the library. In fact, pandas offers Excel users a great bridge to get started with Python and programming in general. If you’ve never written a line of code before, you’ll be pleasantly surprised by how many spreadsheet operations already require you to think like a developer. Let’s explore a sample dataset to see some of the library’s powerful features. If you’d like a deeper dive into the syntax and mechanics of pandas, tune in to my upcoming ODSC workshop this October, “Getting Started with Pandas for Data Analysis.” We’ll start by importing pandas and assigning it an alias to start off strong getting started with Pandas. import pandas as pd Our dataset is a CSV file of titles available on the online streaming service Netflix. Each row includes the title’s name, type, release year, duration, and the categories it’s listed in. netflix = pd.read_csv("netflix_titles.csv") netflix.head() Let’s say we’re in the mood for a 90s comedy film. We can find the subset of rows that fit our criteria by applying filtering conditions to the type, release_year and listed_in columns. First up, let’s extract the rows with a value of “Movie” in the type column. movies = netflix["type"] == "Movie" netflix[movies].head() Next up, let’s find our comedies. We’ll need to be a bit clever here. There are 4 categories in the listed_in column that we should include: “Comedies”, “Stand-Up Comedy” “TV Comedies”, and “Stand-Up Comedy & Talk Shows“. These categories can also be nested amongst other non-related categories. We can use regular expressions to identify the titles whose listed_in text includes the substring “Comed” followed by any characters. comedies = netflix["listed_in"].str.contains(r'Comed.*') netflix[comedies].head() To target titles released in the 90s, we can filter the values in the release_year column to those that fall between a range, from 1990 to 1999. made_in_nineties = netflix["release_year"].between(1990, 1999) netflix[made_in_nineties].head() We’ve now declared three individual conditions to filter the dataset. The final step is to apply all three conditions together. In the next example, we ask pandas for all titles that are movies and comedies and released between 1990 and 1999. netflix[movies & comedies & made_in_nineties].head() getting started with pandasFor the cherry on top, let’s sort the movies by year of release. We’ll sort the numeric values in the release_year column in ascending order. netflix[movies & comedies & made_in_nineties].sort_values("release_year").head() getting started with pandasAll that’s left to do now is select a title and start binging. With only a couple lines of code, we’ve narrowed down a 6,000+ row dataset to 60 titles that fit out interests. For a more in-depth overview of getting started with Pandas, check out my upcoming ODSC talk this October, “Getting Started with Pandas for Data Analysis.” We’ll explore several real-world datasets and walk through many of the features of this powerful data analysis tool. About the author/ODSC West 2020 speaker: Boris Paskhaver is a full-stack web developer based in New York City with experience building apps in React / Redux and Ruby on Rails. His favorite part of programming is the never-ending sense that there’s always something new to master — a secret language feature, a popular design pattern, an emerging library or — most importantly — a different way of looking at a problem. LinkedIn: https://www.linkedin.com/in/boris-paskhaver/ Udemy: https://www.udemy.com/user/borispaskhaver/ Twitter: https://twitter.com/borispaskhaver
Oct
09

Getting Started with Pandas

  • Posted By : odscadmin/
  • 0 comments /
  • Under : Python

Pandas is a popular data analysis library built on top of the Python programming language, and getting started with Pandas is an easy task. It assists with common manipulations for data cleaning, joining, sorting, filtering, deduping, and more. First released in 2009, pandas now sits as the epicenter of Python’s vast data science ecosystem and is an essential tool in the modern data analyst’s toolbox.

Pandas represents a fantastic step forward for graphical spreadsheet users who’d like to handle larger amounts of data, perform more complex operations, and automate the steps of their analysis routines. I like to introduce the tool as “Excel on steroids.”

Here’s some good news: you don’t need to be a software engineer to work effectively with the library. In fact, pandas offers Excel users a great bridge to get started with Python and programming in general. If you’ve never written a line of code before, you’ll be pleasantly surprised by how many spreadsheet operations already require you to think like a developer.

Let’s explore a sample dataset to see some of the library’s powerful features. If you’d like a deeper dive into the syntax and mechanics of pandas, tune in to my upcoming ODSC workshop this October, “Getting Started with Pandas for Data Analysis.”

We’ll start by importing pandas and assigning it an alias to start off strong getting started with Pandas.

import pandas as pd

Our dataset is a CSV file of titles available on the online streaming service Netflix. Each row includes the title’s name, type, release year, duration, and the categories it’s listed in.

netflix = pd.read_csv("netflix_titles.csv")
netflix.head()

Let’s say we’re in the mood for a 90s comedy film. We can find the subset of rows that fit our criteria by applying filtering conditions to the type, release_year and listed_in columns. First up, let’s extract the rows with a value of “Movie” in the type column.

movies = netflix["type"] == "Movie"
netflix[movies].head()

Next up, let’s find our comedies. We’ll need to be a bit clever here. There are 4 categories in the listed_in column that we should include: “Comedies”, “Stand-Up Comedy” “TV Comedies”, and “Stand-Up Comedy & Talk Shows“. These categories can also be nested amongst other non-related categories. We can use regular expressions to identify the titles whose listed_in text includes the substring “Comed” followed by any characters.

comedies = netflix["listed_in"].str.contains(r'Comed.*')
netflix[comedies].head()

To target titles released in the 90s, we can filter the values in the release_year column to those that fall between a range, from 1990 to 1999.

made_in_nineties = netflix["release_year"].between(1990, 1999)
netflix[made_in_nineties].head()

We’ve now declared three individual conditions to filter the dataset. The final step is to apply all three conditions together. In the next example, we ask pandas for all titles that are movies and comedies and released between 1990 and 1999.

netflix[movies & comedies & made_in_nineties].head()

getting started with pandasFor the cherry on top, let’s sort the movies by year of release. We’ll sort the numeric values in the release_year column in ascending order.

netflix[movies & comedies & made_in_nineties].sort_values("release_year").head()

getting started with pandasAll that’s left to do now is select a title and start binging. With only a couple lines of code, we’ve narrowed down a 6,000+ row dataset to 60 titles that fit out interests.

For a more in-depth overview of getting started with Pandas, check out my upcoming ODSC talk this October, “Getting Started with Pandas for Data Analysis.” We’ll explore several real-world datasets and walk through many of the features of this powerful data analysis tool.


About the author/ODSC West 2020 speaker: Boris Paskhaver is a full-stack web developer based in New York City with experience building apps in React / Redux and Ruby on Rails. His favorite part of programming is the never-ending sense that there’s always something new to master — a secret language feature, a popular design pattern, an emerging library or — most importantly — a different way of looking at a problem.

LinkedIn: https://www.linkedin.com/in/boris-paskhaver/
Udemy: https://www.udemy.com/user/borispaskhaver/
Twitter: https://twitter.com/borispaskhaver

Aug
27

Automatic Differentiation in PyTorch

  • Posted By : odscadmin/
  • 0 comments /
  • Under : Machine Learning, Python

Autograd is PyTorch’s automatic differentiation package. Thanks to it, we don’t need to worry about partial derivatives, chain rule, or anything like it.

To illustrate how it works, let’s say we’re trying to fit a simple linear regression with a single feature x, using Mean Squared Error (MSE) as our loss:

We need to create two tensors, one for each parameter our model needs to learn: b and w.

Without PyTorch, we would have to start with our loss, and work the partial derivatives out to compute the gradients manually. Sure, it would be easy enough to do it for this toy problem, but we need something that can scale.

So, how do we do it? PyTorch provides some really handy methods we can use to easily compute the gradients. Let’s check them out!

requires_grad

What distinguishes a tensor used for training data (or validation, or test) from a tensor used as a (trainable) parameter/weight?

The latter requires the computation of its gradients, so we can update their values (the parameters’ values, that is). That’s what the requires_grad=True argument is good for. It tells PyTorch to compute gradients for us.

Remember: a tensor for a learnable parameter requires a gradient!

In code, creating tensors for our two parameters looks like this:

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Step 0 - Initializes parameters "b" and "w" randomly
torch.manual_seed(42)
b = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)
w = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)

backward

So, how do we tell PyTorch to do its thing and compute all gradients? That’s the role of the backward() method. It will compute gradients for all (requiring gradient) tensors involved in the computation of a given variable.

Do you remember the starting point for computing the gradients? It is the loss, which we would use to compute its partial derivatives with respect to our parameters.

Hence, we need to invoke the backward() method from the corresponding Python variable: loss.backward().

The code below illustrates it well, assuming we’re making both predictions and computing the loss using nothing but Numpy:

# Step 1 - Computes our model's predicted output - forward pass
yhat = b + w * x_train_tensor

# Step 2 - Computes the loss
# We are using ALL data points, so this is BATCH gradient descent.
# How wrong is our model? That's the error!
error = (y_train_tensor - yhat)
# It is a regression, so it computes mean squared error (MSE)
loss = (error ** 2).mean()

# Step 3 - Computes gradients for both "b" and "w" parameters
# No more manual computation of gradients!
loss.backward() 

Which tensors are going to be handled by the backward() method applied to the loss?

  • b
  • w
  • yhat
  • error

We have set requires_grad=True to both b and w, so they are obviously included in the list. We use them both to compute yhat, so it will also make it to the list. Then we use yhat to compute the error, which is also added to the list.

Do you see the pattern here? If a tensor in the list is used to compute another tensor, the latter will also be included in the list. Tracking these dependencies is exactly what the dynamic computation graph is doing, as we’ll see shortly.

What about x_train_tensor and y_train_tensor? They are involved in the computation too… but they contain data, and thus they are not created as gradient-requiring tensors. So, backward() does not care about them.

grad

What about the actual values of the gradients? We can inspect them by looking at the grad attribute of each tensor.

b.grad, w.grad

OK, we got gradients, but there is one more thing to pay attention to: by default, PyTorch accumulates the gradients. How to handle that?

zero_

Every time we use the gradients to update the parameters, we need to zero the gradients afterward. And that’s what zero_() is good for.

# This code will be placed after Step 4 (updating the parameters)
b.grad.zero_(), w.grad.zero_()

So, we can definitely ditch the manual computation of gradients and use both backward() and zero_() methods instead.

That’s it? Well, pretty much… but there is always a catch, and this time it has to do

with the update of the parameters…

Updating Parameters

To update a parameter, we multiply its gradient by a learning rate, flip the sign, and add it to the parameter’s former value.  So, let’s first set our learning rate:

lr = 0.1

And then use it to perform the updates:

# Attempt at Step 4
b -= lr * b.grad
w -= lr * w.grad

But, it turns out we cannot simply perform an update like this! Why not?! It turns out to be a case of “too much of a good thing”. The culprit is PyTorch’s ability to build a dynamic computation graph from every Python operation that involves any gradient-computing tensor or its dependencies.

no_grad

So, how do we tell PyTorch to “back off” and let us update our parameters without messing up with its fancy dynamic computation graph? That’s what torch.no_grad() is good for. It allows us to perform regular Python operations on tensors, without affecting PyTorch’s computation graph

This time, the update will work as expected:

# Step 4, for real
with torch.no_grad():
    b -= lr * b.grad
    w -= lr * w.grad

Mission accomplished! We updated our parameters b and w using PyTorch’s automatic differentation package, autograd.

I mean, we updated it once. To actually train a model, we need to place this code inside a loop. Putting it all together, and adding a loop to it, the code should look like this:

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Step 0 - Initializes parameters "b" and "w" randomly
torch.manual_seed(42)
b = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)
w = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)

lr = 0.1

for epoch in range(200):
    # Step 1 - Computes our model's predicted output - forward pass
    yhat = b + w * x_train_tensor

    # Step 2 - Computes the loss
    # We are using ALL data points, so this is BATCH gradient descent.
    # How wrong is our model? That's the error!
    error = (y_train_tensor - yhat)
    # It is a regression, so it computes mean squared error (MSE)
    loss = (error ** 2).mean()

    # Step 3 - Computes gradients for both "b" and "w" parameters
    # No more manual computation of gradients!
    loss.backward() 

    # Step 4, for real
    with torch.no_grad():
        b -= lr * b.grad
        w -= lr * w.grad

    # This code will be placed after Step 4 (updating the parameters)
    b.grad.zero_(), w.grad.zero_()

That was autograd in action! Now it is time to take a peek at the…

Dynamic Computation Graph

“Unfortunately, no one can be told what the dynamic computation graph is. You have to see it for yourself.”

– Morpheus

I want you to see the graph for yourself too!

The PyTorchViz package and its make_dot(variable) method allow us to easily visualize a graph associated with a given Python variable involved in the gradient computation.

So, let’s stick with the bare minimum: two (gradient computing) tensors for our parameters (b and w) and the predictions (yhat) – these are Steps 0 and 1.

make_dot(yhat)

Running the code above will show us the graph below:

Let’s take a closer look at its components:

  • blue boxes ((1)s): these boxes correspond to the tensors we use as parameters, the ones we’re asking PyTorch to compute gradients for
  • gray box (MulBackward0): a Python operation that involves a gradient-computing tensor or its dependencies
  • green box (AddBackward0): the same as the gray box, except that it is the starting point for the computation of gradients (assuming the backward() method is called from the variable used to visualize the graph)— they are computed from the bottom-up in a graph

Now, take a closer look at the green box at the bottom of the graph: two arrows are pointing to it since it is adding up two variables, b, and w*x. Seems obvious, right?

Then, look at the gray box (MulBackward0) of the same graph: it is performing a multiplication, namely, w*x. But there is only one arrow pointing to it! The arrow comes from the blue box that corresponds to our parameter w.

“Why don’t we have a box for our data (x)?“

The answer is: we do not compute gradients for it!

So, even though there are more tensors involved in the operations performed by the computation graph, it only shows gradient-computing tensors and its dependencies.

What would happen to the computation graph if we set requires_grad to False for our parameter b?

# New Step 0
b_nograd = torch.randn(1, requires_grad=False, dtype=torch.float, device=device)
w = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)

# New Step 1
yhat = b_nograd + w * x_train_tensor
make_dot(yhat)

Automatic Differentiation in PyTorch

Unsurprisingly, the blue box corresponding to the parameter b is no more!

Simple enough: no gradients, no graph!

The best thing about the dynamic computation graph is the fact that you can make it as complex as you want it. You can even use control flow statements (e.g., if statements) to control the flow of the gradients.

The figure below shows an example of this. And yes, I do know that the computation itself is complete nonsense…

b = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)
w = torch.randn(1, requires_grad=True,dtype=torch.float, device=device)

yhat = b + w * x_train_tensor
error = y_train_tensor - yhat
loss = (error ** 2).mean()

# this makes no sense!!
if loss > 0:
    yhat2 = w * x_train_tensor
    error2 = y_train_tensor - yhat2

# neither does this :-)
loss += error2.mean()

make_dot(loss)

Automatic Differentiation in PyTorch

Even though the computation is nonsensical, you can clearly see the effect of adding a control flow statement like if loss > 0: it branches the computation graph in two parts. The right branch performs the computation inside the if statement, which gets added to the result of the left branch in the end. Cool, right?

To be continued…

Autograd is just the beginning! Interested in learning more about training a model using PyTorch in a structured, and incremental way?

Don’t miss my talk at ODSC Europe 2020: “PyTorch 101: building a model step-by-step.”

The content of this post was adapted from my book “Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide”.  Learn more about it at http://leanpub.com/pytorch.


About the author/speaker:

Daniel is a data scientist, developer, and author of “Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide”.

He has been teaching machine learning and distributed computing technologies at Data Science Retreat, the longest-running Berlin-based bootcamp, for more than three years, helping more than 150 students advance their careers.

His professional background includes 20 years of experience working for companies in several industries: banking, government, fintech, retail and mobility.
SITE: https://leanpub.com/pytorch/

ODSC West 2019 Preview: Python for Data Acquisition
Sep
25

ODSC West 2019 Preview: Python for Data Acquisition

  • Posted By : odscadmin/
  • 0 comments /
  • Under : Python

Editor’s Note: See Phil present his talk “Python for Data Acquisition”  at ODSC West 2019.

What does it take, on the technical side, to get a project started? After you have an idea and find something you want to study or look into, you need to get some data. Where do you get data? Primary sources? Web sites? Database? There are many different sources and possibilities. Which ones should you choose? How can you trust that the data remains and allows for reproducibility? Will it be easy to update once new data becomes available? These are all just the beginnings of the issues involved in acquiring data for your project, but you can use Python for data acquisition to make it easier.

Data Source

There are so many sources of freely available public data. The US Federal Government runs Data.gov for its public data. The topics covered on this site include everything the government runs such as agriculture, climate, education, transportation, and energy. Individual divisions of the federal government, like NASA, may also have their own open data. Most states and cities also run web sites with a lot of data. ODSC West 2019 is in San Francisco and they have their own web site of local government data, Data SF.

python for data

Other governments and NGO’s have the same features

[Related Article: 25 Excellent Machine Learning Open Datasets]

  • European Union
  • Russia
  • UNICEF
  • Data World

Google has Public Data Directory, Amazon AWS Open Data, Microsoft, and IBM Cloud Data Services all have open data sets for public use. Github keeps track of so many more sites, like Awesome Public Datasets. With a little looking around, there is a set for almost anything you want to study!

Gathering Data

Even with this almost infinite supply of options, doesn’t mean that the data is ready to go for your application or model. You still need to actually download this data and parse it into a usable format. The data on these sites are stored in a variety of different formats. They range from GIS, CSV, XML, JSON, text, HTML and various binary types. It is quite possible for your project to need data from multiple sources and in multiple formats. This can create a variety of issues for any project getting started or continuing on.

Once we have this data, how do we hang on to it? For each application or model that is built on this, do we want to download it again? What happens if the website goes away or changes its policies or changes its format? Storing all of this data in your own database can ease this issue. Once you have downloaded, cleaned up and gotten your data ready, store it in a local database. From there all future applications and models only need to access the database without worrying about all of the other issues of getting the data.

This is where Python comes in! It can handle all of these tasks with the right libraries and some coding. Python has libraries to cover all of these topics and then some. Using the Requests Library, downloading web pages and other files is very simple. With the correct information, it can also log into a server for non-public or restricted data. If the files are compressed, Python has archiving libraries for this. For the various formats, there are Python libraries such as CSV, JSON, and regular expressions. From here storing data in a database can be done by wrapping SQL in python via psycopg2 or creating ORMs in SQL Alchemy.

[Related Article: Jupyter Notebook: Python or R—Or Both?]

This Course on Python for Data Acquisition

The goal of this course, which I’ll be presenting at ODSC West 2019, is to expose all of the students to this process and give them a few labs where they will get to do this. The students will learn to parse various data file formats, download data and interact with a database for storing and retrieving data.

Originally posted on OpenDataScience.com.


Sep
25

Optuna: An Automatic Hyperparameter Optimization Framework

  • Posted By : odscadmin/
  • 0 comments /
  • Under : Machine Learning, Python

Note: Please go here to see a high-resolution version of the title image)

Preferred Networks has released a beta version of an open-source, automatic hyperparameter optimization framework called Optuna. In this blog, we will introduce the motivation behind the development of Optuna as well as its features.

[Related Article: How Developers are Driving Innovation Through Open Source Economy]

 

optuna-logo.png

 

  • Website
  • Documents
  • Tutorials
  • GitHub

What is a hyperparameter?

A hyperparameter is a parameter to control how a machine learning algorithm behaves. In deep learning, the learning rate, batch size, and number of training iterations are hyperparameters. Hyperparameters also include the numbers of neural network layers and channels. They are not, however, just numerical values. Things like whether to use Momentum SGD or Adam in training are also regarded as hyperparameters.

It is almost impossible to make a machine learning algorithm do the job without tuning hyperparameters. The number of hyperparameters tends to be high, especially in deep learning, and it is believed that performance largely depends on how we tune them. Most researchers and engineers that use deep learning technology manually tune these hyperparameters and spend a significant amount of their time doing so.

What is Optuna?

Optuna is a software framework for automating the optimization process of these hyperparameters. It automatically searches for and finds optimal hyperparameter values by trial and error for excellent performance. Currently, the software can be used in Python.

Optuna uses a history record of trials to determine which hyperparameter values to try next. Using this data, it estimates a promising area and tries values in that area. Optuna then estimates an even more promising region based on the new result. It repeats this process using the history data of trials completed thus far. Specifically, it employs a Bayesian optimization algorithm called Tree-structured Parzen Estimator.

What is its relationship with Machine Learning frameworks?

Optuna is framework agnostic and can work with most Python-based frameworks, including Chainer, PyTorch, Tensorflow, scikit-learn, XGBoost, and LightGBM. In fact, Optuna can cover a broad range of use cases beyond machine learning, such as acceleration or database tuning.

Why did PFN develop Optuna?

Why did we develop Optuna even though there were already established automatic hyperparameter optimization frameworks like Hyperopt, Spearmint, and SMAC?

When we tried the existing alternatives, we found that they did not work or were unstable in some of our environments, and that the algorithms had lagged behind recent advances in hyperparameter optimization. We wanted a way to specify which hyperparameters should be tuned within the python code, instead of having to write separate code for the optimizer.

Key Features

Define-by-Run style API

Optuna provides a novel Define-by-Run style API that enables the user to optimize hyperparameters, even if the user code is complex, while maintaining higher modularity than other frameworks. It can also optimize hyperparameters in a complex space like no other framework could express before.

There are two paradigms in deep learning frameworks: Define-and-Run and Define-by-Run. In the early days, Caffe and other Define-and-Run frameworks were dominant players. Then, PFN-developed Chainer appeared as the first advocate of the Define-by-Run paradigm, followed by the release of PyTorch, and later, eager mode becoming the default in TensorFlow 2.0. Now the Define-by-Run paradigm is well recognized and appears to be gaining momentum to become the standard.

Is the Define-by-Run paradigm useful only in the domain of deep learning frameworks? We came to understand that we could apply a similar approach to automatic hyperparameter optimization frameworks as well. Under this approach, all existing automatic hyperparameter optimization frameworks are classified as Define-and-Run. Optuna, on the other hand, is based on the Define-by-Run concept and provides users with a new style of API that is very different from other frameworks. This has made it possible to give high modularity to a user program and access to complex hyperparameter spaces, among other things.

Pruning of trials using learning curves

When iterative algorithms like deep learning and gradient boosting are used, rough prediction on end results of training can be made from the learning curve. Using these predictions, Optuna can halt unpromising trials before the training is over. This is the pruning feature of Optuna.

Existing frameworks such as Hyperopt, Spearmint, and SMAC do not have this functionality. Recent studies show that the pruning technique using learning curves is highly effective.  The following graph indicates its effectiveness in performing a sample deep learning task. While the optimization engines of both Optuna and Hyperopt utilize the same TPE, thanks to pruning, the optimization performed by Optuna is more efficient.

Pruning_of_unpromising_trials.png

Parallel distributed optimization

Deep learning is computationally intensive, and each training process is very time-consuming. Therefore, for automatic hyperparameter optimization in practical use cases, it is essential that the user can easily use parallel distributed optimization that is efficient and stable. Optuna supports asynchronous distributed optimization which simultaneously performs multiple trials using multiple nodes. Parallelization can make the optimization process even faster as shown in the following figure. In the below example, we changed the number of workers from 1, 2, 4, to 8, confirming that the parallelization has accelerated the optimization.

Optuna also has a functionality to work with ChainerMN, allowing the user to optimize training that requires distributed processing without difficulty. By making use of a combination of these functionalities, the user can execute objective functions that include distributed processing in a parallel, distributed manner.

Parallel_distributed_optimization

Visualized optimization on dashboard (under development)

Optuna has a dashboard that provides a visualized display of the optimization process. With this, the user can obtain useful information from experimental results. The dashboard can be accessed by connecting via a Web browser to an HTTP server which can be started by one command. Optuna also has a functionality to export optimization processes in a pandas dataframe, for systematic analysis.

1.png

Conclusions

[Related Article: Optimizing Hyperparameters for Random Forest Algorithms in scikit-learn]

Optuna is already in use by several projects at PFN. Among them is the project to compete in the Open Images Challenge 2018, in which we finished in second place. We will continue to aggressively develop Optuna to improve its integrity as well as prototyping and implementing advanced functionalities. We believe Optuna is ready for use, so we would love to receive your candid feedback.

  • Website
  • Documents
  • Tutorials
  • GitHub

Our objective is to speed up deep learning related R&D activities as much as possible. Our effort into automatic hyperparameter optimization is an important step toward this end. Additionally, we have begun working on other important technologies such as neural architecture search and automatic feature extraction. PFN is looking for potential full-time members and interns who are enthusiastic about working with us in these fields and activities.

Be sure to check out an upcoming talk at ODSC West 2019 this October 29 – Nov 1 by Takuya’s colleague, Crissman Loomis, titled “Machine Learning in Chainer Python.”

More on Crissman: Since his mathematics degree, Crissman has devoted himself to the study of languages, including Spanish, Javascript, German, Python, and Japanese. Previously, Crissman worked on open source projects for the automation of game playing systems, including MMORPGs, web-based games, and Pokemon. After finding the limits of rule-based systems, he worked on Deep Learning programs at Preferred Networks, the company that created the AI Python framework Chainer.

Originally Posted Here.


Categories
  • Accelerate AI (5)
  • Career (4)
  • Data Science (41)
  • Data Visualization (5)
  • Deep Learning (12)
  • Machine Learning (37)
  • NLP (10)
  • Python (4)
  • R (1)
  • Statistics (1)
Recent Posts
  • Top ODSC Europe 2020 Sessions Available for Free On-Demand October 9,2020
  • Announcing ODSC APAC Dec 8-9 October 9,2020
  • Announcing the ODSC Ai x West Business and Innovation Summit This Oct 29-30 October 9,2020
Open Data Science

Open Data Science
Innovation Center
101 Main St
Cambridge, MA 02142
info@odsc.com

Menu
  • Partner with ODSC
  • Blog
  • Training
  • Jobs
  • FAQ
Conferences
  • East 2021
  • West 2021
  • Europe 2021
  • APAC 2020
Extras
  • Newsletter
  • About
  • Code of Conduct
  • Privacy Policy
Copyright ODSC 2020. All Rights Reserved
Close