Open Data Science Conference
  • BOOTCAMP
  • EAST
  • WEST
  • EUROPE
  • APAC
  • Ai+
  • Blog
  • Partners
  • Newsletter
  • Jobs
  • About
  • Home
Open Data Science ConferenceOpen Data Science ConferenceOpen Data Science ConferenceOpen Data Science Conference
  • Focus Areas
    • Hands-on Training
    • Deep Learning & Machine Learning
    • Machine Learning for Programmers
    • Data Visualization
    • Data Science Kick Start
    • AI X Business
    • MLOps & Data Engineering
    • Research Frontiers
    • R for Data Science
    • NLP
    • Mini-Bootcamp
  • Bootcamp
    • Register
    • Program Overview
    • Specialization Tracks
    • Bring Your Team
  • AIx Biz
  • Attend
    • Why Attend
    • Convince Your Boss
    • Bring Your Team
    • Download Attendee Guide
    • See Who Attends East
  • Schedule
    • Schedule Overview
    • Preliminary Schedule
    • Speakers
    • Training
  • Register
    • Conference Tickets
    • Bootcamp Tickets
    • AI Expo Tickets
    • AIx Summit
    • Career Expo Tickets
    • Bring Your Team
  • Speakers
    • Call for Speakers
    • East Speakers
  • Partner
    • Partner with ODSC
    • Meet our Partners
    • Partner Brochure
    • Hiring
    • AI Expo Hall
  • Info
    • Media Pass
    • Discounts
    • Volunteer
    • Scholarship Passes
    • Conference Guide
    • FAQ
  • Focus Areas
    • Hands-on Training
    • Deep Learning & Machine Learning
    • Machine Learning for Programmers
    • Data Visualization
    • Data Science Kick Start
    • AI X Business
    • MLOps & Data Engineering
    • Research Frontiers
    • R for Data Science
    • NLP
    • Mini-Bootcamp
  • Bootcamp
    • Register
    • Program Overview
    • Specialization Tracks
    • Bring Your Team
  • AIx Biz
  • Attend
    • Why Attend
    • Convince Your Boss
    • Bring Your Team
    • Download Attendee Guide
    • See Who Attends East
  • Schedule
    • Schedule Overview
    • Preliminary Schedule
    • Speakers
    • Training
  • Register
    • Conference Tickets
    • Bootcamp Tickets
    • AI Expo Tickets
    • AIx Summit
    • Career Expo Tickets
    • Bring Your Team
  • Speakers
    • Call for Speakers
    • East Speakers
  • Partner
    • Partner with ODSC
    • Meet our Partners
    • Partner Brochure
    • Hiring
    • AI Expo Hall
  • Info
    • Media Pass
    • Discounts
    • Volunteer
    • Scholarship Passes
    • Conference Guide
    • FAQ
Oct
09

Deep Learning-Driven Text Summarization & Explainability with Reuters News Data

  • Posted By : odscadmin/
  • 0 comments /
  • Under : Deep Learning, NLP

Image credit: REUTERS/Dominic Ebenbichler for Reuters news data

Editor’s note: At ODSC West 2020, Nadja Herger, Nina Hristozova, and Viktoriia Samatova will hold a workshop focused on text summarization and that will allow you to automatically generate news headlines powered by Reuters News, and learn about the power of transfer learning and explainable AI.

Natural Language Processing (NLP) is one of the fastest-moving fields within AI and it encompasses a wide range of tasks, such as text classification, question-answering, translation, topic modeling, sentiment analysis, and summarization. Here, we focus on text summarization, which is a powerful and challenging application of NLP.

Summarization & Transfer Learning

When discussing summarization, an important distinction to make is between extractive and abstractive summarization. Extractive summarization refers to the process of extracting words and phrases from the text itself to create a summary. Abstractive summarization more closely resembles the way humans write summaries [link]. The key information of the original text is maintained using semantically consistent words and phrases. Due to its complexity, it relies on advances in Deep Learning to be successful [source].

Here, we investigate the automatic generation of headlines from English news articles across all content categories based on the Reuters News Archive, which is professionally produced by journalists and strictly follows rules of integrity, independence and freedom from bias [source]. The headlines themselves are considered fairly abstractive, with over 70% of bi-grams, and over 90% of 3-grams being novel.

We see a trend towards pre-training Deep Learning models on a large text corpus and fine-tuning them for a specific downstream task (also known as transfer learning) [source]. This has the advantage of reduced training time, as well as needing less training data to achieve satisfactory results. Due to the democratization of AI, we observe a leveling of the playing field where everyone can get hold of these models and adapt them for their use cases. We finetuned a state-of-the-art summarization model on Reuters news data, which significantly outperformed the base model itself. An example of a tokenized, unformatted article text and associated machine-generated headline is shown below. The original article text was published by Reuters in October 2019 [link].

Explainability

Do you trust this automatically generated news headline? Researchers commonly rely on the ROUGE score to evaluate the model’s performance for such a task [source]. In its most basic form, it essentially measures the overlap of n-grams between the machine-generated and human-written summaries. If I told you that the model has a ROUGE score of around 45 on the hold-out set, is that sufficient for you to trust the prediction on a previously unseen article text?

How can we increase trust in what the model generated? The move towards more complex models for NLP tasks makes the need for explainable AI more apparent. Explainable AI is an umbrella term for a range of techniques, algorithms, and methods, which accompany outputs from AI systems with explanations [source]. As such, it addresses the often undesired black-box nature of many AI systems, and subsequently allows users to understand, trust, and manage AI solutions. The desired level of explainability depends on the end user [source]. Here, we are interested in making the model output explainable to a potential reviewer rather than for example an AI system builder, who would have different expectations in terms of technical details.

Let us take a look at how adding an explainability feature can support us in our task of verifying if the machine-generated headline is factually accurate. In addition to just generating the headline, we can gain insights into the most relevant parts of the news article. The illustration below builds upon the example shared earlier, by adding highlights to the article text in Reuters news data.

Reuters news data

The darker the highlights, the more important a given word for the resulting headline text. This makes it significantly easier to verify the headline itself. Particularly the first sentence seems to have the largest impact on the generated headline. Interestingly, it refers to “until early next year” rather than “until early 2019”. The year “2019” never occurs in the article text. For this particular example, we actually have access to the human-written headline, see screenshot of the original article from the Reuters site below.

Reuters news dataKnowing that the article was published in 2019, it is evident that “early next year” refers to the year 2020 rather than 2019, and thus renders the machine-generated headline partially inaccurate. We believe that verifying machine-generated headlines with an extra layer of explainability leads to increased trust and easier detection of biases or mistakes.

For more details on text summarization and Reuters news data, the power of transfer learning, as well as adding explainability for increased trust, join us for our hands-on workshop at ODSC West in October. You will walk away with an interactive notebook to get a head start in applying these concepts to your own challenges!


Nadja Herger is a Data Scientist at Thomson Reuters Labs, based in Switzerland. She is primarily focusing on Deep Learning PoCs within the Labs, where she is working on applied NLP projects in the legal and news domains, applying her skills to text classification, metadata extraction, and summarization tasks.

 

 

 

Nina Hristozova is a Data Scientist at Thomson Reuters (TR) Labs. She has a BSc in Computer Science from the University of Glasgow, Scotland. As part of her role at TR she has worked on a wide range of projects applying ML and DL to a variety of NLP problems. Her current focus is on applied summarization of legal text.

 

 

 

Viktoriia Samatova is a Head of Applied Innovation team of Data Scientists within Reuters Technology division focused on discovering and applying new technologies to enhance Reuters products and improving efficiency of news content production and discoverability.


Oct
02

The Fashion Industry is Impactful – Let’s Make it Positive With AI

  • Posted By : odscadmin/
  • 0 comments /
  • Under : NLP

Pollution, bad working conditions, and animal welfare are topics that are often pushed away by profit. This certainly holds for the modern ‘fast-fashion’ industry which produces most of our clothes. Fortunately, some brands and initiatives are making an effort, but how do you know as a consumer?

It is difficult to make a well-informed choice when buying sustainable clothes …

Searching for information is very time-consuming. Sustainability information is scattered over several sources. Websites, blogs, and books provide valuable guidelines about clothing brands that try to make a difference. Clothing brands provide more and more information on their website about sustainability. Unfortunately, it takes a lot of manual work to gather and build up this sustainability information, therefore the available information is often outdated.

… but Artificial Intelligence can do the job! 

We believe that the (fashion) industry can move towards sustainability by automating determining sustainable brands. Therefore, we use scraping, artificial intelligence, natural language processing, and explainability, to provide more sustainable clothing information, faster than current approaches.

In data science, you often start with a dataset that needs a lot of cleaning. In this case, we had to build from scratch by first making a database that contained all the clothing brands. Second, for all clothing brands, their homepage was added. From here, the scraping fun could start. However we did not have infinite recourses, so we had to be smart about scraping exactly the right amount of data.

While investigating the data for the first time, after some preprocessing, we got the feeling that the model could give valuable insights. These two word-clouds contain the most frequent words per class and already show a clear difference. Interestingly, we mainly see brand names in the ‘not-sustainable’ word-cloud. Fortunately, the model does not train on the names of the specific brands, which we know through explainability.

Explainability is crucial for people to believe the outcomes. In our case, it also improved the quality of our preprocessing. As a result, we now have several models with over 80 percent accuracy. For the outcomes that are still too uncertain, we make a class ‘unknown’. In this way, people can trust the outcome of the model.

Information at your fingertips

We were quite proud and happy at this point, however, a model itself does not accomplish anything. In our opinion, this model could make a change by providing its output publicly. Therefore we built goodbase.ai. This website provides sustainability information on every clothing brand (we knew). You can add brands yourself that will automatically be assessed.

When you think about the online buying process of clothing, it is still a hassle though to constantly check another website while shopping. You want the information at your fingertips without switching websites. Therefore we built a plugin together with project Cece, that we will launch in the second week of September.

Next steps

Although we are proud of the great things we have accomplished, we do know a lot more impact can be made. Both at the technical side as well as in terms of usage.

This is how we see the future, how do you see it?

  • Use goodbase.ai and the plugin and provide us with feedback/add brands etc
  • Spread the word
  • Help improve the code/build new features: https://gitlab.com/thehup/goodbase.ai/duurzame-benchmark

Our database will provide a piece of the sustainability puzzle. Do you want to know more about how we do this and how you can contribute?

Be part of the change!

If you want to know more about why we built the model: https://amsterdamdatacollective.com/2019/12/04/blog-could-artificial-intelligence-help-you-buy-sustainable-clothing/ or please get in touch.


About the Author/ODSC Europe 2020 speaker: Joanneke Meijer is a Manager at Amsterdam Data Collective and initiator of the sustainable benchmark. She is an experienced data science consultant, focusing on forecasting, pricing, operational research, and text mining. By coaching teams towards delivering actionable insights from data, she continuously manages to create a sustainable impact.


Oct
02

State-of-the-Art Text Classification Made Easy

  • Posted By : odscadmin/
  • 0 comments /
  • Under : NLP

In Natural Language Processing (NLP), language models such as ULMFiT, BERT, and GPT have become the foundation of many solutions for common NLP tasks. The benefit of language models is their ability to be pre-trained with a general understanding of language, such that users can fine-tune models on significantly less data and achieve better performance than when starting from scratch. Prior to language models, NLP models required enough data to simultaneously learn a language and a task, such as classification.

At Novetta, we achieved amazing performance with language models, but it was difficult for developers and new data scientists to train and deploy their own models. To address this, we decided to streamline the implementation of state-of-the-art models for different NLP tasks. We built an open-source framework, AdaptNLP, that lowers the barrier to entry for practitioners to use advanced NLP capabilities. AdaptNLP is built atop two open-source libraries: Transformers (from Hugging Face) and Flair (from Zalando Research). AdaptNLP enables users to fine-tune language models for text classification, question answering, entity extraction, and part-of-speech tagging.

Example: Text Classification

To demonstrate how AdaptNLP can be used for language model fine-tuning and training, we will fine-tune a pre-trained language model from Transformers for sequence classification, also known as text classification.

Using AdaptNLP starts with a Python pip install.

pip install adaptnlp

First, we import EasySequenceClassifier, which abstracts the sequence classification task to its most basic components such as data preprocessing, inference, and training. We can then instantiate the EasySequenceClassifier class object to start training our own custom sequence classification model.

from adaptnlp import EasySequenceClassifier

classifier = EasySequenceClassifier()

To train a sequence classification model with our `classifier`, we need to prepare our data and our training hyperparameters.

AdaptNLP is tightly integrated with Hugging Face’s nlp library, so we will import nlp and load in the “ag_news” dataset. The AG News dataset is a collection of news articles labeled as one of four classes: world, sports, business, or sci/tech. This makes it a perfect multi-class dataset for us to train our classifier on. If you’d like, explore the dataset on Hugging Face’s nlp Viewer UI and try out the amazing nlp library in general.

Note: The classifier can be trained with CSV data file path inputs as well as nlp.Dataset inputs.

from datasets import load_dataset

train_dataset, eval_dataset = load_dataset('ag_news', split=['train[:10%]', 'test'])

https://odsc.com/europe/

Now that we have our train and evaluation/test datasets, we can now create the training arguments object from the transformers library, TrainingArguments. This lets us specify parameters and hyperparameters for training the classifier such as output paths, epochs, batch size, and weight decay. Wonderful, extensive documentation on TrainingArguments can be found on Hugging Face’s documentation site.

From transformers import TrainingArguments

 
training_args = TrainingArguments(

output_dir='./models',

num_train_epochs=1,

per_device_train_batch_size=16,

per_device_eval_batch_size=16,

warmup_steps=500,

weight_decay=0.01,

evaluate_during_training=True,

logging_dir='./logs',

save_steps=100

)

We can then start training by running the classifier’s built-in `train()` method, which takes in the train and eval datasets and the `training_args` variable we created. Besides specifying the text and label column names, you will now specify the pre-trained language model to fine-tune.

classifier.train(

training_args=training_args,

train_dataset=train_dataset,

eval_dataset=eval_dataset,

model_name_or_path="bert-base-cased",

text_col_nm="text",

label_col_nm="label",

)

Important Note: In this example, we will use the “bert-base-cased” pre-trained language model with a sequence classification head. However, you can use nearly any pre-trained language model. Try out a pre-trained DistilBert or an Electra model,  a custom fine-tuned model, or any model in Hugging Face’s model repository.

After training is completed, all artifacts and metadata such as checkpoints, model files, configs, and logs will be located in the directory paths specified in your `training_args` for `output_dir` and `logging_dir`. In this example, they are in “./models” and “./logs”.

You will then run a final evaluation with the built-in `evaluate() method to see how well your model performs by calculating metrics on the eval/test dataset.

classifier.evaluate()

Outputs:

{'epoch': 1.0,

 'eval_accuracy': 0.9019736842105263,

 'eval_f1': array([0.90401969, 0.9692994 , 0.85683646, 0.87806097]),

 'eval_loss': 0.295024262882377,

 'eval_precision': array([0.9408082 , 0.96650968, 0.87322404, 0.8358706 ]),

 'eval_recall': array([0.87      , 0.97210526, 0.84105263, 0.92473684])}

Great! You’ve successfully fine-tuned and trained your own sequence classifier for the AG_News dataset. Now let’s explore the data and model objects in more detail

The EasySequenceClassifier object can dynamically load and run mini-batch inference on nearly any Transformers model, including the one you just trained. You can load the model and run mini-batch inference with the built-in `tag_text` method.

text = [

"The batter up went for the run and scored a touch down.",

     "The engineer designed rocket fuel that can take us to mars.",

     "The president of the United States and the prime minister of Britain talked.",

     "The stock market went down as the economy took a hit from stuff."

]

 

results = classifier.tag_text(

text=text,

model_name_or_path = "./models",

mini_batch_size=2

)

print(results)

Outputs:

[Sentence: “The batter up went for the run and scored a touch down .”   [− Tokens: 13  − Sentence-Labels: {‘sc’: [World (0.0421), Sports (0.9516), Business (0.003), Sci/Tech (0.0033)]}], 

Sentence: “The engineer designed rocket fuel that can take us to mars .”   [− Tokens: 12  − Sentence-Labels: {‘sc’: [World (0.1011), Sports (0.0295), Business (0.0767), Sci/Tech (0.7928)]}],

Sentence: “The president of the United States and the prime minister of Britain talked .”   [− Tokens: 14  − Sentence-Labels: {‘sc’: [World (0.9544), Sports (0.003), Business (0.0335), Sci/Tech (0.0091)]}],

Sentence: “The stock market went down as the economy took a hit from stuff .”   [− Tokens: 14  − Sentence-Labels: {‘sc’: [World (0.0243), Sports (0.0013), Business (0.9655), Sci/Tech (0.0089)]}]]

While you’re at it, you can try to run `tag_text()` with a different model fine-tuned on AG_News from Hugging Face’s model repository to see how your custom trained model fares.

Fine-Tuning Language Models

To go beyond only fine-tuning a classifier from general-domain language models, you can use AdaptNLP’s `LMFineTuner` to fine-tune a language model on your target task data. Data from your target task will typically have a different distribution or topic domain from a general-domain language model, so fine-tuning a language model on your target task data can help it “adapt” to your data.

For more information on these techniques, and AdaptNLP in general, visit our documentation site for tutorials, guides, class reference documentation, and more.

A fine-tuned language model can be trained and easily be integrated into user-built systems by providing state-of-the-art text-based classifications. By standardizing the input and output data and function calls, developers can easily use NLP algorithms regardless of which model is used in the backend. Before AdaptNLP, we integrated each version of the latest released model and pre-trained weights, then reiterated through a build for an NLP task pipeline. AdaptNLP streamlined this process to help us leverage new models in existing workflows without having to overhaul code.

Using the latest transformer embeddings, AdaptNLP makes it easy to fine-tune and train state-of-the-art token classification (NER, POS, Chunk, Frame Tagging), sentiment classification, and question-answering models. We will be giving a hands-on workshop on using AdaptNLP with state-of-the-art models at ODSC Europe 2020.

Follow us at @AdaptNLP and give us a star at www.github.com/novetta/adaptnlp!


About the author/ODSC Europe speakers:

Brian Sacash is a Machine Learning Engineer in Novetta’s Machine Learning Center of Excellence. He helps various organizations discover the best ways to extract value from data. His interests are in the areas of Natural Language Processing, Machine Learning, Big Data, and Statistical Methods. Brian holds a Master of Science in Quantitative Analysis from the University of Cincinnati and a Bachelor of Science in Physics from Ohio Northern University.

 

 

 

 

 

Andrew Chang is an Applied Machine Learning Researcher in Novetta’s Machine Learning (ML) Center of Excellence. Andrew is a graduate from Carnegie Mellon University who has a focus on researching state of the art machine learning models and rapid prototyping ML technologies and solutions across the scope of customer problems. He has an interest in open source projects and research in natural language processing, geometric deep learning, reinforcement learning, and computer vision. Andrew is the author and creator of NovettaNLP.


Aug
03

Gauging the State of the Economy with News Narrative and Sentiment

  • Posted By : odscadmin/
  • 0 comments /
  • Under : NLP

Advances in natural language processing have allowed to quantify the intuitive yet elusive notion of sentiment expressed in text and to test its predictive power in relation to changes in social systems.

Studies in cognitive sciences as well as economics have found that unsettling narrative preceded events such as the Great Depression in the 1920s and the Global Financial Crisis in 2008, suggesting that news sentiment is a means of forecasting the economy (1,2). ECB President Mario Draghi’s “whatever it takes” speech from July 2012 is a great example of a narrative that has impacted markets and the economy. These three words marked the turnaround of the euro crisis, purely achieved by Draghi’s verbal intervention.

Newspapers are a proven means for both individuals and institutions to share and distribute information. Most publications have an online presence and generate large amounts of data. This data includes information in the shape of sentiment and opinions about the economy, which is not yet reflected in macro-economic indicators.

The Global Database of Events, Language, and Tone (GDELT) (3) is a research collaboration that monitors the world’s newspapers from a multitude of perspectives, extracting items such as themes, emotions, events, mentions of organizations, and persons and locations for every news article analyzed almost in real-time.

News sentiment and the economic recovery following COVID-19

As the coronavirus spread around the world, governments in many countries were forced to impose strict lockdowns and temporarily close businesses. As a result, a lot of companies are struggling to survive. Many had to lay off employees, leading to a spike in unemployment.

The chart shows the average tone, financial uncertainty, and confidence indices based on emotions from GDELT, with news items filtered thematically for “economic growth.”

Net sentiment (i.e. positive minus negative tone) from global newspapers has improved notably since bottoming in April this year.

Levels of financial uncertainty peaked at the height of the outbreak in early spring. Financial uncertainty has since been declining but remains at elevated levels.

Confidence increased until February, perhaps reflecting confidence in local governments’ ability to contain the virus. The index plunged in March as lockdowns were imposed in countries around the world. Confidence moderately recovered in April but moved mostly sideways in May and June as the longer-term economic repercussions from the COVID-19 outbreak became apparent.

According to net sentiment, recovery is well on underway. However, the two other indices tell a somewhat different story. The coronavirus is likely to have a powerful impact on confidence as consumers are staying at home for a prolonged period, possibly feeling pessimistic about the future. With heightened levels of financial uncertainty and worsening financial conditions, household consumption typically falls as savings go up, weighing on economic growth.

Thoughts and conclusions

Net sentiment lacks the insights that more specific emotions can convey about the economic recovery, as it merely indicates whether positive outweighs negative sentiment. More specific emotions from news narratives can help form a clearer view of the present state of the economy. Both confidence and financial uncertainty suggest that the global economy is only at the beginning of a recovery from this year’s COVID-19 outbreak. The shape and speed of the recovery depend on factors like the economic impact of physical distancing, the effectiveness of government support packages, as well as the pace of the easing of lockdown restrictions and the occurrence of a second wave.

Traditional GDP forecasts are unreliable during normal market conditions and become even more challenging at present when the virus’s trajectory is unknown. It is difficult to capture the impact of COVID-19 on consumer and business behavior in a timely manner and thus to estimate a likely recovery path.

To find out more about forecasting the economy with news, narrative, and emotions, join me for my talk at ODSC Europe, “Forecasting the Economy with Fifty Shades of Emotions.”

(1) Robert J. Shiller. NARRATIVE ECONOMICS. Jan. 2017
(2) David Tuckett et al. “Bringing Social Psychological Variables into Economic Modeling: Uncertainty, Animal Spirits and the Recovery from the Great Recession”. In: IEA World Conference, Jordan (2014)
(3) gdeltproject.org


More on the author/speaker:

Sonja Tilly is a PhD candidate at UCL. Her research focuses on forecasting macro-economic variables and stock market movements using narrative and emotions from global newspapers.

Sonja has over a decade of experience working in asset management, most recently at Quoniam Asset Management, where she contributed to the development of trading strategies using media sentiment. Prior to that, she was a Quantitative Analyst at Hiscox. Sonja is a CFA Charterholder.


Jun
04

Accelerate Your NLP Pipelines Using Hugging Face Transformers and ONNX Runtime

  • Posted By : odscadmin/
  • 0 comments /
  • Under : NLP

This post was written by Morgan Funtowicz from Hugging Face and Tianlei Wu from Microsoft

Transformer models have taken the world of natural language processing (NLP) by storm. They went from beating all the research benchmarks to getting adopted for production by a growing number of companies in a record number of months. Some of the applications of these models include text classification, information extraction, text generation, machine translation, and summarization.

However, given the complexity in the underlying architecture, these transformer models are still hard to train and deploy at scale. Training can take days and the process of fine-tuning critical parameters is involved and complex. Transformer models also need highly scalable and available environments for inference and deployment.

Today we are sharing how the ONNX Runtime team and Hugging Face are working together to address and reduce these challenges in training and deployment of Transformer models. The result is a solution that simplifies training and reduces costs for inferencing.

Making NLP more Accessible

Hugging Face is a company creating open-source libraries for powerful yet easy to use NLP like tokenizers and transformers. The Hugging Face Transformers library provides general purpose architectures, like BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, and T5 for Natural Language Understanding (NLU) and Natural Language Generation (NLG). It currently includes thousands of pretrained models in 100+ languages. These models are both easy to use, powerful and performant for many NLP tasks. Model training, evaluation, and sharing can be achieved through a few lines of code. The library also enables deep interoperability between PyTorch and TensorFlow and the flexibility to select the right framework for training, evaluation, and deployment.

ONNX Runtime helps accelerate PyTorch and TensorFlow models in production, on CPU or GPU. As an open source library built for performance and broad platform support, ONNX Runtime is used in products and services handling over 20 billion inferences each day. ONNX Runtime has optimizations for transformer models with up to 17x speedup. These improvements in latency, throughput, and costs make deploying transformer models more practical.

You can now use ONNX Runtime and Hugging Face Transformers together to improve the experience of training and deploying NLP models. Hugging Face has made it easy to inference Transformer models with ONNX Runtime with the new convert_graph_to_onnx.py which generates a model that can be loaded by ONNX Runtime.

Higher performance NLP inference

Inference performance is dependent on the hardware you run on, the batch size (number of inputs to process at once), and sequence length (size of the input). If you have access to a GPU, inferencing will be faster than on a CPU. While larger batch sizes are useful during training and offline processing, we typically use batch size of 1 for online inferencing. Sequence lengths vary based on the scenario: shorter lengths are used for processing queries while Q&A and summarization scenarios use longer sequence lengths.

We measured the latency of three Hugging Face Transformer models using several batch sizes and sequence lengths on the same CPU and GPU configurations. CPU performance measurement was done on a desktop machine with an Intel® Xeon® E5-2620 v2 processor containing 12 logical cores. For GPU, we used one NVIDIA V100-PCIE-16GB GPU on an Azure Standard_NC12s_v3 VM and tested both FP32 and FP16. We used an updated version of the Hugging Face benchmarking script to run the tests. For PyTorch, we used PyTorch 1.5 with TorchScript. For PyTorch + ONNX Runtime, we exported Hugging Face PyTorch models and inferenced with ONNX Runtime 1.3.

On a GPU in FP16 configuration, compared with PyTorch, PyTorch + ONNX Runtime showed performance gains up to 5.0x for BERT, up to 4.7x for RoBERTa, and up to 4.4x for GPT-2. We saw smaller, but still significant, speedups for GPU/FP32 and CPU configurations.

Smaller sequence lengths generally showed more gains than larger sequence lengths on GPU. Our detailed data is shared at the end of this post.

Get started

We’d like to show how you can incorporate inferencing of Hugging Face Transformer models with ONNX Runtime into your projects. You can also do benchmarking on your own hardware and models.

The steps are:

  1. Export your Hugging Face Transformer model to ONNX

Run the conversion script located at transformers/convert_graph_to_onnx.py. This script takes a few arguments such as the model to be exported and the framework you want to export from (PyTorch or TensorFlow).

python convert_graph_to_onnx.py --framework pt --model bert-base-cased bert-base-cased.onnx

2. Apply latest ONNX Runtime Optimizations

ONNX Runtime automatically applies most optimizations while loading the model. Some of the latest optimizations that are not yet integrated into ONNX Runtime are available as a script that tunes models for the best performance.

You can access the optimization script and run it on your model with these commands:

pip install onnxruntime-tools 
python -m onnxruntime_tools.optimizer_cli --input bert-base-cased.onnx --output bert-base-cased.onnx --model_type bert

The --mode_type parameter triggers specific optimization strategies. The script also provides a flag --float16 to leverage mixed precision performance gains from newer GPUs. You should also use this script if you are using the TensorFlow version of the models. The options and usage are further described in the ONNX Runtime repository.

3. Inference with ONNX Runtime

ONNX Runtime is written in C++ for performance and provides APIs/bindings for Python, C, C++, C#, and Java. It’s a lightweight library that lets you integrate inference into applications written in a variety of languages.

Below is what the code looks like in Python. In Python, we use the tokenizer from the Hugging Face library. In other languages you may need to implement your own tokenizer to process the string input and turn it into tensors the model expects as inputs.

You can find a notebook showing all the steps in the Hugging Face GitHub repo in the link below.

GitHub: huggingface/transformers

Resources

We hope this has inspired you to try out Hugging Face Transformer models with ONNX Runtime. We’d love to hear about your experiences in the comments. The ONNX Runtime team is continually improving performance, so keep an eye out for even more improvements on more models. You can also participate in our GitHub repos (Hugging Face Transformers library and ONNX Runtime).

In future blogs we’ll discuss more optimizations, including how you can use quantization to reduce the size of your model and improve performance in newer hardware. Stay tuned!


Performance Results

Latencies below are measured in milliseconds. PyTorch refers to PyTorch 1.5 with TorchScript. PyTorch + ONNX Runtime refers to PyTorch versions of Hugging Face models exported and inferenced with ONNX Runtime 1.3.

BERT

RoBERTa

GPT-2

For the GPT2 test, we disabled past state input/output. Enabling past state can help reduce computation by reusing intermediate results. Past state optimizations are being added to ONNX Runtime which will further help improve performance when using large sequence sizes.


12
Categories
  • Accelerate AI (5)
  • Career (4)
  • Data Science (41)
  • Data Visualization (5)
  • Deep Learning (12)
  • Machine Learning (37)
  • NLP (10)
  • Python (4)
  • R (1)
  • Statistics (1)
Recent Posts
  • Top ODSC Europe 2020 Sessions Available for Free On-Demand October 9,2020
  • Announcing ODSC APAC Dec 8-9 October 9,2020
  • Announcing the ODSC Ai x West Business and Innovation Summit This Oct 29-30 October 9,2020
Open Data Science

Open Data Science
Innovation Center
101 Main St
Cambridge, MA 02142
info@odsc.com

Menu
  • Partner with ODSC
  • Blog
  • Training
  • Jobs
  • FAQ
Conferences
  • East 2021
  • West 2021
  • Europe 2021
  • APAC 2020
Extras
  • Newsletter
  • About
  • Code of Conduct
  • Privacy Policy
Copyright ODSC 2020. All Rights Reserved
Close