Technology Toolkit 2021 is a technical white paper describing core technologies that
are being researched and developed by Samsung SDS R&D Center. We would like to introduce in this paper a total of
seven technologies concerning AI, Blockchain, Cloud, and Security with details on their technical definition, key
features, differentiating points, and use cases to give our readers some insights into our work.
As AI technology advances, the demand for AI services is increasing. The process of building an AI service involves a
lot of work. For example, one of them includes creating a correct answer for training and finding a model with high
training efficiency. The performance of AI technology will be compromised if activities in the process, albeit small
one, are performed negligently. As a result, many researchers and developers still carry out the work manually. Of the
numerous tasks involved in the process, the most basic and the most labor- consuming task is labeling, which involves
making a correct answer for the data.
Since AI models tend to perform better with more data used for training, a lot of labeled data is required. Labeling
hundreds of thousands or millions of data manually, however, takes a lot of labor and time. Moreover, given that an
incorrect label adversely affects the AI model's performance, it is necessary to review the label for correctness.
Because of this labeling hassle, there is a growing market demand for technologies that can reduce manual work. Our
goal is to help minimize the cumbersome manual labeling work consuming many engineers with the help of automatic
labeling technology, a technology that is adopted by multiple global companies including Amazon, IBM, and Microsoft,
as well as a slew of start-ups.
Labeling is the task of producing a correct answer for given data, and here the correct answer is called a label. For
deep learning to undergo supervised learning, the data needs to be properly labeled. And accurate labeling is critical
as supervised learning based on incorrect labeling will degrade the model's performance.
There are many deep learning technologies that require labeling, such as image processing and natural language
processing. Among them, we introduce labeling for Image Classification, Object Detection, Image Segmentation, and Text
Analysis, and briefly touch upon the application of Active Learning technology. Let's take a look.
① Computer Vision (CV)
• Image Classification
The objective of image classification is to identify the class the image falls under when there are multiple classes
to choose from. For example, let’s say you are given hundreds of thousands of dog/cat images and need to sort
these images as either dog or cat. Here dog and cat are the given classes and this act of classifying a dog image as a
“dog” and a cat image as a “cat” is the image classification.
The labeling required for image classification is creating a label by identifying which class of the given classes
each image falls under. Usually, a single label is assigned to a single image, but sometimes multiple labels are
assigned to a single image depending on the task.
• Object Detection
The purpose of object detection is to find all objects in an image that are associated with the given class. Letgiven
classes each iman image of dogs and cats and you need to detect dogs and cats in the image. Here the given classes are
dogs and cats, and the task of identifying and marking all dogs and cats in the image is called object detection. The
labeling required for object detection is to assign the correct class and the location of all objects in the image
that are associated with the given class.
• Image Segmentation
Image segmentation can be largely divided into semantic segmentation and instance segmentation. The purpose of
semantic segmentation is to identify which of the given classes each pixel in an image falls under, and instance
segmentation is to identify an object in the image that associated with the given class and mark the pixels
corresponding to that object. For example, suppose there is a picture of three dogs overlapped side by side on a
field. Here semantic segmentation is marking each pixel as “dog” without distinguishing the three dogs
whereas instance segmentation is separating the these dogs and marking each pixel as “dog-1”,
“dog-2”, and “dog-3”.
The labeling required in semantic segmentation is identifying which class each pixel of the image falls under whereas
the labeling required in instance segmentation is to assign class that is appropriate to the pixels of all objects
belonging to a given class.
② Natural Language Processing (NLP)
• Named Entity Recognition
The objective of named entity recognition is to extract predefined entities from a sentence. In other words, it is the
task of identifying whether or not a particular word in the sentence belongs to the predetermined entity. Let’s
take a look at the following example: “Charles goes to school”. Here, the named entity recognition is the
task of classifying “Charles” as “a name of a person” and “school” as a
“place”. The labeling required in named entity recognition is assigning an entity name that is right for
the word within the sentence.
• Intent Classification
The objective of intent classification is to classify the intent of a sentence. For example, with the sentence "Please
give me a cup of Americano," intent classification is classifying the sentence as having the intention of "buying."
The labeling required in intent classification is assigning the right intent to each sentence.
• Active Learning
Active learning was developed to create high-performance deep learning model when there are unlabeled datasets. Active
learning methodology doesn't wait for researchers and developers to label all unlabeled datasets. The current deep
learning model is used to make a judgment on the given dataset, and present some of the most difficult-to-assess data
to researchers and developers. Then, researchers and developers manually label the subject data with priority, and the
model proceeds with the learning containing the newly labeled data. The re-trained deep learning model repeats the
earlier step of making a judgment on the given dataset and again present researchers and engineers with so some of the
most difficult-to-assess data. This way you can get a high-performance deep learning model faster as the deep learning
model gets refined with priority given to most complex data.
Auto-labeling is a business solution that allows you to quickly generate labels for your training data through an
intelligent process of training your model by selectively labeling key data. It manually label a small amount of
unlabeled data, and learns label information and other labeled data together to quickly label the rest of the data. It
provides an automated labeling process by manually labeling data that is difficult to determine the label and
automatically labeling the confirmed data.
Manual labeling involves researchers and developers intervening in labeling the data. Labels created through this
process are usually established labels. Automatic labeling technology provides a dedicated labeling tool that makes
manual labeling easier and faster.
Automatic labeling refers to a label predicted through a deep learning model. Labels created through this process are
not confirmed labels. This predicted label is used for learning using predicted labels, such as semi-supervised
learning, or for making manual labeling easier. Auto-labeling automatically labels unlabeled data using a small amount
of labeled data.
Automatic review analyzes data that has already been labeled and is used to improve data quality and improve learning
performance by separating or integrating existing labels.
If multiple people collaborate to create large amounts of labels, Label Manager can help you create labels
efficiently. Administrators group the workers that produce labels to create label generation jobs. When jobs are
created, the Label Manager distributes the data to the group workers according to the settings of jobs. The
administrator can check the progress and the labels of each worker through Label Manager.
Automatic labeling samples images from a completely unlabeled state. Sampling extracts features from unlabeled images
in a dataset, and uses its own algorithm to select the desired number of images.
a. Has its own data sampling technology using Deep Features.
∙ Sampling technology that operates in an initial state without label.
∙ Improved performance by about 6% vs. random.
b. Has its own training model to improve performance.
∙ Recommend manual labeling image through Curriculum Learning method.
∙ 4~10% difference in average accuracy of Curriculum Learning vs. random.
The technology provides the industry's highest level of labeling performance with labeling accuracy of 98.1% compared
to 80% of public data and an automation rate of up to 1.8 times higher than that of Company A.
Automatic labeling can be used for inspection of cell phone appearance and semiconductor wafer defects. You can save
time and cost by automatically labeling for classification of images with poor appearance. Automatic labeling can be
used to quickly generate labeling data sets for inspection of paint appearance defects that occur during the
automotive manufacturing process.
Multilayer Ceramic Capacitor (MLCC) data that is thinner than human hair requires more time for labeling.
Auto-labeling can save labeling time by labeling only some data (20%) and automatically labeling the remaining 80% of
data. In addition, when doing manual labeling, it is easier to label large amounts of data using a dedicated labeling
tool.
Currently, automatic labeling is expanding in trials and testing in public institutions and in a number of
companies.
So far, we have looked at the technology and application examples of automatic labeling. We started research on
automatic labeling in the process of finding a way to not manually label the data required for deep learning model
training every time. Since 2019, the technology has been applied mainly to manufacturing sites, and it is now
expanding its application to various fields such as finance and medical care. We therefore realize the need to secure
and understand field data more while applying automatic labeling to various industries. Automatic labeling will
continue to be applied to industrial sites for better performance improvement, thereby securing world-class
technological competitiveness and market differentiation.
※ In regards to Auto Labeling, Samsung SDS research centers in both Korea and America are conducting research &
development.
# Reference
https://www.samsungsds.com/kr/ai-dl/brightics-deep-learning.html
▶ The content is proected by law and the copyright belongs to the author.
▶ The content is prohibited to copy or quote without the author's permission.
ML Research Team at Samsung SDS R&D Center
Seongwon Park has participated in the research and development of automatic labeling and distributed learning of the Brightics DL solution, and he is currently participating in the development of various platforms and technology research for AI.
If you have any inquiries, comments, or ideas for improvement concerning technologies introduced in Technology Toolkit 2021, please contact us at techtoolkit@samsung.com.