import tensorflow as tf
# Get the GPU device name.
device_name = tf.test.gpu_device_name()
# The device name should look like the following:
if device_name == '/device:GPU:0':
    print('Found GPU at: {}'.format(device_name))
else:
    raise SystemError('GPU device not found')

Found GPU at: /device:GPU:0

Collecting transformers
  Downloading https://files.pythonhosted.org/packages/81/89/f07e7a884072ad37b1b6b1578637ab36152e0251d74abb950d967a59904e/transformers-4.3.1-py3-none-any.whl (1.8MB)
     |████████████████████████████████| 1.8MB 8.1MB/s 
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (1.19.5)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from transformers) (20.9)
Requirement already satisfied: dataclasses; python_version < "3.7" in /usr/local/lib/python3.6/dist-packages (from transformers) (0.8)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers) (2.23.0)
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
     |████████████████████████████████| 890kB 35.7MB/s 
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (2019.12.20)
Requirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers) (3.0.12)
Collecting tokenizers<0.11,>=0.10.1
  Downloading https://files.pythonhosted.org/packages/fd/5b/44baae602e0a30bcc53fbdbc60bd940c15e143d252d658dfdefce736ece5/tokenizers-0.10.1-cp36-cp36m-manylinux2010_x86_64.whl (3.2MB)
     |████████████████████████████████| 3.2MB 50.4MB/s 
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from transformers) (3.4.0)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.6/dist-packages (from transformers) (4.41.1)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->transformers) (2.4.7)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2020.12.5)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (1.24.3)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (1.15.0)
Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (7.1.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (1.0.0)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.4.0)
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... done
  Created wheel for sacremoses: filename=sacremoses-0.0.43-cp36-none-any.whl size=893261 sha256=3aa67db83a382f762ad0378543630bf82a3a600163fea86b18de5c255a4652f5
  Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45
Successfully built sacremoses
Installing collected packages: sacremoses, tokenizers, transformers
Successfully installed sacremoses-0.0.43 tokenizers-0.10.1 transformers-4.3.1

array([10,  0,  5])

트레인 데이터 크기 : 2983
테스트 데이터 크기 : 743

text     object
label     int64
dtype: object

array(['During the 1920s , Great Britain , terrified of having to face depreciation of the pound as a result of running the printing presses , pressured the United States and other powers to inflate in concert with the Bank of England , leading to another global inflationary cycle and economic bubble that burst in 1929 with the great stock market crash .',
       'Those who would save the Amazon from disastrous helter-skelter deforestation are also working out the details of schemes to certify soy , cattle , and other Amazonian products as having been sustainably produced ; already there is in place in Brazil a two-year moratorium on forest destruction to plant soy .',
       'However , the report noted that , `` Deaths were not classified as being due to coalition forces if households had any uncertainty about the responsible party ; consequently , the number of deaths and the proportion of violent deaths attributable to coalition forces could be conservative estimates .',
       ...,
       'The key to successfully using nullification is to expose the federal government as the aggressive , unconstitutional usurper , and states would be wise to not directly confront them .',
       'Finally , there is the chronic flaw in all those pie chart-type analyses purporting to show how terrible the distribution of wealth is in the United States -- the failure to relate age to wealth , income , and productivity .',
       'A free people must never accept this principle , which can only lead to domination by the artificial entity of the corporation , which is driven by the singular yardstick of profit .'],
      dtype=object)

array([1, 0, 0, ..., 1, 1, 0])

Loading BERT tokenizer...

Original:  Those who would save the Amazon from disastrous helter-skelter deforestation are also working out the details of schemes to certify soy , cattle , and other Amazonian products as having been sustainably produced ; already there is in place in Brazil a two-year moratorium on forest destruction to plant soy .
Token IDs: [101, 2216, 2040, 2052, 3828, 1996, 9733, 2013, 16775, 2002, 21928, 1011, 15315, 20042, 2121, 13366, 25794, 2024, 2036, 2551, 2041, 1996, 4751, 1997, 11683, 2000, 8292, 28228, 12031, 25176, 1010, 7125, 1010, 1998, 2060, 9733, 2937, 3688, 2004, 2383, 2042, 15770, 8231, 2550, 1025, 2525, 2045, 2003, 1999, 2173, 1999, 4380, 1037, 2048, 1011, 2095, 26821, 24390, 2006, 3224, 6215, 2000, 3269, 25176, 1012, 102]

Max sentence length:  107

https://medium.com/@aniruddha.choudhury94/part-2-bert-fine-tuning-tutorial-with-pytorch-for-text-classification-on-the-corpus-of-linguistic-18057ce330e1

https://web.stanford.edu/class/cs224n/reports/custom/report43.pdf

https://www.allsides.com/media-bias/media-bias-ratings

import tensorflow as tf
# Get the GPU device name.
device_name = tf.test.gpu_device_name()
# The device name should look like the following:
if device_name == '/device:GPU:0':
    print('Found GPU at: {}'.format(device_name))
else:
    raise SystemError('GPU device not found')

Found GPU at: /device:GPU:0

Hugging Face에 있는 transformers 패키지를 설치한다. Hugging Face는 BERT를 사용하기 위한 파이토치 인터페이스를 사용할 수 있게 한다.

!pip install transformers

Collecting transformers
  Downloading https://files.pythonhosted.org/packages/81/89/f07e7a884072ad37b1b6b1578637ab36152e0251d74abb950d967a59904e/transformers-4.3.1-py3-none-any.whl (1.8MB)
     |████████████████████████████████| 1.8MB 8.1MB/s 
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (1.19.5)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from transformers) (20.9)
Requirement already satisfied: dataclasses; python_version < "3.7" in /usr/local/lib/python3.6/dist-packages (from transformers) (0.8)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers) (2.23.0)
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
     |████████████████████████████████| 890kB 35.7MB/s 
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (2019.12.20)
Requirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers) (3.0.12)
Collecting tokenizers<0.11,>=0.10.1
  Downloading https://files.pythonhosted.org/packages/fd/5b/44baae602e0a30bcc53fbdbc60bd940c15e143d252d658dfdefce736ece5/tokenizers-0.10.1-cp36-cp36m-manylinux2010_x86_64.whl (3.2MB)
     |████████████████████████████████| 3.2MB 50.4MB/s 
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from transformers) (3.4.0)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.6/dist-packages (from transformers) (4.41.1)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->transformers) (2.4.7)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2020.12.5)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (1.24.3)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (1.15.0)
Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (7.1.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (1.0.0)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.4.0)
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... done
  Created wheel for sacremoses: filename=sacremoses-0.0.43-cp36-none-any.whl size=893261 sha256=3aa67db83a382f762ad0378543630bf82a3a600163fea86b18de5c255a4652f5
  Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45
Successfully built sacremoses
Installing collected packages: sacremoses, tokenizers, transformers
Successfully installed sacremoses-0.0.43 tokenizers-0.10.1 transformers-4.3.1

import re
import sys
import random
import pandas as pd
import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torchtext import data, datasets
from transformers import BertTokenizer, BertModel
import tensorflow as tf

sentence, label 2개 칼럼을 가진 IBC dataset을 업로드한다.

train_df = pd.read_csv('/content/drive/Shareddrives/DeepTextLab/train_data.csv')
test_df = pd.read_csv('/content/drive/Shareddrives/DeepTextLab/test_data.csv')

train_df.label.unique()

array([10,  0,  5])

train_df = train_df.loc[train_df.label != 5]
test_df = test_df.loc[test_df.label != 5]

train_df.loc[train_df['label'] == 10, 'label'] = 1
test_df.loc[test_df['label'] == 10, 'label'] = 1

print('트레인 데이터 크기 :', len(train_df))
print('테스트 데이터 크기 :', len(test_df))

트레인 데이터 크기 : 2983
테스트 데이터 크기 : 743

print(test_df.dtypes)

text     object
label     int64
dtype: object

train_df.head(3)

train_df.rename(columns={'text':'sentence'}, inplace=True)
test_df.rename(columns={'text':'sentence'}, inplace=True)

train_df.head(3)

sentences = train_df.sentence.values
labels = train_df.label.values

sentences

array(['During the 1920s , Great Britain , terrified of having to face depreciation of the pound as a result of running the printing presses , pressured the United States and other powers to inflate in concert with the Bank of England , leading to another global inflationary cycle and economic bubble that burst in 1929 with the great stock market crash .',
       'Those who would save the Amazon from disastrous helter-skelter deforestation are also working out the details of schemes to certify soy , cattle , and other Amazonian products as having been sustainably produced ; already there is in place in Brazil a two-year moratorium on forest destruction to plant soy .',
       'However , the report noted that , `` Deaths were not classified as being due to coalition forces if households had any uncertainty about the responsible party ; consequently , the number of deaths and the proportion of violent deaths attributable to coalition forces could be conservative estimates .',
       ...,
       'The key to successfully using nullification is to expose the federal government as the aggressive , unconstitutional usurper , and states would be wise to not directly confront them .',
       'Finally , there is the chronic flaw in all those pie chart-type analyses purporting to show how terrible the distribution of wealth is in the United States -- the failure to relate age to wealth , income , and productivity .',
       'A free people must never accept this principle , which can only lead to domination by the artificial entity of the corporation , which is driven by the singular yardstick of profit .'],
      dtype=object)

labels

array([1, 0, 0, ..., 1, 1, 0])

BERT 토크나이저로 토크나이징

from transformers import BertTokenizer
# Load the BERT tokenizer.
print('Loading BERT tokenizer...')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

Loading BERT tokenizer...

input_ids = []
# For every sentence...
for sent in sentences:
    # `encode` will:
    #   (1) Tokenize the sentence.
    #   (2) Prepend the `[CLS]` token to the start.
    #   (3) Append the `[SEP]` token to the end.
    #   (4) Map tokens to their IDs.
    encoded_sent = tokenizer.encode(
                        sent,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        # This function also supports truncation and conversion
                        # to pytorch tensors, but we need to do padding, so we
                        # can't use these features :( .
                        # max_length = 128,          # Truncate all sentences.
                        # return_tensors = 'pt',     # Return pytorch tensors.
                   )
    
    # Add the encoded sentence to the list.
    input_ids.append(encoded_sent)
# Print sentence 0, now as a list of IDs.
print('Original: ', sentences[1])
print('Token IDs:', input_ids[1])

# Token ID 100 = [UNK]
# Token ID 101 = [CLS]
# Token ID 102 = [SEP]
# Token ID 0 = [PAD]

Original:  Those who would save the Amazon from disastrous helter-skelter deforestation are also working out the details of schemes to certify soy , cattle , and other Amazonian products as having been sustainably produced ; already there is in place in Brazil a two-year moratorium on forest destruction to plant soy .
Token IDs: [101, 2216, 2040, 2052, 3828, 1996, 9733, 2013, 16775, 2002, 21928, 1011, 15315, 20042, 2121, 13366, 25794, 2024, 2036, 2551, 2041, 1996, 4751, 1997, 11683, 2000, 8292, 28228, 12031, 25176, 1010, 7125, 1010, 1998, 2060, 9733, 2937, 3688, 2004, 2383, 2042, 15770, 8231, 2550, 1025, 2525, 2045, 2003, 1999, 2173, 1999, 4380, 1037, 2048, 1011, 2095, 26821, 24390, 2006, 3224, 6215, 2000, 3269, 25176, 1012, 102]

Padding & Truncating

print('Max sentence length: ', max([len(sen) for sen in input_ids]))

Max sentence length:  107

가장 긴 문장이 토큰 107개짜리 문장이므로, 이 값보다 살짝 큰 110를 MAX_LEN로 지정

import keras

from keras.preprocessing.sequence import pad_sequences
# Set the maximum sequence length.
# I've chosen 120 somewhat arbitrarily. It's slightly larger than the
# maximum training sentence length of 107...
MAX_LEN = 110
print('\nPadding/truncating all sentences to %d values...' % MAX_LEN)
print('\nPadding token: "{:}", ID: {:}'.format(tokenizer.pad_token, tokenizer.pad_token_id))
# Pad our input tokens with value 0.
# "post" indicates that we want to pad and truncate at the end of the sequence,
# as opposed to the beginning.
input_ids = pad_sequences(input_ids, maxlen=MAX_LEN, dtype="long", 
                          value=0, truncating="post", padding="post")      # ==> 패딩은 토큰 아이디 0으로 함. 
print('\Done.')

Padding/truncating all sentences to 110 values...

Padding token: "[PAD]", ID: 0
\Done.

Attention Masks

attention_masks = []
# For each sentence...
for sent in input_ids:
    
    # Create the attention mask.
    #   - If a token ID is 0, then it's padding, set the mask to 0. ==> 토큰 아이디가 0이면 어텐션 마스크에도 0을 
    #   - If a token ID is > 0, then it's a real token, set the mask to 1. ==> 토큰 아이디가 0이 아니라면, 어텐션 마스크에는 1을 
    att_mask = [int(token_id > 0) for token_id in sent]
    
    # Store the attention mask for this sentence.
    attention_masks.append(att_mask)

Training & Validation Split

# training
from sklearn.model_selection import train_test_split
# Use 90% for training and 10% for validation.
train_inputs, validation_inputs, train_labels, validation_labels = train_test_split(input_ids, labels, 
                                                            random_state=2021, test_size=0.1)
# Do the same for the masks.
train_masks, validation_masks, _, _ = train_test_split(attention_masks, labels,
                                             random_state=2021, test_size=0.1)

Converting to Pytorch Data Types

# for our model.
train_inputs = torch.tensor(train_inputs)
validation_inputs = torch.tensor(validation_inputs)
train_labels = torch.tensor(train_labels)
validation_labels = torch.tensor(validation_labels)
train_masks = torch.tensor(train_masks)
validation_masks = torch.tensor(validation_masks)

print(len(train_inputs))
print(len(validation_inputs))

2684
299

Create an iterator for our dataset using the torch DataLoader clas

from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler


# 파이토치는 data.Dataset으로 사용자 지정 데이터셋을 만들고, data.DataLoader로 데이터를 불러온다. 
# ==> DataLoader는 데이터를 묶고, 섞고, 병렬처리 과정에서 multiprocessing을 사용할 떄 데이터를 불러오는 모든 기능을 제공하는 반복자(iterator).
# DataLoader는 기본적으로 3개 파라미터를 필요로 한다. 
# 1. 불러올 대상이 되는 데이터 (아래의 경우 train_data)
# 2. batch_size
# 3. batch_sampler
# 이들을 하나씩 지정해주자. 



# 1. ==> 문장을 토큰화해, 토큰 아이디로 바꾼 train_inputs과 어텐션 마스크인 train_maks, train_labels를 묶어서 train_data만듦.
train_data = TensorDataset(train_inputs, train_masks, train_labels) 

# 2. == > The DataLoader needs to know our batch size for training, so we specify it here.
# For fine-tuning BERT on a specific task, the authors recommend a batch size of
# 16 or 32.
batch_size = 16

# 3.==> sampler는 index를 컨트롤하는 방법이다. randomsampler는 인덱스의 랜덤, replacement 여부, 개수를 선택하게 한다. 
train_sampler = RandomSampler(train_data)

# 이제 1~3 파라미터를 넣어서 train_dataloader를 만든다.
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

# Create the DataLoader for our validation set. 검증 데이터셋에 대해서도 같은 작업 수행
validation_data = TensorDataset(validation_inputs, validation_masks, validation_labels)
validation_sampler = SequentialSampler(validation_data) 
validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size)

train_data[0]

(tensor([  101, 13587,  3514,  7597,  2031,  2815,  2012,  2030,  2682,  2023,
          2504,  2144,  2432,  1010,  2437,  5211,  1999, 14931,  2140,  6786,
          8702,  1998, 12996,  4892,  5211,  1012,   102,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0]),
 tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
 tensor(1))

인풋 데이터가 준비됐다!! 이제 BERT model을 파인튜닝 해보자!

BERT Classification 모델 불러오기

from transformers import BertForSequenceClassification, AdamW, BertConfig

# Load BertForSequenceClassification, the pretrained BERT model with a single 
# linear classification layer on top. 

###################### model #######################
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
    num_labels = 2, # The number of output labels--2 for binary classification.
                    # You can increase this for multi-class tasks.   
    output_attentions = False, # Whether the model returns attentions weights.
    output_hidden_states = False, # Whether the model returns all hidden-states.
)
# Tell pytorch to run this model on the GPU.
model.cuda()

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (1): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (2): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (3): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (4): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (5): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (6): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (7): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (8): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (9): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (10): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (11): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=768, out_features=2, bias=True)
)

params = list(model.named_parameters())
print('The BERT model has {:} different named parameters.\n'.format(len(params)))
print('==== Embedding Layer ====\n')
for p in params[0:5]:
    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))
print('\n==== First Transformer ====\n')
for p in params[5:21]:
    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))
print('\n==== Output Layer ====\n')
for p in params[-4:]:
    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))

The BERT model has 201 different named parameters.

==== Embedding Layer ====

bert.embeddings.word_embeddings.weight                  (30522, 768)
bert.embeddings.position_embeddings.weight                (512, 768)
bert.embeddings.token_type_embeddings.weight                (2, 768)
bert.embeddings.LayerNorm.weight                              (768,)
bert.embeddings.LayerNorm.bias                                (768,)

==== First Transformer ====

bert.encoder.layer.0.attention.self.query.weight          (768, 768)
bert.encoder.layer.0.attention.self.query.bias                (768,)
bert.encoder.layer.0.attention.self.key.weight            (768, 768)
bert.encoder.layer.0.attention.self.key.bias                  (768,)
bert.encoder.layer.0.attention.self.value.weight          (768, 768)
bert.encoder.layer.0.attention.self.value.bias                (768,)
bert.encoder.layer.0.attention.output.dense.weight        (768, 768)
bert.encoder.layer.0.attention.output.dense.bias              (768,)
bert.encoder.layer.0.attention.output.LayerNorm.weight        (768,)
bert.encoder.layer.0.attention.output.LayerNorm.bias          (768,)
bert.encoder.layer.0.intermediate.dense.weight           (3072, 768)
bert.encoder.layer.0.intermediate.dense.bias                 (3072,)
bert.encoder.layer.0.output.dense.weight                 (768, 3072)
bert.encoder.layer.0.output.dense.bias                        (768,)
bert.encoder.layer.0.output.LayerNorm.weight                  (768,)
bert.encoder.layer.0.output.LayerNorm.bias                    (768,)

==== Output Layer ====

bert.pooler.dense.weight                                  (768, 768)
bert.pooler.dense.bias                                        (768,)
classifier.weight                                           (2, 768)
classifier.bias                                                 (2,)

Optimizer & Learning Rate Scheduler

# I believe the 'W' stands for 'Weight Decay fix"
# ==> AdamW 옵티마이저 정리 글 : https://hiddenbeginner.github.io/deeplearning/paperreview/2019/12/29/paper_review_AdamW.html

optimizer = AdamW(model.parameters(),
                  lr = 2e-5, # args.learning_rate - default is 5e-5, our notebook had 2e-5
                  eps = 1e-8 # args.adam_epsilon  - default is 1e-8.
                )

from transformers import get_linear_schedule_with_warmup
# Number of training epochs (authors recommend between 2 and 4)
epochs = 4
# Total number of training steps is number of batches * number of epochs.
total_steps = len(train_dataloader) * epochs
# Create the learning rate scheduler.
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = 0, # Default value in run_glue.py
                                            num_training_steps = total_steps)

Training Loop

트레이닝 루프를 도는 동안 하는 것들

인풋 데이터와 라벨을 unpack
GPU에 unpacked 데이터 로드
이전 pass에서의 gradients 계산된 것을 초기화. (디폴트는 gradients가 계속 쌓이는 것인데, 원치 않으므로 초기화 설정)
Forward pass ( feed input data through the network)
Backward pass (backpropagation)
모델의 네트워크에 optimizer.step()을 사용해 파라미터를 업데이트하라고 알려줌
Track variables for monitoring progress

검증 루프를 도는 동안 하는 것들

인풋 데이터와 라벨을 unpack
GPU에 unpacked 데이터 로드
Forward pass ( feed input data through the network)
검증 데이터에서 loss를 계산하고, track variables for monitoring progress

import numpy as np

# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten() # ==> flatten() 다차원 배열을 1차원 배열로 평평하게 펴주는 np 함수
    labels_flat = labels.flatten() 
    return np.sum(pred_flat == labels_flat) / len(labels_flat) 
    # ==> 예측한 라벨과 실제 라벨이 일치한 갯수의 총합 / 총 데이터 갯수

경과 시간 포맷을 위한 기능

import time
import datetime
def format_time(elapsed):
    '''
    Takes a time in seconds and returns a string hh:mm:ss
    '''
    # Round to the nearest second.
    elapsed_rounded = int(round((elapsed)))
    
    # Format as hh:mm:ss
    return str(datetime.timedelta(seconds=elapsed_rounded))

이제 training 준비 완료!!

'''
BERT 모델 기본 설정. 
'''


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

import random
# This training code is based on the `run_glue.py` script here:
# https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128
# Set the seed value all over the place to make this reproducible.

seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

# Store the average loss after each epoch so we can plot them.
loss_values = []

# For each epoch...
for epoch_i in range(0, epochs):
    
    # ========================================
    #               Training
    # ========================================
    
    # Perform one full pass over the training set.
    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
    print('Training...')
    # Measure how long the training epoch takes.
    t0 = time.time()
    # Reset the total loss for this epoch.
    total_loss = 0
    # Put the model into training mode. Don't be mislead--the call to 
    # `train` just changes the *mode*, it doesn't *perform* the training.
    # `dropout` and `batchnorm` layers behave differently during training
    # vs. test (source: https://stackoverflow.com/questions/51433378/what-does-model-train-do-in-pytorch)
    model.train()
    # For each batch of training data...
    for step, batch in enumerate(train_dataloader):
        # Progress update every 40 batches.
        if step % 40 == 0 and not step == 0:
            # Calculate elapsed time in minutes.
            elapsed = format_time(time.time() - t0)
            
            # Report progress.
            print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))
        # Unpack this training batch from our dataloader. 
        #
        # As we unpack the batch, we'll also copy each tensor to the GPU using the 
        # `to` method.
        #
        # `batch` contains three pytorch tensors:
        #   [0]: input ids 
        #   [1]: attention masks
        #   [2]: labels 
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
        # Always clear any previously calculated gradients before performing a
        # backward pass. PyTorch doesn't do this automatically because 
        # accumulating the gradients is "convenient while training RNNs". 
        # (source: https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch)
        model.zero_grad()        
        # Perform a forward pass (evaluate the model on this training batch).
        # This will return the loss (rather than the model output) because we
        # have provided the `labels`.
        # The documentation for this `model` function is here: 
        # https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification
        outputs = model(b_input_ids, 
                    token_type_ids=None, 
                    attention_mask=b_input_mask, 
                    labels=b_labels)
        
        # The call to `model` always returns a tuple, so we need to pull the 
        # loss value out of the tuple.
        loss = outputs[0]
        # Accumulate the training loss over all of the batches so that we can
        # calculate the average loss at the end. `loss` is a Tensor containing a
        # single value; the `.item()` function just returns the Python value 
        # from the tensor.
        total_loss += loss.item()
        # Perform a backward pass to calculate the gradients.
        loss.backward()
        # Clip the norm of the gradients to 1.0.
        # This is to help prevent the "exploding gradients" problem.
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        # Update parameters and take a step using the computed gradient.
        # The optimizer dictates the "update rule"--how the parameters are
        # modified based on their gradients, the learning rate, etc.
        optimizer.step()
        # Update the learning rate.
        scheduler.step()
    # Calculate the average loss over the training data.
    avg_train_loss = total_loss / len(train_dataloader)            
    
    # Store the loss value for plotting the learning curve.
    loss_values.append(avg_train_loss)
    print("")
    print("  Average training loss: {0:.2f}".format(avg_train_loss))
    print("  Training epcoh took: {:}".format(format_time(time.time() - t0)))
        
    # ========================================
    #               Validation
    # ========================================
    # After the completion of each training epoch, measure our performance on
    # our validation set.
    print("")
    print("Running Validation...")
    t0 = time.time()
    # Put the model in evaluation mode--the dropout layers behave differently
    # during evaluation.
    model.eval()
    # Tracking variables 
    eval_loss, eval_accuracy = 0, 0
    nb_eval_steps, nb_eval_examples = 0, 0
    # Evaluate data for one epoch
    for batch in validation_dataloader:
        
        # Add batch to GPU
        batch = tuple(t.to(device) for t in batch)
        
        # Unpack the inputs from our dataloader
        b_input_ids, b_input_mask, b_labels = batch
        
        # Telling the model not to compute or store gradients, saving memory and
        # speeding up validation
        with torch.no_grad():        
            # Forward pass, calculate logit predictions.
            # This will return the logits rather than the loss because we have
            # not provided labels.
            # token_type_ids is the same as the "segment ids", which 
            # differentiates sentence 1 and 2 in 2-sentence tasks.
            # The documentation for this `model` function is here: 
            # https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification
            outputs = model(b_input_ids, 
                            token_type_ids=None, 
                            attention_mask=b_input_mask)
        
        # Get the "logits" output by the model. The "logits" are the output
        # values prior to applying an activation function like the softmax.
        logits = outputs[0]
        # Move logits and labels to CPU
        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()
        
        # Calculate the accuracy for this batch of test sentences.
        tmp_eval_accuracy = flat_accuracy(logits, label_ids)
        
        # Accumulate the total accuracy.
        eval_accuracy += tmp_eval_accuracy
        # Track the number of batches
        nb_eval_steps += 1
    # Report the final accuracy for this validation run.
    print("  Accuracy: {0:.2f}".format(eval_accuracy/nb_eval_steps))
    print("  Validation took: {:}".format(format_time(time.time() - t0)))
print("")
print("Training complete!")

======== Epoch 1 / 4 ========
Training...
  Batch    40  of    168.    Elapsed: 0:00:13.
  Batch    80  of    168.    Elapsed: 0:00:26.
  Batch   120  of    168.    Elapsed: 0:00:39.
  Batch   160  of    168.    Elapsed: 0:00:53.

  Average training loss: 0.69
  Training epcoh took: 0:00:56

Running Validation...
  Accuracy: 0.59
  Validation took: 0:00:02

======== Epoch 2 / 4 ========
Training...
  Batch    40  of    168.    Elapsed: 0:00:14.
  Batch    80  of    168.    Elapsed: 0:00:27.
  Batch   120  of    168.    Elapsed: 0:00:40.
  Batch   160  of    168.    Elapsed: 0:00:54.

  Average training loss: 0.60
  Training epcoh took: 0:00:56

Running Validation...
  Accuracy: 0.64
  Validation took: 0:00:02

======== Epoch 3 / 4 ========
Training...
  Batch    40  of    168.    Elapsed: 0:00:13.
  Batch    80  of    168.    Elapsed: 0:00:27.
  Batch   120  of    168.    Elapsed: 0:00:40.
  Batch   160  of    168.    Elapsed: 0:00:54.

  Average training loss: 0.46
  Training epcoh took: 0:00:56

Running Validation...
  Accuracy: 0.64
  Validation took: 0:00:02

======== Epoch 4 / 4 ========
Training...
  Batch    40  of    168.    Elapsed: 0:00:13.
  Batch    80  of    168.    Elapsed: 0:00:27.
  Batch   120  of    168.    Elapsed: 0:00:40.
  Batch   160  of    168.    Elapsed: 0:00:54.

  Average training loss: 0.34
  Training epcoh took: 0:00:56

Running Validation...
  Accuracy: 0.66
  Validation took: 0:00:02

Training complete!

import plotly.express as px
f = pd.DataFrame(loss_values)
f.columns=['Loss']
fig = px.line(f, x=f.index, y=f.Loss)
fig.update_layout(title='Training loss of the Model',
                   xaxis_title='Epoch',
                   yaxis_title='Loss')
fig.show()

Performance On Test Set

Data Preparation

sentences = test_df.sentence.values
labels = test_df.label.values

input_ids = []
# For every sentence...
for sent in sentences:
    # `encode` will:
    #   (1) Tokenize the sentence.
    #   (2) Prepend the `[CLS]` token to the start.
    #   (3) Append the `[SEP]` token to the end.
    #   (4) Map tokens to their IDs.
    encoded_sent = tokenizer.encode(
                        sent,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                   )
    
    input_ids.append(encoded_sent)
# Pad our input tokens
input_ids = pad_sequences(input_ids, maxlen=MAX_LEN, 
                          dtype="long", truncating="post", padding="post")
# Create attention masks
attention_masks = []
# Create a mask of 1s for each token followed by 0s for padding
for seq in input_ids:
  seq_mask = [float(i>0) for i in seq]
  attention_masks.append(seq_mask) 
# Convert to tensors.
prediction_inputs = torch.tensor(input_ids)
prediction_masks = torch.tensor(attention_masks)
prediction_labels = torch.tensor(labels)
# Set the batch size.  
batch_size = 16  
# Create the DataLoader.
prediction_data = TensorDataset(prediction_inputs, prediction_masks, prediction_labels)
prediction_sampler = SequentialSampler(prediction_data)
prediction_dataloader = DataLoader(prediction_data, sampler=prediction_sampler, batch_size=batch_size)

Evaluate on Test Set

print('Predicting labels for {:,} test sentences...'.format(len(prediction_inputs)))
# Put model in evaluation mode
model.eval()
# Tracking variables 
predictions , true_labels = [], []
# Predict 
for batch in prediction_dataloader:
  # Add batch to GPU
  batch = tuple(t.to(device) for t in batch)
  
  # Unpack the inputs from our dataloader
  b_input_ids, b_input_mask, b_labels = batch
  
  # Telling the model not to compute or store gradients, saving memory and 
  # speeding up prediction
  with torch.no_grad():
      # Forward pass, calculate logit predictions
      outputs = model(b_input_ids, token_type_ids=None, 
                      attention_mask=b_input_mask)
  logits = outputs[0]
  # Move logits and labels to CPU
  logits = logits.detach().cpu().numpy()
  label_ids = b_labels.to('cpu').numpy()
  
  # Store predictions and true labels
  predictions.append(logits)
  true_labels.append(label_ids)
print('DONE.')

Predicting labels for 743 test sentences...
DONE.

print('Positive samples: %d of %d (%.2f%%)' % (test_df.label.sum(), len(test_df.label), (test_df.label.sum() / len(test_df.label) * 100.0)))

Positive samples: 325 of 743 (43.74%)

true_labels[0]

array([1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1])

sum_list = []
for i in range(0, len(predictions)):
  sum_list.append(flat_accuracy(predictions[i], true_labels[i]))

avg_acc = np.mean(sum_list)
print('테스트 셋에서의 accuracy :', avg_acc)

테스트 셋에서의 accuracy : 0.6065729483282675

pred_flat = np.argmax(predictions[0], axis=1).flatten()

pred_flat

array([0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1])

true_labels[0]

array([1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1])

PART 2! MINDS 데이터셋

이제 IBC 데이터셋으로 훈련시킨 정치 성향 판별 모델로 MINDS 데이터셋의 정치 카테고리 기사들의 정치 성향을 디텍딩한다.

1. 데이터셋 준비

train_poli = pd.read_csv('/content/train_politic.csv')
test_poli = pd.read_csv('/content/test_politic.csv')

train_poli.head(3)

train_poli.columns

Index(['News ID', 'Category', 'Subcategory', 'Title', 'Abstract', 'URL',
       'Title Entities', 'Abstract Entities'],
      dtype='object')

print('훈련 데이터셋 크기 : ', len(train_poli))
print('테스트 데이터셋 크기 : ', len(test_poli))

훈련 데이터셋 크기 :  2831
테스트 데이터셋 크기 :  2402

train_poli.isnull().sum()

News ID               0
Category              0
Subcategory           0
Title                 0
Abstract             66
URL                   0
Title Entities        0
Abstract Entities     0
dtype: int64

test_poli.isnull().sum()

News ID               0
Category              0
Subcategory           0
Title                 0
Abstract             63
URL                   0
Title Entities        0
Abstract Entities     0
dtype: int64

train_poli = train_poli.dropna(axis=0)
test_poli = test_poli.dropna(axis=0)

print('훈련 데이터셋 크기 : ', len(train_poli))
print('테스트 데이터셋 크기 : ', len(test_poli))

훈련 데이터셋 크기 :  2765
테스트 데이터셋 크기 :  2339

Title과 Abstract 칼럼을 합친 'TitleAbstract' 칼럼을 추가한다. ==> 추후 이 칼럼 데이터로 정치 성향을 판단

train_poli['TitleAbstract'] = train_poli['Title'] + ' ' + train_poli['Abstract']

test_poli['TitleAbstract'] = test_poli['Title'] + ' ' + test_poli['Abstract']

train_poli.head(3)

train_poli.TitleAbstract[0]

'Elijah Cummings to lie in state at US Capitol Thursday Cummings, a Democrat whose district included sections of Baltimore, died last week at age 68 from complications related to longstanding health issues.'

sentences = train_poli.TitleAbstract.values

sentences[-1]

'High-stakes televised impeachment hearings set for next week William Taylor, the top diplomat in Ukraine, will testify, and Republicans hope to hear from Hunter Biden and the anonymous whistleblower.'

2. 데이터 전처리 (토크나이징, 패딩, 어텐션 마스크, 데이터셋 로드)

가장 긴 문장이 토큰 445개짜리 문장... 적당히 300을 MAX_LEN으로 지정

input_ids = []
# For every sentence...
for sent in sentences:
    # `encode` will:
    #   (1) Tokenize the sentence.
    #   (2) Prepend the `[CLS]` token to the start.
    #   (3) Append the `[SEP]` token to the end.
    #   (4) Map tokens to their IDs.
    encoded_sent = tokenizer.encode(
                        sent,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                   )
    
    input_ids.append(encoded_sent)
# Pad our input tokens
input_ids = pad_sequences(input_ids, maxlen=300, 
                          dtype="long", truncating="post", padding="post")

print('Original: ', sentences[1])
print('Token IDs:', input_ids[1])

Original:  Here are the lawmakers who are not seeking reelection to Congress in 2020 The battle for control of Congress is more than a year away but some lawmakers are already deciding not to run for reelection, setting up a few potentially interesting campaigns in the 2020 election.
Token IDs: [  101  2182  2024  1996  2375 12088  2040  2024  2025  6224 17648  2000
  3519  1999 12609  1996  2645  2005  2491  1997  3519  2003  2062  2084
  1037  2095  2185  2021  2070  2375 12088  2024  2525 10561  2025  2000
  2448  2005 17648  1010  4292  2039  1037  2261  9280  5875  8008  1999
  1996 12609  2602  1012   102     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0]

attention_masks = []
# Create a mask of 1s for each token followed by 0s for padding
for seq in input_ids:
  seq_mask = [float(i>0) for i in seq]
  attention_masks.append(seq_mask) 
# Convert to tensors.
prediction_inputs = torch.tensor(input_ids)
prediction_masks = torch.tensor(attention_masks)
# Set the batch size.  
batch_size = 16  
# Create the DataLoader.
prediction_data = TensorDataset(prediction_inputs, prediction_masks)
prediction_sampler = SequentialSampler(prediction_data)
prediction_dataloader = DataLoader(prediction_data, sampler=prediction_sampler, batch_size=batch_size)

Detect Ideology of News Text

print('Detecting ideology labels for {:,} news sentences...'.format(len(prediction_inputs)))
# Put model in evaluation mode
model.eval()
# Tracking variables 
predictions  = []
# Predict 
for batch in prediction_dataloader:
  # Add batch to GPU
  batch = tuple(t.to(device) for t in batch)
  
  # Unpack the inputs from our dataloader
  b_input_ids, b_input_mask = batch
  
  # Telling the model not to compute or store gradients, saving memory and 
  # speeding up prediction
  with torch.no_grad():
      # Forward pass, calculate logit predictions
      outputs = model(b_input_ids, token_type_ids=None, 
                      attention_mask=b_input_mask)
  logits = outputs[0]
  # Move logits and labels to CPU
  logits = logits.detach().cpu().numpy()
  #label_ids = b_labels.to('cpu').numpy()
  
  # Store predictions and true labels
  predictions.append(logits)
  #true_labels.append(label_ids)
print('DONE.')

Detecting ideology labels for 2,765 news sentences...
DONE.

len(predictions)

173

ideology_list = []

for i in range(len(predictions)):
  pred_flat = np.argmax(predictions[i], axis=1).flatten()
  pred_flat = pred_flat.tolist()
  ideology_list += pred_flat

len(ideology_list) # 2765. 트레인 데이터셋에 들어있는 2763개 뉴스에 대한 정치 detection labeling을 했다.

2765

type(ideology_list)

list

뉴스-정치 라벨링 짝을 쉽게 보기 위해 데이터 프레임에 'Ideology' 칼럼으로 추가한다.

train_poli['Ideology'] = ideology_list

train_poli.head(3)

2.4 모델 성능 평가

2.4.1 IBC 데이터를 이용한 평가

그림6.png

위에서 확인할 수 있듯 모델은 검증 데이터세트에서 정확도 최대 0.66을 보여줬다. 테스트 데이터세트에서의 정확도는 약 0.60을 기록했다.

2.4.2 별도 뉴스 데이터를 이용한 평가

모델 성능을 살피기 위한 두 번째 방법으로 AllSides 사이트가 제공하는 AllSides Media Bias Ratings가 평가한 미디어 정치 성향 평가를 활용했다.

그림7.jpeg

AllSides Media Bias Ratings는 온라인 사용자들에게 미국 온라인 매체들의 정치 성향을 5개 단계(진보 편향 - 진보 - 중도 - 보수 - 보수 편향) 중 하나로 라벨링 하게 하고, 라벨링에 대한 커뮤니티 피드백을 6개 단계(absolutely disagree - somewhat disagree - disagree - somewhat agree - agree - absolutely agree)로 제공한다.

그림8.png

이 서비스를 참고해 정치 성향 평가에 대한 커뮤니티 피드백 단계 중 absolutely agree를 받은 진보 편향 매체 2개(AlterNet, The New Yorker)와 보수 편향 매체 2개(National Review, The Federalist), 총 4개 매체를 골랐다. 그리고 각 매체의 최근 정치 기사의 첫 문장을 발췌해 우리의 모델로 정치 성향을 탐지해, 그 결과가 집단지성이 탐지한 정치 성향과 일치 여부를 살펴보았다.

BERT 파인튜닝으로 만든 정치 성향 판별 모델, 정말 믿을만..한가?

정성 평가를 해보자 https://www.allsides.com/media-bias/media-bias-ratings

test_news = pd.read_excel('/content/news_Lib_and_Con.xlsx')
test_news.to_csv(r'/content/news_Lib_and_Con.csv', index=None)

test_news

sentences = test_news.Sentence.values

모델에 넣기 위한 데이터 형태로 변환

input_ids = []
# For every sentence...
for sent in sentences:
    # `encode` will:
    #   (1) Tokenize the sentence.
    #   (2) Prepend the `[CLS]` token to the start.
    #   (3) Append the `[SEP]` token to the end.
    #   (4) Map tokens to their IDs.
    encoded_sent = tokenizer.encode(
                        sent,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                   )
    
    input_ids.append(encoded_sent)
# Pad our input tokens
input_ids = pad_sequences(input_ids, maxlen=300, 
                          dtype="long", truncating="post", padding="post")

# Create attention masks
attention_masks = []
# Create a mask of 1s for each token followed by 0s for padding
for seq in input_ids:
  seq_mask = [float(i>0) for i in seq]
  attention_masks.append(seq_mask) 
# Convert to tensors.
prediction_inputs = torch.tensor(input_ids)
prediction_masks = torch.tensor(attention_masks)
# Set the batch size.  
batch_size = 16  
# Create the DataLoader.
prediction_data = TensorDataset(prediction_inputs, prediction_masks)
prediction_sampler = SequentialSampler(prediction_data)
prediction_dataloader = DataLoader(prediction_data, sampler=prediction_sampler, batch_size=batch_size)

모델을 이용한 정치 성향 detecting

print('Detecting ideology labels for {:,} news sentences...'.format(len(prediction_inputs)))
# Put model in evaluation mode
model.eval()
# Tracking variables 
predictions  = []
# Predict 
for batch in prediction_dataloader:
  # Add batch to GPU
  batch = tuple(t.to(device) for t in batch)
  
  # Unpack the inputs from our dataloader
  b_input_ids, b_input_mask = batch
  
  # Telling the model not to compute or store gradients, saving memory and 
  # speeding up prediction
  with torch.no_grad():
      # Forward pass, calculate logit predictions
      outputs = model(b_input_ids, token_type_ids=None, 
                      attention_mask=b_input_mask)
  logits = outputs[0]
  # Move logits and labels to CPU
  logits = logits.detach().cpu().numpy()
  #label_ids = b_labels.to('cpu').numpy()
  
  # Store predictions and true labels
  predictions.append(logits)
  #true_labels.append(label_ids)
print('DONE.')

Detecting ideology labels for 12 news sentences...
DONE.

ideology_list = []

for i in range(len(predictions)):
  pred_flat = np.argmax(predictions[i], axis=1).flatten()
  pred_flat = pred_flat.tolist()
  ideology_list += pred_flat

test_news.Ideology_label_by_our_model = ideology_list

test_news

12개 중 9개 딕텍션 성공

그 결과, 12개 중 9개 문장에 대한 정치 성향 탐지에 성공해 75% 정확도를 보였다.

3. 필터버블을 터뜨리는 뉴스 추천 시스템

이제 앞에서 만든 텍스트의 정치 성향 분류 모델에 콘텐츠 추천 알고리즘을 더한 뉴스 추천 시스템을 제안한다.

이에 뉴스 기사 추천 알고리즘 중 DKN(Deep Knowledge-Aware Network for News Recommendation)을 사용자가 가장 좋아할 만한 기사를 추천해주는 기존의 방식으로 학습하고, 이를 앞서 개발한 텍스트 정치 성향 탐지 모델과 결합하여 필터버블을 완화할 수 있는 추천 모형을 제안한다.

3.1 훈련 데이터 - MIND

추천 알고리즘 학습에 MIND((A Large Scale Dataset for News Recommendation)) 데이터세트을 사용하였다. MIND는 Microsoft에서 뉴스 추천 알고리즘의 연구를 위해 공개한 데이터로, Microsoft News에서 수집한 100만 명의 사용자와 160,000건의 기사 정보로 이루어져 있다. 수집 기간은 2019.10.12 부터 2019.11.22.까지 총 6주이다. 사용자 정보는, 각 사용자의 익명화된 아이디와 그들이 클릭한 기사 로그 기록을 포함하고, 기사 정보는 기사 아이디와 category, title, abstract을 포함한다. 본 프로젝트에서는 전체 데이터 중 5만 명의 사용자를 샘플링한 MIND-small 데이터세트을 사용하였다. 추천 알고리즘 학습에는 기사 정보 중 title만 사용하였고, 마지막 추천 단계에서는 정치 기사만 남겨서, title과 abstract 텍스트를 정치 성향 탐지 모델에 사용하였다.

3.2 추천 알고리즘 - DKN

3.2.1 알고리즘 선정 이유

DKN(Deep Knowledge-Aware Network for News Recommendation)은 뉴스 텍스트 엔티티의 지식그래프 구조를 활용한 콘텐츠 기반 추천 알고리즘이다. 뉴스 추천을 위한 다양한 알고리즘 중 DKN을 선정한 이유는 두 가지 이다. 첫째, 뉴스 기사는 음악, 영화 등 다른 콘텐츠에 비해 수명이 매우 짧아서 협업필터링보다는 콘텐츠기반 필터링이 보다 효과적이다. MIND 데이터세트의 경우도 대부분의 기사가 2일 이내에 클릭률이 0으로 떨어지는 모습이 나타났다. 둘째, 기사의 ‘논조’를 반영하기 위해서는 지식 그래프 구조가 효과적일 수 있다고 보았다. 논조는 단어들이 어떠한 구조로 연결되었는가에 의해 결정되기 때문이다.

3.2.2. 알고리즘 설명

그림10.png

DKN은 KCNN(Knowledge-aware Convolutional Neural Network)를 통한 기사 임베딩과, 사용자별 로그 기록을 사용한 attention network의 두 단계를 통해 이루어진다. 후보 기사 하나와, 사용자의 로그 기록을 인풋으로 하여, 해당 사용자가 후보 기사를 읽을 확률을 예측하고 실제 사용자가 그 기사를 읽었는지 여부와 비교하는 방식으로 학습한다.

먼저, KCNN은 텍스트를 word embedding, entity embedding, context embedding의 3가지 채널로 표현하여 CNN을 통해 특징을 추출하는 것이다. 텍스트를 3채널로 표현하는 방식은 다음과 같다. 예를 들어, "Donald Trump to deliver State of the Union address next week" 이라는 기사 제목이 있다면, 이 텍스트는 단어 수준에서는 [10, 34,45,334,23,12,987,3456,111,456,432]와 같이 인코딩 되고, 첫 번째 단어와 두 번째 단어인 ‘Donald Trump’가 한 개의 엔티티로 추출되어 엔티티 수준에서는 [45, 45,0,0,0,0,0,0,0,0,0]와 같이 인코딩 된다. 단어와 엔티티 수준에서 인코딩된 텍스트 인풋을 가지고 word embedding, entity embedding, context embedding을 추출한다. word embedding은 word2vec, Glove 등의 방식을 사용하여 단어를 임베딩 하는 것으로, 이번 프로젝트에서는 사전 학습(pre-trained) Glove를 통해 word embedding initialization을 설정하였다. entity embedding과 context embedding은 지식 그래프를 사용한다. 문장에서 엔티티를 찾아내어 지식그래프와 매칭시키는 entity linking 이후, corpus의 엔티티 중 1단계 내로 이어진 엔티티를 연결하여 corpus의 sub-graph를 얻는다. 다음으로 TransE를 사용한 네트워크 임베딩을 실시하여 각 entity embedding을 얻는다. 또한 엔티티별로 한 단계로 연결된 엔티티의 임베딩을 평균낸 context embedding을 얻는다. entity embedding과 context embedding을 통해 단어의 정보 뿐 아니라 텍스트의 엔티티들이 서로 어떠한 관계를 가지고 있는지 까지 반영할 수 있게 된다. 이렇게 얻어진 word-embedding, entity embedding, context-embedding을 3가지의 채널로 쌓아, 컬러 이미지에 적용하는 것과 같은 3-channel CNN을 텍스트에 적용하여 특징을 추출하여 기사 별 1차원의 임베딩 벡터를 얻는다.

기사별 임베딩을 얻은 후에는, 사용자의 기록을 고려한 attention network를 거쳐 사용자가 후보 기사를 읽었는지 예측한다. 사용자의 로그 기록에 해당하는 기사들의 임베딩 벡터와 후보기사의 임베딩 벡터에 어텐션을 적용해, 로그 기록 각 기사에 대해 후보 기사에 대한 어텐션 가중치를 구한다. 다음으로 로그 기록 기사 임베딩을 가중 평균 내어 사용자의 임베딩 벡터를 넣는다. 마지막으로 후보 기사의 임베딩과 사용자 임베딩을 concat하여 해당 사용자가 후보 기사를 읽을 확률을 예측한다.

3.3 터버블을 터뜨리기 위한 추천시스템 개발 과정 및 결과

DKN을 사용하는 경우, 두 가지 방식의 추천이 가능하다. 사용자의 이전 기록을 모두 고려하여 추천 기사 리스트를 묶어서 제공할 수도 있고, DKN 학습 과정에서 추출된 각 기사 별 임베딩을 사용하여 사용자가 지금 당장 읽고 있는 기사와 관련된 글을 추천(item2item방식)할 수도 있다. 본 프로젝트에서는 후자의 방식을 이용하고자 한다. 추천 모델은 MIND-small 데이터를 모두 사용하여 학습하였으나, 추천 과정에서는 논조의 차이가 가장 두드러지게 나타나는 정치기사만 사용하였다. 전체 기사 중 정치 카테고리에 속하고, title과 abstract이 모두 존재하는 3393개의 기사를 대상으로 필터버블을 터뜨리기 위한 추천을 실험해보았다.

첫째, DKN을 통해 각 기사의 단어, 엔티티 구조가 담긴 기사 별 KCNN 임베딩을 얻는다. 둘째, KCNN 임베딩으로 코사인 유사도를 계산하여 특정 기사와 비슷한 기사를 찾는다. 코사인 유사도가 높은 기사를 살펴본 결과, 실제로 비슷한 주제를 다루고 있음을 알 수 있었다. 셋째, 정치성향 분류모델을 사용하여 추천 대상이 되는 기사와 유사도는 높으면서 정치 성향은 다르게 예측된 기사를 추천한다. 정치성향 분류 모델 적용 결과 3393개의 기사 중 1745개가 liberal, 1648개가 conservative로 분류되었다. 이 과정을 통해 정치 기사에 대하여, 비슷한 주제를 다루면서도 논조가 다른 기사를 추천하고자 하였다.

실험 결과의 예시로 두 가지 사례를 제시한다. 첫 번째 사례는 “The Many Ways That Joe Biden Trips Over His Own Tongue” 기사이다. 해당 기사는 정치 성향 분류 모델에서 ‘conservative’로 분류되었으며, Joe Biden이 이민자 정책에 대해 말을 바꾼 것에 대한 비판 내용을 담고 있다. 한편 KCNN 임베딩으로 계산한 코사인 유사도가 두 번째로 높은 ‘Latino Iowans are playing a bigger role in the caucuses and Democrats are paying attention’ 기사는 정치 성향 분류 모델에서 ‘liberal’로 분류되었다. 추천 대상 기사와 유사하게 BIden의 이민자 정책에 대해 다루고 있지만, 논조는 바이든의 이민자 정책이 Iowa 라틴계 유권자들의 마음을 이끌고 있다는 긍정적인 내용이다. 따라서 주제는 비슷하지만 논조가 다른 이 기사를 함께 추천할 수 있다.

두 번째 사례는 “Warren's $52T 'Medicare-for-all' plan revealed: Campaign still claims no middle-class tax hikes needed” 기사이다. 해당 기사는 정치 성향 분류 모델에서 ‘liberal’로 분류 되었으며, 민주당 대선 후보인 Warren의 ‘Medicare-for-all’ 정책이 중산층 세금 인상을 필요로 하지 않는다는 내용을 담고 있다. 한편 코사인 유사도가 2번째로 높지만, 정치 성향 분류 모델에서 ‘conservative’로 분류된 ‘Warren's health care plan pledges no middle-class tax increase’은 같은 주제를 다루면서도, 실제로는 세금 인상 없이 정책을 실행하기는 무리라는 주장을 하고 있다. 따라서 이 기사를 함께 추천할 수 있다.

import torch
import torch.nn.functional as F 

def train(model, train_loader, loss_func, optimizer, step, print_step=200):
    """train function"""
    model.train()
    for i, batch in enumerate(train_loader):
        inputs, targets = batch.sentence, batch.label.float()
        targets = targets.long()
        # 경사 초기화
        optimizer.zero_grad()
        # 순방향 전파
        outputs = model(inputs)
        # 손실값 계산
        loss = loss_func(outputs, targets)
        # 역방향 전파
        loss.backward()
        # 매개변수 업데이트
        optimizer.step()

        print('Train Step: {} ({:05.2f}%)  \tLoss: {:.4f}'.format(
                step, 100.*(i*train_loader.batch_size)/len(train_loader.dataset),
                loss.item()))


def test(model, test_loader, loss_func):
    """test function"""
    # 모델에게 평가단계이라고 선언함
    model.eval()
    test_loss = 0
    correct = 0

    with torch.no_grad():
        for batch in test_loader:
            inputs, targets = batch.sentence, batch.label.float()
            targets = targets.long()
            # 순방향전파
            outputs = model(inputs)
            # 손실값 계산(합)
            test_loss += loss_func(outputs, targets).item()
            # 예측값
            preds = outputs.softmax(1)
            print(preds)
            preds = preds.argmax(dim=1)
            # 정확하게 예측한 개수를 기록한다
            correct += preds.eq(targets).sum().item()
            
    test_loss /= len(test_loader.dataset)
    test_acc = correct / len(test_loader.dataset)
    print('Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:05.2f}%)'.format(
        test_loss, correct, len(test_loader.dataset), 100. * test_acc))
    return test_loss, test_acc


def main(model, train_loader, test_loader, loss_func, optimizer, n_step, 
         save_path=None, print_step=30):
    """메인 학습 함수"""
    test_accs = []
    best_acc = 0.0

    for step in range(1, n_step+1):
        # 훈련 단계
        train(model, train_loader, loss_func, optimizer, 
              step=step, print_step=print_step)
        # 평가 단계
        test_loss, test_acc = test(model, test_loader, 
                                   loss_func=torch.nn.CrossEntropyLoss(reduction='sum'))
        # 테스트 정확도 기록
        test_accs.append(test_acc)
        # 모델 최적의 매개변수값을 저장할지 결정하고 기록한다.
        if len(test_accs) >= 2:
            if test_acc >= best_acc:
                best_acc = test_acc
                best_state_dict = model.state_dict()
                print("discard previous state, best model state saved!")
        print("")

    # 매개변수 값 저장하기
    if save_path is not None:
        torch.save(best_state_dict, save_path)

import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.data import Field, TabularDataset, Iterator
from train_utils_nlp import main

# 전처리
source_folder = './data'

label_field = Field(sequential=False, use_vocab=False, batch_first=True, dtype=torch.long, is_target=True)
text_field = Field(tokenize='spacy', lower=True, batch_first=True)
fields = [ ('label', label_field), ('sentence', text_field)]

train, test = TabularDataset.splits(path=source_folder, train='train.csv', test='test.csv',
                                    format='CSV', fields=fields, skip_header=True)

text_field.build_vocab(train, test, min_freq=2)

batch_size = 128
device = 'cuda' if torch.cuda.is_available() else 'cpu'
epoch = 100

train_loader = Iterator(dataset=train, batch_size=batch_size, device=device)
test_loader = Iterator(dataset=test, batch_size=batch_size, device=device)


# 모델 만들기
class IdeologyCls(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, output_size, num_layers=1, batch_first=True, bidirec=True):
        super(IdeologyCls, self).__init__()
        self.hidden_size = hidden_size
        self.n_layers = num_layers
        self.n_direct = 2 if bidirec else 1
        self.embedding_layer = nn.Embedding(vocab_size, embed_size)
        self.rnn_layer = nn.LSTM(input_size=embed_size,
                                 hidden_size=hidden_size,
                                 num_layers=num_layers,
                                 batch_first=batch_first,
                                 bidirectional=bidirec)
        self.linear = nn.Linear(self.n_direct*hidden_size, output_size)

    def forward(self, x):
        embeded = self.embedding_layer(x)
        hidden, cell = self.init_hiddens(x.size(0), self.hidden_size, device=x.device)
        output, (hidden, cell) = self.rnn_layer(embeded, (hidden, cell))
        last_hidden = torch.cat([h for h in hidden[-self.n_direct:]], dim=1)
        scores = self.linear(last_hidden)
        scores = torch.softmax(scores, dim=1)
        return scores

    def init_hiddens(self, batch_size, hidden_size, device):
        hidden = torch.zeros(self.n_direct*self.n_layers, batch_size, hidden_size)
        cell = torch.zeros(self.n_direct*self.n_layers, batch_size, hidden_size)
        return hidden.to(device), cell.to(device)


embed_size = 16
hidden_size = 32
output_size = 3
num_layers = 3
batch_first = True
bidirec = True
lr = 0.000001

model = IdeologyCls(vocab_size=len(text_field.vocab),
                    embed_size=embed_size,
                    hidden_size=hidden_size,
                    output_size=output_size,
                    num_layers=num_layers,
                    batch_first=batch_first,
                    bidirec=bidirec).to(device)


loss_function = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)


main(model=model,
      train_loader=train_loader,
      test_loader=test_loader,
      loss_func=loss_function,
      optimizer=optimizer,
      n_step=epoch,
      save_path='./model/',
      print_step=256)

4. 남는 질문들과 앞으로의 계획

위에서 제안한 방법은 같은 사안을 다룬 기사 중 정치성향이 상반된 기사를 추천하는 방식이다. 그 성능에 대한 평가를 차치하고서라도 남는 질문이 있다. 과연 이 방법이 필터버블을 터뜨리는 데 유효할 것인가? 콜럼비아 대학과 뉴욕 대학이 공동 연구한 결과에 따르면, 소셜미디어에서 반대 정치 성향에 노출되면 오히려 정치 양극화가 심화된다고 한다. (자세한 내용은 이 글을 참고 : Exposure to opposing views on social media can increase political polarization) 예상 가능하듯, backfire 때문이다.

사실 이 지점을 POP THE FILTER BUBBLE을 시작할 때부터 우려한 점이다. 때문에 처음 디자인 한 모델은 정치 성향을 0과 1, 바이너리로 분류하는 것이 아니라 0~1 사이 확률값으로 구하는 것이었다. 궁극적으로는 독자가 자신이 읽고 있는 기사가 0(진보)과 1(보수) 사이 숫자로 표현된 정치 성향 바(bar)에서 어느 지점에 위치하는 지 확인하고, 바에서 정치 성향 포인트를 바꿔가며 다양한 정치 성향을 가진 유사 소재 기사를 선택해 읽게 하는 것이었다.

5. 참고 문헌

The Filter Bubble - How the new personalized web is changing what we read and how we think (2012.04, Eli Pariser)
Political Ideology Detection Using Recursive Neural Networks (2014.06, Mohit Iyyer 외 3인, ACL Anthology)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(2019.05, Jacob Devlin 외 4인, arVic)
DKN: Deep Knowledge-Aware Network for News Recommendation(2018.04, H Wang, F Zhang, X Xie, M Guo)
MIND: A Large-scale Dataset for News Recommendation(2020.07, F Wu, Y Qiao, JH Chen, C Wu, T Qi, J Lian)

	text	label
0	During the 1920s , Great Britain , terrified o...	1
1	Those who would save the Amazon from disastrou...	0
2	However , the report noted that , `` Deaths we...	0

	sentence	label
0	During the 1920s , Great Britain , terrified o...	1
1	Those who would save the Amazon from disastrou...	0
2	However , the report noted that , `` Deaths we...	0

	News ID	Category	Subcategory	Title	Abstract	URL	Title Entities	Abstract Entities
0	N9786	news	newspolitics	Elijah Cummings to lie in state at US Capitol ...	Cummings, a Democrat whose district included s...	https://assets.msn.com/labs/mind/AAJgNxm.html	[{"Label": "Elijah Cummings", "Type": "P", "Wi...	[{"Label": "Elijah Cummings", "Type": "P", "Wi...
1	N47214	news	newspolitics	Here are the lawmakers who are not seeking ree...	The battle for control of Congress is more tha...	https://assets.msn.com/labs/mind/AAFcidm.html	[]	[]
2	N24905	news	newspolitics	Grieder: Special election in House District 28...	The special election in Texas House District 2...	https://assets.msn.com/labs/mind/AAJJa4h.html	[{"Label": "2004 North Carolina General Assemb...	[{"Label": "Republican Party (United States)",...

	News ID	Category	Subcategory	Title	Abstract	URL	Title Entities	Abstract Entities	TitleAbstract
0	N9786	news	newspolitics	Elijah Cummings to lie in state at US Capitol ...	Cummings, a Democrat whose district included s...	https://assets.msn.com/labs/mind/AAJgNxm.html	[{"Label": "Elijah Cummings", "Type": "P", "Wi...	[{"Label": "Elijah Cummings", "Type": "P", "Wi...	Elijah Cummings to lie in state at US Capitol ...
1	N47214	news	newspolitics	Here are the lawmakers who are not seeking ree...	The battle for control of Congress is more tha...	https://assets.msn.com/labs/mind/AAFcidm.html	[]	[]	Here are the lawmakers who are not seeking ree...
2	N24905	news	newspolitics	Grieder: Special election in House District 28...	The special election in Texas House District 2...	https://assets.msn.com/labs/mind/AAJJa4h.html	[{"Label": "2004 North Carolina General Assemb...	[{"Label": "Republican Party (United States)",...	Grieder: Special election in House District 28...

	News ID	Category	Subcategory	Title	Abstract	URL	Title Entities	Abstract Entities	TitleAbstract	Ideology
0	N9786	news	newspolitics	Elijah Cummings to lie in state at US Capitol ...	Cummings, a Democrat whose district included s...	https://assets.msn.com/labs/mind/AAJgNxm.html	[{"Label": "Elijah Cummings", "Type": "P", "Wi...	[{"Label": "Elijah Cummings", "Type": "P", "Wi...	Elijah Cummings to lie in state at US Capitol ...	1
1	N47214	news	newspolitics	Here are the lawmakers who are not seeking ree...	The battle for control of Congress is more tha...	https://assets.msn.com/labs/mind/AAFcidm.html	[]	[]	Here are the lawmakers who are not seeking ree...	0
2	N24905	news	newspolitics	Grieder: Special election in House District 28...	The special election in Texas House District 2...	https://assets.msn.com/labs/mind/AAJJa4h.html	[{"Label": "2004 North Carolina General Assemb...	[{"Label": "Republican Party (United States)",...	Grieder: Special election in House District 28...	0

	Press	Sentence	Ideology_label_by_media_bias	Ideology_label_by_our_model	link
0	AlterNet	On Tuesday, writing for The Daily Beast, forme...	0	NaN	https://www.alternet.org/2021/02/capitol-riot-...
1	AlterNet	Democrats are currently negotiating with thems...	0	NaN	https://www.alternet.org/2021/02/manchin-biden/
2	AlterNet	Hours after President Biden declared that "dem...	0	NaN	https://www.alternet.org/2021/01/republican-el...
3	NationalReview	Last week, just as Texas was set to officially...	1	NaN	https://www.nationalreview.com/corner/district...
4	NationalReview	For the third year in a row, Senate Democrats ...	1	NaN	https://www.nationalreview.com/2021/02/democra...
5	NationalReview	Conservatism is having an identity crisis When...	1	NaN	https://www.nationalreview.com/2021/02/the-con...
6	TheFederalist	Democrats Distract Federal Government With ‘Gr...	1	NaN	https://thefederalist.com/2021/02/09/democrats...
7	TheFederalist	So the petty little fascists in the Democratic...	1	NaN	https://thefederalist.com/2021/02/08/this-impe...
8	TheFederalist	The conservative mindset may be to despair rig...	1	NaN	https://thefederalist.com/2021/02/09/exclusive...
9	TheNewYorker	As tens of thousands of Americans continue to ...	0	NaN	https://www.newyorker.com/news/our-columnists/...
10	TheNewYorker	Last week’s storming of the Capitol attracted ...	0	NaN	https://www.newyorker.com/news/news-desk/a-for...
11	TheNewYorker	Donald Trump is no longer the President of the...	0	NaN	https://www.newyorker.com/news/our-columnists/...

POP THE FILTER BUBBLE

각설하고, 요는 내가 필터버블에 관심이 있다는 것!

일명, POP THE FILTER BUBBLE