Supervised Machine Learning
Language Classification
Abusive Language Detection
Dataset: Hate Speech and Offensive Language Dataset
Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
nltk.download('stopwords', quiet=True)
nltk.download('wordnet', quiet=True)
nltk.download('omw-1.4', quiet=True)
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer
import string
import re
import demoji
from wordcloud import WordCloud, STOPWORDS
from textblob import TextBlob
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
from sklearn.metrics import confusion_matrix, classification_report
import warnings
warnings.filterwarnings('ignore')
The dataset used in this research is obtained from Kaggle.
Data Characteristics¶
The data used in this research is based on Twitter data and contains information about hate speech and offensive language. The dataset consists if tweets that are normal, racist, sexist, homophobic and generally offensive. The dataset obtained is a labeled dataset. The dataset contains 7 feature columns and 24783 Records.
Index¶
This column contains the index number for each record.
Count¶
It represents the number of users from CrowdFlower (CF) that coded each tweet ranges from 3 or more.
hate_speech¶
It represents the number of CF users who judged that the tweet content is hate speech
Offensive_language¶
It represents the number of Cf users who judged that the tweet content is offensive
neither¶
It represents the number of CF users who judged that the tweet content is neither hate speech nor offensive
class¶
This column represents the class label assigned to each tweet 0 for hate speech, 1 for offensive language and 2 for neither.
Tweet¶
This column represents the tweet content obtained from twitter.
tweet_data=pd.read_csv("labeled_data.csv")
print(tweet_data.shape)
(24783, 7)
tweet_data.columns
Index(['Unnamed: 0', 'count', 'hate_speech', 'offensive_language', 'neither', 'class', 'tweet'], dtype='object')
tweet_data.head()
Unnamed: 0 | count | hate_speech | offensive_language | neither | class | tweet | |
---|---|---|---|---|---|---|---|
0 | 0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... |
1 | 1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... |
2 | 2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... |
3 | 3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... |
4 | 4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... |
tweet_data.tail()
Unnamed: 0 | count | hate_speech | offensive_language | neither | class | tweet | |
---|---|---|---|---|---|---|---|
24778 | 25291 | 3 | 0 | 2 | 1 | 1 | you's a muthaf***in lie “@LifeAsKing: @2... |
24779 | 25292 | 3 | 0 | 1 | 2 | 2 | you've gone and broke the wrong heart baby, an... |
24780 | 25294 | 3 | 0 | 3 | 0 | 1 | young buck wanna eat!!.. dat nigguh like I ain... |
24781 | 25295 | 6 | 0 | 6 | 0 | 1 | youu got wild bitches tellin you lies |
24782 | 25296 | 3 | 0 | 0 | 3 | 2 | ~~Ruffled | Ntac Eileen Dahlia - Beautiful col... |
tweet_data.sample(5)
Unnamed: 0 | count | hate_speech | offensive_language | neither | class | tweet | |
---|---|---|---|---|---|---|---|
10742 | 11023 | 3 | 0 | 3 | 0 | 1 | I made ya bitch sicc wit that one 😂 |
22914 | 23394 | 3 | 0 | 3 | 0 | 1 | Why you Worried bout a bitch, weed, clothes,se... |
929 | 949 | 3 | 0 | 3 | 0 | 1 | #youaremoreattractive if u a real bitch! |
1701 | 1736 | 3 | 1 | 2 | 0 | 1 | “@badnradbrad: @whattheflocka @MorbidMer... |
19568 | 20003 | 3 | 0 | 3 | 0 | 1 | RT @lnsaneTweets: I'm such a sarcastic bitch i... |
Data Pre-processing¶
Data pre-processing is required to clean and transform data to make sure that it is ready for analysis. Several steps are followed to pre-process this data. The steps are listed as follows
### The index column is irrelevant to data analysis so it is dropped.
tweet_data=tweet_data.drop(['Unnamed: 0'],axis=1)
tweet_data
count | hate_speech | offensive_language | neither | class | tweet | |
---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... |
... | ... | ... | ... | ... | ... | ... |
24778 | 3 | 0 | 2 | 1 | 1 | you's a muthaf***in lie “@LifeAsKing: @2... |
24779 | 3 | 0 | 1 | 2 | 2 | you've gone and broke the wrong heart baby, an... |
24780 | 3 | 0 | 3 | 0 | 1 | young buck wanna eat!!.. dat nigguh like I ain... |
24781 | 6 | 0 | 6 | 0 | 1 | youu got wild bitches tellin you lies |
24782 | 3 | 0 | 0 | 3 | 2 | ~~Ruffled | Ntac Eileen Dahlia - Beautiful col... |
24783 rows × 6 columns
The data is checked for null values but no null value is found.¶
tweet_data.isna().sum()
count 0 hate_speech 0 offensive_language 0 neither 0 class 0 tweet 0 dtype: int64
The data is checked for duplicated records but no duplicate value is found in the data¶
tweet_data.duplicated().sum()
0
Exploratory Data Analysis¶
• Summary statistics of the different feature columns in our data is presented using the following table
tweet_data.describe()
count | hate_speech | offensive_language | neither | class | |
---|---|---|---|---|---|
count | 24783.000000 | 24783.000000 | 24783.000000 | 24783.000000 | 24783.000000 |
mean | 3.243473 | 0.280515 | 2.413711 | 0.549247 | 1.110277 |
std | 0.883060 | 0.631851 | 1.399459 | 1.113299 | 0.462089 |
min | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 3.000000 | 0.000000 | 2.000000 | 0.000000 | 1.000000 |
50% | 3.000000 | 0.000000 | 3.000000 | 0.000000 | 1.000000 |
75% | 3.000000 | 0.000000 | 3.000000 | 0.000000 | 1.000000 |
max | 9.000000 | 7.000000 | 9.000000 | 9.000000 | 2.000000 |
• Box plot is a visualization technique used to display data by grouping numerical columns into quartiles. The data describe the median (Q2) with a line and 1st and 3rd quartile (Q1 to Q3) using a box. The points extending from the box presents data range and outliers. Box plots are also used to compare variables and distributions.
The box plot of different feature columns is represented as
data_plot=sns.boxplot(data=tweet_data)
data_plot.set_xticklabels(data_plot.get_xticklabels(), rotation=45)
[Text(0, 0, 'count'), Text(1, 0, 'hate_speech'), Text(2, 0, 'offensive_language'), Text(3, 0, 'neither'), Text(4, 0, 'class')]
• The distribution of count column shows that the number of users for reviewing each tweet ranges from 3 to 9. On average at least 3 users have reviewed each tweet and judged its content.
• The distribution of hate_Speech column shows that the number of users for reviewing each tweet as hate speech and their number ranges from 0 to 7 where 0 means that no user thinks that the tweet is hate speech. 4993 records i.e., 20.15% records have been filtered where number of users reviewing hate speech has a number greater than 0. Most of the tweet content is considered not a hate speech therefore the count of 0 users is the highest in this plot.
• The distribution of offensive_language column shows that the number of users for reviewing each tweet as offensive language and their number ranges from 0 to 8 where 0 means that no user thinks that the tweet is hate speech. 21308 i.e., 85.98% records have been filtered where number of users reviewing offensive language has a number greater than 0. More than 13000 tweet content is considered offensive language therefore the count of 3 users is the highest in this plot.
• The distribution of neither column shows that the number of users for reviewing each tweet as neither hate speech nor offensive language and their number ranges from 0 to 8 where 0 means that no user thinks that the tweet is hate speech. 5891 records i.e., 23.77% records have been filtered where number of users reviewing tweet content as neither has a number greater than 0. More than 2500 tweet content is considered neither as shown by the bar at 3 users.
• The distribution of count column shows the class assigned to each tweet content. 1430 records i.e., 5.77% have been assigned class 0 which is hate speech. 19190 records i.e., 77.43% have been assigned class 1 which is offensive language. 4163 records i.e., 16.8% have been assigned class 2 which is neither hate speech nor offensive language. Highest class assigned is offensive language while the lowest class assigned is hate speech as displayed in the histogram.
for col in tweet_data[['count', 'hate_speech', 'offensive_language', 'neither',
'class']]:
sns.histplot(tweet_data[col])
plt.show()
tweet_data[tweet_data['hate_speech']>0].describe()
count | hate_speech | offensive_language | neither | class | |
---|---|---|---|---|---|
count | 4993.000000 | 4993.000000 | 4993.000000 | 4993.000000 | 4993.000000 |
mean | 3.382936 | 1.392349 | 1.827759 | 0.162828 | 0.764070 |
std | 1.124272 | 0.658461 | 1.256703 | 0.594213 | 0.530344 |
min | 3.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 3.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 |
50% | 3.000000 | 1.000000 | 2.000000 | 0.000000 | 1.000000 |
75% | 3.000000 | 2.000000 | 2.000000 | 0.000000 | 1.000000 |
max | 9.000000 | 7.000000 | 8.000000 | 8.000000 | 2.000000 |
tweet_data[tweet_data['offensive_language']>0].describe()
count | hate_speech | offensive_language | neither | class | |
---|---|---|---|---|---|
count | 21308.000000 | 21308.000000 | 21308.000000 | 21308.000000 | 21308.000000 |
mean | 3.263328 | 0.266942 | 2.807349 | 0.189037 | 1.000751 |
std | 0.916658 | 0.578783 | 1.082944 | 0.572051 | 0.315283 |
min | 3.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 |
25% | 3.000000 | 0.000000 | 2.000000 | 0.000000 | 1.000000 |
50% | 3.000000 | 0.000000 | 3.000000 | 0.000000 | 1.000000 |
75% | 3.000000 | 0.000000 | 3.000000 | 0.000000 | 1.000000 |
max | 9.000000 | 7.000000 | 9.000000 | 8.000000 | 2.000000 |
tweet_data[tweet_data['neither']>0].describe()
count | hate_speech | offensive_language | neither | class | |
---|---|---|---|---|---|
count | 5891.000000 | 5891.000000 | 5891.000000 | 5891.000000 | 5891.000000 |
mean | 3.248175 | 0.108301 | 0.829231 | 2.310643 | 1.685113 |
std | 0.899459 | 0.422802 | 1.138030 | 1.069688 | 0.508816 |
min | 3.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
25% | 3.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 |
50% | 3.000000 | 0.000000 | 0.000000 | 3.000000 | 2.000000 |
75% | 3.000000 | 0.000000 | 2.000000 | 3.000000 | 2.000000 |
max | 9.000000 | 7.000000 | 8.000000 | 9.000000 | 2.000000 |
tweet_data[tweet_data['class']==0].describe()
count | hate_speech | offensive_language | neither | class | |
---|---|---|---|---|---|
count | 1430.000000 | 1430.000000 | 1430.000000 | 1430.000000 | 1430.0 |
mean | 3.108392 | 2.256643 | 0.755944 | 0.095804 | 0.0 |
std | 0.648084 | 0.573994 | 0.487653 | 0.326007 | 0.0 |
min | 3.000000 | 2.000000 | 0.000000 | 0.000000 | 0.0 |
25% | 3.000000 | 2.000000 | 0.000000 | 0.000000 | 0.0 |
50% | 3.000000 | 2.000000 | 1.000000 | 0.000000 | 0.0 |
75% | 3.000000 | 2.000000 | 1.000000 | 0.000000 | 0.0 |
max | 9.000000 | 7.000000 | 4.000000 | 4.000000 | 0.0 |
tweet_data[tweet_data['class']==1].describe()
count | hate_speech | offensive_language | neither | class | |
---|---|---|---|---|---|
count | 19190.000000 | 19190.000000 | 19190.000000 | 19190.000000 | 19190.0 |
mean | 3.268890 | 0.180459 | 3.003544 | 0.084888 | 1.0 |
std | 0.923024 | 0.407220 | 0.954097 | 0.284093 | 0.0 |
min | 3.000000 | 0.000000 | 2.000000 | 0.000000 | 1.0 |
25% | 3.000000 | 0.000000 | 3.000000 | 0.000000 | 1.0 |
50% | 3.000000 | 0.000000 | 3.000000 | 0.000000 | 1.0 |
75% | 3.000000 | 0.000000 | 3.000000 | 0.000000 | 1.0 |
max | 9.000000 | 4.000000 | 9.000000 | 3.000000 | 1.0 |
tweet_data[tweet_data['class']==2].describe()
count | hate_speech | offensive_language | neither | class | |
---|---|---|---|---|---|
count | 4163.000000 | 4163.000000 | 4163.000000 | 4163.000000 | 4163.0 |
mean | 3.172712 | 0.062935 | 0.264233 | 2.845544 | 2.0 |
std | 0.746097 | 0.253524 | 0.461737 | 0.795181 | 0.0 |
min | 3.000000 | 0.000000 | 0.000000 | 2.000000 | 2.0 |
25% | 3.000000 | 0.000000 | 0.000000 | 2.000000 | 2.0 |
50% | 3.000000 | 0.000000 | 0.000000 | 3.000000 | 2.0 |
75% | 3.000000 | 0.000000 | 1.000000 | 3.000000 | 2.0 |
max | 9.000000 | 3.000000 | 4.000000 | 9.000000 | 2.0 |
sns.stripplot(data=tweet_data, x="hate_speech", y="class")
plt.show()
sns.stripplot(data=tweet_data, x="offensive_language", y="class")
plt.show()
sns.stripplot(data=tweet_data, x="neither", y="class")
plt.show()
Scatter plot are used to understand relationship between different variables. The first scatter plot is drawn between number of users reviewing hate speech and offensive language and the class assigned to them. The class is highlighted with a colour. The plot clearly shows that if number of users judging a tweet content to be both offensive and hate speech label is assigned according to the highest number of users and if the tweet is judged as neither hate speech nor offensive language neither i.e., 2 label is assigned.
# scatter plot hue parameter
sns.scatterplot(x = "hate_speech", y = "offensive_language", data = tweet_data, hue = "class")
plt.title("Scatter Plot for hate_speech vs offensive_language according to their class label")
plt.show()
tweet_data.groupby(['class']).count()#.apply(lambda x:100 * x / float(x.mean()))
count | hate_speech | offensive_language | neither | tweet | |
---|---|---|---|---|---|
class | |||||
0 | 1430 | 1430 | 1430 | 1430 | 1430 |
1 | 19190 | 19190 | 19190 | 19190 | 19190 |
2 | 4163 | 4163 | 4163 | 4163 | 4163 |
len(tweet_data[(tweet_data['hate_speech']>0)&(tweet_data['class']==0)])/len(tweet_data)*100
5.770084332001776
len(tweet_data[(tweet_data['hate_speech']>0)&(tweet_data['class']==1)])/len(tweet_data)*100
13.359964491788725
len(tweet_data[(tweet_data['hate_speech']>0)&(tweet_data['class']==2)])/len(tweet_data)*100
1.0168260501149982
len(tweet_data[(tweet_data['offensive_language']>0)&(tweet_data['class']==0)])/len(tweet_data)*100
4.240810232820885
len(tweet_data[(tweet_data['offensive_language']>0)&(tweet_data['class']==1)])/len(tweet_data)*100
77.43211072105879
len(tweet_data[(tweet_data['offensive_language']>0)&(tweet_data['class']==2)])/len(tweet_data)*100
4.305370616955171
len(tweet_data[(tweet_data['neither']>0)&(tweet_data['class']==0)])/len(tweet_data)*100
0.5124480490658919
len(tweet_data[(tweet_data['neither']>0)&(tweet_data['class']==1)])/len(tweet_data)*100
6.460073437436953
len(tweet_data[(tweet_data['neither']>0)&(tweet_data['class']==2)])/len(tweet_data)*100
16.797804946939436
tweet_data['class'].hist()
<Axes: >
The data in the tweet column is processed using text pre-processing. First the column data type is converted to string type.
tweet_data["tweet"] = tweet_data["tweet"].astype(str)
Punctuation marks in text is unnecessary information that does not provide and meaning to the text for building model or corpus. Punctuation is removed from tweets text in the preprocess_tweet column. All the further pre-processing is stored in the preprocess_tweet column.
print(string.punctuation)
def remove_punctuation(tweet):
punctuationfree="".join([i for i in tweet if i not in string.punctuation])
return punctuationfree
tweet_data['preprocess_tweet']= tweet_data['tweet'].apply(lambda x:remove_punctuation(x))
tweet_data.head()
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | |
---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | RT mayasolovely As a woman you shouldnt compl... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | RT mleew17 boy dats coldtyga dwn bad for cuff... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | RT UrKindOfBrand Dawg RT 80sbaby4life You eve... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | RT CGAnderson vivabased she look like a tranny |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | RT ShenikaRoberts The shit you hear about me ... |
Text conversion from uppercase to lowercase is required for standardization. The lower () method in python converts every uppercase letter to lowercase while the lowercase characters remain unchanged. The tweets text is converted into lower case.
tweet_data['preprocess_tweet']= tweet_data['preprocess_tweet'].apply(lambda x: x.lower())
tweet_data.head()
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | |
---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | rt mayasolovely as a woman you shouldnt compl... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | rt mleew17 boy dats coldtyga dwn bad for cuff... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | rt urkindofbrand dawg rt 80sbaby4life you eve... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | rt cganderson vivabased she look like a tranny |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | rt shenikaroberts the shit you hear about me ... |
Tokenization is applied to each sentence where sentences are split into words. In this step stream of words are broken down into smaller chunks called tokens. This step is required because it helps in understanding vocabulary and lexicon of the text and allows better pattern analysis.
def tokenization(tweet):
tokens = re.split('W+',tweet)
return tokens
tweet_data['preprocess_tweet']= tweet_data['preprocess_tweet'].apply(lambda x: tokenization(x))
tweet_data.head()
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | |
---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | [ rt mayasolovely as a woman you shouldnt comp... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | [ rt mleew17 boy dats coldtyga dwn bad for cuf... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | [ rt urkindofbrand dawg rt 80sbaby4life you ev... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | [ rt cganderson vivabased she look like a tranny] |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | [ rt shenikaroberts the shit you hear about me... |
Stop words are the common words that are frequently found in text but do not add significant meaning to the text and therefore interfere in NLP tasks. Stop words like ‘I’, ‘me’, ‘my’ etc are removed from the tweet text to focus on more meaningful words.
stopwords = nltk.corpus.stopwords.words('english')
print(stopwords[0:10])
def remove_stopwords(Tweet):
output= [i for i in Tweet if i not in stopwords]
return output
tweet_data['preprocess_tweet']= tweet_data['preprocess_tweet'].apply(lambda x:remove_stopwords(x))
tweet_data.head()
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | |
---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | [ rt mayasolovely as a woman you shouldnt comp... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | [ rt mleew17 boy dats coldtyga dwn bad for cuf... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | [ rt urkindofbrand dawg rt 80sbaby4life you ev... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | [ rt cganderson vivabased she look like a tranny] |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | [ rt shenikaroberts the shit you hear about me... |
Stemming is an important technique for text normalization and its processing. The process of stemming involves getting different morphological variations given a root word. Stemming is applied using PorterStemmer(). In the stemming process stem or root word of each word is extracted.
porter_stemmer = PorterStemmer()
def stemming(Tweet):
stem_Tweet = [porter_stemmer.stem(word) for word in Tweet]
return stem_Tweet
tweet_data['preprocess_tweet']=tweet_data['preprocess_tweet'].apply(lambda x: stemming(x))
tweet_data.head()
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | |
---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | [ rt mayasolovely as a woman you shouldnt comp... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | [ rt mleew17 boy dats coldtyga dwn bad for cuf... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | [ rt urkindofbrand dawg rt 80sbaby4life you ev... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | [ rt cganderson vivabased she look like a tranni] |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | [ rt shenikaroberts the shit you hear about me... |
Lemmatization is applied using WordNetLemmatizer (). In the lemmatization process lemma of each word is extracted. The process of lemmatization is similar to stemming where a word is converted to its base form. The technique is used for inflection endings removal. Lemmatization returns the base or dictionary from of a word also known as lemma.
wordnet_lemmatizer = WordNetLemmatizer()
def lemmatizer(Tweet):
lemm_Tweet = [wordnet_lemmatizer.lemmatize(word) for word in Tweet]
return lemm_Tweet
tweet_data['preprocess_tweet']=tweet_data['preprocess_tweet'].apply(lambda x:lemmatizer(x))
tweet_data.head()
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | |
---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | [ rt mayasolovely as a woman you shouldnt comp... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | [ rt mleew17 boy dats coldtyga dwn bad for cuf... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | [ rt urkindofbrand dawg rt 80sbaby4life you ev... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | [ rt cganderson vivabased she look like a tranni] |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | [ rt shenikaroberts the shit you hear about me... |
The processed words are then recombined into sentences using space.
def get_sentence(words):
sentence = ' '.join(words)
return sentence
tweet_data['preprocess_tweet']=tweet_data['preprocess_tweet'].apply(lambda x: get_sentence(x))
tweet_data.head()
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | |
---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | rt mayasolovely as a woman you shouldnt compl... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | rt mleew17 boy dats coldtyga dwn bad for cuff... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | rt urkindofbrand dawg rt 80sbaby4life you eve... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | rt cganderson vivabased she look like a tranni |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | rt shenikaroberts the shit you hear about me ... |
Online content contains a lot of emoji or emoticons to represent emotions and feelings. But these emojis or emoticons interfere with the analysis process. Using the demoji library these emotion signals are removed from tweet sentences.
def remove_emoji(tweet):
dem = demoji.findall(tweet)
for item in dem.keys():
tweet = tweet.replace(item, '')
return tweet
tweet_data['preprocess_tweet']= tweet_data['preprocess_tweet'].apply(lambda x: remove_emoji(x))
tweet_data.head()
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | |
---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | rt mayasolovely as a woman you shouldnt compl... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | rt mleew17 boy dats coldtyga dwn bad for cuff... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | rt urkindofbrand dawg rt 80sbaby4life you eve... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | rt cganderson vivabased she look like a tranni |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | rt shenikaroberts the shit you hear about me ... |
tweet=" ".join(i for i in tweet_data.preprocess_tweet)
stopwords=set(STOPWORDS)
wordcloud = WordCloud(width = 1000, height = 500,
background_color ='white',
stopwords = stopwords, max_words=100,
min_font_size = 10).generate(tweet)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
tweet_data.head()
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | |
---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | rt mayasolovely as a woman you shouldnt compl... |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | rt mleew17 boy dats coldtyga dwn bad for cuff... |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | rt urkindofbrand dawg rt 80sbaby4life you eve... |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | rt cganderson vivabased she look like a tranni |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | rt shenikaroberts the shit you hear about me ... |
tweet_data_hate = tweet_data[tweet_data["class"]==0]
tweet_data_offensive = tweet_data[tweet_data["class"]==1]
tweet_data_neither = tweet_data[tweet_data["class"]==2]
print(tweet_data_hate.shape)
(1430, 7)
tweet=" ".join(i for i in tweet_data_hate.preprocess_tweet)
stopwords=set(STOPWORDS)
wordcloud = WordCloud(width = 1000, height = 500,
background_color ='white',
stopwords = stopwords, max_words=100,
min_font_size = 10).generate(tweet)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
tweet=" ".join(i for i in tweet_data_offensive.preprocess_tweet)
stopwords=set(STOPWORDS)
wordcloud = WordCloud(width = 1000, height = 500,
background_color ='white',
stopwords = stopwords, max_words=100,
min_font_size = 10).generate(tweet)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
tweet=" ".join(i for i in tweet_data_neither.preprocess_tweet)
stopwords=set(STOPWORDS)
wordcloud = WordCloud(width = 1000, height = 500,
background_color ='white',
stopwords = stopwords, max_words=100,
min_font_size = 10).generate(tweet)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Sentiment polarity displays the user’s sentiment of a particular text or phrase in the range of -1 to 1. This is also known as sentiment score. Polarity of each tweet content is computed and represented in a histogram. The chart describes that the polarity score count is highest between 0 and 0.2.
def getPolarity(Tweet):
return TextBlob(Tweet).sentiment.polarity
tweet_data['polarity']=tweet_data['preprocess_tweet'].apply(getPolarity)
tweet_data.sample(5)
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | polarity | |
---|---|---|---|---|---|---|---|---|
6158 | 3 | 0 | 3 | 0 | 1 | @illest_will @djdynamiq @edrobersonsf @sarahli... | illestwill djdynamiq edrobersonsf sarahlizsf t... | 0.5 |
17866 | 3 | 1 | 2 | 0 | 1 | RT @TryHardAlby: @TryHardSilva @davidam_23 get... | rt tryhardalby tryhardsilva davidam23 get down... | -0.6 |
22429 | 3 | 0 | 3 | 0 | 1 | W bitch | w bitch | 0.0 |
13494 | 3 | 0 | 3 | 0 | 1 | Now when I put this pussy on Vivian she bet no... | now when i put this pussy on vivian she bet no... | 0.0 |
5393 | 3 | 1 | 2 | 0 | 1 | @_cblaze @kieffer_jason ask your boy Jason kei... | cblaze kiefferjason ask your boy jason keiffer... | 0.0 |
def getAnalysis(score):
if score < 0:
return 'Negative'
elif score == 0:
return 'Neutral'
else:
return 'Positive'
tweet_data['sentiment']=tweet_data['polarity'].apply(getAnalysis)
tweet_data.sample(5)
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | polarity | sentiment | |
---|---|---|---|---|---|---|---|---|---|
3451 | 3 | 0 | 1 | 2 | 2 | @IHateStevenSing\nI ain't to show bout dem col... | ihatestevensing\ni aint to show bout dem color... | 0.000000 | Neutral |
4504 | 3 | 0 | 1 | 2 | 2 | @RealSkipBayless man what about Wilson being t... | realskipbayless man what about wilson being th... | 0.211111 | Positive |
21888 | 3 | 1 | 2 | 0 | 1 | They told me to fuc wit bitches but never trus... | they told me to fuc wit bitches but never trus... | 0.000000 | Neutral |
14925 | 3 | 0 | 3 | 0 | 1 | RT @Dan_OSU_Hashtag: You ever look at a bitch ... | rt danosuhashtag you ever look at a bitch and ... | 0.000000 | Neutral |
4395 | 3 | 0 | 1 | 2 | 2 | @Paulyy2 nah some honkey lookin dude at chr | paulyy2 nah some honkey lookin dude at chr | 0.000000 | Neutral |
sns.set(rc={'figure.figsize':(5,5)})
tweet_data['polarity'].hist()
plt.show()
# scatter plot hue parameter
sns.scatterplot(x = "hate_speech", y = "polarity", data = tweet_data, hue = "class")
plt.show()
The relationship between number of users judging a tweet to be offensive language with sentiment polarity is studied using a scatter plot while highlighting the label assigned to that tweet. The plot displays that sentiment polarity is not defining a tweet to be offensive language. Tweets with 1 class label contains all polarity scores.
# scatter plot hue parameter
sns.scatterplot(x = "offensive_language", y = "polarity", data = tweet_data, hue = "class")
plt.show()
# scatter plot hue parameter
sns.scatterplot(x = "neither", y = "polarity", data = tweet_data, hue = "class")
plt.show()
Sentiment polarity score is dived in three distributions where 0 denotes neutral sentiment positive value shows positive sentiment and negative value shows negative sentiment. Labels are assigned to each tweet content according to the score obtained and plotted using a histogram. The chart clearly describes that the number of neutral sentiments obtained are the highest.
tweet_data_negative = tweet_data[tweet_data["sentiment"]=='Negative']
tweet_data_positive = tweet_data[tweet_data["sentiment"]=='Positive']
tweet_data_neutral = tweet_data[tweet_data["sentiment"]=='Neutral']
def count_values_in_column(data,feature):
total=data.loc[:,feature].value_counts(dropna=False)
percentage=round(data.loc[:,feature].value_counts(dropna=False,normalize=True)*100,2)
return pd.concat([total,percentage],axis=1,keys=["Total","Percentage"])
count_values_in_column(tweet_data,"sentiment")
Total | Percentage | |
---|---|---|
sentiment | ||
Neutral | 10254 | 41.38 |
Negative | 7271 | 29.34 |
Positive | 7258 | 29.29 |
plt.figure(figsize=(13, 8), dpi=80)
pichart = count_values_in_column(tweet_data,"sentiment")
names= ["Positive","Neutral","Negative"]
size=pichart["Percentage"]
# Create a circle for the center of the plot
my_circle=plt.Circle( (0,0), 0.5, color='white')
plt.pie(size, labels=names, colors=['green','blue','red'])
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.show()
sns.countplot(data=tweet_data, x="sentiment")
plt.show()
Three separate data subsets are obtained for each type of sentiment. The top 1000 words in negative sentiment are represented using the word cloud.
data_neg = tweet_data_negative['preprocess_tweet']
plt.figure(figsize = (20,20))
wc = WordCloud(max_words = 1000 , width = 1000 , height = 500,
collocations=False).generate(" ".join(data_neg))
plt.imshow(wc)
plt.show()
The top 1000 words in positive sentiment are represented using the word cloud.
data_pos = tweet_data_positive['preprocess_tweet']
plt.figure(figsize = (20,20))
wc = WordCloud(max_words = 1000 , width = 1000 , height = 500,
collocations=False).generate(" ".join(data_pos))
plt.imshow(wc)
plt.show()
Subjectivity refers to the degree the textual content in influenced by a user personal feelings and beliefs. Its value ranges from 0 to 1 where 0 denotes no subjectivity and 1 shows high subjectivity. Sentiment subjectivity is also computed for each tweet content and represented using a histogram. The plot shows that sentiment with no subjectivity is the highest.
def getSubjectivity(Tweet):
return TextBlob(Tweet).sentiment.subjectivity
tweet_data['subjectivity']=tweet_data['preprocess_tweet'].apply(getSubjectivity)
tweet_data.head()
count | hate_speech | offensive_language | neither | class | tweet | preprocess_tweet | polarity | sentiment | subjectivity | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | 0 | 0 | 3 | 2 | !!! RT @mayasolovely: As a woman you shouldn't... | rt mayasolovely as a woman you shouldnt compl... | 0.000000 | Neutral | 0.000000 |
1 | 3 | 0 | 3 | 0 | 1 | !!!!! RT @mleew17: boy dats cold...tyga dwn ba... | rt mleew17 boy dats coldtyga dwn bad for cuff... | -0.700000 | Negative | 0.666667 |
2 | 3 | 0 | 3 | 0 | 1 | !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby... | rt urkindofbrand dawg rt 80sbaby4life you eve... | -0.333333 | Negative | 0.700000 |
3 | 3 | 0 | 2 | 1 | 1 | !!!!!!!!! RT @C_G_Anderson: @viva_based she lo... | rt cganderson vivabased she look like a tranni | 0.000000 | Neutral | 0.000000 |
4 | 6 | 0 | 6 | 0 | 1 | !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you... | rt shenikaroberts the shit you hear about me ... | 0.075000 | Positive | 0.725000 |
sns.set(rc={'figure.figsize':(5,5)})
tweet_data['subjectivity'].hist()
plt.show()
# scatter plot hue parameter
sns.scatterplot(x = "hate_speech", y = "subjectivity", data = tweet_data, hue = "class")
plt.show()
The relationship between number of users judging a tweet to be offensive with sentiment subjectivity is studied using a scatter plot while highlighting the label assigned to that tweet. The plot shows that a tweet's content does not have to be subjective in order for it to contain offensive language. Tweets that have been assigned label 1 include all subjectivity ratings.
# scatter plot hue parameter
sns.scatterplot(x = "offensive_language", y = "subjectivity", data = tweet_data, hue = "class")
plt.show()
# scatter plot hue parameter
sns.scatterplot(x = "neither", y = "subjectivity", data = tweet_data, hue = "class")
plt.show()
tweet_data.to_excel('preprcessed_labeled_data.xlsx',index=False)
tweet_data = pd.read_excel('preprcessed_labeled_data.xlsx')
print(tweet_data.shape)
(24783, 10)
tweet_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 24783 entries, 0 to 24782 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 count 24783 non-null int64 1 hate_speech 24783 non-null int64 2 offensive_language 24783 non-null int64 3 neither 24783 non-null int64 4 class 24783 non-null int64 5 tweet 24782 non-null object 6 preprocess_tweet 24783 non-null object 7 polarity 24783 non-null float64 8 sentiment 24783 non-null object 9 subjectivity 24783 non-null float64 dtypes: float64(2), int64(5), object(3) memory usage: 1.9+ MB
numeric_df = tweet_data.select_dtypes(include=['number'])
# calculate the correlation matrix
corr = numeric_df.corr()
# plot the heatmap
sns.heatmap(corr,
xticklabels=corr.columns,
yticklabels=corr.columns)
plt.show()
Machine Learning¶
X=tweet_data['preprocess_tweet']
Y=tweet_data['class']
The input is the 70% data split for training. After model training, models will be evaluated using 30% data.
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)
The number of records in X_train is 17348, the number of records in X_test is 7435.
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(17348,) (7435,) (17348,) (7435,)
Count Vectorizer¶
Count vectorizer produced 32461 features for 17348 records.
vectoriser = CountVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))
Vectoriser fitted. No. of feature_words: 32461
X_train = vectoriser.transform(X_train)
X_test = vectoriser.transform(X_test)
print(f'Data Transformed.')
Data Transformed.
MultinomialNB¶
clf = MultinomialNB()
clf.fit(X_train, y_train)
MultinomialNB()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MultinomialNB()
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
84.76126429051783
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8476126429051782 precision recall f1-score support 0 0.50 0.02 0.04 427 1 0.84 0.99 0.91 5747 2 0.90 0.47 0.62 1261 accuracy 0.85 7435 macro avg 0.75 0.49 0.52 7435 weighted avg 0.83 0.85 0.81 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 9 395 23] [ 8 5699 40] [ 1 666 594]]
TfidfVectorizer¶
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)
TF-IDF vectorizer produced 32461 features for 17348 records.
vectoriser = TfidfVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))
Vectoriser fitted. No. of feature_words: 32461
X_train = vectoriser.transform(X_train)
X_test = vectoriser.transform(X_test)
print(f'Data Transformed.')
Data Transformed.
MultinomialNB¶
clf = MultinomialNB()
clf.fit(X_train, y_train)
MultinomialNB()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MultinomialNB()
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
78.37256220578345
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.7837256220578346 precision recall f1-score support 0 0.00 0.00 0.00 427 1 0.78 1.00 0.88 5747 2 0.98 0.07 0.12 1261 accuracy 0.78 7435 macro avg 0.59 0.35 0.33 7435 weighted avg 0.77 0.78 0.70 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 0 427 0] [ 0 5745 2] [ 0 1179 82]]
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(17348,) (7435,) (17348,) (7435,)
Count Vectorizer¶
vectoriser = CountVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))
Vectoriser fitted. No. of feature_words: 32461
X_train = vectoriser.transform(X_train)
X_test = vectoriser.transform(X_test)
print(f'Data Transformed.')
Data Transformed.
clf = DecisionTreeClassifier(random_state=0)
clf.fit(X_train, y_train)
DecisionTreeClassifier(random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(random_state=0)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
87.6126429051782
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8761264290517821 precision recall f1-score support 0 0.33 0.21 0.26 427 1 0.92 0.93 0.93 5747 2 0.79 0.85 0.82 1261 accuracy 0.88 7435 macro avg 0.68 0.66 0.67 7435 weighted avg 0.87 0.88 0.87 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 89 286 52] [ 158 5356 233] [ 20 172 1069]]
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)
vectoriser = TfidfVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))
Vectoriser fitted. No. of feature_words: 32461
X_train = vectoriser.transform(X_train)
X_test = vectoriser.transform(X_test)
print(f'Data Transformed.')
Data Transformed.
clf = DecisionTreeClassifier(random_state=0)
clf.fit(X_train, y_train)
DecisionTreeClassifier(random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(random_state=0)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
86.80564895763283
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8680564895763282 precision recall f1-score support 0 0.35 0.30 0.32 427 1 0.92 0.92 0.92 5747 2 0.78 0.80 0.79 1261 accuracy 0.87 7435 macro avg 0.68 0.68 0.68 7435 weighted avg 0.86 0.87 0.87 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 127 255 45] [ 191 5313 243] [ 41 206 1014]]
Fine-Tuning¶
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(17348,) (7435,) (17348,) (7435,)
Count Vectorizer¶
vectoriser = CountVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))
Vectoriser fitted. No. of feature_words: 32461
X_train = vectoriser.transform(X_train)
X_test = vectoriser.transform(X_test)
print(f'Data Transformed.')
Data Transformed.
clf = MultinomialNB(alpha=1.0, fit_prior=False, class_prior=None)
clf.fit(X_train, y_train)
MultinomialNB(fit_prior=False)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MultinomialNB(fit_prior=False)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
86.21385339609952
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8621385339609953 precision recall f1-score support 0 0.41 0.12 0.19 427 1 0.87 0.97 0.92 5747 2 0.85 0.61 0.71 1261 accuracy 0.86 7435 macro avg 0.71 0.57 0.61 7435 weighted avg 0.84 0.86 0.84 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 53 336 38] [ 62 5589 96] [ 13 480 768]]
DecisionTreeClassifier¶
clf = DecisionTreeClassifier(criterion='gini',splitter='best',max_features=None, random_state=0)
clf.fit(X_train, y_train)
DecisionTreeClassifier(random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(random_state=0)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
87.6126429051782
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8761264290517821 precision recall f1-score support 0 0.33 0.21 0.26 427 1 0.92 0.93 0.93 5747 2 0.79 0.85 0.82 1261 accuracy 0.88 7435 macro avg 0.68 0.66 0.67 7435 weighted avg 0.87 0.88 0.87 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 89 286 52] [ 158 5356 233] [ 20 172 1069]]
Hyper parameter fine-tuning¶
from sklearn.model_selection import GridSearchCV
max_depth¶
This parameter selects the height of the tree at its tallest. This parameter takes integer value and the default value is None. If None, nodes are expanded either until all leaves are pure or until all leaves contain fewer samples than min_samples_split. With the default None max_depth the tree depth becomes 253. With GridSearchCV we used to find the best parameter value from the range of 1 to 100. And the best results were obtained using 65 as max_depth.
gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=None, random_state=0),
param_grid={'max_depth': list(range(2, 100))},
verbose=3)
gs.fit(X_train, y_train)
Fitting 5 folds for each of 98 candidates, totalling 490 fits [CV 1/5] END .......................max_depth=2;, score=0.775 total time= 0.4s [CV 2/5] END .......................max_depth=2;, score=0.775 total time= 0.4s [CV 3/5] END .......................max_depth=2;, score=0.775 total time= 0.4s [CV 4/5] END .......................max_depth=2;, score=0.775 total time= 0.4s [CV 5/5] END .......................max_depth=2;, score=0.775 total time= 0.3s [CV 1/5] END .......................max_depth=3;, score=0.775 total time= 0.5s [CV 2/5] END .......................max_depth=3;, score=0.774 total time= 0.6s [CV 3/5] END .......................max_depth=3;, score=0.775 total time= 0.4s [CV 4/5] END .......................max_depth=3;, score=0.775 total time= 0.4s [CV 5/5] END .......................max_depth=3;, score=0.775 total time= 0.4s [CV 1/5] END .......................max_depth=4;, score=0.775 total time= 0.4s [CV 2/5] END .......................max_depth=4;, score=0.774 total time= 0.4s [CV 3/5] END .......................max_depth=4;, score=0.775 total time= 0.4s [CV 4/5] END .......................max_depth=4;, score=0.775 total time= 0.4s [CV 5/5] END .......................max_depth=4;, score=0.775 total time= 0.4s [CV 1/5] END .......................max_depth=5;, score=0.775 total time= 0.4s [CV 2/5] END .......................max_depth=5;, score=0.774 total time= 0.4s [CV 3/5] END .......................max_depth=5;, score=0.774 total time= 0.4s [CV 4/5] END .......................max_depth=5;, score=0.774 total time= 0.4s [CV 5/5] END .......................max_depth=5;, score=0.775 total time= 0.4s [CV 1/5] END .......................max_depth=6;, score=0.777 total time= 0.5s [CV 2/5] END .......................max_depth=6;, score=0.771 total time= 0.5s [CV 3/5] END .......................max_depth=6;, score=0.774 total time= 0.5s [CV 4/5] END .......................max_depth=6;, score=0.772 total time= 0.5s [CV 5/5] END .......................max_depth=6;, score=0.774 total time= 0.5s [CV 1/5] END .......................max_depth=7;, score=0.788 total time= 0.5s [CV 2/5] END .......................max_depth=7;, score=0.784 total time= 0.5s [CV 3/5] END .......................max_depth=7;, score=0.796 total time= 0.5s [CV 4/5] END .......................max_depth=7;, score=0.788 total time= 0.5s [CV 5/5] END .......................max_depth=7;, score=0.790 total time= 0.5s [CV 1/5] END .......................max_depth=8;, score=0.800 total time= 0.5s [CV 2/5] END .......................max_depth=8;, score=0.793 total time= 0.5s [CV 3/5] END .......................max_depth=8;, score=0.806 total time= 0.6s [CV 4/5] END .......................max_depth=8;, score=0.798 total time= 0.7s [CV 5/5] END .......................max_depth=8;, score=0.794 total time= 0.8s [CV 1/5] END .......................max_depth=9;, score=0.809 total time= 0.6s [CV 2/5] END .......................max_depth=9;, score=0.804 total time= 0.6s [CV 3/5] END .......................max_depth=9;, score=0.818 total time= 0.6s [CV 4/5] END .......................max_depth=9;, score=0.809 total time= 0.6s [CV 5/5] END .......................max_depth=9;, score=0.804 total time= 0.7s [CV 1/5] END ......................max_depth=10;, score=0.817 total time= 0.9s [CV 2/5] END ......................max_depth=10;, score=0.814 total time= 0.8s [CV 3/5] END ......................max_depth=10;, score=0.825 total time= 0.7s [CV 4/5] END ......................max_depth=10;, score=0.820 total time= 0.7s [CV 5/5] END ......................max_depth=10;, score=0.814 total time= 0.7s [CV 1/5] END ......................max_depth=11;, score=0.822 total time= 0.6s [CV 2/5] END ......................max_depth=11;, score=0.819 total time= 0.6s [CV 3/5] END ......................max_depth=11;, score=0.831 total time= 0.6s [CV 4/5] END ......................max_depth=11;, score=0.828 total time= 0.6s [CV 5/5] END ......................max_depth=11;, score=0.821 total time= 0.6s [CV 1/5] END ......................max_depth=12;, score=0.829 total time= 0.7s [CV 2/5] END ......................max_depth=12;, score=0.826 total time= 0.7s [CV 3/5] END ......................max_depth=12;, score=0.837 total time= 0.7s [CV 4/5] END ......................max_depth=12;, score=0.837 total time= 0.7s [CV 5/5] END ......................max_depth=12;, score=0.827 total time= 1.0s [CV 1/5] END ......................max_depth=13;, score=0.837 total time= 0.8s [CV 2/5] END ......................max_depth=13;, score=0.832 total time= 0.7s [CV 3/5] END ......................max_depth=13;, score=0.842 total time= 0.7s [CV 4/5] END ......................max_depth=13;, score=0.843 total time= 0.7s [CV 5/5] END ......................max_depth=13;, score=0.839 total time= 0.7s [CV 1/5] END ......................max_depth=14;, score=0.842 total time= 0.7s [CV 2/5] END ......................max_depth=14;, score=0.835 total time= 0.7s [CV 3/5] END ......................max_depth=14;, score=0.846 total time= 0.7s [CV 4/5] END ......................max_depth=14;, score=0.847 total time= 0.7s [CV 5/5] END ......................max_depth=14;, score=0.842 total time= 0.7s [CV 1/5] END ......................max_depth=15;, score=0.845 total time= 0.8s [CV 2/5] END ......................max_depth=15;, score=0.841 total time= 0.8s [CV 3/5] END ......................max_depth=15;, score=0.848 total time= 0.8s [CV 4/5] END ......................max_depth=15;, score=0.848 total time= 0.8s [CV 5/5] END ......................max_depth=15;, score=0.847 total time= 0.7s [CV 1/5] END ......................max_depth=16;, score=0.847 total time= 0.8s [CV 2/5] END ......................max_depth=16;, score=0.845 total time= 0.8s [CV 3/5] END ......................max_depth=16;, score=0.852 total time= 1.0s [CV 4/5] END ......................max_depth=16;, score=0.850 total time= 1.0s [CV 5/5] END ......................max_depth=16;, score=0.850 total time= 0.9s [CV 1/5] END ......................max_depth=17;, score=0.851 total time= 1.0s [CV 2/5] END ......................max_depth=17;, score=0.846 total time= 1.2s [CV 3/5] END ......................max_depth=17;, score=0.852 total time= 0.9s [CV 4/5] END ......................max_depth=17;, score=0.854 total time= 1.0s [CV 5/5] END ......................max_depth=17;, score=0.853 total time= 0.9s [CV 1/5] END ......................max_depth=18;, score=0.852 total time= 1.2s [CV 2/5] END ......................max_depth=18;, score=0.848 total time= 1.2s [CV 3/5] END ......................max_depth=18;, score=0.857 total time= 1.0s [CV 4/5] END ......................max_depth=18;, score=0.853 total time= 1.0s [CV 5/5] END ......................max_depth=18;, score=0.854 total time= 1.1s [CV 1/5] END ......................max_depth=19;, score=0.858 total time= 1.3s [CV 2/5] END ......................max_depth=19;, score=0.850 total time= 1.3s [CV 3/5] END ......................max_depth=19;, score=0.859 total time= 1.4s [CV 4/5] END ......................max_depth=19;, score=0.856 total time= 1.0s [CV 5/5] END ......................max_depth=19;, score=0.858 total time= 0.9s [CV 1/5] END ......................max_depth=20;, score=0.858 total time= 1.0s [CV 2/5] END ......................max_depth=20;, score=0.852 total time= 0.9s [CV 3/5] END ......................max_depth=20;, score=0.860 total time= 0.9s [CV 4/5] END ......................max_depth=20;, score=0.860 total time= 1.0s [CV 5/5] END ......................max_depth=20;, score=0.860 total time= 1.1s [CV 1/5] END ......................max_depth=21;, score=0.862 total time= 1.0s [CV 2/5] END ......................max_depth=21;, score=0.856 total time= 1.0s [CV 3/5] END ......................max_depth=21;, score=0.863 total time= 1.0s [CV 4/5] END ......................max_depth=21;, score=0.860 total time= 1.1s [CV 5/5] END ......................max_depth=21;, score=0.865 total time= 1.3s [CV 1/5] END ......................max_depth=22;, score=0.867 total time= 1.5s [CV 2/5] END ......................max_depth=22;, score=0.858 total time= 1.3s [CV 3/5] END ......................max_depth=22;, score=0.865 total time= 1.1s [CV 4/5] END ......................max_depth=22;, score=0.865 total time= 1.2s [CV 5/5] END ......................max_depth=22;, score=0.866 total time= 1.5s [CV 1/5] END ......................max_depth=23;, score=0.868 total time= 1.2s [CV 2/5] END ......................max_depth=23;, score=0.859 total time= 1.4s [CV 3/5] END ......................max_depth=23;, score=0.865 total time= 1.4s [CV 4/5] END ......................max_depth=23;, score=0.867 total time= 1.4s [CV 5/5] END ......................max_depth=23;, score=0.869 total time= 1.4s [CV 1/5] END ......................max_depth=24;, score=0.867 total time= 1.8s [CV 2/5] END ......................max_depth=24;, score=0.863 total time= 1.9s [CV 3/5] END ......................max_depth=24;, score=0.869 total time= 1.3s [CV 4/5] END ......................max_depth=24;, score=0.865 total time= 1.5s [CV 5/5] END ......................max_depth=24;, score=0.871 total time= 1.3s [CV 1/5] END ......................max_depth=25;, score=0.870 total time= 1.4s [CV 2/5] END ......................max_depth=25;, score=0.866 total time= 1.3s [CV 3/5] END ......................max_depth=25;, score=0.867 total time= 1.4s [CV 4/5] END ......................max_depth=25;, score=0.867 total time= 1.4s [CV 5/5] END ......................max_depth=25;, score=0.876 total time= 1.6s [CV 1/5] END ......................max_depth=26;, score=0.869 total time= 1.6s [CV 2/5] END ......................max_depth=26;, score=0.867 total time= 1.9s [CV 3/5] END ......................max_depth=26;, score=0.871 total time= 1.3s [CV 4/5] END ......................max_depth=26;, score=0.874 total time= 1.4s [CV 5/5] END ......................max_depth=26;, score=0.876 total time= 1.2s [CV 1/5] END ......................max_depth=27;, score=0.876 total time= 1.4s [CV 2/5] END ......................max_depth=27;, score=0.868 total time= 1.4s [CV 3/5] END ......................max_depth=27;, score=0.869 total time= 1.4s [CV 4/5] END ......................max_depth=27;, score=0.874 total time= 1.5s [CV 5/5] END ......................max_depth=27;, score=0.879 total time= 1.8s [CV 1/5] END ......................max_depth=28;, score=0.877 total time= 1.4s [CV 2/5] END ......................max_depth=28;, score=0.869 total time= 1.9s [CV 3/5] END ......................max_depth=28;, score=0.874 total time= 1.3s [CV 4/5] END ......................max_depth=28;, score=0.879 total time= 1.3s [CV 5/5] END ......................max_depth=28;, score=0.879 total time= 1.3s [CV 1/5] END ......................max_depth=29;, score=0.878 total time= 1.4s [CV 2/5] END ......................max_depth=29;, score=0.869 total time= 1.7s [CV 3/5] END ......................max_depth=29;, score=0.875 total time= 1.5s [CV 4/5] END ......................max_depth=29;, score=0.878 total time= 1.4s [CV 5/5] END ......................max_depth=29;, score=0.882 total time= 1.4s [CV 1/5] END ......................max_depth=30;, score=0.878 total time= 1.8s [CV 2/5] END ......................max_depth=30;, score=0.873 total time= 1.9s [CV 3/5] END ......................max_depth=30;, score=0.873 total time= 1.8s [CV 4/5] END ......................max_depth=30;, score=0.878 total time= 1.8s [CV 5/5] END ......................max_depth=30;, score=0.880 total time= 1.8s [CV 1/5] END ......................max_depth=31;, score=0.877 total time= 1.8s [CV 2/5] END ......................max_depth=31;, score=0.875 total time= 1.8s [CV 3/5] END ......................max_depth=31;, score=0.873 total time= 1.9s [CV 4/5] END ......................max_depth=31;, score=0.879 total time= 2.0s [CV 5/5] END ......................max_depth=31;, score=0.883 total time= 2.1s [CV 1/5] END ......................max_depth=32;, score=0.878 total time= 1.9s [CV 2/5] END ......................max_depth=32;, score=0.873 total time= 1.5s [CV 3/5] END ......................max_depth=32;, score=0.875 total time= 1.6s [CV 4/5] END ......................max_depth=32;, score=0.876 total time= 1.6s [CV 5/5] END ......................max_depth=32;, score=0.882 total time= 1.4s [CV 1/5] END ......................max_depth=33;, score=0.881 total time= 1.5s [CV 2/5] END ......................max_depth=33;, score=0.875 total time= 1.8s [CV 3/5] END ......................max_depth=33;, score=0.874 total time= 1.5s [CV 4/5] END ......................max_depth=33;, score=0.877 total time= 1.9s [CV 5/5] END ......................max_depth=33;, score=0.883 total time= 1.5s [CV 1/5] END ......................max_depth=34;, score=0.880 total time= 1.5s [CV 2/5] END ......................max_depth=34;, score=0.873 total time= 1.5s [CV 3/5] END ......................max_depth=34;, score=0.873 total time= 1.6s [CV 4/5] END ......................max_depth=34;, score=0.879 total time= 1.6s [CV 5/5] END ......................max_depth=34;, score=0.883 total time= 1.6s [CV 1/5] END ......................max_depth=35;, score=0.877 total time= 1.5s [CV 2/5] END ......................max_depth=35;, score=0.875 total time= 1.5s [CV 3/5] END ......................max_depth=35;, score=0.874 total time= 1.4s [CV 4/5] END ......................max_depth=35;, score=0.881 total time= 1.8s [CV 5/5] END ......................max_depth=35;, score=0.884 total time= 1.6s [CV 1/5] END ......................max_depth=36;, score=0.878 total time= 1.4s [CV 2/5] END ......................max_depth=36;, score=0.873 total time= 2.0s [CV 3/5] END ......................max_depth=36;, score=0.874 total time= 1.6s [CV 4/5] END ......................max_depth=36;, score=0.880 total time= 2.0s [CV 5/5] END ......................max_depth=36;, score=0.883 total time= 1.7s [CV 1/5] END ......................max_depth=37;, score=0.878 total time= 1.7s [CV 2/5] END ......................max_depth=37;, score=0.871 total time= 1.4s [CV 3/5] END ......................max_depth=37;, score=0.876 total time= 1.9s [CV 4/5] END ......................max_depth=37;, score=0.883 total time= 1.5s [CV 5/5] END ......................max_depth=37;, score=0.885 total time= 1.5s [CV 1/5] END ......................max_depth=38;, score=0.879 total time= 1.6s [CV 2/5] END ......................max_depth=38;, score=0.871 total time= 1.5s [CV 3/5] END ......................max_depth=38;, score=0.876 total time= 1.6s [CV 4/5] END ......................max_depth=38;, score=0.882 total time= 1.7s [CV 5/5] END ......................max_depth=38;, score=0.880 total time= 1.6s [CV 1/5] END ......................max_depth=39;, score=0.876 total time= 1.8s [CV 2/5] END ......................max_depth=39;, score=0.875 total time= 2.1s [CV 3/5] END ......................max_depth=39;, score=0.875 total time= 1.8s [CV 4/5] END ......................max_depth=39;, score=0.885 total time= 1.8s [CV 5/5] END ......................max_depth=39;, score=0.881 total time= 2.1s [CV 1/5] END ......................max_depth=40;, score=0.878 total time= 1.8s [CV 2/5] END ......................max_depth=40;, score=0.871 total time= 1.6s [CV 3/5] END ......................max_depth=40;, score=0.876 total time= 1.7s [CV 4/5] END ......................max_depth=40;, score=0.886 total time= 1.8s [CV 5/5] END ......................max_depth=40;, score=0.880 total time= 2.0s [CV 1/5] END ......................max_depth=41;, score=0.880 total time= 1.9s [CV 2/5] END ......................max_depth=41;, score=0.873 total time= 1.9s [CV 3/5] END ......................max_depth=41;, score=0.877 total time= 2.1s [CV 4/5] END ......................max_depth=41;, score=0.885 total time= 1.9s [CV 5/5] END ......................max_depth=41;, score=0.884 total time= 2.0s [CV 1/5] END ......................max_depth=42;, score=0.876 total time= 1.7s [CV 2/5] END ......................max_depth=42;, score=0.871 total time= 1.9s [CV 3/5] END ......................max_depth=42;, score=0.876 total time= 2.5s [CV 4/5] END ......................max_depth=42;, score=0.879 total time= 2.0s [CV 5/5] END ......................max_depth=42;, score=0.885 total time= 1.8s [CV 1/5] END ......................max_depth=43;, score=0.878 total time= 1.6s [CV 2/5] END ......................max_depth=43;, score=0.873 total time= 1.6s [CV 3/5] END ......................max_depth=43;, score=0.878 total time= 1.6s [CV 4/5] END ......................max_depth=43;, score=0.884 total time= 1.6s [CV 5/5] END ......................max_depth=43;, score=0.885 total time= 1.9s [CV 1/5] END ......................max_depth=44;, score=0.879 total time= 1.9s [CV 2/5] END ......................max_depth=44;, score=0.875 total time= 1.9s [CV 3/5] END ......................max_depth=44;, score=0.874 total time= 1.6s [CV 4/5] END ......................max_depth=44;, score=0.880 total time= 1.7s [CV 5/5] END ......................max_depth=44;, score=0.884 total time= 1.9s [CV 1/5] END ......................max_depth=45;, score=0.881 total time= 1.7s [CV 2/5] END ......................max_depth=45;, score=0.871 total time= 1.9s [CV 3/5] END ......................max_depth=45;, score=0.877 total time= 1.7s [CV 4/5] END ......................max_depth=45;, score=0.881 total time= 1.8s [CV 5/5] END ......................max_depth=45;, score=0.884 total time= 2.0s [CV 1/5] END ......................max_depth=46;, score=0.880 total time= 1.8s [CV 2/5] END ......................max_depth=46;, score=0.873 total time= 1.6s [CV 3/5] END ......................max_depth=46;, score=0.877 total time= 1.7s [CV 4/5] END ......................max_depth=46;, score=0.882 total time= 1.8s [CV 5/5] END ......................max_depth=46;, score=0.883 total time= 1.7s [CV 1/5] END ......................max_depth=47;, score=0.877 total time= 1.8s [CV 2/5] END ......................max_depth=47;, score=0.875 total time= 1.8s [CV 3/5] END ......................max_depth=47;, score=0.878 total time= 2.2s [CV 4/5] END ......................max_depth=47;, score=0.883 total time= 1.8s [CV 5/5] END ......................max_depth=47;, score=0.885 total time= 1.8s [CV 1/5] END ......................max_depth=48;, score=0.881 total time= 2.0s [CV 2/5] END ......................max_depth=48;, score=0.873 total time= 1.8s [CV 3/5] END ......................max_depth=48;, score=0.873 total time= 1.8s [CV 4/5] END ......................max_depth=48;, score=0.882 total time= 1.9s [CV 5/5] END ......................max_depth=48;, score=0.883 total time= 1.9s [CV 1/5] END ......................max_depth=49;, score=0.879 total time= 1.9s [CV 2/5] END ......................max_depth=49;, score=0.871 total time= 1.9s [CV 3/5] END ......................max_depth=49;, score=0.873 total time= 1.7s [CV 4/5] END ......................max_depth=49;, score=0.885 total time= 1.8s [CV 5/5] END ......................max_depth=49;, score=0.885 total time= 2.0s [CV 1/5] END ......................max_depth=50;, score=0.881 total time= 1.7s [CV 2/5] END ......................max_depth=50;, score=0.872 total time= 1.7s [CV 3/5] END ......................max_depth=50;, score=0.877 total time= 1.8s [CV 4/5] END ......................max_depth=50;, score=0.887 total time= 1.8s [CV 5/5] END ......................max_depth=50;, score=0.884 total time= 2.0s [CV 1/5] END ......................max_depth=51;, score=0.880 total time= 2.0s [CV 2/5] END ......................max_depth=51;, score=0.871 total time= 1.8s [CV 3/5] END ......................max_depth=51;, score=0.875 total time= 1.9s [CV 4/5] END ......................max_depth=51;, score=0.883 total time= 1.8s [CV 5/5] END ......................max_depth=51;, score=0.885 total time= 1.8s [CV 1/5] END ......................max_depth=52;, score=0.881 total time= 1.9s [CV 2/5] END ......................max_depth=52;, score=0.873 total time= 2.2s [CV 3/5] END ......................max_depth=52;, score=0.876 total time= 2.6s [CV 4/5] END ......................max_depth=52;, score=0.882 total time= 2.3s [CV 5/5] END ......................max_depth=52;, score=0.885 total time= 2.1s [CV 1/5] END ......................max_depth=53;, score=0.881 total time= 1.9s [CV 2/5] END ......................max_depth=53;, score=0.874 total time= 2.0s [CV 3/5] END ......................max_depth=53;, score=0.878 total time= 1.8s [CV 4/5] END ......................max_depth=53;, score=0.881 total time= 1.8s [CV 5/5] END ......................max_depth=53;, score=0.887 total time= 2.4s [CV 1/5] END ......................max_depth=54;, score=0.880 total time= 2.1s [CV 2/5] END ......................max_depth=54;, score=0.874 total time= 2.5s [CV 3/5] END ......................max_depth=54;, score=0.877 total time= 2.1s [CV 4/5] END ......................max_depth=54;, score=0.883 total time= 1.9s [CV 5/5] END ......................max_depth=54;, score=0.884 total time= 1.9s [CV 1/5] END ......................max_depth=55;, score=0.882 total time= 1.8s [CV 2/5] END ......................max_depth=55;, score=0.875 total time= 1.7s [CV 3/5] END ......................max_depth=55;, score=0.872 total time= 2.2s [CV 4/5] END ......................max_depth=55;, score=0.881 total time= 1.8s [CV 5/5] END ......................max_depth=55;, score=0.884 total time= 1.8s [CV 1/5] END ......................max_depth=56;, score=0.878 total time= 1.7s [CV 2/5] END ......................max_depth=56;, score=0.875 total time= 1.7s [CV 3/5] END ......................max_depth=56;, score=0.871 total time= 2.0s [CV 4/5] END ......................max_depth=56;, score=0.881 total time= 2.0s [CV 5/5] END ......................max_depth=56;, score=0.886 total time= 1.9s [CV 1/5] END ......................max_depth=57;, score=0.881 total time= 2.1s [CV 2/5] END ......................max_depth=57;, score=0.875 total time= 1.7s [CV 3/5] END ......................max_depth=57;, score=0.875 total time= 1.9s [CV 4/5] END ......................max_depth=57;, score=0.879 total time= 2.0s [CV 5/5] END ......................max_depth=57;, score=0.885 total time= 2.3s [CV 1/5] END ......................max_depth=58;, score=0.882 total time= 1.9s [CV 2/5] END ......................max_depth=58;, score=0.872 total time= 2.2s [CV 3/5] END ......................max_depth=58;, score=0.873 total time= 2.7s [CV 4/5] END ......................max_depth=58;, score=0.880 total time= 1.9s [CV 5/5] END ......................max_depth=58;, score=0.886 total time= 2.1s [CV 1/5] END ......................max_depth=59;, score=0.878 total time= 1.8s [CV 2/5] END ......................max_depth=59;, score=0.876 total time= 2.4s [CV 3/5] END ......................max_depth=59;, score=0.876 total time= 2.3s [CV 4/5] END ......................max_depth=59;, score=0.882 total time= 2.0s [CV 5/5] END ......................max_depth=59;, score=0.885 total time= 2.5s [CV 1/5] END ......................max_depth=60;, score=0.882 total time= 2.2s [CV 2/5] END ......................max_depth=60;, score=0.872 total time= 2.0s [CV 3/5] END ......................max_depth=60;, score=0.875 total time= 2.3s [CV 4/5] END ......................max_depth=60;, score=0.883 total time= 1.9s [CV 5/5] END ......................max_depth=60;, score=0.885 total time= 2.2s [CV 1/5] END ......................max_depth=61;, score=0.883 total time= 2.5s [CV 2/5] END ......................max_depth=61;, score=0.871 total time= 2.2s [CV 3/5] END ......................max_depth=61;, score=0.878 total time= 1.9s [CV 4/5] END ......................max_depth=61;, score=0.883 total time= 2.0s [CV 5/5] END ......................max_depth=61;, score=0.886 total time= 2.3s [CV 1/5] END ......................max_depth=62;, score=0.879 total time= 2.0s [CV 2/5] END ......................max_depth=62;, score=0.868 total time= 2.1s [CV 3/5] END ......................max_depth=62;, score=0.877 total time= 2.1s [CV 4/5] END ......................max_depth=62;, score=0.880 total time= 2.1s [CV 5/5] END ......................max_depth=62;, score=0.884 total time= 2.5s [CV 1/5] END ......................max_depth=63;, score=0.882 total time= 2.6s [CV 2/5] END ......................max_depth=63;, score=0.873 total time= 2.3s [CV 3/5] END ......................max_depth=63;, score=0.876 total time= 2.4s [CV 4/5] END ......................max_depth=63;, score=0.880 total time= 2.1s [CV 5/5] END ......................max_depth=63;, score=0.885 total time= 1.9s [CV 1/5] END ......................max_depth=64;, score=0.880 total time= 2.4s [CV 2/5] END ......................max_depth=64;, score=0.871 total time= 2.0s [CV 3/5] END ......................max_depth=64;, score=0.879 total time= 2.2s [CV 4/5] END ......................max_depth=64;, score=0.877 total time= 1.9s [CV 5/5] END ......................max_depth=64;, score=0.883 total time= 2.0s [CV 1/5] END ......................max_depth=65;, score=0.884 total time= 2.0s [CV 2/5] END ......................max_depth=65;, score=0.875 total time= 2.4s [CV 3/5] END ......................max_depth=65;, score=0.878 total time= 2.3s [CV 4/5] END ......................max_depth=65;, score=0.880 total time= 2.1s [CV 5/5] END ......................max_depth=65;, score=0.887 total time= 1.9s [CV 1/5] END ......................max_depth=66;, score=0.882 total time= 1.9s [CV 2/5] END ......................max_depth=66;, score=0.875 total time= 2.0s [CV 3/5] END ......................max_depth=66;, score=0.877 total time= 1.9s [CV 4/5] END ......................max_depth=66;, score=0.882 total time= 2.2s [CV 5/5] END ......................max_depth=66;, score=0.883 total time= 2.1s [CV 1/5] END ......................max_depth=67;, score=0.886 total time= 2.5s [CV 2/5] END ......................max_depth=67;, score=0.872 total time= 2.3s [CV 3/5] END ......................max_depth=67;, score=0.877 total time= 2.2s [CV 4/5] END ......................max_depth=67;, score=0.880 total time= 2.0s [CV 5/5] END ......................max_depth=67;, score=0.884 total time= 2.1s [CV 1/5] END ......................max_depth=68;, score=0.879 total time= 2.3s [CV 2/5] END ......................max_depth=68;, score=0.874 total time= 3.1s [CV 3/5] END ......................max_depth=68;, score=0.878 total time= 2.6s [CV 4/5] END ......................max_depth=68;, score=0.879 total time= 2.4s [CV 5/5] END ......................max_depth=68;, score=0.884 total time= 2.2s [CV 1/5] END ......................max_depth=69;, score=0.880 total time= 2.5s [CV 2/5] END ......................max_depth=69;, score=0.872 total time= 2.1s [CV 3/5] END ......................max_depth=69;, score=0.877 total time= 2.4s [CV 4/5] END ......................max_depth=69;, score=0.880 total time= 2.4s [CV 5/5] END ......................max_depth=69;, score=0.884 total time= 2.2s [CV 1/5] END ......................max_depth=70;, score=0.879 total time= 1.9s [CV 2/5] END ......................max_depth=70;, score=0.870 total time= 2.0s [CV 3/5] END ......................max_depth=70;, score=0.882 total time= 2.4s [CV 4/5] END ......................max_depth=70;, score=0.878 total time= 2.1s [CV 5/5] END ......................max_depth=70;, score=0.885 total time= 2.4s [CV 1/5] END ......................max_depth=71;, score=0.881 total time= 2.7s [CV 2/5] END ......................max_depth=71;, score=0.875 total time= 2.5s [CV 3/5] END ......................max_depth=71;, score=0.878 total time= 2.8s [CV 4/5] END ......................max_depth=71;, score=0.879 total time= 2.4s [CV 5/5] END ......................max_depth=71;, score=0.883 total time= 2.4s [CV 1/5] END ......................max_depth=72;, score=0.879 total time= 2.0s [CV 2/5] END ......................max_depth=72;, score=0.876 total time= 2.7s [CV 3/5] END ......................max_depth=72;, score=0.878 total time= 2.4s [CV 4/5] END ......................max_depth=72;, score=0.877 total time= 2.2s [CV 5/5] END ......................max_depth=72;, score=0.885 total time= 2.0s [CV 1/5] END ......................max_depth=73;, score=0.880 total time= 2.5s [CV 2/5] END ......................max_depth=73;, score=0.872 total time= 2.6s [CV 3/5] END ......................max_depth=73;, score=0.878 total time= 2.9s [CV 4/5] END ......................max_depth=73;, score=0.882 total time= 2.6s [CV 5/5] END ......................max_depth=73;, score=0.886 total time= 2.5s [CV 1/5] END ......................max_depth=74;, score=0.878 total time= 2.5s [CV 2/5] END ......................max_depth=74;, score=0.871 total time= 2.7s [CV 3/5] END ......................max_depth=74;, score=0.878 total time= 2.1s [CV 4/5] END ......................max_depth=74;, score=0.880 total time= 2.8s [CV 5/5] END ......................max_depth=74;, score=0.883 total time= 2.7s [CV 1/5] END ......................max_depth=75;, score=0.880 total time= 2.2s [CV 2/5] END ......................max_depth=75;, score=0.875 total time= 2.3s [CV 3/5] END ......................max_depth=75;, score=0.880 total time= 2.5s [CV 4/5] END ......................max_depth=75;, score=0.881 total time= 2.8s [CV 5/5] END ......................max_depth=75;, score=0.886 total time= 3.1s [CV 1/5] END ......................max_depth=76;, score=0.878 total time= 2.7s [CV 2/5] END ......................max_depth=76;, score=0.877 total time= 2.7s [CV 3/5] END ......................max_depth=76;, score=0.878 total time= 2.6s [CV 4/5] END ......................max_depth=76;, score=0.877 total time= 2.4s [CV 5/5] END ......................max_depth=76;, score=0.887 total time= 2.3s [CV 1/5] END ......................max_depth=77;, score=0.882 total time= 2.4s [CV 2/5] END ......................max_depth=77;, score=0.873 total time= 2.7s [CV 3/5] END ......................max_depth=77;, score=0.877 total time= 2.3s [CV 4/5] END ......................max_depth=77;, score=0.882 total time= 2.1s [CV 5/5] END ......................max_depth=77;, score=0.885 total time= 2.1s [CV 1/5] END ......................max_depth=78;, score=0.882 total time= 2.3s [CV 2/5] END ......................max_depth=78;, score=0.870 total time= 2.1s [CV 3/5] END ......................max_depth=78;, score=0.879 total time= 2.5s [CV 4/5] END ......................max_depth=78;, score=0.876 total time= 2.1s [CV 5/5] END ......................max_depth=78;, score=0.885 total time= 2.3s [CV 1/5] END ......................max_depth=79;, score=0.878 total time= 2.6s [CV 2/5] END ......................max_depth=79;, score=0.874 total time= 2.4s [CV 3/5] END ......................max_depth=79;, score=0.878 total time= 2.6s [CV 4/5] END ......................max_depth=79;, score=0.880 total time= 2.6s [CV 5/5] END ......................max_depth=79;, score=0.882 total time= 2.5s [CV 1/5] END ......................max_depth=80;, score=0.879 total time= 2.1s [CV 2/5] END ......................max_depth=80;, score=0.873 total time= 2.1s [CV 3/5] END ......................max_depth=80;, score=0.877 total time= 2.2s [CV 4/5] END ......................max_depth=80;, score=0.879 total time= 2.5s [CV 5/5] END ......................max_depth=80;, score=0.884 total time= 2.5s [CV 1/5] END ......................max_depth=81;, score=0.877 total time= 3.0s [CV 2/5] END ......................max_depth=81;, score=0.870 total time= 2.2s [CV 3/5] END ......................max_depth=81;, score=0.882 total time= 2.7s [CV 4/5] END ......................max_depth=81;, score=0.882 total time= 2.9s [CV 5/5] END ......................max_depth=81;, score=0.884 total time= 2.7s [CV 1/5] END ......................max_depth=82;, score=0.880 total time= 2.3s [CV 2/5] END ......................max_depth=82;, score=0.869 total time= 3.0s [CV 3/5] END ......................max_depth=82;, score=0.880 total time= 2.9s [CV 4/5] END ......................max_depth=82;, score=0.881 total time= 2.6s [CV 5/5] END ......................max_depth=82;, score=0.884 total time= 2.4s [CV 1/5] END ......................max_depth=83;, score=0.879 total time= 2.5s [CV 2/5] END ......................max_depth=83;, score=0.873 total time= 2.3s [CV 3/5] END ......................max_depth=83;, score=0.878 total time= 3.3s [CV 4/5] END ......................max_depth=83;, score=0.880 total time= 2.6s [CV 5/5] END ......................max_depth=83;, score=0.884 total time= 2.6s [CV 1/5] END ......................max_depth=84;, score=0.878 total time= 2.8s [CV 2/5] END ......................max_depth=84;, score=0.871 total time= 2.6s [CV 3/5] END ......................max_depth=84;, score=0.879 total time= 2.7s [CV 4/5] END ......................max_depth=84;, score=0.876 total time= 2.4s [CV 5/5] END ......................max_depth=84;, score=0.882 total time= 2.9s [CV 1/5] END ......................max_depth=85;, score=0.879 total time= 3.0s [CV 2/5] END ......................max_depth=85;, score=0.869 total time= 2.5s [CV 3/5] END ......................max_depth=85;, score=0.881 total time= 2.2s [CV 4/5] END ......................max_depth=85;, score=0.878 total time= 2.9s [CV 5/5] END ......................max_depth=85;, score=0.884 total time= 2.7s [CV 1/5] END ......................max_depth=86;, score=0.881 total time= 2.9s [CV 2/5] END ......................max_depth=86;, score=0.872 total time= 2.8s [CV 3/5] END ......................max_depth=86;, score=0.877 total time= 2.6s [CV 4/5] END ......................max_depth=86;, score=0.881 total time= 2.7s [CV 5/5] END ......................max_depth=86;, score=0.886 total time= 2.9s [CV 1/5] END ......................max_depth=87;, score=0.880 total time= 2.3s [CV 2/5] END ......................max_depth=87;, score=0.875 total time= 2.1s [CV 3/5] END ......................max_depth=87;, score=0.876 total time= 2.2s [CV 4/5] END ......................max_depth=87;, score=0.881 total time= 2.1s [CV 5/5] END ......................max_depth=87;, score=0.885 total time= 2.2s [CV 1/5] END ......................max_depth=88;, score=0.880 total time= 2.3s [CV 2/5] END ......................max_depth=88;, score=0.871 total time= 2.4s [CV 3/5] END ......................max_depth=88;, score=0.879 total time= 2.2s [CV 4/5] END ......................max_depth=88;, score=0.880 total time= 2.1s [CV 5/5] END ......................max_depth=88;, score=0.885 total time= 2.3s [CV 1/5] END ......................max_depth=89;, score=0.881 total time= 2.3s [CV 2/5] END ......................max_depth=89;, score=0.873 total time= 2.3s [CV 3/5] END ......................max_depth=89;, score=0.877 total time= 2.6s [CV 4/5] END ......................max_depth=89;, score=0.884 total time= 2.7s [CV 5/5] END ......................max_depth=89;, score=0.885 total time= 3.0s [CV 1/5] END ......................max_depth=90;, score=0.880 total time= 2.6s [CV 2/5] END ......................max_depth=90;, score=0.870 total time= 2.3s [CV 3/5] END ......................max_depth=90;, score=0.875 total time= 2.9s [CV 4/5] END ......................max_depth=90;, score=0.882 total time= 3.0s [CV 5/5] END ......................max_depth=90;, score=0.884 total time= 2.7s [CV 1/5] END ......................max_depth=91;, score=0.883 total time= 2.8s [CV 2/5] END ......................max_depth=91;, score=0.870 total time= 2.6s [CV 3/5] END ......................max_depth=91;, score=0.879 total time= 2.8s [CV 4/5] END ......................max_depth=91;, score=0.879 total time= 3.1s [CV 5/5] END ......................max_depth=91;, score=0.884 total time= 2.7s [CV 1/5] END ......................max_depth=92;, score=0.878 total time= 2.5s [CV 2/5] END ......................max_depth=92;, score=0.875 total time= 2.5s [CV 3/5] END ......................max_depth=92;, score=0.877 total time= 2.6s [CV 4/5] END ......................max_depth=92;, score=0.878 total time= 2.2s [CV 5/5] END ......................max_depth=92;, score=0.888 total time= 3.0s [CV 1/5] END ......................max_depth=93;, score=0.879 total time= 3.1s [CV 2/5] END ......................max_depth=93;, score=0.873 total time= 3.1s [CV 3/5] END ......................max_depth=93;, score=0.877 total time= 2.7s [CV 4/5] END ......................max_depth=93;, score=0.881 total time= 2.6s [CV 5/5] END ......................max_depth=93;, score=0.885 total time= 2.8s [CV 1/5] END ......................max_depth=94;, score=0.878 total time= 3.1s [CV 2/5] END ......................max_depth=94;, score=0.869 total time= 2.8s [CV 3/5] END ......................max_depth=94;, score=0.877 total time= 2.5s [CV 4/5] END ......................max_depth=94;, score=0.877 total time= 2.6s [CV 5/5] END ......................max_depth=94;, score=0.884 total time= 2.8s [CV 1/5] END ......................max_depth=95;, score=0.880 total time= 2.9s [CV 2/5] END ......................max_depth=95;, score=0.871 total time= 2.9s [CV 3/5] END ......................max_depth=95;, score=0.878 total time= 2.5s [CV 4/5] END ......................max_depth=95;, score=0.879 total time= 2.2s [CV 5/5] END ......................max_depth=95;, score=0.886 total time= 2.4s [CV 1/5] END ......................max_depth=96;, score=0.880 total time= 2.8s [CV 2/5] END ......................max_depth=96;, score=0.871 total time= 2.9s [CV 3/5] END ......................max_depth=96;, score=0.878 total time= 2.8s [CV 4/5] END ......................max_depth=96;, score=0.880 total time= 2.6s [CV 5/5] END ......................max_depth=96;, score=0.886 total time= 2.4s [CV 1/5] END ......................max_depth=97;, score=0.879 total time= 2.4s [CV 2/5] END ......................max_depth=97;, score=0.870 total time= 2.8s [CV 3/5] END ......................max_depth=97;, score=0.877 total time= 3.0s [CV 4/5] END ......................max_depth=97;, score=0.879 total time= 2.5s [CV 5/5] END ......................max_depth=97;, score=0.884 total time= 3.1s [CV 1/5] END ......................max_depth=98;, score=0.877 total time= 2.9s [CV 2/5] END ......................max_depth=98;, score=0.872 total time= 2.9s [CV 3/5] END ......................max_depth=98;, score=0.878 total time= 3.2s [CV 4/5] END ......................max_depth=98;, score=0.876 total time= 2.6s [CV 5/5] END ......................max_depth=98;, score=0.884 total time= 2.9s [CV 1/5] END ......................max_depth=99;, score=0.881 total time= 2.9s [CV 2/5] END ......................max_depth=99;, score=0.869 total time= 2.6s [CV 3/5] END ......................max_depth=99;, score=0.875 total time= 2.7s [CV 4/5] END ......................max_depth=99;, score=0.875 total time= 3.4s [CV 5/5] END ......................max_depth=99;, score=0.887 total time= 2.7s
GridSearchCV(estimator=DecisionTreeClassifier(random_state=0), param_grid={'max_depth': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, ...]}, verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=DecisionTreeClassifier(random_state=0), param_grid={'max_depth': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, ...]}, verbose=3)
DecisionTreeClassifier(random_state=0)
DecisionTreeClassifier(random_state=0)
[CV 3/5] END ......................max_depth=83;, score=0.878 total time= 4.0s [CV 4/5] END ......................max_depth=83;, score=0.880 total time= 4.1s [CV 5/5] END ......................max_depth=83;, score=0.884 total time= 4.0s [CV 1/5] END ......................max_depth=84;, score=0.878 total time= 4.2s [CV 2/5] END ......................max_depth=84;, score=0.871 total time= 5.3s [CV 3/5] END ......................max_depth=84;, score=0.879 total time= 4.3s [CV 4/5] END ......................max_depth=84;, score=0.876 total time= 4.8s [CV 5/5] END ......................max_depth=84;, score=0.882 total time= 4.9s [CV 1/5] END ......................max_depth=85;, score=0.879 total time= 3.8s [CV 2/5] END ......................max_depth=85;, score=0.869 total time= 5.4s [CV 3/5] END ......................max_depth=85;, score=0.881 total time= 4.2s [CV 4/5] END ......................max_depth=85;, score=0.878 total time= 3.6s [CV 5/5] END ......................max_depth=85;, score=0.884 total time= 3.4s [CV 1/5] END ......................max_depth=86;, score=0.881 total time= 3.6s [CV 2/5] END ......................max_depth=86;, score=0.872 total time= 3.4s [CV 3/5] END ......................max_depth=86;, score=0.877 total time= 4.1s [CV 4/5] END ......................max_depth=86;, score=0.881 total time= 3.4s [CV 5/5] END ......................max_depth=86;, score=0.886 total time= 3.6s [CV 1/5] END ......................max_depth=87;, score=0.880 total time= 3.8s [CV 2/5] END ......................max_depth=87;, score=0.875 total time= 4.1s [CV 3/5] END ......................max_depth=87;, score=0.876 total time= 3.6s [CV 4/5] END ......................max_depth=87;, score=0.881 total time= 3.4s [CV 5/5] END ......................max_depth=87;, score=0.885 total time= 4.0s [CV 1/5] END ......................max_depth=88;, score=0.880 total time= 4.2s [CV 2/5] END ......................max_depth=88;, score=0.871 total time= 3.7s [CV 3/5] END ......................max_depth=88;, score=0.879 total time= 3.7s [CV 4/5] END ......................max_depth=88;, score=0.880 total time= 3.5s [CV 5/5] END ......................max_depth=88;, score=0.885 total time= 4.4s [CV 1/5] END ......................max_depth=89;, score=0.881 total time= 3.9s [CV 2/5] END ......................max_depth=89;, score=0.873 total time= 3.6s [CV 3/5] END ......................max_depth=89;, score=0.877 total time= 3.8s [CV 4/5] END ......................max_depth=89;, score=0.884 total time= 3.9s [CV 5/5] END ......................max_depth=89;, score=0.885 total time= 4.2s [CV 1/5] END ......................max_depth=90;, score=0.880 total time= 3.7s [CV 2/5] END ......................max_depth=90;, score=0.870 total time= 3.5s [CV 3/5] END ......................max_depth=90;, score=0.875 total time= 3.6s [CV 4/5] END ......................max_depth=90;, score=0.882 total time= 3.4s [CV 5/5] END ......................max_depth=90;, score=0.884 total time= 3.5s [CV 1/5] END ......................max_depth=91;, score=0.883 total time= 3.7s [CV 2/5] END ......................max_depth=91;, score=0.870 total time= 3.3s [CV 3/5] END ......................max_depth=91;, score=0.879 total time= 4.3s [CV 4/5] END ......................max_depth=91;, score=0.879 total time= 3.8s [CV 5/5] END ......................max_depth=91;, score=0.884 total time= 3.7s [CV 1/5] END ......................max_depth=92;, score=0.878 total time= 3.7s [CV 2/5] END ......................max_depth=92;, score=0.875 total time= 4.4s [CV 3/5] END ......................max_depth=92;, score=0.877 total time= 5.1s [CV 4/5] END ......................max_depth=92;, score=0.878 total time= 4.2s [CV 5/5] END ......................max_depth=92;, score=0.888 total time= 3.7s [CV 1/5] END ......................max_depth=93;, score=0.879 total time= 3.2s [CV 2/5] END ......................max_depth=93;, score=0.873 total time= 4.3s [CV 3/5] END ......................max_depth=93;, score=0.877 total time= 3.5s [CV 4/5] END ......................max_depth=93;, score=0.881 total time= 3.8s [CV 5/5] END ......................max_depth=93;, score=0.885 total time= 3.6s [CV 1/5] END ......................max_depth=94;, score=0.878 total time= 4.0s [CV 2/5] END ......................max_depth=94;, score=0.869 total time= 3.7s [CV 3/5] END ......................max_depth=94;, score=0.877 total time= 3.7s [CV 4/5] END ......................max_depth=94;, score=0.877 total time= 4.0s [CV 5/5] END ......................max_depth=94;, score=0.884 total time= 3.8s [CV 1/5] END ......................max_depth=95;, score=0.880 total time= 3.8s [CV 2/5] END ......................max_depth=95;, score=0.871 total time= 3.7s [CV 3/5] END ......................max_depth=95;, score=0.878 total time= 3.9s [CV 4/5] END ......................max_depth=95;, score=0.879 total time= 3.8s [CV 5/5] END ......................max_depth=95;, score=0.886 total time= 4.1s [CV 1/5] END ......................max_depth=96;, score=0.880 total time= 4.1s [CV 2/5] END ......................max_depth=96;, score=0.871 total time= 3.5s [CV 3/5] END ......................max_depth=96;, score=0.878 total time= 3.5s [CV 4/5] END ......................max_depth=96;, score=0.880 total time= 3.6s [CV 5/5] END ......................max_depth=96;, score=0.886 total time= 4.3s [CV 1/5] END ......................max_depth=97;, score=0.879 total time= 4.5s [CV 2/5] END ......................max_depth=97;, score=0.870 total time= 3.6s [CV 3/5] END ......................max_depth=97;, score=0.877 total time= 4.1s [CV 4/5] END ......................max_depth=97;, score=0.879 total time= 3.7s [CV 5/5] END ......................max_depth=97;, score=0.884 total time= 3.9s [CV 1/5] END ......................max_depth=98;, score=0.877 total time= 3.3s [CV 2/5] END ......................max_depth=98;, score=0.872 total time= 3.9s [CV 3/5] END ......................max_depth=98;, score=0.878 total time= 3.7s [CV 4/5] END ......................max_depth=98;, score=0.876 total time= 4.0s [CV 5/5] END ......................max_depth=98;, score=0.884 total time= 4.3s [CV 1/5] END ......................max_depth=99;, score=0.881 total time= 4.5s [CV 2/5] END ......................max_depth=99;, score=0.869 total time= 4.1s [CV 3/5] END ......................max_depth=99;, score=0.875 total time= 3.5s [CV 4/5] END ......................max_depth=99;, score=0.875 total time= 4.0s [CV 5/5] END ......................max_depth=99;, score=0.887 total time= 4.3s
GridSearchCV(estimator=DecisionTreeClassifier(random_state=0), param_grid={'max_depth': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, ...]}, verbose=3)
gs.best_estimator_
DecisionTreeClassifier(max_depth=65, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(max_depth=65, random_state=0)
min_samples_split¶
This parameter takes Integer or float value where minimum default value is 2. This parameter decides the least number of samples needed to split an internal node. Consider min_samples_split as the minimum number if int. If float, then min_samples_split is a fraction and ceil (min_samples_split * n_samples) is the minimum number of samples for each split. GridSearchCV is used to select the best value of this parameter, first integer values in the range of 2 to 10 and the then floating-point value in the range of 0.1 to 1 is tested. With GridSearchCV for integer value between 2 to 10, the best parameter value is 2 and for floating-point value the best value is found to be 0.1.
gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, random_state=0),
param_grid={'min_samples_split': list(range(2, 10))},
verbose=3)
gs.fit(X_train, y_train)
Fitting 5 folds for each of 8 candidates, totalling 40 fits [CV 1/5] END ...............min_samples_split=2;, score=0.884 total time= 2.3s [CV 2/5] END ...............min_samples_split=2;, score=0.875 total time= 2.7s [CV 3/5] END ...............min_samples_split=2;, score=0.878 total time= 2.5s [CV 4/5] END ...............min_samples_split=2;, score=0.880 total time= 2.5s [CV 5/5] END ...............min_samples_split=2;, score=0.887 total time= 2.8s [CV 1/5] END ...............min_samples_split=3;, score=0.882 total time= 2.4s [CV 2/5] END ...............min_samples_split=3;, score=0.872 total time= 2.2s [CV 3/5] END ...............min_samples_split=3;, score=0.880 total time= 2.0s [CV 4/5] END ...............min_samples_split=3;, score=0.880 total time= 2.3s [CV 5/5] END ...............min_samples_split=3;, score=0.882 total time= 2.7s [CV 1/5] END ...............min_samples_split=4;, score=0.878 total time= 2.6s [CV 2/5] END ...............min_samples_split=4;, score=0.876 total time= 2.2s [CV 3/5] END ...............min_samples_split=4;, score=0.875 total time= 2.1s [CV 4/5] END ...............min_samples_split=4;, score=0.882 total time= 2.3s [CV 5/5] END ...............min_samples_split=4;, score=0.885 total time= 2.4s [CV 1/5] END ...............min_samples_split=5;, score=0.877 total time= 2.0s [CV 2/5] END ...............min_samples_split=5;, score=0.872 total time= 2.5s [CV 3/5] END ...............min_samples_split=5;, score=0.879 total time= 2.3s [CV 4/5] END ...............min_samples_split=5;, score=0.881 total time= 1.9s [CV 5/5] END ...............min_samples_split=5;, score=0.886 total time= 1.9s [CV 1/5] END ...............min_samples_split=6;, score=0.882 total time= 2.3s [CV 2/5] END ...............min_samples_split=6;, score=0.874 total time= 1.8s [CV 3/5] END ...............min_samples_split=6;, score=0.878 total time= 1.8s [CV 4/5] END ...............min_samples_split=6;, score=0.880 total time= 2.2s [CV 5/5] END ...............min_samples_split=6;, score=0.881 total time= 2.0s [CV 1/5] END ...............min_samples_split=7;, score=0.882 total time= 2.2s [CV 2/5] END ...............min_samples_split=7;, score=0.873 total time= 2.0s [CV 3/5] END ...............min_samples_split=7;, score=0.878 total time= 1.8s [CV 4/5] END ...............min_samples_split=7;, score=0.882 total time= 1.8s [CV 5/5] END ...............min_samples_split=7;, score=0.885 total time= 1.8s [CV 1/5] END ...............min_samples_split=8;, score=0.881 total time= 1.8s [CV 2/5] END ...............min_samples_split=8;, score=0.875 total time= 2.1s [CV 3/5] END ...............min_samples_split=8;, score=0.875 total time= 1.8s [CV 4/5] END ...............min_samples_split=8;, score=0.877 total time= 1.8s [CV 5/5] END ...............min_samples_split=8;, score=0.885 total time= 1.8s [CV 1/5] END ...............min_samples_split=9;, score=0.884 total time= 2.0s [CV 2/5] END ...............min_samples_split=9;, score=0.874 total time= 1.7s [CV 3/5] END ...............min_samples_split=9;, score=0.880 total time= 1.8s [CV 4/5] END ...............min_samples_split=9;, score=0.882 total time= 1.8s [CV 5/5] END ...............min_samples_split=9;, score=0.883 total time= 2.1s
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, random_state=0), param_grid={'min_samples_split': [2, 3, 4, 5, 6, 7, 8, 9]}, verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, random_state=0), param_grid={'min_samples_split': [2, 3, 4, 5, 6, 7, 8, 9]}, verbose=3)
DecisionTreeClassifier(max_depth=65, random_state=0)
DecisionTreeClassifier(max_depth=65, random_state=0)
gs.best_estimator_
DecisionTreeClassifier(max_depth=65, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(max_depth=65, random_state=0)
gs.best_params_
{'min_samples_split': 2}
gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, random_state=0),
param_grid={'min_samples_split': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]},
verbose=3)
gs.fit(X_train, y_train)
Fitting 5 folds for each of 9 candidates, totalling 45 fits [CV 1/5] END .............min_samples_split=0.1;, score=0.890 total time= 1.1s [CV 2/5] END .............min_samples_split=0.1;, score=0.883 total time= 1.0s [CV 3/5] END .............min_samples_split=0.1;, score=0.887 total time= 1.0s [CV 4/5] END .............min_samples_split=0.1;, score=0.894 total time= 1.1s [CV 5/5] END .............min_samples_split=0.1;, score=0.892 total time= 1.0s [CV 1/5] END .............min_samples_split=0.2;, score=0.883 total time= 0.9s [CV 2/5] END .............min_samples_split=0.2;, score=0.878 total time= 0.9s [CV 3/5] END .............min_samples_split=0.2;, score=0.882 total time= 0.9s [CV 4/5] END .............min_samples_split=0.2;, score=0.887 total time= 0.9s [CV 5/5] END .............min_samples_split=0.2;, score=0.888 total time= 0.9s [CV 1/5] END .............min_samples_split=0.3;, score=0.830 total time= 0.7s [CV 2/5] END .............min_samples_split=0.3;, score=0.828 total time= 0.7s [CV 3/5] END .............min_samples_split=0.3;, score=0.841 total time= 0.9s [CV 4/5] END .............min_samples_split=0.3;, score=0.840 total time= 0.5s [CV 5/5] END .............min_samples_split=0.3;, score=0.828 total time= 0.6s [CV 1/5] END .............min_samples_split=0.4;, score=0.778 total time= 0.4s [CV 2/5] END .............min_samples_split=0.4;, score=0.773 total time= 0.4s [CV 3/5] END .............min_samples_split=0.4;, score=0.775 total time= 0.4s [CV 4/5] END .............min_samples_split=0.4;, score=0.773 total time= 0.4s [CV 5/5] END .............min_samples_split=0.4;, score=0.774 total time= 0.4s [CV 1/5] END .............min_samples_split=0.5;, score=0.775 total time= 0.4s [CV 2/5] END .............min_samples_split=0.5;, score=0.775 total time= 0.4s [CV 3/5] END .............min_samples_split=0.5;, score=0.775 total time= 0.4s [CV 4/5] END .............min_samples_split=0.5;, score=0.775 total time= 0.4s [CV 5/5] END .............min_samples_split=0.5;, score=0.775 total time= 0.4s [CV 1/5] END .............min_samples_split=0.6;, score=0.775 total time= 0.3s [CV 2/5] END .............min_samples_split=0.6;, score=0.775 total time= 0.4s [CV 3/5] END .............min_samples_split=0.6;, score=0.775 total time= 0.4s [CV 4/5] END .............min_samples_split=0.6;, score=0.775 total time= 0.3s [CV 5/5] END .............min_samples_split=0.6;, score=0.775 total time= 0.4s [CV 1/5] END .............min_samples_split=0.7;, score=0.775 total time= 0.3s [CV 2/5] END .............min_samples_split=0.7;, score=0.775 total time= 0.3s [CV 3/5] END .............min_samples_split=0.7;, score=0.775 total time= 0.3s [CV 4/5] END .............min_samples_split=0.7;, score=0.775 total time= 0.3s [CV 5/5] END .............min_samples_split=0.7;, score=0.775 total time= 0.3s [CV 1/5] END .............min_samples_split=0.8;, score=0.775 total time= 0.4s [CV 2/5] END .............min_samples_split=0.8;, score=0.775 total time= 0.3s [CV 3/5] END .............min_samples_split=0.8;, score=0.775 total time= 0.3s [CV 4/5] END .............min_samples_split=0.8;, score=0.775 total time= 0.3s [CV 5/5] END .............min_samples_split=0.8;, score=0.775 total time= 0.3s [CV 1/5] END .............min_samples_split=0.9;, score=0.775 total time= 0.4s [CV 2/5] END .............min_samples_split=0.9;, score=0.775 total time= 0.4s [CV 3/5] END .............min_samples_split=0.9;, score=0.775 total time= 0.4s [CV 4/5] END .............min_samples_split=0.9;, score=0.775 total time= 0.4s [CV 5/5] END .............min_samples_split=0.9;, score=0.775 total time= 0.6s
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, random_state=0), param_grid={'min_samples_split': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}, verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, random_state=0), param_grid={'min_samples_split': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}, verbose=3)
DecisionTreeClassifier(max_depth=65, random_state=0)
DecisionTreeClassifier(max_depth=65, random_state=0)
gs.best_estimator_
DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0)
gs.best_params_
{'min_samples_split': 0.1}
min_samples_leaf¶
This parameter also takes both integer and floating-point value where the default value is 1. The value of this param decides the bare minimum of samples needed at a leaf node. Only split points that leave at least min_samples_leaf training samples in both the left and right branches will be taken into consideration. If int, then take min_samples_leaf as the lowest value. The minimum number of samples for each node is ceil (min_samples_leaf * n_samples), and the fraction min_samples_leaf is a fraction if the data type is float. GridSearchCV is used to select the best value of this parameter, first integer values in the range of 1 to 10 and the then floating-point value in the range of 0.1 to 1 is tested. With GridSearchCV for integer value between 1 to 10, the best parameter value is 4 and for floating-point value the best value is found to be 0.1
gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65,min_samples_split=0.1, random_state=0),
param_grid={'min_samples_leaf': list(range(1, 10))},
verbose=3)
gs.fit(X_train, y_train)
Fitting 5 folds for each of 9 candidates, totalling 45 fits [CV 1/5] END ................min_samples_leaf=1;, score=0.890 total time= 1.3s [CV 2/5] END ................min_samples_leaf=1;, score=0.883 total time= 1.2s [CV 3/5] END ................min_samples_leaf=1;, score=0.887 total time= 1.1s [CV 4/5] END ................min_samples_leaf=1;, score=0.894 total time= 1.1s [CV 5/5] END ................min_samples_leaf=1;, score=0.892 total time= 1.1s [CV 1/5] END ................min_samples_leaf=2;, score=0.888 total time= 1.0s [CV 2/5] END ................min_samples_leaf=2;, score=0.884 total time= 1.0s [CV 3/5] END ................min_samples_leaf=2;, score=0.883 total time= 1.0s [CV 4/5] END ................min_samples_leaf=2;, score=0.894 total time= 1.0s [CV 5/5] END ................min_samples_leaf=2;, score=0.891 total time= 1.0s [CV 1/5] END ................min_samples_leaf=3;, score=0.890 total time= 1.1s [CV 2/5] END ................min_samples_leaf=3;, score=0.884 total time= 1.0s [CV 3/5] END ................min_samples_leaf=3;, score=0.888 total time= 1.4s [CV 4/5] END ................min_samples_leaf=3;, score=0.897 total time= 1.0s [CV 5/5] END ................min_samples_leaf=3;, score=0.892 total time= 1.0s [CV 1/5] END ................min_samples_leaf=4;, score=0.890 total time= 1.0s [CV 2/5] END ................min_samples_leaf=4;, score=0.884 total time= 0.9s [CV 3/5] END ................min_samples_leaf=4;, score=0.889 total time= 1.0s [CV 4/5] END ................min_samples_leaf=4;, score=0.897 total time= 0.9s [CV 5/5] END ................min_samples_leaf=4;, score=0.892 total time= 1.0s [CV 1/5] END ................min_samples_leaf=5;, score=0.890 total time= 1.1s [CV 2/5] END ................min_samples_leaf=5;, score=0.884 total time= 0.9s [CV 3/5] END ................min_samples_leaf=5;, score=0.888 total time= 0.9s [CV 4/5] END ................min_samples_leaf=5;, score=0.896 total time= 1.0s [CV 5/5] END ................min_samples_leaf=5;, score=0.892 total time= 1.0s [CV 1/5] END ................min_samples_leaf=6;, score=0.889 total time= 1.0s [CV 2/5] END ................min_samples_leaf=6;, score=0.884 total time= 1.2s [CV 3/5] END ................min_samples_leaf=6;, score=0.887 total time= 1.0s [CV 4/5] END ................min_samples_leaf=6;, score=0.896 total time= 0.9s [CV 5/5] END ................min_samples_leaf=6;, score=0.892 total time= 0.9s [CV 1/5] END ................min_samples_leaf=7;, score=0.888 total time= 0.9s [CV 2/5] END ................min_samples_leaf=7;, score=0.883 total time= 0.8s [CV 3/5] END ................min_samples_leaf=7;, score=0.887 total time= 0.9s [CV 4/5] END ................min_samples_leaf=7;, score=0.896 total time= 0.9s [CV 5/5] END ................min_samples_leaf=7;, score=0.892 total time= 0.9s [CV 1/5] END ................min_samples_leaf=8;, score=0.888 total time= 0.9s [CV 2/5] END ................min_samples_leaf=8;, score=0.884 total time= 0.8s [CV 3/5] END ................min_samples_leaf=8;, score=0.886 total time= 0.9s [CV 4/5] END ................min_samples_leaf=8;, score=0.894 total time= 0.8s [CV 5/5] END ................min_samples_leaf=8;, score=0.891 total time= 0.9s [CV 1/5] END ................min_samples_leaf=9;, score=0.888 total time= 0.8s [CV 2/5] END ................min_samples_leaf=9;, score=0.882 total time= 0.8s [CV 3/5] END ................min_samples_leaf=9;, score=0.885 total time= 1.1s [CV 4/5] END ................min_samples_leaf=9;, score=0.894 total time= 0.8s [CV 5/5] END ................min_samples_leaf=9;, score=0.891 total time= 0.8s
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0), param_grid={'min_samples_leaf': [1, 2, 3, 4, 5, 6, 7, 8, 9]}, verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0), param_grid={'min_samples_leaf': [1, 2, 3, 4, 5, 6, 7, 8, 9]}, verbose=3)
DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0)
DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0)
gs.best_params_
{'min_samples_leaf': 4}
gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65,min_samples_split=0.1, random_state=0),
param_grid={'min_samples_leaf': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]},
verbose=3)
gs.fit(X_train, y_train)
Fitting 5 folds for each of 9 candidates, totalling 45 fits [CV 1/5] END ..............min_samples_leaf=0.1;, score=0.775 total time= 0.5s [CV 2/5] END ..............min_samples_leaf=0.1;, score=0.775 total time= 0.4s [CV 3/5] END ..............min_samples_leaf=0.1;, score=0.775 total time= 0.4s [CV 4/5] END ..............min_samples_leaf=0.1;, score=0.775 total time= 0.4s [CV 5/5] END ..............min_samples_leaf=0.1;, score=0.775 total time= 0.4s [CV 1/5] END ..............min_samples_leaf=0.2;, score=0.775 total time= 0.4s [CV 2/5] END ..............min_samples_leaf=0.2;, score=0.775 total time= 0.4s [CV 3/5] END ..............min_samples_leaf=0.2;, score=0.775 total time= 0.4s [CV 4/5] END ..............min_samples_leaf=0.2;, score=0.775 total time= 0.4s [CV 5/5] END ..............min_samples_leaf=0.2;, score=0.775 total time= 0.4s [CV 1/5] END ..............min_samples_leaf=0.3;, score=0.775 total time= 0.3s [CV 2/5] END ..............min_samples_leaf=0.3;, score=0.775 total time= 0.4s [CV 3/5] END ..............min_samples_leaf=0.3;, score=0.775 total time= 0.4s [CV 4/5] END ..............min_samples_leaf=0.3;, score=0.775 total time= 0.4s [CV 5/5] END ..............min_samples_leaf=0.3;, score=0.775 total time= 0.3s [CV 1/5] END ..............min_samples_leaf=0.4;, score=0.775 total time= 0.3s [CV 2/5] END ..............min_samples_leaf=0.4;, score=0.775 total time= 0.4s [CV 3/5] END ..............min_samples_leaf=0.4;, score=0.775 total time= 0.3s [CV 4/5] END ..............min_samples_leaf=0.4;, score=0.775 total time= 0.3s [CV 5/5] END ..............min_samples_leaf=0.4;, score=0.775 total time= 0.3s [CV 1/5] END ..............min_samples_leaf=0.5;, score=0.775 total time= 0.4s [CV 2/5] END ..............min_samples_leaf=0.5;, score=0.775 total time= 0.3s [CV 3/5] END ..............min_samples_leaf=0.5;, score=0.775 total time= 0.3s [CV 4/5] END ..............min_samples_leaf=0.5;, score=0.775 total time= 0.0s [CV 5/5] END ..............min_samples_leaf=0.5;, score=0.775 total time= 0.0s [CV 1/5] END ..............min_samples_leaf=0.6;, score=0.775 total time= 0.0s [CV 2/5] END ..............min_samples_leaf=0.6;, score=0.775 total time= 0.0s [CV 3/5] END ..............min_samples_leaf=0.6;, score=0.775 total time= 0.0s [CV 4/5] END ..............min_samples_leaf=0.6;, score=0.775 total time= 0.0s [CV 5/5] END ..............min_samples_leaf=0.6;, score=0.775 total time= 0.0s [CV 1/5] END ..............min_samples_leaf=0.7;, score=0.775 total time= 0.0s [CV 2/5] END ..............min_samples_leaf=0.7;, score=0.775 total time= 0.0s [CV 3/5] END ..............min_samples_leaf=0.7;, score=0.775 total time= 0.0s [CV 4/5] END ..............min_samples_leaf=0.7;, score=0.775 total time= 0.0s [CV 5/5] END ..............min_samples_leaf=0.7;, score=0.775 total time= 0.0s [CV 1/5] END ..............min_samples_leaf=0.8;, score=0.775 total time= 0.0s [CV 2/5] END ..............min_samples_leaf=0.8;, score=0.775 total time= 0.0s [CV 3/5] END ..............min_samples_leaf=0.8;, score=0.775 total time= 0.0s [CV 4/5] END ..............min_samples_leaf=0.8;, score=0.775 total time= 0.0s [CV 5/5] END ..............min_samples_leaf=0.8;, score=0.775 total time= 0.0s [CV 1/5] END ..............min_samples_leaf=0.9;, score=0.775 total time= 0.0s [CV 2/5] END ..............min_samples_leaf=0.9;, score=0.775 total time= 0.0s [CV 3/5] END ..............min_samples_leaf=0.9;, score=0.775 total time= 0.0s [CV 4/5] END ..............min_samples_leaf=0.9;, score=0.775 total time= 0.0s [CV 5/5] END ..............min_samples_leaf=0.9;, score=0.775 total time= 0.0s
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0), param_grid={'min_samples_leaf': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}, verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0), param_grid={'min_samples_leaf': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}, verbose=3)
DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0)
DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0)
gs.best_params_
{'min_samples_leaf': 0.1}
max_features¶
This parameter values can be int, float or "auto," "sqrt," and "log2", where the default value is None. For integer value the number of features to take into account when choosing the best split is max_features while splitting.
In the event that floating point value is provided, max_features are a fraction and max (1, int (max_features * n_features_in_)) features are taken into account at each split. If "auto," max_features are equal to sqrt(n_features). Max_features = sqrt(n_features) if "sqrt" is true. Max_features = log2(n_features) if "log2". If None, max_features=n_features. Manual search is used for testing the values None, "auto," "sqrt," and "log2". GridSearchCV is used to select the best value of this parameter among integer values in the range of 1 to 10 and for floating-point value in the range of 0.1 to 1 is tested. With GridSearchCV for integer value between 1 to 10, the best parameter value is 9 and for floating-point value the best value is found to be 0.9.
gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, min_samples_split=0.1,
min_samples_leaf=4,random_state=0),
param_grid={'max_features': list(range(1, 10))},
verbose=3)
gs.fit(X_train, y_train)
Fitting 5 folds for each of 9 candidates, totalling 45 fits [CV 1/5] END ....................max_features=1;, score=0.775 total time= 0.0s [CV 2/5] END ....................max_features=1;, score=0.775 total time= 0.0s [CV 3/5] END ....................max_features=1;, score=0.775 total time= 0.0s [CV 4/5] END ....................max_features=1;, score=0.775 total time= 0.0s [CV 5/5] END ....................max_features=1;, score=0.775 total time= 0.0s [CV 1/5] END ....................max_features=2;, score=0.775 total time= 0.0s [CV 2/5] END ....................max_features=2;, score=0.775 total time= 0.0s [CV 3/5] END ....................max_features=2;, score=0.775 total time= 0.0s [CV 4/5] END ....................max_features=2;, score=0.775 total time= 0.0s [CV 5/5] END ....................max_features=2;, score=0.775 total time= 0.0s [CV 1/5] END ....................max_features=3;, score=0.775 total time= 0.0s [CV 2/5] END ....................max_features=3;, score=0.775 total time= 0.0s [CV 3/5] END ....................max_features=3;, score=0.775 total time= 0.0s [CV 4/5] END ....................max_features=3;, score=0.775 total time= 0.0s [CV 5/5] END ....................max_features=3;, score=0.775 total time= 0.0s [CV 1/5] END ....................max_features=4;, score=0.775 total time= 0.0s [CV 2/5] END ....................max_features=4;, score=0.775 total time= 0.0s [CV 3/5] END ....................max_features=4;, score=0.775 total time= 0.0s [CV 4/5] END ....................max_features=4;, score=0.775 total time= 0.0s [CV 5/5] END ....................max_features=4;, score=0.775 total time= 0.0s [CV 1/5] END ....................max_features=5;, score=0.775 total time= 0.0s [CV 2/5] END ....................max_features=5;, score=0.775 total time= 0.0s [CV 3/5] END ....................max_features=5;, score=0.775 total time= 0.0s [CV 4/5] END ....................max_features=5;, score=0.775 total time= 0.0s [CV 5/5] END ....................max_features=5;, score=0.775 total time= 0.0s [CV 1/5] END ....................max_features=6;, score=0.775 total time= 0.0s [CV 2/5] END ....................max_features=6;, score=0.775 total time= 0.0s [CV 3/5] END ....................max_features=6;, score=0.775 total time= 0.0s [CV 4/5] END ....................max_features=6;, score=0.775 total time= 0.0s [CV 5/5] END ....................max_features=6;, score=0.775 total time= 0.0s [CV 1/5] END ....................max_features=7;, score=0.775 total time= 0.0s [CV 2/5] END ....................max_features=7;, score=0.775 total time= 0.0s [CV 3/5] END ....................max_features=7;, score=0.775 total time= 0.0s [CV 4/5] END ....................max_features=7;, score=0.775 total time= 0.0s [CV 5/5] END ....................max_features=7;, score=0.775 total time= 0.0s [CV 1/5] END ....................max_features=8;, score=0.775 total time= 0.0s [CV 2/5] END ....................max_features=8;, score=0.775 total time= 0.0s [CV 3/5] END ....................max_features=8;, score=0.775 total time= 0.0s [CV 4/5] END ....................max_features=8;, score=0.775 total time= 0.0s [CV 5/5] END ....................max_features=8;, score=0.775 total time= 0.0s [CV 1/5] END ....................max_features=9;, score=0.775 total time= 0.0s [CV 2/5] END ....................max_features=9;, score=0.775 total time= 0.0s [CV 3/5] END ....................max_features=9;, score=0.775 total time= 0.0s [CV 4/5] END ....................max_features=9;, score=0.775 total time= 0.0s [CV 5/5] END ....................max_features=9;, score=0.775 total time= 0.0s
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0), param_grid={'max_features': [1, 2, 3, 4, 5, 6, 7, 8, 9]}, verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0), param_grid={'max_features': [1, 2, 3, 4, 5, 6, 7, 8, 9]}, verbose=3)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
gs.best_params_
{'max_features': 9}
gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, min_samples_split=0.1,
min_samples_leaf=4,random_state=0),
param_grid={'max_features': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]},
verbose=3)
gs.fit(X_train, y_train)
Fitting 5 folds for each of 9 candidates, totalling 45 fits [CV 1/5] END ..................max_features=0.1;, score=0.879 total time= 0.6s [CV 2/5] END ..................max_features=0.1;, score=0.871 total time= 0.8s [CV 3/5] END ..................max_features=0.1;, score=0.855 total time= 0.8s [CV 4/5] END ..................max_features=0.1;, score=0.864 total time= 0.6s [CV 5/5] END ..................max_features=0.1;, score=0.830 total time= 0.5s [CV 1/5] END ..................max_features=0.2;, score=0.839 total time= 0.8s [CV 2/5] END ..................max_features=0.2;, score=0.854 total time= 1.0s [CV 3/5] END ..................max_features=0.2;, score=0.868 total time= 0.9s [CV 4/5] END ..................max_features=0.2;, score=0.841 total time= 0.9s [CV 5/5] END ..................max_features=0.2;, score=0.839 total time= 0.8s [CV 1/5] END ..................max_features=0.3;, score=0.881 total time= 1.2s [CV 2/5] END ..................max_features=0.3;, score=0.849 total time= 0.9s [CV 3/5] END ..................max_features=0.3;, score=0.851 total time= 1.1s [CV 4/5] END ..................max_features=0.3;, score=0.869 total time= 1.3s [CV 5/5] END ..................max_features=0.3;, score=0.879 total time= 1.1s [CV 1/5] END ..................max_features=0.4;, score=0.882 total time= 1.3s [CV 2/5] END ..................max_features=0.4;, score=0.875 total time= 1.6s [CV 3/5] END ..................max_features=0.4;, score=0.887 total time= 1.3s [CV 4/5] END ..................max_features=0.4;, score=0.881 total time= 1.1s [CV 5/5] END ..................max_features=0.4;, score=0.876 total time= 1.1s [CV 1/5] END ..................max_features=0.5;, score=0.883 total time= 1.6s [CV 2/5] END ..................max_features=0.5;, score=0.872 total time= 1.3s [CV 3/5] END ..................max_features=0.5;, score=0.877 total time= 1.4s [CV 4/5] END ..................max_features=0.5;, score=0.896 total time= 1.2s [CV 5/5] END ..................max_features=0.5;, score=0.882 total time= 1.3s [CV 1/5] END ..................max_features=0.6;, score=0.882 total time= 1.5s [CV 2/5] END ..................max_features=0.6;, score=0.885 total time= 1.4s [CV 3/5] END ..................max_features=0.6;, score=0.888 total time= 1.6s [CV 4/5] END ..................max_features=0.6;, score=0.890 total time= 1.5s [CV 5/5] END ..................max_features=0.6;, score=0.884 total time= 1.4s [CV 1/5] END ..................max_features=0.7;, score=0.889 total time= 1.5s [CV 2/5] END ..................max_features=0.7;, score=0.884 total time= 1.4s [CV 3/5] END ..................max_features=0.7;, score=0.878 total time= 1.5s [CV 4/5] END ..................max_features=0.7;, score=0.897 total time= 1.4s [CV 5/5] END ..................max_features=0.7;, score=0.892 total time= 1.5s [CV 1/5] END ..................max_features=0.8;, score=0.890 total time= 1.6s [CV 2/5] END ..................max_features=0.8;, score=0.884 total time= 1.6s [CV 3/5] END ..................max_features=0.8;, score=0.879 total time= 2.0s [CV 4/5] END ..................max_features=0.8;, score=0.880 total time= 1.7s [CV 5/5] END ..................max_features=0.8;, score=0.882 total time= 1.6s [CV 1/5] END ..................max_features=0.9;, score=0.890 total time= 1.6s [CV 2/5] END ..................max_features=0.9;, score=0.884 total time= 1.6s [CV 3/5] END ..................max_features=0.9;, score=0.889 total time= 1.6s [CV 4/5] END ..................max_features=0.9;, score=0.897 total time= 1.6s [CV 5/5] END ..................max_features=0.9;, score=0.892 total time= 1.5s
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0), param_grid={'max_features': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}, verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0), param_grid={'max_features': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}, verbose=3)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
gs.best_params_
{'max_features': 0.9}
max_leaf_nodes¶
This parameter takes integer values with None as its default value. With max_leaf_nodes, create a tree using the best-first method. Relative impurity reduction is the definition of best nodes. The number of leaf nodes is limitless if None. GridSearchCV is used to find the best parameter value from the range of 1 to 100. And the best results were obtained using 63 as max_leaf_nodes.
gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, min_samples_split=0.1,
min_samples_leaf=4,max_features=None,random_state=0),
param_grid={'max_leaf_nodes': list(range(1, 100))},
verbose=3)
gs.fit(X_train, y_train)
Fitting 5 folds for each of 99 candidates, totalling 495 fits [CV 1/5] END ....................max_leaf_nodes=1;, score=nan total time= 0.0s [CV 2/5] END ....................max_leaf_nodes=1;, score=nan total time= 0.0s [CV 3/5] END ....................max_leaf_nodes=1;, score=nan total time= 0.0s [CV 4/5] END ....................max_leaf_nodes=1;, score=nan total time= 0.0s [CV 5/5] END ....................max_leaf_nodes=1;, score=nan total time= 0.0s [CV 1/5] END ..................max_leaf_nodes=2;, score=0.775 total time= 0.8s [CV 2/5] END ..................max_leaf_nodes=2;, score=0.775 total time= 0.8s [CV 3/5] END ..................max_leaf_nodes=2;, score=0.775 total time= 0.9s [CV 4/5] END ..................max_leaf_nodes=2;, score=0.775 total time= 1.0s [CV 5/5] END ..................max_leaf_nodes=2;, score=0.775 total time= 0.8s [CV 1/5] END ..................max_leaf_nodes=3;, score=0.775 total time= 1.0s [CV 2/5] END ..................max_leaf_nodes=3;, score=0.775 total time= 1.0s [CV 3/5] END ..................max_leaf_nodes=3;, score=0.775 total time= 1.0s [CV 4/5] END ..................max_leaf_nodes=3;, score=0.775 total time= 1.0s [CV 5/5] END ..................max_leaf_nodes=3;, score=0.775 total time= 1.0s [CV 1/5] END ..................max_leaf_nodes=4;, score=0.775 total time= 1.3s [CV 2/5] END ..................max_leaf_nodes=4;, score=0.775 total time= 1.2s [CV 3/5] END ..................max_leaf_nodes=4;, score=0.775 total time= 1.3s [CV 4/5] END ..................max_leaf_nodes=4;, score=0.775 total time= 1.3s [CV 5/5] END ..................max_leaf_nodes=4;, score=0.775 total time= 1.2s [CV 1/5] END ..................max_leaf_nodes=5;, score=0.775 total time= 1.7s [CV 2/5] END ..................max_leaf_nodes=5;, score=0.775 total time= 1.6s [CV 3/5] END ..................max_leaf_nodes=5;, score=0.775 total time= 1.4s [CV 4/5] END ..................max_leaf_nodes=5;, score=0.775 total time= 1.5s [CV 5/5] END ..................max_leaf_nodes=5;, score=0.775 total time= 1.4s [CV 1/5] END ..................max_leaf_nodes=6;, score=0.775 total time= 1.7s [CV 2/5] END ..................max_leaf_nodes=6;, score=0.775 total time= 1.6s [CV 3/5] END ..................max_leaf_nodes=6;, score=0.775 total time= 1.6s [CV 4/5] END ..................max_leaf_nodes=6;, score=0.775 total time= 1.6s [CV 5/5] END ..................max_leaf_nodes=6;, score=0.775 total time= 1.6s [CV 1/5] END ..................max_leaf_nodes=7;, score=0.778 total time= 2.2s [CV 2/5] END ..................max_leaf_nodes=7;, score=0.773 total time= 2.0s [CV 3/5] END ..................max_leaf_nodes=7;, score=0.775 total time= 2.3s [CV 4/5] END ..................max_leaf_nodes=7;, score=0.773 total time= 1.9s [CV 5/5] END ..................max_leaf_nodes=7;, score=0.774 total time= 1.7s [CV 1/5] END ..................max_leaf_nodes=8;, score=0.790 total time= 1.9s [CV 2/5] END ..................max_leaf_nodes=8;, score=0.786 total time= 1.9s [CV 3/5] END ..................max_leaf_nodes=8;, score=0.798 total time= 2.0s [CV 4/5] END ..................max_leaf_nodes=8;, score=0.789 total time= 2.1s [CV 5/5] END ..................max_leaf_nodes=8;, score=0.790 total time= 2.2s [CV 1/5] END ..................max_leaf_nodes=9;, score=0.800 total time= 2.2s [CV 2/5] END ..................max_leaf_nodes=9;, score=0.796 total time= 2.0s [CV 3/5] END ..................max_leaf_nodes=9;, score=0.808 total time= 2.0s [CV 4/5] END ..................max_leaf_nodes=9;, score=0.800 total time= 2.2s [CV 5/5] END ..................max_leaf_nodes=9;, score=0.794 total time= 2.0s [CV 1/5] END .................max_leaf_nodes=10;, score=0.809 total time= 2.6s [CV 2/5] END .................max_leaf_nodes=10;, score=0.807 total time= 2.2s [CV 3/5] END .................max_leaf_nodes=10;, score=0.821 total time= 2.2s [CV 4/5] END .................max_leaf_nodes=10;, score=0.812 total time= 2.2s [CV 5/5] END .................max_leaf_nodes=10;, score=0.805 total time= 2.2s [CV 1/5] END .................max_leaf_nodes=11;, score=0.818 total time= 2.3s [CV 2/5] END .................max_leaf_nodes=11;, score=0.817 total time= 2.6s [CV 3/5] END .................max_leaf_nodes=11;, score=0.829 total time= 2.4s [CV 4/5] END .................max_leaf_nodes=11;, score=0.822 total time= 2.4s [CV 5/5] END .................max_leaf_nodes=11;, score=0.816 total time= 2.3s [CV 1/5] END .................max_leaf_nodes=12;, score=0.823 total time= 2.5s [CV 2/5] END .................max_leaf_nodes=12;, score=0.823 total time= 2.6s [CV 3/5] END .................max_leaf_nodes=12;, score=0.835 total time= 2.8s [CV 4/5] END .................max_leaf_nodes=12;, score=0.830 total time= 3.2s [CV 5/5] END .................max_leaf_nodes=12;, score=0.822 total time= 2.6s [CV 1/5] END .................max_leaf_nodes=13;, score=0.831 total time= 2.7s [CV 2/5] END .................max_leaf_nodes=13;, score=0.830 total time= 2.7s [CV 3/5] END .................max_leaf_nodes=13;, score=0.842 total time= 2.8s [CV 4/5] END .................max_leaf_nodes=13;, score=0.839 total time= 3.3s [CV 5/5] END .................max_leaf_nodes=13;, score=0.828 total time= 2.7s [CV 1/5] END .................max_leaf_nodes=14;, score=0.838 total time= 2.8s [CV 2/5] END .................max_leaf_nodes=14;, score=0.837 total time= 2.8s [CV 3/5] END .................max_leaf_nodes=14;, score=0.847 total time= 2.8s [CV 4/5] END .................max_leaf_nodes=14;, score=0.846 total time= 3.2s [CV 5/5] END .................max_leaf_nodes=14;, score=0.840 total time= 2.9s [CV 1/5] END .................max_leaf_nodes=15;, score=0.842 total time= 3.0s [CV 2/5] END .................max_leaf_nodes=15;, score=0.840 total time= 2.9s [CV 3/5] END .................max_leaf_nodes=15;, score=0.852 total time= 3.0s [CV 4/5] END .................max_leaf_nodes=15;, score=0.852 total time= 3.1s [CV 5/5] END .................max_leaf_nodes=15;, score=0.845 total time= 3.3s [CV 1/5] END .................max_leaf_nodes=16;, score=0.847 total time= 3.1s [CV 2/5] END .................max_leaf_nodes=16;, score=0.847 total time= 3.0s [CV 3/5] END .................max_leaf_nodes=16;, score=0.856 total time= 3.1s [CV 4/5] END .................max_leaf_nodes=16;, score=0.855 total time= 3.1s [CV 5/5] END .................max_leaf_nodes=16;, score=0.850 total time= 3.1s [CV 1/5] END .................max_leaf_nodes=17;, score=0.848 total time= 3.1s [CV 2/5] END .................max_leaf_nodes=17;, score=0.851 total time= 3.0s [CV 3/5] END .................max_leaf_nodes=17;, score=0.861 total time= 3.7s [CV 4/5] END .................max_leaf_nodes=17;, score=0.860 total time= 3.7s [CV 5/5] END .................max_leaf_nodes=17;, score=0.854 total time= 3.1s [CV 1/5] END .................max_leaf_nodes=18;, score=0.854 total time= 3.2s [CV 2/5] END .................max_leaf_nodes=18;, score=0.852 total time= 3.4s [CV 3/5] END .................max_leaf_nodes=18;, score=0.862 total time= 3.2s [CV 4/5] END .................max_leaf_nodes=18;, score=0.862 total time= 3.6s [CV 5/5] END .................max_leaf_nodes=18;, score=0.855 total time= 3.4s [CV 1/5] END .................max_leaf_nodes=19;, score=0.857 total time= 3.5s [CV 2/5] END .................max_leaf_nodes=19;, score=0.855 total time= 3.7s [CV 3/5] END .................max_leaf_nodes=19;, score=0.863 total time= 4.1s [CV 4/5] END .................max_leaf_nodes=19;, score=0.864 total time= 3.4s [CV 5/5] END .................max_leaf_nodes=19;, score=0.857 total time= 3.3s [CV 1/5] END .................max_leaf_nodes=20;, score=0.861 total time= 3.5s [CV 2/5] END .................max_leaf_nodes=20;, score=0.857 total time= 3.6s [CV 3/5] END .................max_leaf_nodes=20;, score=0.865 total time= 4.2s [CV 4/5] END .................max_leaf_nodes=20;, score=0.867 total time= 3.7s [CV 5/5] END .................max_leaf_nodes=20;, score=0.860 total time= 3.7s [CV 1/5] END .................max_leaf_nodes=21;, score=0.863 total time= 3.8s [CV 2/5] END .................max_leaf_nodes=21;, score=0.861 total time= 4.5s [CV 3/5] END .................max_leaf_nodes=21;, score=0.867 total time= 3.9s [CV 4/5] END .................max_leaf_nodes=21;, score=0.871 total time= 3.7s [CV 5/5] END .................max_leaf_nodes=21;, score=0.865 total time= 3.7s [CV 1/5] END .................max_leaf_nodes=22;, score=0.867 total time= 4.2s [CV 2/5] END .................max_leaf_nodes=22;, score=0.864 total time= 3.8s [CV 3/5] END .................max_leaf_nodes=22;, score=0.871 total time= 3.8s [CV 4/5] END .................max_leaf_nodes=22;, score=0.873 total time= 3.8s [CV 5/5] END .................max_leaf_nodes=22;, score=0.868 total time= 4.2s [CV 1/5] END .................max_leaf_nodes=23;, score=0.870 total time= 3.8s [CV 2/5] END .................max_leaf_nodes=23;, score=0.867 total time= 4.4s [CV 3/5] END .................max_leaf_nodes=23;, score=0.874 total time= 4.2s [CV 4/5] END .................max_leaf_nodes=23;, score=0.874 total time= 3.9s [CV 5/5] END .................max_leaf_nodes=23;, score=0.870 total time= 3.8s [CV 1/5] END .................max_leaf_nodes=24;, score=0.872 total time= 3.9s [CV 2/5] END .................max_leaf_nodes=24;, score=0.868 total time= 4.1s [CV 3/5] END .................max_leaf_nodes=24;, score=0.876 total time= 4.1s [CV 4/5] END .................max_leaf_nodes=24;, score=0.876 total time= 4.0s [CV 5/5] END .................max_leaf_nodes=24;, score=0.872 total time= 3.9s [CV 1/5] END .................max_leaf_nodes=25;, score=0.872 total time= 4.3s [CV 2/5] END .................max_leaf_nodes=25;, score=0.869 total time= 4.0s [CV 3/5] END .................max_leaf_nodes=25;, score=0.878 total time= 4.0s [CV 4/5] END .................max_leaf_nodes=25;, score=0.876 total time= 4.0s [CV 5/5] END .................max_leaf_nodes=25;, score=0.875 total time= 4.3s [CV 1/5] END .................max_leaf_nodes=26;, score=0.876 total time= 4.3s [CV 2/5] END .................max_leaf_nodes=26;, score=0.871 total time= 4.2s [CV 3/5] END .................max_leaf_nodes=26;, score=0.878 total time= 4.6s [CV 4/5] END .................max_leaf_nodes=26;, score=0.879 total time= 5.1s [CV 5/5] END .................max_leaf_nodes=26;, score=0.879 total time= 4.3s [CV 1/5] END .................max_leaf_nodes=27;, score=0.877 total time= 4.5s [CV 2/5] END .................max_leaf_nodes=27;, score=0.874 total time= 4.7s [CV 3/5] END .................max_leaf_nodes=27;, score=0.881 total time= 4.4s [CV 4/5] END .................max_leaf_nodes=27;, score=0.884 total time= 4.4s [CV 5/5] END .................max_leaf_nodes=27;, score=0.881 total time= 4.4s [CV 1/5] END .................max_leaf_nodes=28;, score=0.881 total time= 4.8s [CV 2/5] END .................max_leaf_nodes=28;, score=0.876 total time= 4.4s [CV 3/5] END .................max_leaf_nodes=28;, score=0.882 total time= 4.5s [CV 4/5] END .................max_leaf_nodes=28;, score=0.886 total time= 4.6s [CV 5/5] END .................max_leaf_nodes=28;, score=0.883 total time= 4.2s [CV 1/5] END .................max_leaf_nodes=29;, score=0.881 total time= 4.7s [CV 2/5] END .................max_leaf_nodes=29;, score=0.877 total time= 5.3s [CV 3/5] END .................max_leaf_nodes=29;, score=0.883 total time= 4.4s [CV 4/5] END .................max_leaf_nodes=29;, score=0.888 total time= 4.4s [CV 5/5] END .................max_leaf_nodes=29;, score=0.883 total time= 4.4s [CV 1/5] END .................max_leaf_nodes=30;, score=0.883 total time= 4.8s [CV 2/5] END .................max_leaf_nodes=30;, score=0.879 total time= 4.5s [CV 3/5] END .................max_leaf_nodes=30;, score=0.884 total time= 4.6s [CV 4/5] END .................max_leaf_nodes=30;, score=0.888 total time= 4.9s [CV 5/5] END .................max_leaf_nodes=30;, score=0.885 total time= 4.5s [CV 1/5] END .................max_leaf_nodes=31;, score=0.884 total time= 4.6s [CV 2/5] END .................max_leaf_nodes=31;, score=0.879 total time= 4.9s [CV 3/5] END .................max_leaf_nodes=31;, score=0.884 total time= 5.1s [CV 4/5] END .................max_leaf_nodes=31;, score=0.888 total time= 5.7s [CV 5/5] END .................max_leaf_nodes=31;, score=0.886 total time= 5.3s [CV 1/5] END .................max_leaf_nodes=32;, score=0.884 total time= 5.0s [CV 2/5] END .................max_leaf_nodes=32;, score=0.881 total time= 4.8s [CV 3/5] END .................max_leaf_nodes=32;, score=0.884 total time= 4.9s [CV 4/5] END .................max_leaf_nodes=32;, score=0.888 total time= 5.4s [CV 5/5] END .................max_leaf_nodes=32;, score=0.888 total time= 4.9s [CV 1/5] END .................max_leaf_nodes=33;, score=0.884 total time= 5.0s [CV 2/5] END .................max_leaf_nodes=33;, score=0.880 total time= 5.3s [CV 3/5] END .................max_leaf_nodes=33;, score=0.884 total time= 4.8s [CV 4/5] END .................max_leaf_nodes=33;, score=0.888 total time= 5.0s [CV 5/5] END .................max_leaf_nodes=33;, score=0.888 total time= 5.2s [CV 1/5] END .................max_leaf_nodes=34;, score=0.886 total time= 5.0s [CV 2/5] END .................max_leaf_nodes=34;, score=0.881 total time= 4.8s [CV 3/5] END .................max_leaf_nodes=34;, score=0.884 total time= 5.2s [CV 4/5] END .................max_leaf_nodes=34;, score=0.888 total time= 4.9s [CV 5/5] END .................max_leaf_nodes=34;, score=0.888 total time= 4.9s [CV 1/5] END .................max_leaf_nodes=35;, score=0.886 total time= 5.3s [CV 2/5] END .................max_leaf_nodes=35;, score=0.881 total time= 5.0s [CV 3/5] END .................max_leaf_nodes=35;, score=0.884 total time= 5.0s [CV 4/5] END .................max_leaf_nodes=35;, score=0.890 total time= 5.4s [CV 5/5] END .................max_leaf_nodes=35;, score=0.888 total time= 5.2s [CV 1/5] END .................max_leaf_nodes=36;, score=0.886 total time= 5.3s [CV 2/5] END .................max_leaf_nodes=36;, score=0.881 total time= 5.7s [CV 3/5] END .................max_leaf_nodes=36;, score=0.884 total time= 5.3s [CV 4/5] END .................max_leaf_nodes=36;, score=0.892 total time= 5.3s [CV 5/5] END .................max_leaf_nodes=36;, score=0.888 total time= 5.6s [CV 1/5] END .................max_leaf_nodes=37;, score=0.886 total time= 5.3s [CV 2/5] END .................max_leaf_nodes=37;, score=0.881 total time= 5.3s [CV 3/5] END .................max_leaf_nodes=37;, score=0.885 total time= 5.7s [CV 4/5] END .................max_leaf_nodes=37;, score=0.892 total time= 5.4s [CV 5/5] END .................max_leaf_nodes=37;, score=0.888 total time= 6.0s [CV 1/5] END .................max_leaf_nodes=38;, score=0.886 total time= 5.5s [CV 2/5] END .................max_leaf_nodes=38;, score=0.881 total time= 5.3s [CV 3/5] END .................max_leaf_nodes=38;, score=0.886 total time= 5.8s [CV 4/5] END .................max_leaf_nodes=38;, score=0.893 total time= 5.3s [CV 5/5] END .................max_leaf_nodes=38;, score=0.888 total time= 5.3s [CV 1/5] END .................max_leaf_nodes=39;, score=0.886 total time= 5.7s [CV 2/5] END .................max_leaf_nodes=39;, score=0.881 total time= 5.3s [CV 3/5] END .................max_leaf_nodes=39;, score=0.886 total time= 5.5s [CV 4/5] END .................max_leaf_nodes=39;, score=0.894 total time= 5.7s [CV 5/5] END .................max_leaf_nodes=39;, score=0.887 total time= 5.3s [CV 1/5] END .................max_leaf_nodes=40;, score=0.885 total time= 5.5s [CV 2/5] END .................max_leaf_nodes=40;, score=0.880 total time= 5.9s [CV 3/5] END .................max_leaf_nodes=40;, score=0.886 total time= 6.3s [CV 4/5] END .................max_leaf_nodes=40;, score=0.895 total time= 5.8s [CV 5/5] END .................max_leaf_nodes=40;, score=0.888 total time= 6.1s [CV 1/5] END .................max_leaf_nodes=41;, score=0.885 total time= 5.7s [CV 2/5] END .................max_leaf_nodes=41;, score=0.880 total time= 6.4s [CV 3/5] END .................max_leaf_nodes=41;, score=0.886 total time= 5.9s [CV 4/5] END .................max_leaf_nodes=41;, score=0.895 total time= 5.7s [CV 5/5] END .................max_leaf_nodes=41;, score=0.888 total time= 6.0s [CV 1/5] END .................max_leaf_nodes=42;, score=0.886 total time= 5.6s [CV 2/5] END .................max_leaf_nodes=42;, score=0.880 total time= 5.5s [CV 3/5] END .................max_leaf_nodes=42;, score=0.886 total time= 6.6s [CV 4/5] END .................max_leaf_nodes=42;, score=0.894 total time= 5.8s [CV 5/5] END .................max_leaf_nodes=42;, score=0.889 total time= 5.7s [CV 1/5] END .................max_leaf_nodes=43;, score=0.887 total time= 5.7s [CV 2/5] END .................max_leaf_nodes=43;, score=0.881 total time= 5.8s [CV 3/5] END .................max_leaf_nodes=43;, score=0.886 total time= 6.0s [CV 4/5] END .................max_leaf_nodes=43;, score=0.893 total time= 6.2s [CV 5/5] END .................max_leaf_nodes=43;, score=0.890 total time= 5.8s [CV 1/5] END .................max_leaf_nodes=44;, score=0.888 total time= 6.1s [CV 2/5] END .................max_leaf_nodes=44;, score=0.883 total time= 6.0s [CV 3/5] END .................max_leaf_nodes=44;, score=0.886 total time= 6.4s [CV 4/5] END .................max_leaf_nodes=44;, score=0.894 total time= 6.1s [CV 5/5] END .................max_leaf_nodes=44;, score=0.890 total time= 5.9s [CV 1/5] END .................max_leaf_nodes=45;, score=0.888 total time= 6.3s [CV 2/5] END .................max_leaf_nodes=45;, score=0.883 total time= 5.9s [CV 3/5] END .................max_leaf_nodes=45;, score=0.887 total time= 6.3s [CV 4/5] END .................max_leaf_nodes=45;, score=0.894 total time= 6.0s [CV 5/5] END .................max_leaf_nodes=45;, score=0.891 total time= 5.8s [CV 1/5] END .................max_leaf_nodes=46;, score=0.889 total time= 6.6s [CV 2/5] END .................max_leaf_nodes=46;, score=0.883 total time= 5.9s [CV 3/5] END .................max_leaf_nodes=46;, score=0.887 total time= 6.2s [CV 4/5] END .................max_leaf_nodes=46;, score=0.894 total time= 6.0s [CV 5/5] END .................max_leaf_nodes=46;, score=0.890 total time= 5.7s [CV 1/5] END .................max_leaf_nodes=47;, score=0.889 total time= 6.2s [CV 2/5] END .................max_leaf_nodes=47;, score=0.884 total time= 5.8s [CV 3/5] END .................max_leaf_nodes=47;, score=0.887 total time= 5.9s [CV 4/5] END .................max_leaf_nodes=47;, score=0.894 total time= 6.2s [CV 5/5] END .................max_leaf_nodes=47;, score=0.890 total time= 6.0s [CV 1/5] END .................max_leaf_nodes=48;, score=0.888 total time= 6.4s [CV 2/5] END .................max_leaf_nodes=48;, score=0.884 total time= 6.6s [CV 3/5] END .................max_leaf_nodes=48;, score=0.887 total time= 6.7s [CV 4/5] END .................max_leaf_nodes=48;, score=0.895 total time= 6.6s [CV 5/5] END .................max_leaf_nodes=48;, score=0.890 total time= 6.1s [CV 1/5] END .................max_leaf_nodes=49;, score=0.888 total time= 6.5s [CV 2/5] END .................max_leaf_nodes=49;, score=0.883 total time= 6.9s [CV 3/5] END .................max_leaf_nodes=49;, score=0.886 total time= 7.1s [CV 4/5] END .................max_leaf_nodes=49;, score=0.895 total time= 8.6s [CV 5/5] END .................max_leaf_nodes=49;, score=0.890 total time= 9.0s [CV 1/5] END .................max_leaf_nodes=50;, score=0.888 total time= 8.6s [CV 2/5] END .................max_leaf_nodes=50;, score=0.883 total time= 7.7s [CV 3/5] END .................max_leaf_nodes=50;, score=0.886 total time= 7.5s [CV 4/5] END .................max_leaf_nodes=50;, score=0.896 total time= 7.2s [CV 5/5] END .................max_leaf_nodes=50;, score=0.890 total time= 6.3s [CV 1/5] END .................max_leaf_nodes=51;, score=0.888 total time= 6.3s [CV 2/5] END .................max_leaf_nodes=51;, score=0.883 total time= 7.5s [CV 3/5] END .................max_leaf_nodes=51;, score=0.886 total time= 7.1s [CV 4/5] END .................max_leaf_nodes=51;, score=0.895 total time= 8.0s [CV 5/5] END .................max_leaf_nodes=51;, score=0.890 total time= 7.7s [CV 1/5] END .................max_leaf_nodes=52;, score=0.888 total time= 7.0s [CV 2/5] END .................max_leaf_nodes=52;, score=0.883 total time= 6.8s [CV 3/5] END .................max_leaf_nodes=52;, score=0.886 total time= 7.6s [CV 4/5] END .................max_leaf_nodes=52;, score=0.896 total time= 7.5s [CV 5/5] END .................max_leaf_nodes=52;, score=0.890 total time= 6.6s [CV 1/5] END .................max_leaf_nodes=53;, score=0.889 total time=24.5min [CV 2/5] END .................max_leaf_nodes=53;, score=0.883 total time= 10.6s [CV 3/5] END .................max_leaf_nodes=53;, score=0.886 total time= 6.8s [CV 4/5] END .................max_leaf_nodes=53;, score=0.896 total time= 6.6s [CV 5/5] END .................max_leaf_nodes=53;, score=0.890 total time= 6.5s [CV 1/5] END .................max_leaf_nodes=54;, score=0.889 total time= 8.1s [CV 2/5] END .................max_leaf_nodes=54;, score=0.883 total time= 7.0s [CV 3/5] END .................max_leaf_nodes=54;, score=0.886 total time= 7.3s [CV 4/5] END .................max_leaf_nodes=54;, score=0.896 total time= 7.3s [CV 5/5] END .................max_leaf_nodes=54;, score=0.890 total time= 7.6s [CV 1/5] END .................max_leaf_nodes=55;, score=0.889 total time= 7.2s [CV 2/5] END .................max_leaf_nodes=55;, score=0.884 total time= 7.2s [CV 3/5] END .................max_leaf_nodes=55;, score=0.886 total time= 8.4s [CV 4/5] END .................max_leaf_nodes=55;, score=0.896 total time= 7.9s [CV 5/5] END .................max_leaf_nodes=55;, score=0.890 total time= 7.0s [CV 1/5] END .................max_leaf_nodes=56;, score=0.889 total time= 7.1s [CV 2/5] END .................max_leaf_nodes=56;, score=0.884 total time= 7.0s [CV 3/5] END .................max_leaf_nodes=56;, score=0.886 total time= 6.9s [CV 4/5] END .................max_leaf_nodes=56;, score=0.896 total time= 7.1s [CV 5/5] END .................max_leaf_nodes=56;, score=0.890 total time= 6.8s [CV 1/5] END .................max_leaf_nodes=57;, score=0.889 total time= 7.6s [CV 2/5] END .................max_leaf_nodes=57;, score=0.884 total time= 7.5s [CV 3/5] END .................max_leaf_nodes=57;, score=0.886 total time= 7.6s [CV 4/5] END .................max_leaf_nodes=57;, score=0.896 total time= 7.1s [CV 5/5] END .................max_leaf_nodes=57;, score=0.891 total time= 7.3s [CV 1/5] END .................max_leaf_nodes=58;, score=0.889 total time= 7.1s [CV 2/5] END .................max_leaf_nodes=58;, score=0.884 total time= 7.1s [CV 3/5] END .................max_leaf_nodes=58;, score=0.887 total time= 7.2s [CV 4/5] END .................max_leaf_nodes=58;, score=0.896 total time= 7.3s [CV 5/5] END .................max_leaf_nodes=58;, score=0.891 total time= 6.9s [CV 1/5] END .................max_leaf_nodes=59;, score=0.889 total time= 7.0s [CV 2/5] END .................max_leaf_nodes=59;, score=0.884 total time= 7.4s [CV 3/5] END .................max_leaf_nodes=59;, score=0.887 total time= 7.6s [CV 4/5] END .................max_leaf_nodes=59;, score=0.896 total time= 7.4s [CV 5/5] END .................max_leaf_nodes=59;, score=0.892 total time= 7.2s [CV 1/5] END .................max_leaf_nodes=60;, score=0.890 total time= 7.4s [CV 2/5] END .................max_leaf_nodes=60;, score=0.884 total time= 7.3s [CV 3/5] END .................max_leaf_nodes=60;, score=0.887 total time= 7.2s [CV 4/5] END .................max_leaf_nodes=60;, score=0.897 total time= 6.9s [CV 5/5] END .................max_leaf_nodes=60;, score=0.892 total time= 6.9s [CV 1/5] END .................max_leaf_nodes=61;, score=0.890 total time= 6.9s [CV 2/5] END .................max_leaf_nodes=61;, score=0.884 total time= 7.1s [CV 3/5] END .................max_leaf_nodes=61;, score=0.887 total time= 7.1s [CV 4/5] END .................max_leaf_nodes=61;, score=0.897 total time= 6.9s [CV 5/5] END .................max_leaf_nodes=61;, score=0.892 total time= 6.9s [CV 1/5] END .................max_leaf_nodes=62;, score=0.890 total time= 7.1s [CV 2/5] END .................max_leaf_nodes=62;, score=0.884 total time= 7.5s [CV 3/5] END .................max_leaf_nodes=62;, score=0.888 total time= 7.4s [CV 4/5] END .................max_leaf_nodes=62;, score=0.897 total time= 7.2s [CV 5/5] END .................max_leaf_nodes=62;, score=0.892 total time= 6.9s [CV 1/5] END .................max_leaf_nodes=63;, score=0.890 total time= 7.2s [CV 2/5] END .................max_leaf_nodes=63;, score=0.884 total time= 7.3s [CV 3/5] END .................max_leaf_nodes=63;, score=0.888 total time= 7.3s [CV 4/5] END .................max_leaf_nodes=63;, score=0.897 total time= 7.3s [CV 5/5] END .................max_leaf_nodes=63;, score=0.892 total time= 6.9s [CV 1/5] END .................max_leaf_nodes=64;, score=0.890 total time= 7.2s [CV 2/5] END .................max_leaf_nodes=64;, score=0.884 total time= 7.4s [CV 3/5] END .................max_leaf_nodes=64;, score=0.888 total time= 7.3s [CV 4/5] END .................max_leaf_nodes=64;, score=0.897 total time= 7.2s [CV 5/5] END .................max_leaf_nodes=64;, score=0.892 total time= 6.9s [CV 1/5] END .................max_leaf_nodes=65;, score=0.890 total time= 7.1s [CV 2/5] END .................max_leaf_nodes=65;, score=0.884 total time= 7.6s [CV 3/5] END .................max_leaf_nodes=65;, score=0.888 total time= 7.1s [CV 4/5] END .................max_leaf_nodes=65;, score=0.897 total time= 7.2s [CV 5/5] END .................max_leaf_nodes=65;, score=0.892 total time= 7.1s [CV 1/5] END .................max_leaf_nodes=66;, score=0.890 total time= 7.1s [CV 2/5] END .................max_leaf_nodes=66;, score=0.884 total time= 7.8s [CV 3/5] END .................max_leaf_nodes=66;, score=0.888 total time= 7.1s [CV 4/5] END .................max_leaf_nodes=66;, score=0.897 total time= 7.4s [CV 5/5] END .................max_leaf_nodes=66;, score=0.892 total time= 7.1s [CV 1/5] END .................max_leaf_nodes=67;, score=0.890 total time= 7.2s [CV 2/5] END .................max_leaf_nodes=67;, score=0.884 total time= 7.8s [CV 3/5] END .................max_leaf_nodes=67;, score=0.889 total time= 7.3s [CV 4/5] END .................max_leaf_nodes=67;, score=0.897 total time= 8.0s [CV 5/5] END .................max_leaf_nodes=67;, score=0.892 total time= 7.4s [CV 1/5] END .................max_leaf_nodes=68;, score=0.890 total time= 7.3s [CV 2/5] END .................max_leaf_nodes=68;, score=0.884 total time= 8.0s [CV 3/5] END .................max_leaf_nodes=68;, score=0.889 total time= 7.5s [CV 4/5] END .................max_leaf_nodes=68;, score=0.897 total time= 7.7s [CV 5/5] END .................max_leaf_nodes=68;, score=0.892 total time= 7.4s [CV 1/5] END .................max_leaf_nodes=69;, score=0.890 total time= 7.4s [CV 2/5] END .................max_leaf_nodes=69;, score=0.884 total time= 8.1s [CV 3/5] END .................max_leaf_nodes=69;, score=0.889 total time= 7.6s [CV 4/5] END .................max_leaf_nodes=69;, score=0.897 total time= 7.8s [CV 5/5] END .................max_leaf_nodes=69;, score=0.892 total time= 7.5s [CV 1/5] END .................max_leaf_nodes=70;, score=0.890 total time= 7.6s [CV 2/5] END .................max_leaf_nodes=70;, score=0.884 total time= 8.6s [CV 3/5] END .................max_leaf_nodes=70;, score=0.889 total time= 8.5s [CV 4/5] END .................max_leaf_nodes=70;, score=0.897 total time= 8.2s [CV 5/5] END .................max_leaf_nodes=70;, score=0.892 total time= 7.8s [CV 1/5] END .................max_leaf_nodes=71;, score=0.890 total time= 7.7s [CV 2/5] END .................max_leaf_nodes=71;, score=0.884 total time= 8.6s [CV 3/5] END .................max_leaf_nodes=71;, score=0.889 total time= 7.9s [CV 4/5] END .................max_leaf_nodes=71;, score=0.897 total time= 8.1s [CV 5/5] END .................max_leaf_nodes=71;, score=0.892 total time= 7.9s [CV 1/5] END .................max_leaf_nodes=72;, score=0.890 total time= 7.9s [CV 2/5] END .................max_leaf_nodes=72;, score=0.884 total time= 8.5s [CV 3/5] END .................max_leaf_nodes=72;, score=0.889 total time= 7.9s [CV 4/5] END .................max_leaf_nodes=72;, score=0.897 total time= 8.2s [CV 5/5] END .................max_leaf_nodes=72;, score=0.892 total time= 7.8s [CV 1/5] END .................max_leaf_nodes=73;, score=0.890 total time= 8.0s [CV 2/5] END .................max_leaf_nodes=73;, score=0.884 total time= 8.6s [CV 3/5] END .................max_leaf_nodes=73;, score=0.889 total time= 8.1s [CV 4/5] END .................max_leaf_nodes=73;, score=0.897 total time= 8.5s [CV 5/5] END .................max_leaf_nodes=73;, score=0.892 total time= 9.0s [CV 1/5] END .................max_leaf_nodes=74;, score=0.890 total time= 9.0s [CV 2/5] END .................max_leaf_nodes=74;, score=0.884 total time= 9.3s [CV 3/5] END .................max_leaf_nodes=74;, score=0.889 total time= 8.9s [CV 4/5] END .................max_leaf_nodes=74;, score=0.897 total time= 9.2s [CV 5/5] END .................max_leaf_nodes=74;, score=0.892 total time= 8.8s [CV 1/5] END .................max_leaf_nodes=75;, score=0.890 total time= 8.8s [CV 2/5] END .................max_leaf_nodes=75;, score=0.884 total time= 9.4s [CV 3/5] END .................max_leaf_nodes=75;, score=0.889 total time= 9.0s [CV 4/5] END .................max_leaf_nodes=75;, score=0.897 total time= 9.1s [CV 5/5] END .................max_leaf_nodes=75;, score=0.892 total time= 8.6s [CV 1/5] END .................max_leaf_nodes=76;, score=0.890 total time= 8.7s [CV 2/5] END .................max_leaf_nodes=76;, score=0.884 total time= 9.4s [CV 3/5] END .................max_leaf_nodes=76;, score=0.889 total time= 9.0s [CV 4/5] END .................max_leaf_nodes=76;, score=0.897 total time= 9.4s [CV 5/5] END .................max_leaf_nodes=76;, score=0.892 total time= 8.4s [CV 1/5] END .................max_leaf_nodes=77;, score=0.890 total time= 8.5s [CV 2/5] END .................max_leaf_nodes=77;, score=0.884 total time= 9.0s [CV 3/5] END .................max_leaf_nodes=77;, score=0.889 total time= 8.7s [CV 4/5] END .................max_leaf_nodes=77;, score=0.897 total time= 8.8s [CV 5/5] END .................max_leaf_nodes=77;, score=0.892 total time= 8.6s [CV 1/5] END .................max_leaf_nodes=78;, score=0.890 total time= 8.7s [CV 2/5] END .................max_leaf_nodes=78;, score=0.884 total time= 9.6s [CV 3/5] END .................max_leaf_nodes=78;, score=0.889 total time= 8.6s [CV 4/5] END .................max_leaf_nodes=78;, score=0.897 total time= 9.3s [CV 5/5] END .................max_leaf_nodes=78;, score=0.892 total time= 9.0s [CV 1/5] END .................max_leaf_nodes=79;, score=0.890 total time= 8.7s [CV 2/5] END .................max_leaf_nodes=79;, score=0.884 total time= 9.4s [CV 3/5] END .................max_leaf_nodes=79;, score=0.889 total time= 8.8s [CV 4/5] END .................max_leaf_nodes=79;, score=0.897 total time= 9.0s [CV 5/5] END .................max_leaf_nodes=79;, score=0.892 total time= 8.8s [CV 1/5] END .................max_leaf_nodes=80;, score=0.890 total time= 8.9s [CV 2/5] END .................max_leaf_nodes=80;, score=0.884 total time= 9.5s [CV 3/5] END .................max_leaf_nodes=80;, score=0.889 total time= 8.9s [CV 4/5] END .................max_leaf_nodes=80;, score=0.897 total time= 9.2s [CV 5/5] END .................max_leaf_nodes=80;, score=0.892 total time= 9.0s [CV 1/5] END .................max_leaf_nodes=81;, score=0.890 total time= 8.9s [CV 2/5] END .................max_leaf_nodes=81;, score=0.884 total time= 9.6s [CV 3/5] END .................max_leaf_nodes=81;, score=0.889 total time= 8.9s [CV 4/5] END .................max_leaf_nodes=81;, score=0.897 total time= 9.3s [CV 5/5] END .................max_leaf_nodes=81;, score=0.892 total time= 8.9s [CV 1/5] END .................max_leaf_nodes=82;, score=0.890 total time= 9.2s [CV 2/5] END .................max_leaf_nodes=82;, score=0.884 total time= 9.6s [CV 3/5] END .................max_leaf_nodes=82;, score=0.889 total time= 9.1s [CV 4/5] END .................max_leaf_nodes=82;, score=0.897 total time= 9.5s [CV 5/5] END .................max_leaf_nodes=82;, score=0.892 total time= 9.2s [CV 1/5] END .................max_leaf_nodes=83;, score=0.890 total time= 9.6s [CV 2/5] END .................max_leaf_nodes=83;, score=0.884 total time= 9.8s [CV 3/5] END .................max_leaf_nodes=83;, score=0.889 total time= 9.3s [CV 4/5] END .................max_leaf_nodes=83;, score=0.897 total time= 9.9s [CV 5/5] END .................max_leaf_nodes=83;, score=0.892 total time= 9.4s [CV 1/5] END .................max_leaf_nodes=84;, score=0.890 total time= 9.4s [CV 2/5] END .................max_leaf_nodes=84;, score=0.884 total time= 9.8s [CV 3/5] END .................max_leaf_nodes=84;, score=0.889 total time= 9.3s [CV 4/5] END .................max_leaf_nodes=84;, score=0.897 total time= 9.7s [CV 5/5] END .................max_leaf_nodes=84;, score=0.892 total time= 9.3s [CV 1/5] END .................max_leaf_nodes=85;, score=0.890 total time= 9.6s [CV 2/5] END .................max_leaf_nodes=85;, score=0.884 total time= 10.0s [CV 3/5] END .................max_leaf_nodes=85;, score=0.889 total time= 9.5s [CV 4/5] END .................max_leaf_nodes=85;, score=0.897 total time= 9.9s [CV 5/5] END .................max_leaf_nodes=85;, score=0.892 total time= 9.6s [CV 1/5] END .................max_leaf_nodes=86;, score=0.890 total time= 9.6s [CV 2/5] END .................max_leaf_nodes=86;, score=0.884 total time= 10.2s [CV 3/5] END .................max_leaf_nodes=86;, score=0.889 total time= 9.6s [CV 4/5] END .................max_leaf_nodes=86;, score=0.897 total time= 10.1s [CV 5/5] END .................max_leaf_nodes=86;, score=0.892 total time= 9.7s [CV 1/5] END .................max_leaf_nodes=87;, score=0.890 total time= 9.8s [CV 2/5] END .................max_leaf_nodes=87;, score=0.884 total time= 10.3s [CV 3/5] END .................max_leaf_nodes=87;, score=0.889 total time= 9.7s [CV 4/5] END .................max_leaf_nodes=87;, score=0.897 total time= 10.1s [CV 5/5] END .................max_leaf_nodes=87;, score=0.892 total time= 9.8s [CV 1/5] END .................max_leaf_nodes=88;, score=0.890 total time= 9.9s [CV 2/5] END .................max_leaf_nodes=88;, score=0.884 total time= 10.3s [CV 3/5] END .................max_leaf_nodes=88;, score=0.889 total time= 10.0s [CV 4/5] END .................max_leaf_nodes=88;, score=0.897 total time= 10.5s [CV 5/5] END .................max_leaf_nodes=88;, score=0.892 total time= 10.7s [CV 1/5] END .................max_leaf_nodes=89;, score=0.890 total time= 10.7s [CV 2/5] END .................max_leaf_nodes=89;, score=0.884 total time= 11.1s [CV 3/5] END .................max_leaf_nodes=89;, score=0.889 total time= 10.7s [CV 4/5] END .................max_leaf_nodes=89;, score=0.897 total time= 11.1s [CV 5/5] END .................max_leaf_nodes=89;, score=0.892 total time= 10.7s [CV 1/5] END .................max_leaf_nodes=90;, score=0.890 total time= 10.8s [CV 2/5] END .................max_leaf_nodes=90;, score=0.884 total time= 10.9s [CV 3/5] END .................max_leaf_nodes=90;, score=0.889 total time= 10.2s [CV 4/5] END .................max_leaf_nodes=90;, score=0.897 total time= 10.6s [CV 5/5] END .................max_leaf_nodes=90;, score=0.892 total time= 10.4s [CV 1/5] END .................max_leaf_nodes=91;, score=0.890 total time= 10.7s [CV 2/5] END .................max_leaf_nodes=91;, score=0.884 total time= 11.3s [CV 3/5] END .................max_leaf_nodes=91;, score=0.889 total time= 11.0s [CV 4/5] END .................max_leaf_nodes=91;, score=0.897 total time= 10.9s [CV 5/5] END .................max_leaf_nodes=91;, score=0.892 total time= 10.8s [CV 1/5] END .................max_leaf_nodes=92;, score=0.890 total time= 10.9s [CV 2/5] END .................max_leaf_nodes=92;, score=0.884 total time= 11.2s [CV 3/5] END .................max_leaf_nodes=92;, score=0.889 total time= 10.7s [CV 4/5] END .................max_leaf_nodes=92;, score=0.897 total time= 11.1s [CV 5/5] END .................max_leaf_nodes=92;, score=0.892 total time= 10.8s [CV 1/5] END .................max_leaf_nodes=93;, score=0.890 total time= 10.6s [CV 2/5] END .................max_leaf_nodes=93;, score=0.884 total time= 11.2s [CV 3/5] END .................max_leaf_nodes=93;, score=0.889 total time= 10.6s [CV 4/5] END .................max_leaf_nodes=93;, score=0.897 total time= 10.7s [CV 5/5] END .................max_leaf_nodes=93;, score=0.892 total time= 10.6s [CV 1/5] END .................max_leaf_nodes=94;, score=0.890 total time= 10.8s [CV 2/5] END .................max_leaf_nodes=94;, score=0.884 total time= 10.9s [CV 3/5] END .................max_leaf_nodes=94;, score=0.889 total time= 10.8s [CV 4/5] END .................max_leaf_nodes=94;, score=0.897 total time= 11.1s [CV 5/5] END .................max_leaf_nodes=94;, score=0.892 total time= 10.6s [CV 1/5] END .................max_leaf_nodes=95;, score=0.890 total time= 11.2s [CV 2/5] END .................max_leaf_nodes=95;, score=0.884 total time= 11.4s [CV 3/5] END .................max_leaf_nodes=95;, score=0.889 total time= 10.7s [CV 4/5] END .................max_leaf_nodes=95;, score=0.897 total time= 11.1s [CV 5/5] END .................max_leaf_nodes=95;, score=0.892 total time= 10.7s [CV 1/5] END .................max_leaf_nodes=96;, score=0.890 total time= 11.0s [CV 2/5] END .................max_leaf_nodes=96;, score=0.884 total time= 11.2s [CV 3/5] END .................max_leaf_nodes=96;, score=0.889 total time= 11.0s [CV 4/5] END .................max_leaf_nodes=96;, score=0.897 total time= 11.2s [CV 5/5] END .................max_leaf_nodes=96;, score=0.892 total time= 10.8s [CV 1/5] END .................max_leaf_nodes=97;, score=0.890 total time= 11.3s [CV 2/5] END .................max_leaf_nodes=97;, score=0.884 total time= 11.5s [CV 3/5] END .................max_leaf_nodes=97;, score=0.889 total time= 11.1s [CV 4/5] END .................max_leaf_nodes=97;, score=0.897 total time= 11.2s [CV 5/5] END .................max_leaf_nodes=97;, score=0.892 total time= 11.0s [CV 1/5] END .................max_leaf_nodes=98;, score=0.890 total time= 11.1s [CV 2/5] END .................max_leaf_nodes=98;, score=0.884 total time= 11.5s [CV 3/5] END .................max_leaf_nodes=98;, score=0.889 total time= 11.2s [CV 4/5] END .................max_leaf_nodes=98;, score=0.897 total time= 11.4s [CV 5/5] END .................max_leaf_nodes=98;, score=0.892 total time= 11.2s [CV 1/5] END .................max_leaf_nodes=99;, score=0.890 total time= 12.1s [CV 2/5] END .................max_leaf_nodes=99;, score=0.884 total time= 12.3s [CV 3/5] END .................max_leaf_nodes=99;, score=0.889 total time= 12.3s [CV 4/5] END .................max_leaf_nodes=99;, score=0.897 total time= 12.4s [CV 5/5] END .................max_leaf_nodes=99;, score=0.892 total time= 11.8s
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0), param_grid={'max_leaf_nodes': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, ...]}, verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0), param_grid={'max_leaf_nodes': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, ...]}, verbose=3)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0), param_grid={'max_leaf_nodes': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, ...]}, verbose=3)
gs.best_params_
{'max_leaf_nodes': 63}
min_impurity_decrease¶
This parameter takes floating point value where the default value is 0.0. If a split causes an impurity level to drop by more than or equal to this amount, a node will be split. GridSearchCV is used to find the best parameter value from the range of 0.1 to 1. And the best results were obtained using 0.1 as min_impurity_decrease.
gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, min_samples_split=0.1,
min_samples_leaf=4,max_features=None,
max_leaf_nodes=63,random_state=0),
param_grid={'min_impurity_decrease': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]},
verbose=3)
gs.fit(X_train, y_train)
Fitting 5 folds for each of 9 candidates, totalling 45 fits [CV 1/5] END .........min_impurity_decrease=0.1;, score=0.775 total time= 0.4s [CV 2/5] END .........min_impurity_decrease=0.1;, score=0.775 total time= 0.4s [CV 3/5] END .........min_impurity_decrease=0.1;, score=0.775 total time= 0.4s [CV 4/5] END .........min_impurity_decrease=0.1;, score=0.775 total time= 0.4s [CV 5/5] END .........min_impurity_decrease=0.1;, score=0.775 total time= 0.4s [CV 1/5] END .........min_impurity_decrease=0.2;, score=0.775 total time= 0.4s [CV 2/5] END .........min_impurity_decrease=0.2;, score=0.775 total time= 0.4s [CV 3/5] END .........min_impurity_decrease=0.2;, score=0.775 total time= 0.4s [CV 4/5] END .........min_impurity_decrease=0.2;, score=0.775 total time= 0.3s [CV 5/5] END .........min_impurity_decrease=0.2;, score=0.775 total time= 0.4s [CV 1/5] END .........min_impurity_decrease=0.3;, score=0.775 total time= 0.4s [CV 2/5] END .........min_impurity_decrease=0.3;, score=0.775 total time= 0.4s [CV 3/5] END .........min_impurity_decrease=0.3;, score=0.775 total time= 0.4s [CV 4/5] END .........min_impurity_decrease=0.3;, score=0.775 total time= 0.4s [CV 5/5] END .........min_impurity_decrease=0.3;, score=0.775 total time= 0.4s [CV 1/5] END .........min_impurity_decrease=0.4;, score=0.775 total time= 0.4s [CV 2/5] END .........min_impurity_decrease=0.4;, score=0.775 total time= 0.4s [CV 3/5] END .........min_impurity_decrease=0.4;, score=0.775 total time= 0.4s [CV 4/5] END .........min_impurity_decrease=0.4;, score=0.775 total time= 0.4s [CV 5/5] END .........min_impurity_decrease=0.4;, score=0.775 total time= 0.3s [CV 1/5] END .........min_impurity_decrease=0.5;, score=0.775 total time= 0.3s [CV 2/5] END .........min_impurity_decrease=0.5;, score=0.775 total time= 0.3s [CV 3/5] END .........min_impurity_decrease=0.5;, score=0.775 total time= 0.3s [CV 4/5] END .........min_impurity_decrease=0.5;, score=0.775 total time= 0.3s [CV 5/5] END .........min_impurity_decrease=0.5;, score=0.775 total time= 0.3s [CV 1/5] END .........min_impurity_decrease=0.6;, score=0.775 total time= 0.3s [CV 2/5] END .........min_impurity_decrease=0.6;, score=0.775 total time= 0.3s [CV 3/5] END .........min_impurity_decrease=0.6;, score=0.775 total time= 0.3s [CV 4/5] END .........min_impurity_decrease=0.6;, score=0.775 total time= 0.3s [CV 5/5] END .........min_impurity_decrease=0.6;, score=0.775 total time= 0.3s [CV 1/5] END .........min_impurity_decrease=0.7;, score=0.775 total time= 0.3s [CV 2/5] END .........min_impurity_decrease=0.7;, score=0.775 total time= 0.3s [CV 3/5] END .........min_impurity_decrease=0.7;, score=0.775 total time= 0.3s [CV 4/5] END .........min_impurity_decrease=0.7;, score=0.775 total time= 0.3s [CV 5/5] END .........min_impurity_decrease=0.7;, score=0.775 total time= 0.3s [CV 1/5] END .........min_impurity_decrease=0.8;, score=0.775 total time= 0.3s [CV 2/5] END .........min_impurity_decrease=0.8;, score=0.775 total time= 0.3s [CV 3/5] END .........min_impurity_decrease=0.8;, score=0.775 total time= 0.3s [CV 4/5] END .........min_impurity_decrease=0.8;, score=0.775 total time= 0.3s [CV 5/5] END .........min_impurity_decrease=0.8;, score=0.775 total time= 0.3s [CV 1/5] END .........min_impurity_decrease=0.9;, score=0.775 total time= 0.4s [CV 2/5] END .........min_impurity_decrease=0.9;, score=0.775 total time= 0.4s [CV 3/5] END .........min_impurity_decrease=0.9;, score=0.775 total time= 0.4s [CV 4/5] END .........min_impurity_decrease=0.9;, score=0.775 total time= 0.3s [CV 5/5] END .........min_impurity_decrease=0.9;, score=0.775 total time= 0.3s
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, max_leaf_nodes=63, min_samples_leaf=4, min_samples_split=0.1, random_state=0), param_grid={'min_impurity_decrease': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}, verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, max_leaf_nodes=63, min_samples_leaf=4, min_samples_split=0.1, random_state=0), param_grid={'min_impurity_decrease': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}, verbose=3)
DecisionTreeClassifier(max_depth=65, max_leaf_nodes=63, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
DecisionTreeClassifier(max_depth=65, max_leaf_nodes=63, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
clf.get_params()
{'ccp_alpha': 0.0, 'class_weight': None, 'criterion': 'gini', 'max_depth': None, 'max_features': None, 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'random_state': 0, 'splitter': 'best'}
gs.best_params_
{'min_impurity_decrease': 0.1}
The fine-tuned hyperparameters with their values are
DecisionTreeClassifier (criterion='gini', splitter='best', max_depth=65, min_samples_split=0.1, min_samples_leaf=4, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, random_state=0)
clf = DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, min_samples_split=0.1,
min_samples_leaf=4,max_features=None,
max_leaf_nodes=None,min_impurity_decrease=0.0,
random_state=0)
clf.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
89.45527908540686
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8945527908540686 precision recall f1-score support 0 0.44 0.15 0.22 427 1 0.94 0.94 0.94 5747 2 0.77 0.93 0.84 1261 accuracy 0.89 7435 macro avg 0.72 0.67 0.67 7435 weighted avg 0.88 0.89 0.88 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 63 290 74] [ 58 5410 279] [ 22 61 1178]]
Ensemble¶
BaggingClassifier¶
A bagging classifier is an ensemble meta-estimator that fits base classifiers one at a time to random subsets of the original dataset, then combines each prediction (either through voting or average) to create the final prediction. Typically, a black-box estimator's variance can be decreased by using a meta-estimator of this type (E.g., a decision tree) by adding randomization to its construction process and creating an ensemble from it. Bagging Classifier is applied in two ways. First the bagging classifier is applied on decision tree with default parameters and then using fine-tuned parameters.
from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(DecisionTreeClassifier(
random_state=0))
clf.fit(X_train, y_train)
BaggingClassifier(estimator=DecisionTreeClassifier(random_state=0))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
BaggingClassifier(estimator=DecisionTreeClassifier(random_state=0))
DecisionTreeClassifier(random_state=0)
DecisionTreeClassifier(random_state=0)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
89.44182918628111
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8944182918628111 precision recall f1-score support 0 0.46 0.22 0.30 427 1 0.93 0.94 0.94 5747 2 0.80 0.92 0.85 1261 accuracy 0.89 7435 macro avg 0.73 0.69 0.70 7435 weighted avg 0.88 0.89 0.89 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 94 278 55] [ 104 5401 242] [ 7 99 1155]]
The Bagging classifier modeled with fine-tuned parameters performed better. The confusion matrix for bagging classifier created with fine-tuned decision tree parameters is
from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, min_samples_split=0.1,
min_samples_leaf=4,max_features=None,
max_leaf_nodes=None,min_impurity_decrease=0.0,
random_state=0))
clf.fit(X_train, y_train)
BaggingClassifier(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
BaggingClassifier(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0))
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
89.06523201075991
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8906523201075992 precision recall f1-score support 0 0.44 0.16 0.23 427 1 0.95 0.93 0.94 5747 2 0.74 0.95 0.83 1261 accuracy 0.89 7435 macro avg 0.71 0.68 0.67 7435 weighted avg 0.88 0.89 0.88 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 68 261 98] [ 65 5356 326] [ 20 43 1198]]
from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(
DecisionTreeClassifier(
random_state=0))
clf.fit(X_train, y_train)
BaggingClassifier(estimator=DecisionTreeClassifier(random_state=0))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
BaggingClassifier(estimator=DecisionTreeClassifier(random_state=0))
DecisionTreeClassifier(random_state=0)
DecisionTreeClassifier(random_state=0)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
88.97108271687962
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8897108271687962 precision recall f1-score support 0 0.41 0.22 0.28 427 1 0.94 0.93 0.93 5747 2 0.79 0.91 0.85 1261 accuracy 0.89 7435 macro avg 0.71 0.69 0.69 7435 weighted avg 0.88 0.89 0.88 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 92 280 55] [ 120 5370 257] [ 15 93 1153]]
AdaBoostClassifier¶
A meta-estimator called an AdaBoost classifier starts by fitting a classifier to the initial dataset. It then fits additional copies of the classifier to the same dataset, but with the weights of instances that were incorrectly classified being changed so that later classifiers would concentrate more on challenging cases. AdaBoost Classifier is applied in two ways. First the AdaBoost classifier is applied on decision tree with default parameters and then using fine-tuned parameters.
from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier(
base_estimator=DecisionTreeClassifier(
random_state=0))
clf.fit(X_train, y_train)
AdaBoostClassifier(base_estimator=DecisionTreeClassifier(random_state=0))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
AdaBoostClassifier(base_estimator=DecisionTreeClassifier(random_state=0))
DecisionTreeClassifier(random_state=0)
DecisionTreeClassifier(random_state=0)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
88.15063887020848
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8815063887020848 precision recall f1-score support 0 0.37 0.24 0.29 427 1 0.93 0.93 0.93 5747 2 0.79 0.86 0.83 1261 accuracy 0.88 7435 macro avg 0.70 0.68 0.68 7435 weighted avg 0.87 0.88 0.88 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 104 266 57] [ 154 5361 232] [ 23 149 1089]]
The AdaBoost classifier modeled with default parameters performed better. The confusion matrix for AdaBoost classifier created with default decision tree parameters is
from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier(
base_estimator=DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, min_samples_split=0.1,
min_samples_leaf=4,max_features=None,
max_leaf_nodes=None,min_impurity_decrease=0.0,
random_state=0))
clf.fit(X_train, y_train)
AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0))
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)
83.81977135171486
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8381977135171487 precision recall f1-score support 0 0.24 0.27 0.25 427 1 0.89 0.91 0.90 5747 2 0.81 0.71 0.76 1261 accuracy 0.84 7435 macro avg 0.65 0.63 0.64 7435 weighted avg 0.84 0.84 0.84 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 115 275 37] [ 355 5225 167] [ 16 353 892]]
VotingClassifier¶
Using a majority vote or the average predicted probabilities (soft vote), the Voting Classifier combines conceptually distinct machine learning classifiers to predict class labels. In order to counteract the weaknesses of a group of models with similar performance, a classifier like this can be helpful. In order to predict class labels, the Voting Classifier combines conceptually different machine learning classifiers and casts a vote based on a majority or the average predicted probabilities (soft vote). A classifier like this can be useful to compensate for the shortcomings of a set of models with comparable performance. Three classifiers are used in Voting classifier. The first classifier is are MultinomialNB with fine-tuned parameters, the second is DecisionTreeClassifier with default parameters and the third is DecisionTreeClassifier with fine-tuned parameters. Different weights are assigned to all these classifiers.
from sklearn.ensemble import VotingClassifier
The DecisionTreeClassifier classifier modeled with the weights 1 for MultinomialNB with fine-tuned parameters, 1, for decision tree with default parameters and 2 for decision tree with fine-tuned parameters performed the best. The confusion matrix for voting classifier created with three classifiers with weights 1,1,2 is
clf1 = MultinomialNB(alpha=1.0, fit_prior=False, class_prior=None)
clf2 = DecisionTreeClassifier(random_state=0)
clf3 = DecisionTreeClassifier(criterion='gini',splitter='best',
max_depth=65, min_samples_split=0.1,
min_samples_leaf=4,max_features=None,
max_leaf_nodes=None,min_impurity_decrease=0.0,
random_state=0)
eclf = VotingClassifier(estimators=[('mnb', clf1), ('dt', clf2), ('ft-dt', clf3)],
voting='soft', weights=[1, 1, 2])
#
eclf.fit(X_train, y_train)
VotingClassifier(estimators=[('mnb', MultinomialNB(fit_prior=False)), ('dt', DecisionTreeClassifier(random_state=0)), ('ft-dt', DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0))], voting='soft', weights=[1, 1, 2])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
VotingClassifier(estimators=[('mnb', MultinomialNB(fit_prior=False)), ('dt', DecisionTreeClassifier(random_state=0)), ('ft-dt', DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0))], voting='soft', weights=[1, 1, 2])
MultinomialNB(fit_prior=False)
DecisionTreeClassifier(random_state=0)
DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1, random_state=0)
predictions=eclf.predict(X_test)
score=eclf.score(X_test,y_test)
print(score*100)
89.14593140551446
print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))
Accuracy: 0.8914593140551446 precision recall f1-score support 0 0.47 0.12 0.20 427 1 0.92 0.95 0.94 5747 2 0.79 0.88 0.84 1261 accuracy 0.89 7435 macro avg 0.73 0.65 0.66 7435 weighted avg 0.87 0.89 0.88 7435
cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()
[[ 53 315 59] [ 54 5465 228] [ 6 145 1110]]
Comments
Post a Comment