Import necessary libraries

In [24]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import nltk
nltk.download('stopwords', quiet=True)
nltk.download('wordnet', quiet=True)
nltk.download('omw-1.4', quiet=True)
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer

import string
import re
import demoji
from wordcloud import WordCloud, STOPWORDS
from textblob import TextBlob

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn import metrics
from sklearn.metrics import confusion_matrix, classification_report

In [25]:

import warnings
warnings.filterwarnings('ignore')

The dataset used in this research is obtained from Kaggle.

Data Characteristics¶

The data used in this research is based on Twitter data and contains information about hate speech and offensive language. The dataset consists if tweets that are normal, racist, sexist, homophobic and generally offensive. The dataset obtained is a labeled dataset. The dataset contains 7 feature columns and 24783 Records.

Index¶

This column contains the index number for each record.

Count¶

It represents the number of users from CrowdFlower (CF) that coded each tweet ranges from 3 or more.

hate_speech¶

It represents the number of CF users who judged that the tweet content is hate speech

Offensive_language¶

It represents the number of Cf users who judged that the tweet content is offensive

neither¶

It represents the number of CF users who judged that the tweet content is neither hate speech nor offensive

class¶

This column represents the class label assigned to each tweet 0 for hate speech, 1 for offensive language and 2 for neither.

Tweet¶

This column represents the tweet content obtained from twitter.

In [27]:

tweet_data=pd.read_csv("labeled_data.csv")

In [28]:

print(tweet_data.shape)

(24783, 7)

In [29]:

tweet_data.columns

Out[29]:

Index(['Unnamed: 0', 'count', 'hate_speech', 'offensive_language', 'neither',
       'class', 'tweet'],
      dtype='object')

In [30]:

tweet_data.head()

Out[30]:

	Unnamed: 0	count	offensive_language	neither	class	tweet
0	0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...
1	1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...
2	2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...
3	3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...
4	4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...

In [31]:

tweet_data.tail()

Out[31]:

	Unnamed: 0	count	offensive_language	neither	class	tweet
24778	25291	3	2	1	1	you's a muthaf***in lie “@LifeAsKing: @2...
24779	25292	3	1	2	2	you've gone and broke the wrong heart baby, an...
24780	25294	3	3	0	1	young buck wanna eat!!.. dat nigguh like I ain...
24781	25295	6	6	0	1	youu got wild bitches tellin you lies
24782	25296	3	0	3	2	~~Ruffled \| Ntac Eileen Dahlia - Beautiful col...

In [32]:

tweet_data.sample(5)

Out[32]:

	Unnamed: 0	count	hate_speech	offensive_language	class	tweet
10742	11023	3	0	3	1	I made ya bitch sicc wit that one 😂
22914	23394	3	0	3	1	Why you Worried bout a bitch, weed, clothes,se...
929	949	3	0	3	1	#youaremoreattractive if u a real bitch!
1701	1736	3	1	2	1	“@badnradbrad: @whattheflocka @MorbidMer...
19568	20003	3	0	3	1	RT @lnsaneTweets: I'm such a sarcastic bitch i...

Data Pre-processing¶

Data pre-processing is required to clean and transform data to make sure that it is ready for analysis. Several steps are followed to pre-process this data. The steps are listed as follows

### The index column is irrelevant to data analysis so it is dropped.

In [34]:

tweet_data=tweet_data.drop(['Unnamed: 0'],axis=1)
tweet_data

Out[34]:

	count	hate_speech	offensive_language	neither	class	tweet
0	3	0	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...
1	3	0	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...
2	3	0	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...
3	3	0	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...
4	6	0	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...
...	...	...	...	...	...	...
24778	3	0	2	1	1	you's a muthaf***in lie “@LifeAsKing: @2...
24779	3	0	1	2	2	you've gone and broke the wrong heart baby, an...
24780	3	0	3	0	1	young buck wanna eat!!.. dat nigguh like I ain...
24781	6	0	6	0	1	youu got wild bitches tellin you lies
24782	3	0	0	3	2	~~Ruffled \| Ntac Eileen Dahlia - Beautiful col...

24783 rows × 6 columns

The data is checked for null values but no null value is found.¶

In [36]:

tweet_data.isna().sum()

Out[36]:

count                 0
hate_speech           0
offensive_language    0
neither               0
class                 0
tweet                 0
dtype: int64

The data is checked for duplicated records but no duplicate value is found in the data¶

In [38]:

tweet_data.duplicated().sum()

Out[38]:

Exploratory Data Analysis¶

• Summary statistics of the different feature columns in our data is presented using the following table

In [41]:

tweet_data.describe()

Out[41]:

	count	hate_speech	offensive_language	neither	class
count	24783.000000	24783.000000	24783.000000	24783.000000	24783.000000
mean	3.243473	0.280515	2.413711	0.549247	1.110277
std	0.883060	0.631851	1.399459	1.113299	0.462089
min	3.000000	0.000000	0.000000	0.000000	0.000000
25%	3.000000	0.000000	2.000000	0.000000	1.000000
50%	3.000000	0.000000	3.000000	0.000000	1.000000
75%	3.000000	0.000000	3.000000	0.000000	1.000000
max	9.000000	7.000000	9.000000	9.000000	2.000000

• Box plot is a visualization technique used to display data by grouping numerical columns into quartiles. The data describe the median (Q2) with a line and 1st and 3rd quartile (Q1 to Q3) using a box. The points extending from the box presents data range and outliers. Box plots are also used to compare variables and distributions.

The box plot of different feature columns is represented as

In [55]:

data_plot=sns.boxplot(data=tweet_data)

data_plot.set_xticklabels(data_plot.get_xticklabels(), rotation=45)

Out[55]:

[Text(0, 0, 'count'),
 Text(1, 0, 'hate_speech'),
 Text(2, 0, 'offensive_language'),
 Text(3, 0, 'neither'),
 Text(4, 0, 'class')]

No description has been provided for this image

• The distribution of count column shows that the number of users for reviewing each tweet ranges from 3 to 9. On average at least 3 users have reviewed each tweet and judged its content.
• The distribution of hate_Speech column shows that the number of users for reviewing each tweet as hate speech and their number ranges from 0 to 7 where 0 means that no user thinks that the tweet is hate speech. 4993 records i.e., 20.15% records have been filtered where number of users reviewing hate speech has a number greater than 0. Most of the tweet content is considered not a hate speech therefore the count of 0 users is the highest in this plot.

• The distribution of offensive_language column shows that the number of users for reviewing each tweet as offensive language and their number ranges from 0 to 8 where 0 means that no user thinks that the tweet is hate speech. 21308 i.e., 85.98% records have been filtered where number of users reviewing offensive language has a number greater than 0. More than 13000 tweet content is considered offensive language therefore the count of 3 users is the highest in this plot.
• The distribution of neither column shows that the number of users for reviewing each tweet as neither hate speech nor offensive language and their number ranges from 0 to 8 where 0 means that no user thinks that the tweet is hate speech. 5891 records i.e., 23.77% records have been filtered where number of users reviewing tweet content as neither has a number greater than 0. More than 2500 tweet content is considered neither as shown by the bar at 3 users.
• The distribution of count column shows the class assigned to each tweet content. 1430 records i.e., 5.77% have been assigned class 0 which is hate speech. 19190 records i.e., 77.43% have been assigned class 1 which is offensive language. 4163 records i.e., 16.8% have been assigned class 2 which is neither hate speech nor offensive language. Highest class assigned is offensive language while the lowest class assigned is hate speech as displayed in the histogram.

In [323]:

for col in tweet_data[['count', 'hate_speech', 'offensive_language', 'neither',
       'class']]:
    sns.histplot(tweet_data[col])
    plt.show()

In [324]:

tweet_data[tweet_data['hate_speech']>0].describe()

Out[324]:

	count	hate_speech	offensive_language	neither	class
count	4993.000000	4993.000000	4993.000000	4993.000000	4993.000000
mean	3.382936	1.392349	1.827759	0.162828	0.764070
std	1.124272	0.658461	1.256703	0.594213	0.530344
min	3.000000	1.000000	0.000000	0.000000	0.000000
25%	3.000000	1.000000	1.000000	0.000000	0.000000
50%	3.000000	1.000000	2.000000	0.000000	1.000000
75%	3.000000	2.000000	2.000000	0.000000	1.000000
max	9.000000	7.000000	8.000000	8.000000	2.000000

In [325]:

tweet_data[tweet_data['offensive_language']>0].describe()

Out[325]:

	count	hate_speech	offensive_language	neither	class
count	21308.000000	21308.000000	21308.000000	21308.000000	21308.000000
mean	3.263328	0.266942	2.807349	0.189037	1.000751
std	0.916658	0.578783	1.082944	0.572051	0.315283
min	3.000000	0.000000	1.000000	0.000000	0.000000
25%	3.000000	0.000000	2.000000	0.000000	1.000000
50%	3.000000	0.000000	3.000000	0.000000	1.000000
75%	3.000000	0.000000	3.000000	0.000000	1.000000
max	9.000000	7.000000	9.000000	8.000000	2.000000

In [326]:

tweet_data[tweet_data['neither']>0].describe()

Out[326]:

	count	hate_speech	offensive_language	neither	class
count	5891.000000	5891.000000	5891.000000	5891.000000	5891.000000
mean	3.248175	0.108301	0.829231	2.310643	1.685113
std	0.899459	0.422802	1.138030	1.069688	0.508816
min	3.000000	0.000000	0.000000	1.000000	0.000000
25%	3.000000	0.000000	0.000000	1.000000	1.000000
50%	3.000000	0.000000	0.000000	3.000000	2.000000
75%	3.000000	0.000000	2.000000	3.000000	2.000000
max	9.000000	7.000000	8.000000	9.000000	2.000000

In [327]:

tweet_data[tweet_data['class']==0].describe()

Out[327]:

	count	hate_speech	offensive_language	neither	class
count	1430.000000	1430.000000	1430.000000	1430.000000	1430.0
mean	3.108392	2.256643	0.755944	0.095804	0.0
std	0.648084	0.573994	0.487653	0.326007	0.0
min	3.000000	2.000000	0.000000	0.000000	0.0
25%	3.000000	2.000000	0.000000	0.000000	0.0
50%	3.000000	2.000000	1.000000	0.000000	0.0
75%	3.000000	2.000000	1.000000	0.000000	0.0
max	9.000000	7.000000	4.000000	4.000000	0.0

In [328]:

tweet_data[tweet_data['class']==1].describe()

Out[328]:

	count	hate_speech	offensive_language	neither	class
count	19190.000000	19190.000000	19190.000000	19190.000000	19190.0
mean	3.268890	0.180459	3.003544	0.084888	1.0
std	0.923024	0.407220	0.954097	0.284093	0.0
min	3.000000	0.000000	2.000000	0.000000	1.0
25%	3.000000	0.000000	3.000000	0.000000	1.0
50%	3.000000	0.000000	3.000000	0.000000	1.0
75%	3.000000	0.000000	3.000000	0.000000	1.0
max	9.000000	4.000000	9.000000	3.000000	1.0

In [329]:

tweet_data[tweet_data['class']==2].describe()

Out[329]:

	count	hate_speech	offensive_language	neither	class
count	4163.000000	4163.000000	4163.000000	4163.000000	4163.0
mean	3.172712	0.062935	0.264233	2.845544	2.0
std	0.746097	0.253524	0.461737	0.795181	0.0
min	3.000000	0.000000	0.000000	2.000000	2.0
25%	3.000000	0.000000	0.000000	2.000000	2.0
50%	3.000000	0.000000	0.000000	3.000000	2.0
75%	3.000000	0.000000	1.000000	3.000000	2.0
max	9.000000	3.000000	4.000000	9.000000	2.0

In [330]:

sns.stripplot(data=tweet_data, x="hate_speech", y="class")
plt.show()

In [331]:

sns.stripplot(data=tweet_data, x="offensive_language", y="class")
plt.show()

In [332]:

sns.stripplot(data=tweet_data, x="neither", y="class")
plt.show()

Scatter plot are used to understand relationship between different variables. The first scatter plot is drawn between number of users reviewing hate speech and offensive language and the class assigned to them. The class is highlighted with a colour. The plot clearly shows that if number of users judging a tweet content to be both offensive and hate speech label is assigned according to the highest number of users and if the tweet is judged as neither hate speech nor offensive language neither i.e., 2 label is assigned.

In [333]:

# scatter plot hue parameter
sns.scatterplot(x = "hate_speech", y = "offensive_language", data = tweet_data, hue = "class")
plt.title("Scatter Plot for hate_speech vs offensive_language according to their class label")
plt.show()

In [334]:

tweet_data.groupby(['class']).count()#.apply(lambda x:100 * x / float(x.mean()))

Out[334]:

	count	hate_speech	offensive_language	neither	tweet
class
0	1430	1430	1430	1430	1430
1	19190	19190	19190	19190	19190
2	4163	4163	4163	4163	4163

In [335]:

len(tweet_data[(tweet_data['hate_speech']>0)&(tweet_data['class']==0)])/len(tweet_data)*100

Out[335]:

5.770084332001776

In [336]:

len(tweet_data[(tweet_data['hate_speech']>0)&(tweet_data['class']==1)])/len(tweet_data)*100

Out[336]:

13.359964491788725

In [337]:

len(tweet_data[(tweet_data['hate_speech']>0)&(tweet_data['class']==2)])/len(tweet_data)*100

Out[337]:

1.0168260501149982

In [338]:

len(tweet_data[(tweet_data['offensive_language']>0)&(tweet_data['class']==0)])/len(tweet_data)*100

Out[338]:

4.240810232820885

In [339]:

len(tweet_data[(tweet_data['offensive_language']>0)&(tweet_data['class']==1)])/len(tweet_data)*100

Out[339]:

77.43211072105879

In [340]:

len(tweet_data[(tweet_data['offensive_language']>0)&(tweet_data['class']==2)])/len(tweet_data)*100

Out[340]:

4.305370616955171

In [341]:

len(tweet_data[(tweet_data['neither']>0)&(tweet_data['class']==0)])/len(tweet_data)*100

Out[341]:

0.5124480490658919

In [342]:

len(tweet_data[(tweet_data['neither']>0)&(tweet_data['class']==1)])/len(tweet_data)*100

Out[342]:

6.460073437436953

In [343]:

len(tweet_data[(tweet_data['neither']>0)&(tweet_data['class']==2)])/len(tweet_data)*100

Out[343]:

16.797804946939436

In [344]:

tweet_data['class'].hist()

Out[344]:

<Axes: >

The data in the tweet column is processed using text pre-processing. First the column data type is converted to string type.

In [345]:

tweet_data["tweet"] = tweet_data["tweet"].astype(str)

Punctuation marks in text is unnecessary information that does not provide and meaning to the text for building model or corpus. Punctuation is removed from tweets text in the preprocess_tweet column. All the further pre-processing is stored in the preprocess_tweet column.

In [346]:

print(string.punctuation)
def remove_punctuation(tweet):
    punctuationfree="".join([i for i in tweet if i not in string.punctuation])
    return punctuationfree

tweet_data['preprocess_tweet']= tweet_data['tweet'].apply(lambda x:remove_punctuation(x))
tweet_data.head()

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

Out[346]:

	count	offensive_language	neither	class	tweet	preprocess_tweet
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	RT mayasolovely As a woman you shouldnt compl...
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	RT mleew17 boy dats coldtyga dwn bad for cuff...
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	RT UrKindOfBrand Dawg RT 80sbaby4life You eve...
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	RT CGAnderson vivabased she look like a tranny
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	RT ShenikaRoberts The shit you hear about me ...

Text conversion from uppercase to lowercase is required for standardization. The lower () method in python converts every uppercase letter to lowercase while the lowercase characters remain unchanged. The tweets text is converted into lower case.

In [347]:

tweet_data['preprocess_tweet']= tweet_data['preprocess_tweet'].apply(lambda x: x.lower())
tweet_data.head()

Out[347]:

	count	offensive_language	neither	class	tweet	preprocess_tweet
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	rt mayasolovely as a woman you shouldnt compl...
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	rt mleew17 boy dats coldtyga dwn bad for cuff...
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	rt urkindofbrand dawg rt 80sbaby4life you eve...
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	rt cganderson vivabased she look like a tranny
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	rt shenikaroberts the shit you hear about me ...

Tokenization is applied to each sentence where sentences are split into words. In this step stream of words are broken down into smaller chunks called tokens. This step is required because it helps in understanding vocabulary and lexicon of the text and allows better pattern analysis.

In [348]:

def tokenization(tweet):
    tokens = re.split('W+',tweet)
    return tokens

tweet_data['preprocess_tweet']= tweet_data['preprocess_tweet'].apply(lambda x: tokenization(x))
tweet_data.head()

Out[348]:

	count	offensive_language	neither	class	tweet	preprocess_tweet
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	[ rt mayasolovely as a woman you shouldnt comp...
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	[ rt mleew17 boy dats coldtyga dwn bad for cuf...
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	[ rt urkindofbrand dawg rt 80sbaby4life you ev...
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	[ rt cganderson vivabased she look like a tranny]
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	[ rt shenikaroberts the shit you hear about me...

Stop words are the common words that are frequently found in text but do not add significant meaning to the text and therefore interfere in NLP tasks. Stop words like ‘I’, ‘me’, ‘my’ etc are removed from the tweet text to focus on more meaningful words.

In [349]:

stopwords = nltk.corpus.stopwords.words('english')
print(stopwords[0:10])

def remove_stopwords(Tweet):
    output= [i for i in Tweet if i not in stopwords]
    return output

tweet_data['preprocess_tweet']= tweet_data['preprocess_tweet'].apply(lambda x:remove_stopwords(x))
tweet_data.head()

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]

Out[349]:

	count	offensive_language	neither	class	tweet	preprocess_tweet
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	[ rt mayasolovely as a woman you shouldnt comp...
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	[ rt mleew17 boy dats coldtyga dwn bad for cuf...
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	[ rt urkindofbrand dawg rt 80sbaby4life you ev...
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	[ rt cganderson vivabased she look like a tranny]
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	[ rt shenikaroberts the shit you hear about me...

Stemming is an important technique for text normalization and its processing. The process of stemming involves getting different morphological variations given a root word. Stemming is applied using PorterStemmer(). In the stemming process stem or root word of each word is extracted.

In [350]:

porter_stemmer = PorterStemmer()

def stemming(Tweet):
    stem_Tweet = [porter_stemmer.stem(word) for word in Tweet]
    return stem_Tweet

tweet_data['preprocess_tweet']=tweet_data['preprocess_tweet'].apply(lambda x: stemming(x))
tweet_data.head()

Out[350]:

	count	offensive_language	neither	class	tweet	preprocess_tweet
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	[ rt mayasolovely as a woman you shouldnt comp...
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	[ rt mleew17 boy dats coldtyga dwn bad for cuf...
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	[ rt urkindofbrand dawg rt 80sbaby4life you ev...
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	[ rt cganderson vivabased she look like a tranni]
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	[ rt shenikaroberts the shit you hear about me...

Lemmatization is applied using WordNetLemmatizer (). In the lemmatization process lemma of each word is extracted. The process of lemmatization is similar to stemming where a word is converted to its base form. The technique is used for inflection endings removal. Lemmatization returns the base or dictionary from of a word also known as lemma.

In [351]:

wordnet_lemmatizer = WordNetLemmatizer()

def lemmatizer(Tweet):
    lemm_Tweet = [wordnet_lemmatizer.lemmatize(word) for word in Tweet]
    return lemm_Tweet

tweet_data['preprocess_tweet']=tweet_data['preprocess_tweet'].apply(lambda x:lemmatizer(x))
tweet_data.head()

Out[351]:

	count	offensive_language	neither	class	tweet	preprocess_tweet
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	[ rt mayasolovely as a woman you shouldnt comp...
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	[ rt mleew17 boy dats coldtyga dwn bad for cuf...
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	[ rt urkindofbrand dawg rt 80sbaby4life you ev...
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	[ rt cganderson vivabased she look like a tranni]
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	[ rt shenikaroberts the shit you hear about me...

The processed words are then recombined into sentences using space.

In [352]:

def get_sentence(words):
    sentence = ' '.join(words)
    return sentence

tweet_data['preprocess_tweet']=tweet_data['preprocess_tweet'].apply(lambda x: get_sentence(x))
tweet_data.head()

Out[352]:

	count	offensive_language	neither	class	tweet	preprocess_tweet
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	rt mayasolovely as a woman you shouldnt compl...
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	rt mleew17 boy dats coldtyga dwn bad for cuff...
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	rt urkindofbrand dawg rt 80sbaby4life you eve...
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	rt cganderson vivabased she look like a tranni
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	rt shenikaroberts the shit you hear about me ...

Online content contains a lot of emoji or emoticons to represent emotions and feelings. But these emojis or emoticons interfere with the analysis process. Using the demoji library these emotion signals are removed from tweet sentences.

In [353]:

def remove_emoji(tweet):
    dem = demoji.findall(tweet)
    for item in dem.keys():
        tweet = tweet.replace(item, '')
    return tweet

tweet_data['preprocess_tweet']= tweet_data['preprocess_tweet'].apply(lambda x: remove_emoji(x))
tweet_data.head()

Out[353]:

	count	offensive_language	neither	class	tweet	preprocess_tweet
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	rt mayasolovely as a woman you shouldnt compl...
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	rt mleew17 boy dats coldtyga dwn bad for cuff...
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	rt urkindofbrand dawg rt 80sbaby4life you eve...
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	rt cganderson vivabased she look like a tranni
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	rt shenikaroberts the shit you hear about me ...

In [354]:

tweet=" ".join(i for i in tweet_data.preprocess_tweet)
stopwords=set(STOPWORDS)
wordcloud = WordCloud(width = 1000, height = 500,
                background_color ='white',
                stopwords = stopwords, max_words=100,
                min_font_size = 10).generate(tweet)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

In [355]:

tweet_data.head()

Out[355]:

	count	offensive_language	neither	class	tweet	preprocess_tweet
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	rt mayasolovely as a woman you shouldnt compl...
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	rt mleew17 boy dats coldtyga dwn bad for cuff...
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	rt urkindofbrand dawg rt 80sbaby4life you eve...
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	rt cganderson vivabased she look like a tranni
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	rt shenikaroberts the shit you hear about me ...

In [356]:

tweet_data_hate = tweet_data[tweet_data["class"]==0]
tweet_data_offensive = tweet_data[tweet_data["class"]==1]
tweet_data_neither = tweet_data[tweet_data["class"]==2]

In [357]:

print(tweet_data_hate.shape)

(1430, 7)

In [358]:

tweet=" ".join(i for i in tweet_data_hate.preprocess_tweet)
stopwords=set(STOPWORDS)
wordcloud = WordCloud(width = 1000, height = 500,
                background_color ='white',
                stopwords = stopwords, max_words=100,
                min_font_size = 10).generate(tweet)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

In [359]:

tweet=" ".join(i for i in tweet_data_offensive.preprocess_tweet)
stopwords=set(STOPWORDS)
wordcloud = WordCloud(width = 1000, height = 500,
                background_color ='white',
                stopwords = stopwords, max_words=100,
                min_font_size = 10).generate(tweet)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

In [360]:

tweet=" ".join(i for i in tweet_data_neither.preprocess_tweet)
stopwords=set(STOPWORDS)
wordcloud = WordCloud(width = 1000, height = 500,
                background_color ='white',
                stopwords = stopwords, max_words=100,
                min_font_size = 10).generate(tweet)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Sentiment polarity displays the user’s sentiment of a particular text or phrase in the range of -1 to 1. This is also known as sentiment score. Polarity of each tweet content is computed and represented in a histogram. The chart describes that the polarity score count is highest between 0 and 0.2.

In [361]:

def getPolarity(Tweet):
    return TextBlob(Tweet).sentiment.polarity

tweet_data['polarity']=tweet_data['preprocess_tweet'].apply(getPolarity)
tweet_data.sample(5)

Out[361]:

	count	hate_speech	offensive_language	class	tweet	preprocess_tweet	polarity
6158	3	0	3	1	@illest_will @djdynamiq @edrobersonsf @sarahli...	illestwill djdynamiq edrobersonsf sarahlizsf t...	0.5
17866	3	1	2	1	RT @TryHardAlby: @TryHardSilva @davidam_23 get...	rt tryhardalby tryhardsilva davidam23 get down...	-0.6
22429	3	0	3	1	W bitch	w bitch	0.0
13494	3	0	3	1	Now when I put this pussy on Vivian she bet no...	now when i put this pussy on vivian she bet no...	0.0
5393	3	1	2	1	@_cblaze @kieffer_jason ask your boy Jason kei...	cblaze kiefferjason ask your boy jason keiffer...	0.0

In [362]:

def getAnalysis(score):
    if score < 0:
        return 'Negative'
    elif score == 0:
        return 'Neutral'
    else:
        return 'Positive'
tweet_data['sentiment']=tweet_data['polarity'].apply(getAnalysis)
tweet_data.sample(5)

Out[362]:

	count	hate_speech	offensive_language	neither	class	tweet	preprocess_tweet	polarity	sentiment
3451	3	0	1	2	2	@IHateStevenSing\nI ain't to show bout dem col...	ihatestevensing\ni aint to show bout dem color...	0.000000	Neutral
4504	3	0	1	2	2	@RealSkipBayless man what about Wilson being t...	realskipbayless man what about wilson being th...	0.211111	Positive
21888	3	1	2	0	1	They told me to fuc wit bitches but never trus...	they told me to fuc wit bitches but never trus...	0.000000	Neutral
14925	3	0	3	0	1	RT @Dan_OSU_Hashtag: You ever look at a bitch ...	rt danosuhashtag you ever look at a bitch and ...	0.000000	Neutral
4395	3	0	1	2	2	@Paulyy2 nah some honkey lookin dude at chr	paulyy2 nah some honkey lookin dude at chr	0.000000	Neutral

In [363]:

sns.set(rc={'figure.figsize':(5,5)})
tweet_data['polarity'].hist()
plt.show()

In [364]:

# scatter plot hue parameter
sns.scatterplot(x = "hate_speech", y = "polarity", data = tweet_data, hue = "class")
plt.show()

The relationship between number of users judging a tweet to be offensive language with sentiment polarity is studied using a scatter plot while highlighting the label assigned to that tweet. The plot displays that sentiment polarity is not defining a tweet to be offensive language. Tweets with 1 class label contains all polarity scores.

In [365]:

# scatter plot hue parameter
sns.scatterplot(x = "offensive_language", y = "polarity", data = tweet_data, hue = "class")
plt.show()

In [366]:

# scatter plot hue parameter
sns.scatterplot(x = "neither", y = "polarity", data = tweet_data, hue = "class")
plt.show()

Sentiment polarity score is dived in three distributions where 0 denotes neutral sentiment positive value shows positive sentiment and negative value shows negative sentiment. Labels are assigned to each tweet content according to the score obtained and plotted using a histogram. The chart clearly describes that the number of neutral sentiments obtained are the highest.

In [367]:

tweet_data_negative = tweet_data[tweet_data["sentiment"]=='Negative']
tweet_data_positive = tweet_data[tweet_data["sentiment"]=='Positive']
tweet_data_neutral = tweet_data[tweet_data["sentiment"]=='Neutral']

In [368]:

def count_values_in_column(data,feature):
    total=data.loc[:,feature].value_counts(dropna=False)
    percentage=round(data.loc[:,feature].value_counts(dropna=False,normalize=True)*100,2)
    return pd.concat([total,percentage],axis=1,keys=["Total","Percentage"])
count_values_in_column(tweet_data,"sentiment")

Out[368]:

	Total	Percentage
sentiment
Neutral	10254	41.38
Negative	7271	29.34
Positive	7258	29.29

In [369]:

plt.figure(figsize=(13, 8), dpi=80)
pichart = count_values_in_column(tweet_data,"sentiment")
names= ["Positive","Neutral","Negative"]
size=pichart["Percentage"]
 
# Create a circle for the center of the plot
my_circle=plt.Circle( (0,0), 0.5, color='white')
plt.pie(size, labels=names, colors=['green','blue','red'])
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.show()

In [370]:

sns.countplot(data=tweet_data, x="sentiment")
plt.show()

Three separate data subsets are obtained for each type of sentiment. The top 1000 words in negative sentiment are represented using the word cloud.

In [371]:

data_neg = tweet_data_negative['preprocess_tweet']
plt.figure(figsize = (20,20))
wc = WordCloud(max_words = 1000 , width = 1000 , height = 500,
               collocations=False).generate(" ".join(data_neg))
plt.imshow(wc)
plt.show()

The top 1000 words in positive sentiment are represented using the word cloud.

In [372]:

data_pos = tweet_data_positive['preprocess_tweet']
plt.figure(figsize = (20,20))
wc = WordCloud(max_words = 1000 , width = 1000 , height = 500,
               collocations=False).generate(" ".join(data_pos))
plt.imshow(wc)
plt.show()

Subjectivity refers to the degree the textual content in influenced by a user personal feelings and beliefs. Its value ranges from 0 to 1 where 0 denotes no subjectivity and 1 shows high subjectivity. Sentiment subjectivity is also computed for each tweet content and represented using a histogram. The plot shows that sentiment with no subjectivity is the highest.

In [373]:

def getSubjectivity(Tweet):
    return TextBlob(Tweet).sentiment.subjectivity

In [374]:

tweet_data['subjectivity']=tweet_data['preprocess_tweet'].apply(getSubjectivity)
tweet_data.head()

Out[374]:

	count	offensive_language	neither	class	tweet	preprocess_tweet	polarity	sentiment	subjectivity
0	3	0	3	2	!!! RT @mayasolovely: As a woman you shouldn't...	rt mayasolovely as a woman you shouldnt compl...	0.000000	Neutral	0.000000
1	3	3	0	1	!!!!! RT @mleew17: boy dats cold...tyga dwn ba...	rt mleew17 boy dats coldtyga dwn bad for cuff...	-0.700000	Negative	0.666667
2	3	3	0	1	!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...	rt urkindofbrand dawg rt 80sbaby4life you eve...	-0.333333	Negative	0.700000
3	3	2	1	1	!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...	rt cganderson vivabased she look like a tranni	0.000000	Neutral	0.000000
4	6	6	0	1	!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...	rt shenikaroberts the shit you hear about me ...	0.075000	Positive	0.725000

In [375]:

sns.set(rc={'figure.figsize':(5,5)})
tweet_data['subjectivity'].hist()
plt.show()

In [376]:

# scatter plot hue parameter
sns.scatterplot(x = "hate_speech", y = "subjectivity", data = tweet_data, hue = "class")
plt.show()

The relationship between number of users judging a tweet to be offensive with sentiment subjectivity is studied using a scatter plot while highlighting the label assigned to that tweet. The plot shows that a tweet's content does not have to be subjective in order for it to contain offensive language. Tweets that have been assigned label 1 include all subjectivity ratings.

In [377]:

# scatter plot hue parameter
sns.scatterplot(x = "offensive_language", y = "subjectivity", data = tweet_data, hue = "class")
plt.show()

In [378]:

# scatter plot hue parameter
sns.scatterplot(x = "neither", y = "subjectivity", data = tweet_data, hue = "class")
plt.show()

In [379]:

tweet_data.to_excel('preprcessed_labeled_data.xlsx',index=False)

In [380]:

tweet_data = pd.read_excel('preprcessed_labeled_data.xlsx')
print(tweet_data.shape)

(24783, 10)

In [381]:

tweet_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24783 entries, 0 to 24782
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   count               24783 non-null  int64  
 1   hate_speech         24783 non-null  int64  
 2   offensive_language  24783 non-null  int64  
 3   neither             24783 non-null  int64  
 4   class               24783 non-null  int64  
 5   tweet               24782 non-null  object 
 6   preprocess_tweet    24783 non-null  object 
 7   polarity            24783 non-null  float64
 8   sentiment           24783 non-null  object 
 9   subjectivity        24783 non-null  float64
dtypes: float64(2), int64(5), object(3)
memory usage: 1.9+ MB

In [418]:

numeric_df = tweet_data.select_dtypes(include=['number'])

In [420]:

# calculate the correlation matrix
corr = numeric_df.corr()

# plot the heatmap
sns.heatmap(corr, 
        xticklabels=corr.columns,
        yticklabels=corr.columns)
plt.show()

Machine Learning¶

In [422]:

X=tweet_data['preprocess_tweet']
Y=tweet_data['class']

The input is the 70% data split for training. After model training, models will be evaluated using 30% data.

In [423]:

X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)

The number of records in X_train is 17348, the number of records in X_test is 7435.

In [424]:

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(17348,)
(7435,)
(17348,)
(7435,)

Count Vectorizer¶

Count vectorizer produced 32461 features for 17348 records.

In [425]:

vectoriser = CountVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))

Vectoriser fitted.
No. of feature_words:  32461

In [426]:

X_train = vectoriser.transform(X_train)
X_test  = vectoriser.transform(X_test)
print(f'Data Transformed.')

Data Transformed.

MultinomialNB¶

In [427]:

clf = MultinomialNB()
clf.fit(X_train, y_train)

Out[427]:

MultinomialNB()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [428]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

84.76126429051783

In [429]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8476126429051782
              precision    recall  f1-score   support

           0       0.50      0.02      0.04       427
           1       0.84      0.99      0.91      5747
           2       0.90      0.47      0.62      1261

    accuracy                           0.85      7435
   macro avg       0.75      0.49      0.52      7435
weighted avg       0.83      0.85      0.81      7435

In [430]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[   9  395   23]
 [   8 5699   40]
 [   1  666  594]]

TfidfVectorizer¶

In [431]:

X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)

TF-IDF vectorizer produced 32461 features for 17348 records.

In [432]:

vectoriser = TfidfVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))

Vectoriser fitted.
No. of feature_words:  32461

In [433]:

X_train = vectoriser.transform(X_train)
X_test  = vectoriser.transform(X_test)
print(f'Data Transformed.')

Data Transformed.

MultinomialNB¶

In [434]:

clf = MultinomialNB()
clf.fit(X_train, y_train)

Out[434]:

MultinomialNB()

In [435]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

78.37256220578345

In [436]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.7837256220578346
              precision    recall  f1-score   support

           0       0.00      0.00      0.00       427
           1       0.78      1.00      0.88      5747
           2       0.98      0.07      0.12      1261

    accuracy                           0.78      7435
   macro avg       0.59      0.35      0.33      7435
weighted avg       0.77      0.78      0.70      7435

In [437]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[   0  427    0]
 [   0 5745    2]
 [   0 1179   82]]

In [438]:

X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)

In [439]:

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(17348,)
(7435,)
(17348,)
(7435,)

Count Vectorizer¶

In [440]:

vectoriser = CountVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))

Vectoriser fitted.
No. of feature_words:  32461

In [441]:

X_train = vectoriser.transform(X_train)
X_test  = vectoriser.transform(X_test)
print(f'Data Transformed.')

Data Transformed.

In [442]:

clf = DecisionTreeClassifier(random_state=0)
clf.fit(X_train, y_train)

Out[442]:

DecisionTreeClassifier(random_state=0)

In [443]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

87.6126429051782

In [444]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8761264290517821
              precision    recall  f1-score   support

           0       0.33      0.21      0.26       427
           1       0.92      0.93      0.93      5747
           2       0.79      0.85      0.82      1261

    accuracy                           0.88      7435
   macro avg       0.68      0.66      0.67      7435
weighted avg       0.87      0.88      0.87      7435

In [445]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[  89  286   52]
 [ 158 5356  233]
 [  20  172 1069]]

In [446]:

X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)

In [447]:

vectoriser = TfidfVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))

Vectoriser fitted.
No. of feature_words:  32461

In [448]:

X_train = vectoriser.transform(X_train)
X_test  = vectoriser.transform(X_test)
print(f'Data Transformed.')

Data Transformed.

In [449]:

clf = DecisionTreeClassifier(random_state=0)
clf.fit(X_train, y_train)

Out[449]:

DecisionTreeClassifier(random_state=0)

In [450]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

86.80564895763283

In [451]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8680564895763282
              precision    recall  f1-score   support

           0       0.35      0.30      0.32       427
           1       0.92      0.92      0.92      5747
           2       0.78      0.80      0.79      1261

    accuracy                           0.87      7435
   macro avg       0.68      0.68      0.68      7435
weighted avg       0.86      0.87      0.87      7435

In [452]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[ 127  255   45]
 [ 191 5313  243]
 [  41  206 1014]]

Fine-Tuning¶

In [454]:

X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.3, random_state = 0)

In [455]:

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(17348,)
(7435,)
(17348,)
(7435,)

Count Vectorizer¶

In [456]:

vectoriser = CountVectorizer()
vectoriser.fit(X_train)
print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectoriser.get_feature_names_out()))

Vectoriser fitted.
No. of feature_words:  32461

In [457]:

X_train = vectoriser.transform(X_train)
X_test  = vectoriser.transform(X_test)
print(f'Data Transformed.')

Data Transformed.

In [458]:

clf = MultinomialNB(alpha=1.0, fit_prior=False, class_prior=None)
clf.fit(X_train, y_train)

Out[458]:

MultinomialNB(fit_prior=False)

In [459]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

86.21385339609952

In [460]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8621385339609953
              precision    recall  f1-score   support

           0       0.41      0.12      0.19       427
           1       0.87      0.97      0.92      5747
           2       0.85      0.61      0.71      1261

    accuracy                           0.86      7435
   macro avg       0.71      0.57      0.61      7435
weighted avg       0.84      0.86      0.84      7435

In [461]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[  53  336   38]
 [  62 5589   96]
 [  13  480  768]]

DecisionTreeClassifier¶

In [463]:

clf = DecisionTreeClassifier(criterion='gini',splitter='best',max_features=None, random_state=0)
clf.fit(X_train, y_train)

Out[463]:

DecisionTreeClassifier(random_state=0)

In [464]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

87.6126429051782

In [465]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8761264290517821
              precision    recall  f1-score   support

           0       0.33      0.21      0.26       427
           1       0.92      0.93      0.93      5747
           2       0.79      0.85      0.82      1261

    accuracy                           0.88      7435
   macro avg       0.68      0.66      0.67      7435
weighted avg       0.87      0.88      0.87      7435

In [466]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[  89  286   52]
 [ 158 5356  233]
 [  20  172 1069]]

Hyper parameter fine-tuning¶

In [467]:

from sklearn.model_selection import GridSearchCV

max_depth¶

This parameter selects the height of the tree at its tallest. This parameter takes integer value and the default value is None. If None, nodes are expanded either until all leaves are pure or until all leaves contain fewer samples than min_samples_split. With the default None max_depth the tree depth becomes 253. With GridSearchCV we used to find the best parameter value from the range of 1 to 100. And the best results were obtained using 65 as max_depth.

In [469]:

gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=None, random_state=0),
             param_grid={'max_depth': list(range(2, 100))},
             verbose=3)

In [470]:

gs.fit(X_train, y_train)

Fitting 5 folds for each of 98 candidates, totalling 490 fits
[CV 1/5] END .......................max_depth=2;, score=0.775 total time=   0.4s
[CV 2/5] END .......................max_depth=2;, score=0.775 total time=   0.4s
[CV 3/5] END .......................max_depth=2;, score=0.775 total time=   0.4s
[CV 4/5] END .......................max_depth=2;, score=0.775 total time=   0.4s
[CV 5/5] END .......................max_depth=2;, score=0.775 total time=   0.3s
[CV 1/5] END .......................max_depth=3;, score=0.775 total time=   0.5s
[CV 2/5] END .......................max_depth=3;, score=0.774 total time=   0.6s
[CV 3/5] END .......................max_depth=3;, score=0.775 total time=   0.4s
[CV 4/5] END .......................max_depth=3;, score=0.775 total time=   0.4s
[CV 5/5] END .......................max_depth=3;, score=0.775 total time=   0.4s
[CV 1/5] END .......................max_depth=4;, score=0.775 total time=   0.4s
[CV 2/5] END .......................max_depth=4;, score=0.774 total time=   0.4s
[CV 3/5] END .......................max_depth=4;, score=0.775 total time=   0.4s
[CV 4/5] END .......................max_depth=4;, score=0.775 total time=   0.4s
[CV 5/5] END .......................max_depth=4;, score=0.775 total time=   0.4s
[CV 1/5] END .......................max_depth=5;, score=0.775 total time=   0.4s
[CV 2/5] END .......................max_depth=5;, score=0.774 total time=   0.4s
[CV 3/5] END .......................max_depth=5;, score=0.774 total time=   0.4s
[CV 4/5] END .......................max_depth=5;, score=0.774 total time=   0.4s
[CV 5/5] END .......................max_depth=5;, score=0.775 total time=   0.4s
[CV 1/5] END .......................max_depth=6;, score=0.777 total time=   0.5s
[CV 2/5] END .......................max_depth=6;, score=0.771 total time=   0.5s
[CV 3/5] END .......................max_depth=6;, score=0.774 total time=   0.5s
[CV 4/5] END .......................max_depth=6;, score=0.772 total time=   0.5s
[CV 5/5] END .......................max_depth=6;, score=0.774 total time=   0.5s
[CV 1/5] END .......................max_depth=7;, score=0.788 total time=   0.5s
[CV 2/5] END .......................max_depth=7;, score=0.784 total time=   0.5s
[CV 3/5] END .......................max_depth=7;, score=0.796 total time=   0.5s
[CV 4/5] END .......................max_depth=7;, score=0.788 total time=   0.5s
[CV 5/5] END .......................max_depth=7;, score=0.790 total time=   0.5s
[CV 1/5] END .......................max_depth=8;, score=0.800 total time=   0.5s
[CV 2/5] END .......................max_depth=8;, score=0.793 total time=   0.5s
[CV 3/5] END .......................max_depth=8;, score=0.806 total time=   0.6s
[CV 4/5] END .......................max_depth=8;, score=0.798 total time=   0.7s
[CV 5/5] END .......................max_depth=8;, score=0.794 total time=   0.8s
[CV 1/5] END .......................max_depth=9;, score=0.809 total time=   0.6s
[CV 2/5] END .......................max_depth=9;, score=0.804 total time=   0.6s
[CV 3/5] END .......................max_depth=9;, score=0.818 total time=   0.6s
[CV 4/5] END .......................max_depth=9;, score=0.809 total time=   0.6s
[CV 5/5] END .......................max_depth=9;, score=0.804 total time=   0.7s
[CV 1/5] END ......................max_depth=10;, score=0.817 total time=   0.9s
[CV 2/5] END ......................max_depth=10;, score=0.814 total time=   0.8s
[CV 3/5] END ......................max_depth=10;, score=0.825 total time=   0.7s
[CV 4/5] END ......................max_depth=10;, score=0.820 total time=   0.7s
[CV 5/5] END ......................max_depth=10;, score=0.814 total time=   0.7s
[CV 1/5] END ......................max_depth=11;, score=0.822 total time=   0.6s
[CV 2/5] END ......................max_depth=11;, score=0.819 total time=   0.6s
[CV 3/5] END ......................max_depth=11;, score=0.831 total time=   0.6s
[CV 4/5] END ......................max_depth=11;, score=0.828 total time=   0.6s
[CV 5/5] END ......................max_depth=11;, score=0.821 total time=   0.6s
[CV 1/5] END ......................max_depth=12;, score=0.829 total time=   0.7s
[CV 2/5] END ......................max_depth=12;, score=0.826 total time=   0.7s
[CV 3/5] END ......................max_depth=12;, score=0.837 total time=   0.7s
[CV 4/5] END ......................max_depth=12;, score=0.837 total time=   0.7s
[CV 5/5] END ......................max_depth=12;, score=0.827 total time=   1.0s
[CV 1/5] END ......................max_depth=13;, score=0.837 total time=   0.8s
[CV 2/5] END ......................max_depth=13;, score=0.832 total time=   0.7s
[CV 3/5] END ......................max_depth=13;, score=0.842 total time=   0.7s
[CV 4/5] END ......................max_depth=13;, score=0.843 total time=   0.7s
[CV 5/5] END ......................max_depth=13;, score=0.839 total time=   0.7s
[CV 1/5] END ......................max_depth=14;, score=0.842 total time=   0.7s
[CV 2/5] END ......................max_depth=14;, score=0.835 total time=   0.7s
[CV 3/5] END ......................max_depth=14;, score=0.846 total time=   0.7s
[CV 4/5] END ......................max_depth=14;, score=0.847 total time=   0.7s
[CV 5/5] END ......................max_depth=14;, score=0.842 total time=   0.7s
[CV 1/5] END ......................max_depth=15;, score=0.845 total time=   0.8s
[CV 2/5] END ......................max_depth=15;, score=0.841 total time=   0.8s
[CV 3/5] END ......................max_depth=15;, score=0.848 total time=   0.8s
[CV 4/5] END ......................max_depth=15;, score=0.848 total time=   0.8s
[CV 5/5] END ......................max_depth=15;, score=0.847 total time=   0.7s
[CV 1/5] END ......................max_depth=16;, score=0.847 total time=   0.8s
[CV 2/5] END ......................max_depth=16;, score=0.845 total time=   0.8s
[CV 3/5] END ......................max_depth=16;, score=0.852 total time=   1.0s
[CV 4/5] END ......................max_depth=16;, score=0.850 total time=   1.0s
[CV 5/5] END ......................max_depth=16;, score=0.850 total time=   0.9s
[CV 1/5] END ......................max_depth=17;, score=0.851 total time=   1.0s
[CV 2/5] END ......................max_depth=17;, score=0.846 total time=   1.2s
[CV 3/5] END ......................max_depth=17;, score=0.852 total time=   0.9s
[CV 4/5] END ......................max_depth=17;, score=0.854 total time=   1.0s
[CV 5/5] END ......................max_depth=17;, score=0.853 total time=   0.9s
[CV 1/5] END ......................max_depth=18;, score=0.852 total time=   1.2s
[CV 2/5] END ......................max_depth=18;, score=0.848 total time=   1.2s
[CV 3/5] END ......................max_depth=18;, score=0.857 total time=   1.0s
[CV 4/5] END ......................max_depth=18;, score=0.853 total time=   1.0s
[CV 5/5] END ......................max_depth=18;, score=0.854 total time=   1.1s
[CV 1/5] END ......................max_depth=19;, score=0.858 total time=   1.3s
[CV 2/5] END ......................max_depth=19;, score=0.850 total time=   1.3s
[CV 3/5] END ......................max_depth=19;, score=0.859 total time=   1.4s
[CV 4/5] END ......................max_depth=19;, score=0.856 total time=   1.0s
[CV 5/5] END ......................max_depth=19;, score=0.858 total time=   0.9s
[CV 1/5] END ......................max_depth=20;, score=0.858 total time=   1.0s
[CV 2/5] END ......................max_depth=20;, score=0.852 total time=   0.9s
[CV 3/5] END ......................max_depth=20;, score=0.860 total time=   0.9s
[CV 4/5] END ......................max_depth=20;, score=0.860 total time=   1.0s
[CV 5/5] END ......................max_depth=20;, score=0.860 total time=   1.1s
[CV 1/5] END ......................max_depth=21;, score=0.862 total time=   1.0s
[CV 2/5] END ......................max_depth=21;, score=0.856 total time=   1.0s
[CV 3/5] END ......................max_depth=21;, score=0.863 total time=   1.0s
[CV 4/5] END ......................max_depth=21;, score=0.860 total time=   1.1s
[CV 5/5] END ......................max_depth=21;, score=0.865 total time=   1.3s
[CV 1/5] END ......................max_depth=22;, score=0.867 total time=   1.5s
[CV 2/5] END ......................max_depth=22;, score=0.858 total time=   1.3s
[CV 3/5] END ......................max_depth=22;, score=0.865 total time=   1.1s
[CV 4/5] END ......................max_depth=22;, score=0.865 total time=   1.2s
[CV 5/5] END ......................max_depth=22;, score=0.866 total time=   1.5s
[CV 1/5] END ......................max_depth=23;, score=0.868 total time=   1.2s
[CV 2/5] END ......................max_depth=23;, score=0.859 total time=   1.4s
[CV 3/5] END ......................max_depth=23;, score=0.865 total time=   1.4s
[CV 4/5] END ......................max_depth=23;, score=0.867 total time=   1.4s
[CV 5/5] END ......................max_depth=23;, score=0.869 total time=   1.4s
[CV 1/5] END ......................max_depth=24;, score=0.867 total time=   1.8s
[CV 2/5] END ......................max_depth=24;, score=0.863 total time=   1.9s
[CV 3/5] END ......................max_depth=24;, score=0.869 total time=   1.3s
[CV 4/5] END ......................max_depth=24;, score=0.865 total time=   1.5s
[CV 5/5] END ......................max_depth=24;, score=0.871 total time=   1.3s
[CV 1/5] END ......................max_depth=25;, score=0.870 total time=   1.4s
[CV 2/5] END ......................max_depth=25;, score=0.866 total time=   1.3s
[CV 3/5] END ......................max_depth=25;, score=0.867 total time=   1.4s
[CV 4/5] END ......................max_depth=25;, score=0.867 total time=   1.4s
[CV 5/5] END ......................max_depth=25;, score=0.876 total time=   1.6s
[CV 1/5] END ......................max_depth=26;, score=0.869 total time=   1.6s
[CV 2/5] END ......................max_depth=26;, score=0.867 total time=   1.9s
[CV 3/5] END ......................max_depth=26;, score=0.871 total time=   1.3s
[CV 4/5] END ......................max_depth=26;, score=0.874 total time=   1.4s
[CV 5/5] END ......................max_depth=26;, score=0.876 total time=   1.2s
[CV 1/5] END ......................max_depth=27;, score=0.876 total time=   1.4s
[CV 2/5] END ......................max_depth=27;, score=0.868 total time=   1.4s
[CV 3/5] END ......................max_depth=27;, score=0.869 total time=   1.4s
[CV 4/5] END ......................max_depth=27;, score=0.874 total time=   1.5s
[CV 5/5] END ......................max_depth=27;, score=0.879 total time=   1.8s
[CV 1/5] END ......................max_depth=28;, score=0.877 total time=   1.4s
[CV 2/5] END ......................max_depth=28;, score=0.869 total time=   1.9s
[CV 3/5] END ......................max_depth=28;, score=0.874 total time=   1.3s
[CV 4/5] END ......................max_depth=28;, score=0.879 total time=   1.3s
[CV 5/5] END ......................max_depth=28;, score=0.879 total time=   1.3s
[CV 1/5] END ......................max_depth=29;, score=0.878 total time=   1.4s
[CV 2/5] END ......................max_depth=29;, score=0.869 total time=   1.7s
[CV 3/5] END ......................max_depth=29;, score=0.875 total time=   1.5s
[CV 4/5] END ......................max_depth=29;, score=0.878 total time=   1.4s
[CV 5/5] END ......................max_depth=29;, score=0.882 total time=   1.4s
[CV 1/5] END ......................max_depth=30;, score=0.878 total time=   1.8s
[CV 2/5] END ......................max_depth=30;, score=0.873 total time=   1.9s
[CV 3/5] END ......................max_depth=30;, score=0.873 total time=   1.8s
[CV 4/5] END ......................max_depth=30;, score=0.878 total time=   1.8s
[CV 5/5] END ......................max_depth=30;, score=0.880 total time=   1.8s
[CV 1/5] END ......................max_depth=31;, score=0.877 total time=   1.8s
[CV 2/5] END ......................max_depth=31;, score=0.875 total time=   1.8s
[CV 3/5] END ......................max_depth=31;, score=0.873 total time=   1.9s
[CV 4/5] END ......................max_depth=31;, score=0.879 total time=   2.0s
[CV 5/5] END ......................max_depth=31;, score=0.883 total time=   2.1s
[CV 1/5] END ......................max_depth=32;, score=0.878 total time=   1.9s
[CV 2/5] END ......................max_depth=32;, score=0.873 total time=   1.5s
[CV 3/5] END ......................max_depth=32;, score=0.875 total time=   1.6s
[CV 4/5] END ......................max_depth=32;, score=0.876 total time=   1.6s
[CV 5/5] END ......................max_depth=32;, score=0.882 total time=   1.4s
[CV 1/5] END ......................max_depth=33;, score=0.881 total time=   1.5s
[CV 2/5] END ......................max_depth=33;, score=0.875 total time=   1.8s
[CV 3/5] END ......................max_depth=33;, score=0.874 total time=   1.5s
[CV 4/5] END ......................max_depth=33;, score=0.877 total time=   1.9s
[CV 5/5] END ......................max_depth=33;, score=0.883 total time=   1.5s
[CV 1/5] END ......................max_depth=34;, score=0.880 total time=   1.5s
[CV 2/5] END ......................max_depth=34;, score=0.873 total time=   1.5s
[CV 3/5] END ......................max_depth=34;, score=0.873 total time=   1.6s
[CV 4/5] END ......................max_depth=34;, score=0.879 total time=   1.6s
[CV 5/5] END ......................max_depth=34;, score=0.883 total time=   1.6s
[CV 1/5] END ......................max_depth=35;, score=0.877 total time=   1.5s
[CV 2/5] END ......................max_depth=35;, score=0.875 total time=   1.5s
[CV 3/5] END ......................max_depth=35;, score=0.874 total time=   1.4s
[CV 4/5] END ......................max_depth=35;, score=0.881 total time=   1.8s
[CV 5/5] END ......................max_depth=35;, score=0.884 total time=   1.6s
[CV 1/5] END ......................max_depth=36;, score=0.878 total time=   1.4s
[CV 2/5] END ......................max_depth=36;, score=0.873 total time=   2.0s
[CV 3/5] END ......................max_depth=36;, score=0.874 total time=   1.6s
[CV 4/5] END ......................max_depth=36;, score=0.880 total time=   2.0s
[CV 5/5] END ......................max_depth=36;, score=0.883 total time=   1.7s
[CV 1/5] END ......................max_depth=37;, score=0.878 total time=   1.7s
[CV 2/5] END ......................max_depth=37;, score=0.871 total time=   1.4s
[CV 3/5] END ......................max_depth=37;, score=0.876 total time=   1.9s
[CV 4/5] END ......................max_depth=37;, score=0.883 total time=   1.5s
[CV 5/5] END ......................max_depth=37;, score=0.885 total time=   1.5s
[CV 1/5] END ......................max_depth=38;, score=0.879 total time=   1.6s
[CV 2/5] END ......................max_depth=38;, score=0.871 total time=   1.5s
[CV 3/5] END ......................max_depth=38;, score=0.876 total time=   1.6s
[CV 4/5] END ......................max_depth=38;, score=0.882 total time=   1.7s
[CV 5/5] END ......................max_depth=38;, score=0.880 total time=   1.6s
[CV 1/5] END ......................max_depth=39;, score=0.876 total time=   1.8s
[CV 2/5] END ......................max_depth=39;, score=0.875 total time=   2.1s
[CV 3/5] END ......................max_depth=39;, score=0.875 total time=   1.8s
[CV 4/5] END ......................max_depth=39;, score=0.885 total time=   1.8s
[CV 5/5] END ......................max_depth=39;, score=0.881 total time=   2.1s
[CV 1/5] END ......................max_depth=40;, score=0.878 total time=   1.8s
[CV 2/5] END ......................max_depth=40;, score=0.871 total time=   1.6s
[CV 3/5] END ......................max_depth=40;, score=0.876 total time=   1.7s
[CV 4/5] END ......................max_depth=40;, score=0.886 total time=   1.8s
[CV 5/5] END ......................max_depth=40;, score=0.880 total time=   2.0s
[CV 1/5] END ......................max_depth=41;, score=0.880 total time=   1.9s
[CV 2/5] END ......................max_depth=41;, score=0.873 total time=   1.9s
[CV 3/5] END ......................max_depth=41;, score=0.877 total time=   2.1s
[CV 4/5] END ......................max_depth=41;, score=0.885 total time=   1.9s
[CV 5/5] END ......................max_depth=41;, score=0.884 total time=   2.0s
[CV 1/5] END ......................max_depth=42;, score=0.876 total time=   1.7s
[CV 2/5] END ......................max_depth=42;, score=0.871 total time=   1.9s
[CV 3/5] END ......................max_depth=42;, score=0.876 total time=   2.5s
[CV 4/5] END ......................max_depth=42;, score=0.879 total time=   2.0s
[CV 5/5] END ......................max_depth=42;, score=0.885 total time=   1.8s
[CV 1/5] END ......................max_depth=43;, score=0.878 total time=   1.6s
[CV 2/5] END ......................max_depth=43;, score=0.873 total time=   1.6s
[CV 3/5] END ......................max_depth=43;, score=0.878 total time=   1.6s
[CV 4/5] END ......................max_depth=43;, score=0.884 total time=   1.6s
[CV 5/5] END ......................max_depth=43;, score=0.885 total time=   1.9s
[CV 1/5] END ......................max_depth=44;, score=0.879 total time=   1.9s
[CV 2/5] END ......................max_depth=44;, score=0.875 total time=   1.9s
[CV 3/5] END ......................max_depth=44;, score=0.874 total time=   1.6s
[CV 4/5] END ......................max_depth=44;, score=0.880 total time=   1.7s
[CV 5/5] END ......................max_depth=44;, score=0.884 total time=   1.9s
[CV 1/5] END ......................max_depth=45;, score=0.881 total time=   1.7s
[CV 2/5] END ......................max_depth=45;, score=0.871 total time=   1.9s
[CV 3/5] END ......................max_depth=45;, score=0.877 total time=   1.7s
[CV 4/5] END ......................max_depth=45;, score=0.881 total time=   1.8s
[CV 5/5] END ......................max_depth=45;, score=0.884 total time=   2.0s
[CV 1/5] END ......................max_depth=46;, score=0.880 total time=   1.8s
[CV 2/5] END ......................max_depth=46;, score=0.873 total time=   1.6s
[CV 3/5] END ......................max_depth=46;, score=0.877 total time=   1.7s
[CV 4/5] END ......................max_depth=46;, score=0.882 total time=   1.8s
[CV 5/5] END ......................max_depth=46;, score=0.883 total time=   1.7s
[CV 1/5] END ......................max_depth=47;, score=0.877 total time=   1.8s
[CV 2/5] END ......................max_depth=47;, score=0.875 total time=   1.8s
[CV 3/5] END ......................max_depth=47;, score=0.878 total time=   2.2s
[CV 4/5] END ......................max_depth=47;, score=0.883 total time=   1.8s
[CV 5/5] END ......................max_depth=47;, score=0.885 total time=   1.8s
[CV 1/5] END ......................max_depth=48;, score=0.881 total time=   2.0s
[CV 2/5] END ......................max_depth=48;, score=0.873 total time=   1.8s
[CV 3/5] END ......................max_depth=48;, score=0.873 total time=   1.8s
[CV 4/5] END ......................max_depth=48;, score=0.882 total time=   1.9s
[CV 5/5] END ......................max_depth=48;, score=0.883 total time=   1.9s
[CV 1/5] END ......................max_depth=49;, score=0.879 total time=   1.9s
[CV 2/5] END ......................max_depth=49;, score=0.871 total time=   1.9s
[CV 3/5] END ......................max_depth=49;, score=0.873 total time=   1.7s
[CV 4/5] END ......................max_depth=49;, score=0.885 total time=   1.8s
[CV 5/5] END ......................max_depth=49;, score=0.885 total time=   2.0s
[CV 1/5] END ......................max_depth=50;, score=0.881 total time=   1.7s
[CV 2/5] END ......................max_depth=50;, score=0.872 total time=   1.7s
[CV 3/5] END ......................max_depth=50;, score=0.877 total time=   1.8s
[CV 4/5] END ......................max_depth=50;, score=0.887 total time=   1.8s
[CV 5/5] END ......................max_depth=50;, score=0.884 total time=   2.0s
[CV 1/5] END ......................max_depth=51;, score=0.880 total time=   2.0s
[CV 2/5] END ......................max_depth=51;, score=0.871 total time=   1.8s
[CV 3/5] END ......................max_depth=51;, score=0.875 total time=   1.9s
[CV 4/5] END ......................max_depth=51;, score=0.883 total time=   1.8s
[CV 5/5] END ......................max_depth=51;, score=0.885 total time=   1.8s
[CV 1/5] END ......................max_depth=52;, score=0.881 total time=   1.9s
[CV 2/5] END ......................max_depth=52;, score=0.873 total time=   2.2s
[CV 3/5] END ......................max_depth=52;, score=0.876 total time=   2.6s
[CV 4/5] END ......................max_depth=52;, score=0.882 total time=   2.3s
[CV 5/5] END ......................max_depth=52;, score=0.885 total time=   2.1s
[CV 1/5] END ......................max_depth=53;, score=0.881 total time=   1.9s
[CV 2/5] END ......................max_depth=53;, score=0.874 total time=   2.0s
[CV 3/5] END ......................max_depth=53;, score=0.878 total time=   1.8s
[CV 4/5] END ......................max_depth=53;, score=0.881 total time=   1.8s
[CV 5/5] END ......................max_depth=53;, score=0.887 total time=   2.4s
[CV 1/5] END ......................max_depth=54;, score=0.880 total time=   2.1s
[CV 2/5] END ......................max_depth=54;, score=0.874 total time=   2.5s
[CV 3/5] END ......................max_depth=54;, score=0.877 total time=   2.1s
[CV 4/5] END ......................max_depth=54;, score=0.883 total time=   1.9s
[CV 5/5] END ......................max_depth=54;, score=0.884 total time=   1.9s
[CV 1/5] END ......................max_depth=55;, score=0.882 total time=   1.8s
[CV 2/5] END ......................max_depth=55;, score=0.875 total time=   1.7s
[CV 3/5] END ......................max_depth=55;, score=0.872 total time=   2.2s
[CV 4/5] END ......................max_depth=55;, score=0.881 total time=   1.8s
[CV 5/5] END ......................max_depth=55;, score=0.884 total time=   1.8s
[CV 1/5] END ......................max_depth=56;, score=0.878 total time=   1.7s
[CV 2/5] END ......................max_depth=56;, score=0.875 total time=   1.7s
[CV 3/5] END ......................max_depth=56;, score=0.871 total time=   2.0s
[CV 4/5] END ......................max_depth=56;, score=0.881 total time=   2.0s
[CV 5/5] END ......................max_depth=56;, score=0.886 total time=   1.9s
[CV 1/5] END ......................max_depth=57;, score=0.881 total time=   2.1s
[CV 2/5] END ......................max_depth=57;, score=0.875 total time=   1.7s
[CV 3/5] END ......................max_depth=57;, score=0.875 total time=   1.9s
[CV 4/5] END ......................max_depth=57;, score=0.879 total time=   2.0s
[CV 5/5] END ......................max_depth=57;, score=0.885 total time=   2.3s
[CV 1/5] END ......................max_depth=58;, score=0.882 total time=   1.9s
[CV 2/5] END ......................max_depth=58;, score=0.872 total time=   2.2s
[CV 3/5] END ......................max_depth=58;, score=0.873 total time=   2.7s
[CV 4/5] END ......................max_depth=58;, score=0.880 total time=   1.9s
[CV 5/5] END ......................max_depth=58;, score=0.886 total time=   2.1s
[CV 1/5] END ......................max_depth=59;, score=0.878 total time=   1.8s
[CV 2/5] END ......................max_depth=59;, score=0.876 total time=   2.4s
[CV 3/5] END ......................max_depth=59;, score=0.876 total time=   2.3s
[CV 4/5] END ......................max_depth=59;, score=0.882 total time=   2.0s
[CV 5/5] END ......................max_depth=59;, score=0.885 total time=   2.5s
[CV 1/5] END ......................max_depth=60;, score=0.882 total time=   2.2s
[CV 2/5] END ......................max_depth=60;, score=0.872 total time=   2.0s
[CV 3/5] END ......................max_depth=60;, score=0.875 total time=   2.3s
[CV 4/5] END ......................max_depth=60;, score=0.883 total time=   1.9s
[CV 5/5] END ......................max_depth=60;, score=0.885 total time=   2.2s
[CV 1/5] END ......................max_depth=61;, score=0.883 total time=   2.5s
[CV 2/5] END ......................max_depth=61;, score=0.871 total time=   2.2s
[CV 3/5] END ......................max_depth=61;, score=0.878 total time=   1.9s
[CV 4/5] END ......................max_depth=61;, score=0.883 total time=   2.0s
[CV 5/5] END ......................max_depth=61;, score=0.886 total time=   2.3s
[CV 1/5] END ......................max_depth=62;, score=0.879 total time=   2.0s
[CV 2/5] END ......................max_depth=62;, score=0.868 total time=   2.1s
[CV 3/5] END ......................max_depth=62;, score=0.877 total time=   2.1s
[CV 4/5] END ......................max_depth=62;, score=0.880 total time=   2.1s
[CV 5/5] END ......................max_depth=62;, score=0.884 total time=   2.5s
[CV 1/5] END ......................max_depth=63;, score=0.882 total time=   2.6s
[CV 2/5] END ......................max_depth=63;, score=0.873 total time=   2.3s
[CV 3/5] END ......................max_depth=63;, score=0.876 total time=   2.4s
[CV 4/5] END ......................max_depth=63;, score=0.880 total time=   2.1s
[CV 5/5] END ......................max_depth=63;, score=0.885 total time=   1.9s
[CV 1/5] END ......................max_depth=64;, score=0.880 total time=   2.4s
[CV 2/5] END ......................max_depth=64;, score=0.871 total time=   2.0s
[CV 3/5] END ......................max_depth=64;, score=0.879 total time=   2.2s
[CV 4/5] END ......................max_depth=64;, score=0.877 total time=   1.9s
[CV 5/5] END ......................max_depth=64;, score=0.883 total time=   2.0s
[CV 1/5] END ......................max_depth=65;, score=0.884 total time=   2.0s
[CV 2/5] END ......................max_depth=65;, score=0.875 total time=   2.4s
[CV 3/5] END ......................max_depth=65;, score=0.878 total time=   2.3s
[CV 4/5] END ......................max_depth=65;, score=0.880 total time=   2.1s
[CV 5/5] END ......................max_depth=65;, score=0.887 total time=   1.9s
[CV 1/5] END ......................max_depth=66;, score=0.882 total time=   1.9s
[CV 2/5] END ......................max_depth=66;, score=0.875 total time=   2.0s
[CV 3/5] END ......................max_depth=66;, score=0.877 total time=   1.9s
[CV 4/5] END ......................max_depth=66;, score=0.882 total time=   2.2s
[CV 5/5] END ......................max_depth=66;, score=0.883 total time=   2.1s
[CV 1/5] END ......................max_depth=67;, score=0.886 total time=   2.5s
[CV 2/5] END ......................max_depth=67;, score=0.872 total time=   2.3s
[CV 3/5] END ......................max_depth=67;, score=0.877 total time=   2.2s
[CV 4/5] END ......................max_depth=67;, score=0.880 total time=   2.0s
[CV 5/5] END ......................max_depth=67;, score=0.884 total time=   2.1s
[CV 1/5] END ......................max_depth=68;, score=0.879 total time=   2.3s
[CV 2/5] END ......................max_depth=68;, score=0.874 total time=   3.1s
[CV 3/5] END ......................max_depth=68;, score=0.878 total time=   2.6s
[CV 4/5] END ......................max_depth=68;, score=0.879 total time=   2.4s
[CV 5/5] END ......................max_depth=68;, score=0.884 total time=   2.2s
[CV 1/5] END ......................max_depth=69;, score=0.880 total time=   2.5s
[CV 2/5] END ......................max_depth=69;, score=0.872 total time=   2.1s
[CV 3/5] END ......................max_depth=69;, score=0.877 total time=   2.4s
[CV 4/5] END ......................max_depth=69;, score=0.880 total time=   2.4s
[CV 5/5] END ......................max_depth=69;, score=0.884 total time=   2.2s
[CV 1/5] END ......................max_depth=70;, score=0.879 total time=   1.9s
[CV 2/5] END ......................max_depth=70;, score=0.870 total time=   2.0s
[CV 3/5] END ......................max_depth=70;, score=0.882 total time=   2.4s
[CV 4/5] END ......................max_depth=70;, score=0.878 total time=   2.1s
[CV 5/5] END ......................max_depth=70;, score=0.885 total time=   2.4s
[CV 1/5] END ......................max_depth=71;, score=0.881 total time=   2.7s
[CV 2/5] END ......................max_depth=71;, score=0.875 total time=   2.5s
[CV 3/5] END ......................max_depth=71;, score=0.878 total time=   2.8s
[CV 4/5] END ......................max_depth=71;, score=0.879 total time=   2.4s
[CV 5/5] END ......................max_depth=71;, score=0.883 total time=   2.4s
[CV 1/5] END ......................max_depth=72;, score=0.879 total time=   2.0s
[CV 2/5] END ......................max_depth=72;, score=0.876 total time=   2.7s
[CV 3/5] END ......................max_depth=72;, score=0.878 total time=   2.4s
[CV 4/5] END ......................max_depth=72;, score=0.877 total time=   2.2s
[CV 5/5] END ......................max_depth=72;, score=0.885 total time=   2.0s
[CV 1/5] END ......................max_depth=73;, score=0.880 total time=   2.5s
[CV 2/5] END ......................max_depth=73;, score=0.872 total time=   2.6s
[CV 3/5] END ......................max_depth=73;, score=0.878 total time=   2.9s
[CV 4/5] END ......................max_depth=73;, score=0.882 total time=   2.6s
[CV 5/5] END ......................max_depth=73;, score=0.886 total time=   2.5s
[CV 1/5] END ......................max_depth=74;, score=0.878 total time=   2.5s
[CV 2/5] END ......................max_depth=74;, score=0.871 total time=   2.7s
[CV 3/5] END ......................max_depth=74;, score=0.878 total time=   2.1s
[CV 4/5] END ......................max_depth=74;, score=0.880 total time=   2.8s
[CV 5/5] END ......................max_depth=74;, score=0.883 total time=   2.7s
[CV 1/5] END ......................max_depth=75;, score=0.880 total time=   2.2s
[CV 2/5] END ......................max_depth=75;, score=0.875 total time=   2.3s
[CV 3/5] END ......................max_depth=75;, score=0.880 total time=   2.5s
[CV 4/5] END ......................max_depth=75;, score=0.881 total time=   2.8s
[CV 5/5] END ......................max_depth=75;, score=0.886 total time=   3.1s
[CV 1/5] END ......................max_depth=76;, score=0.878 total time=   2.7s
[CV 2/5] END ......................max_depth=76;, score=0.877 total time=   2.7s
[CV 3/5] END ......................max_depth=76;, score=0.878 total time=   2.6s
[CV 4/5] END ......................max_depth=76;, score=0.877 total time=   2.4s
[CV 5/5] END ......................max_depth=76;, score=0.887 total time=   2.3s
[CV 1/5] END ......................max_depth=77;, score=0.882 total time=   2.4s
[CV 2/5] END ......................max_depth=77;, score=0.873 total time=   2.7s
[CV 3/5] END ......................max_depth=77;, score=0.877 total time=   2.3s
[CV 4/5] END ......................max_depth=77;, score=0.882 total time=   2.1s
[CV 5/5] END ......................max_depth=77;, score=0.885 total time=   2.1s
[CV 1/5] END ......................max_depth=78;, score=0.882 total time=   2.3s
[CV 2/5] END ......................max_depth=78;, score=0.870 total time=   2.1s
[CV 3/5] END ......................max_depth=78;, score=0.879 total time=   2.5s
[CV 4/5] END ......................max_depth=78;, score=0.876 total time=   2.1s
[CV 5/5] END ......................max_depth=78;, score=0.885 total time=   2.3s
[CV 1/5] END ......................max_depth=79;, score=0.878 total time=   2.6s
[CV 2/5] END ......................max_depth=79;, score=0.874 total time=   2.4s
[CV 3/5] END ......................max_depth=79;, score=0.878 total time=   2.6s
[CV 4/5] END ......................max_depth=79;, score=0.880 total time=   2.6s
[CV 5/5] END ......................max_depth=79;, score=0.882 total time=   2.5s
[CV 1/5] END ......................max_depth=80;, score=0.879 total time=   2.1s
[CV 2/5] END ......................max_depth=80;, score=0.873 total time=   2.1s
[CV 3/5] END ......................max_depth=80;, score=0.877 total time=   2.2s
[CV 4/5] END ......................max_depth=80;, score=0.879 total time=   2.5s
[CV 5/5] END ......................max_depth=80;, score=0.884 total time=   2.5s
[CV 1/5] END ......................max_depth=81;, score=0.877 total time=   3.0s
[CV 2/5] END ......................max_depth=81;, score=0.870 total time=   2.2s
[CV 3/5] END ......................max_depth=81;, score=0.882 total time=   2.7s
[CV 4/5] END ......................max_depth=81;, score=0.882 total time=   2.9s
[CV 5/5] END ......................max_depth=81;, score=0.884 total time=   2.7s
[CV 1/5] END ......................max_depth=82;, score=0.880 total time=   2.3s
[CV 2/5] END ......................max_depth=82;, score=0.869 total time=   3.0s
[CV 3/5] END ......................max_depth=82;, score=0.880 total time=   2.9s
[CV 4/5] END ......................max_depth=82;, score=0.881 total time=   2.6s
[CV 5/5] END ......................max_depth=82;, score=0.884 total time=   2.4s
[CV 1/5] END ......................max_depth=83;, score=0.879 total time=   2.5s
[CV 2/5] END ......................max_depth=83;, score=0.873 total time=   2.3s
[CV 3/5] END ......................max_depth=83;, score=0.878 total time=   3.3s
[CV 4/5] END ......................max_depth=83;, score=0.880 total time=   2.6s
[CV 5/5] END ......................max_depth=83;, score=0.884 total time=   2.6s
[CV 1/5] END ......................max_depth=84;, score=0.878 total time=   2.8s
[CV 2/5] END ......................max_depth=84;, score=0.871 total time=   2.6s
[CV 3/5] END ......................max_depth=84;, score=0.879 total time=   2.7s
[CV 4/5] END ......................max_depth=84;, score=0.876 total time=   2.4s
[CV 5/5] END ......................max_depth=84;, score=0.882 total time=   2.9s
[CV 1/5] END ......................max_depth=85;, score=0.879 total time=   3.0s
[CV 2/5] END ......................max_depth=85;, score=0.869 total time=   2.5s
[CV 3/5] END ......................max_depth=85;, score=0.881 total time=   2.2s
[CV 4/5] END ......................max_depth=85;, score=0.878 total time=   2.9s
[CV 5/5] END ......................max_depth=85;, score=0.884 total time=   2.7s
[CV 1/5] END ......................max_depth=86;, score=0.881 total time=   2.9s
[CV 2/5] END ......................max_depth=86;, score=0.872 total time=   2.8s
[CV 3/5] END ......................max_depth=86;, score=0.877 total time=   2.6s
[CV 4/5] END ......................max_depth=86;, score=0.881 total time=   2.7s
[CV 5/5] END ......................max_depth=86;, score=0.886 total time=   2.9s
[CV 1/5] END ......................max_depth=87;, score=0.880 total time=   2.3s
[CV 2/5] END ......................max_depth=87;, score=0.875 total time=   2.1s
[CV 3/5] END ......................max_depth=87;, score=0.876 total time=   2.2s
[CV 4/5] END ......................max_depth=87;, score=0.881 total time=   2.1s
[CV 5/5] END ......................max_depth=87;, score=0.885 total time=   2.2s
[CV 1/5] END ......................max_depth=88;, score=0.880 total time=   2.3s
[CV 2/5] END ......................max_depth=88;, score=0.871 total time=   2.4s
[CV 3/5] END ......................max_depth=88;, score=0.879 total time=   2.2s
[CV 4/5] END ......................max_depth=88;, score=0.880 total time=   2.1s
[CV 5/5] END ......................max_depth=88;, score=0.885 total time=   2.3s
[CV 1/5] END ......................max_depth=89;, score=0.881 total time=   2.3s
[CV 2/5] END ......................max_depth=89;, score=0.873 total time=   2.3s
[CV 3/5] END ......................max_depth=89;, score=0.877 total time=   2.6s
[CV 4/5] END ......................max_depth=89;, score=0.884 total time=   2.7s
[CV 5/5] END ......................max_depth=89;, score=0.885 total time=   3.0s
[CV 1/5] END ......................max_depth=90;, score=0.880 total time=   2.6s
[CV 2/5] END ......................max_depth=90;, score=0.870 total time=   2.3s
[CV 3/5] END ......................max_depth=90;, score=0.875 total time=   2.9s
[CV 4/5] END ......................max_depth=90;, score=0.882 total time=   3.0s
[CV 5/5] END ......................max_depth=90;, score=0.884 total time=   2.7s
[CV 1/5] END ......................max_depth=91;, score=0.883 total time=   2.8s
[CV 2/5] END ......................max_depth=91;, score=0.870 total time=   2.6s
[CV 3/5] END ......................max_depth=91;, score=0.879 total time=   2.8s
[CV 4/5] END ......................max_depth=91;, score=0.879 total time=   3.1s
[CV 5/5] END ......................max_depth=91;, score=0.884 total time=   2.7s
[CV 1/5] END ......................max_depth=92;, score=0.878 total time=   2.5s
[CV 2/5] END ......................max_depth=92;, score=0.875 total time=   2.5s
[CV 3/5] END ......................max_depth=92;, score=0.877 total time=   2.6s
[CV 4/5] END ......................max_depth=92;, score=0.878 total time=   2.2s
[CV 5/5] END ......................max_depth=92;, score=0.888 total time=   3.0s
[CV 1/5] END ......................max_depth=93;, score=0.879 total time=   3.1s
[CV 2/5] END ......................max_depth=93;, score=0.873 total time=   3.1s
[CV 3/5] END ......................max_depth=93;, score=0.877 total time=   2.7s
[CV 4/5] END ......................max_depth=93;, score=0.881 total time=   2.6s
[CV 5/5] END ......................max_depth=93;, score=0.885 total time=   2.8s
[CV 1/5] END ......................max_depth=94;, score=0.878 total time=   3.1s
[CV 2/5] END ......................max_depth=94;, score=0.869 total time=   2.8s
[CV 3/5] END ......................max_depth=94;, score=0.877 total time=   2.5s
[CV 4/5] END ......................max_depth=94;, score=0.877 total time=   2.6s
[CV 5/5] END ......................max_depth=94;, score=0.884 total time=   2.8s
[CV 1/5] END ......................max_depth=95;, score=0.880 total time=   2.9s
[CV 2/5] END ......................max_depth=95;, score=0.871 total time=   2.9s
[CV 3/5] END ......................max_depth=95;, score=0.878 total time=   2.5s
[CV 4/5] END ......................max_depth=95;, score=0.879 total time=   2.2s
[CV 5/5] END ......................max_depth=95;, score=0.886 total time=   2.4s
[CV 1/5] END ......................max_depth=96;, score=0.880 total time=   2.8s
[CV 2/5] END ......................max_depth=96;, score=0.871 total time=   2.9s
[CV 3/5] END ......................max_depth=96;, score=0.878 total time=   2.8s
[CV 4/5] END ......................max_depth=96;, score=0.880 total time=   2.6s
[CV 5/5] END ......................max_depth=96;, score=0.886 total time=   2.4s
[CV 1/5] END ......................max_depth=97;, score=0.879 total time=   2.4s
[CV 2/5] END ......................max_depth=97;, score=0.870 total time=   2.8s
[CV 3/5] END ......................max_depth=97;, score=0.877 total time=   3.0s
[CV 4/5] END ......................max_depth=97;, score=0.879 total time=   2.5s
[CV 5/5] END ......................max_depth=97;, score=0.884 total time=   3.1s
[CV 1/5] END ......................max_depth=98;, score=0.877 total time=   2.9s
[CV 2/5] END ......................max_depth=98;, score=0.872 total time=   2.9s
[CV 3/5] END ......................max_depth=98;, score=0.878 total time=   3.2s
[CV 4/5] END ......................max_depth=98;, score=0.876 total time=   2.6s
[CV 5/5] END ......................max_depth=98;, score=0.884 total time=   2.9s
[CV 1/5] END ......................max_depth=99;, score=0.881 total time=   2.9s
[CV 2/5] END ......................max_depth=99;, score=0.869 total time=   2.6s
[CV 3/5] END ......................max_depth=99;, score=0.875 total time=   2.7s
[CV 4/5] END ......................max_depth=99;, score=0.875 total time=   3.4s
[CV 5/5] END ......................max_depth=99;, score=0.887 total time=   2.7s

Out[470]:

GridSearchCV(estimator=DecisionTreeClassifier(random_state=0),
             param_grid={'max_depth': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
                                       14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
                                       24, 25, 26, 27, 28, 29, 30, 31, ...]},
             verbose=3)

[CV 3/5] END ......................max_depth=83;, score=0.878 total time=   4.0s
[CV 4/5] END ......................max_depth=83;, score=0.880 total time=   4.1s
[CV 5/5] END ......................max_depth=83;, score=0.884 total time=   4.0s
[CV 1/5] END ......................max_depth=84;, score=0.878 total time=   4.2s
[CV 2/5] END ......................max_depth=84;, score=0.871 total time=   5.3s
[CV 3/5] END ......................max_depth=84;, score=0.879 total time=   4.3s
[CV 4/5] END ......................max_depth=84;, score=0.876 total time=   4.8s
[CV 5/5] END ......................max_depth=84;, score=0.882 total time=   4.9s
[CV 1/5] END ......................max_depth=85;, score=0.879 total time=   3.8s
[CV 2/5] END ......................max_depth=85;, score=0.869 total time=   5.4s
[CV 3/5] END ......................max_depth=85;, score=0.881 total time=   4.2s
[CV 4/5] END ......................max_depth=85;, score=0.878 total time=   3.6s
[CV 5/5] END ......................max_depth=85;, score=0.884 total time=   3.4s
[CV 1/5] END ......................max_depth=86;, score=0.881 total time=   3.6s
[CV 2/5] END ......................max_depth=86;, score=0.872 total time=   3.4s
[CV 3/5] END ......................max_depth=86;, score=0.877 total time=   4.1s
[CV 4/5] END ......................max_depth=86;, score=0.881 total time=   3.4s
[CV 5/5] END ......................max_depth=86;, score=0.886 total time=   3.6s
[CV 1/5] END ......................max_depth=87;, score=0.880 total time=   3.8s
[CV 2/5] END ......................max_depth=87;, score=0.875 total time=   4.1s
[CV 3/5] END ......................max_depth=87;, score=0.876 total time=   3.6s
[CV 4/5] END ......................max_depth=87;, score=0.881 total time=   3.4s
[CV 5/5] END ......................max_depth=87;, score=0.885 total time=   4.0s
[CV 1/5] END ......................max_depth=88;, score=0.880 total time=   4.2s
[CV 2/5] END ......................max_depth=88;, score=0.871 total time=   3.7s
[CV 3/5] END ......................max_depth=88;, score=0.879 total time=   3.7s
[CV 4/5] END ......................max_depth=88;, score=0.880 total time=   3.5s
[CV 5/5] END ......................max_depth=88;, score=0.885 total time=   4.4s
[CV 1/5] END ......................max_depth=89;, score=0.881 total time=   3.9s
[CV 2/5] END ......................max_depth=89;, score=0.873 total time=   3.6s
[CV 3/5] END ......................max_depth=89;, score=0.877 total time=   3.8s
[CV 4/5] END ......................max_depth=89;, score=0.884 total time=   3.9s
[CV 5/5] END ......................max_depth=89;, score=0.885 total time=   4.2s
[CV 1/5] END ......................max_depth=90;, score=0.880 total time=   3.7s
[CV 2/5] END ......................max_depth=90;, score=0.870 total time=   3.5s
[CV 3/5] END ......................max_depth=90;, score=0.875 total time=   3.6s
[CV 4/5] END ......................max_depth=90;, score=0.882 total time=   3.4s
[CV 5/5] END ......................max_depth=90;, score=0.884 total time=   3.5s
[CV 1/5] END ......................max_depth=91;, score=0.883 total time=   3.7s
[CV 2/5] END ......................max_depth=91;, score=0.870 total time=   3.3s
[CV 3/5] END ......................max_depth=91;, score=0.879 total time=   4.3s
[CV 4/5] END ......................max_depth=91;, score=0.879 total time=   3.8s
[CV 5/5] END ......................max_depth=91;, score=0.884 total time=   3.7s
[CV 1/5] END ......................max_depth=92;, score=0.878 total time=   3.7s
[CV 2/5] END ......................max_depth=92;, score=0.875 total time=   4.4s
[CV 3/5] END ......................max_depth=92;, score=0.877 total time=   5.1s
[CV 4/5] END ......................max_depth=92;, score=0.878 total time=   4.2s
[CV 5/5] END ......................max_depth=92;, score=0.888 total time=   3.7s
[CV 1/5] END ......................max_depth=93;, score=0.879 total time=   3.2s
[CV 2/5] END ......................max_depth=93;, score=0.873 total time=   4.3s
[CV 3/5] END ......................max_depth=93;, score=0.877 total time=   3.5s
[CV 4/5] END ......................max_depth=93;, score=0.881 total time=   3.8s
[CV 5/5] END ......................max_depth=93;, score=0.885 total time=   3.6s
[CV 1/5] END ......................max_depth=94;, score=0.878 total time=   4.0s
[CV 2/5] END ......................max_depth=94;, score=0.869 total time=   3.7s
[CV 3/5] END ......................max_depth=94;, score=0.877 total time=   3.7s
[CV 4/5] END ......................max_depth=94;, score=0.877 total time=   4.0s
[CV 5/5] END ......................max_depth=94;, score=0.884 total time=   3.8s
[CV 1/5] END ......................max_depth=95;, score=0.880 total time=   3.8s
[CV 2/5] END ......................max_depth=95;, score=0.871 total time=   3.7s
[CV 3/5] END ......................max_depth=95;, score=0.878 total time=   3.9s
[CV 4/5] END ......................max_depth=95;, score=0.879 total time=   3.8s
[CV 5/5] END ......................max_depth=95;, score=0.886 total time=   4.1s
[CV 1/5] END ......................max_depth=96;, score=0.880 total time=   4.1s
[CV 2/5] END ......................max_depth=96;, score=0.871 total time=   3.5s
[CV 3/5] END ......................max_depth=96;, score=0.878 total time=   3.5s
[CV 4/5] END ......................max_depth=96;, score=0.880 total time=   3.6s
[CV 5/5] END ......................max_depth=96;, score=0.886 total time=   4.3s
[CV 1/5] END ......................max_depth=97;, score=0.879 total time=   4.5s
[CV 2/5] END ......................max_depth=97;, score=0.870 total time=   3.6s
[CV 3/5] END ......................max_depth=97;, score=0.877 total time=   4.1s
[CV 4/5] END ......................max_depth=97;, score=0.879 total time=   3.7s
[CV 5/5] END ......................max_depth=97;, score=0.884 total time=   3.9s
[CV 1/5] END ......................max_depth=98;, score=0.877 total time=   3.3s
[CV 2/5] END ......................max_depth=98;, score=0.872 total time=   3.9s
[CV 3/5] END ......................max_depth=98;, score=0.878 total time=   3.7s
[CV 4/5] END ......................max_depth=98;, score=0.876 total time=   4.0s
[CV 5/5] END ......................max_depth=98;, score=0.884 total time=   4.3s
[CV 1/5] END ......................max_depth=99;, score=0.881 total time=   4.5s
[CV 2/5] END ......................max_depth=99;, score=0.869 total time=   4.1s
[CV 3/5] END ......................max_depth=99;, score=0.875 total time=   3.5s
[CV 4/5] END ......................max_depth=99;, score=0.875 total time=   4.0s
[CV 5/5] END ......................max_depth=99;, score=0.887 total time=   4.3s

Out[470]:

GridSearchCV(estimator=DecisionTreeClassifier(random_state=0),
             param_grid={'max_depth': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
                                       14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
                                       24, 25, 26, 27, 28, 29, 30, 31, ...]},
             verbose=3)

In [471]:

gs.best_estimator_

Out[471]:

DecisionTreeClassifier(max_depth=65, random_state=0)

min_samples_split¶

This parameter takes Integer or float value where minimum default value is 2. This parameter decides the least number of samples needed to split an internal node. Consider min_samples_split as the minimum number if int. If float, then min_samples_split is a fraction and ceil (min_samples_split * n_samples) is the minimum number of samples for each split. GridSearchCV is used to select the best value of this parameter, first integer values in the range of 2 to 10 and the then floating-point value in the range of 0.1 to 1 is tested. With GridSearchCV for integer value between 2 to 10, the best parameter value is 2 and for floating-point value the best value is found to be 0.1.

In [473]:

gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, random_state=0),
             param_grid={'min_samples_split': list(range(2, 10))},
             verbose=3)

In [474]:

gs.fit(X_train, y_train)

Fitting 5 folds for each of 8 candidates, totalling 40 fits
[CV 1/5] END ...............min_samples_split=2;, score=0.884 total time=   2.3s
[CV 2/5] END ...............min_samples_split=2;, score=0.875 total time=   2.7s
[CV 3/5] END ...............min_samples_split=2;, score=0.878 total time=   2.5s
[CV 4/5] END ...............min_samples_split=2;, score=0.880 total time=   2.5s
[CV 5/5] END ...............min_samples_split=2;, score=0.887 total time=   2.8s
[CV 1/5] END ...............min_samples_split=3;, score=0.882 total time=   2.4s
[CV 2/5] END ...............min_samples_split=3;, score=0.872 total time=   2.2s
[CV 3/5] END ...............min_samples_split=3;, score=0.880 total time=   2.0s
[CV 4/5] END ...............min_samples_split=3;, score=0.880 total time=   2.3s
[CV 5/5] END ...............min_samples_split=3;, score=0.882 total time=   2.7s
[CV 1/5] END ...............min_samples_split=4;, score=0.878 total time=   2.6s
[CV 2/5] END ...............min_samples_split=4;, score=0.876 total time=   2.2s
[CV 3/5] END ...............min_samples_split=4;, score=0.875 total time=   2.1s
[CV 4/5] END ...............min_samples_split=4;, score=0.882 total time=   2.3s
[CV 5/5] END ...............min_samples_split=4;, score=0.885 total time=   2.4s
[CV 1/5] END ...............min_samples_split=5;, score=0.877 total time=   2.0s
[CV 2/5] END ...............min_samples_split=5;, score=0.872 total time=   2.5s
[CV 3/5] END ...............min_samples_split=5;, score=0.879 total time=   2.3s
[CV 4/5] END ...............min_samples_split=5;, score=0.881 total time=   1.9s
[CV 5/5] END ...............min_samples_split=5;, score=0.886 total time=   1.9s
[CV 1/5] END ...............min_samples_split=6;, score=0.882 total time=   2.3s
[CV 2/5] END ...............min_samples_split=6;, score=0.874 total time=   1.8s
[CV 3/5] END ...............min_samples_split=6;, score=0.878 total time=   1.8s
[CV 4/5] END ...............min_samples_split=6;, score=0.880 total time=   2.2s
[CV 5/5] END ...............min_samples_split=6;, score=0.881 total time=   2.0s
[CV 1/5] END ...............min_samples_split=7;, score=0.882 total time=   2.2s
[CV 2/5] END ...............min_samples_split=7;, score=0.873 total time=   2.0s
[CV 3/5] END ...............min_samples_split=7;, score=0.878 total time=   1.8s
[CV 4/5] END ...............min_samples_split=7;, score=0.882 total time=   1.8s
[CV 5/5] END ...............min_samples_split=7;, score=0.885 total time=   1.8s
[CV 1/5] END ...............min_samples_split=8;, score=0.881 total time=   1.8s
[CV 2/5] END ...............min_samples_split=8;, score=0.875 total time=   2.1s
[CV 3/5] END ...............min_samples_split=8;, score=0.875 total time=   1.8s
[CV 4/5] END ...............min_samples_split=8;, score=0.877 total time=   1.8s
[CV 5/5] END ...............min_samples_split=8;, score=0.885 total time=   1.8s
[CV 1/5] END ...............min_samples_split=9;, score=0.884 total time=   2.0s
[CV 2/5] END ...............min_samples_split=9;, score=0.874 total time=   1.7s
[CV 3/5] END ...............min_samples_split=9;, score=0.880 total time=   1.8s
[CV 4/5] END ...............min_samples_split=9;, score=0.882 total time=   1.8s
[CV 5/5] END ...............min_samples_split=9;, score=0.883 total time=   2.1s

Out[474]:

GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, random_state=0),
             param_grid={'min_samples_split': [2, 3, 4, 5, 6, 7, 8, 9]},
             verbose=3)

In [475]:

gs.best_estimator_

Out[475]:

DecisionTreeClassifier(max_depth=65, random_state=0)

In [476]:

gs.best_params_

Out[476]:

{'min_samples_split': 2}

In [477]:

gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, random_state=0),
             param_grid={'min_samples_split': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]},
             verbose=3)

In [478]:

gs.fit(X_train, y_train)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END .............min_samples_split=0.1;, score=0.890 total time=   1.1s
[CV 2/5] END .............min_samples_split=0.1;, score=0.883 total time=   1.0s
[CV 3/5] END .............min_samples_split=0.1;, score=0.887 total time=   1.0s
[CV 4/5] END .............min_samples_split=0.1;, score=0.894 total time=   1.1s
[CV 5/5] END .............min_samples_split=0.1;, score=0.892 total time=   1.0s
[CV 1/5] END .............min_samples_split=0.2;, score=0.883 total time=   0.9s
[CV 2/5] END .............min_samples_split=0.2;, score=0.878 total time=   0.9s
[CV 3/5] END .............min_samples_split=0.2;, score=0.882 total time=   0.9s
[CV 4/5] END .............min_samples_split=0.2;, score=0.887 total time=   0.9s
[CV 5/5] END .............min_samples_split=0.2;, score=0.888 total time=   0.9s
[CV 1/5] END .............min_samples_split=0.3;, score=0.830 total time=   0.7s
[CV 2/5] END .............min_samples_split=0.3;, score=0.828 total time=   0.7s
[CV 3/5] END .............min_samples_split=0.3;, score=0.841 total time=   0.9s
[CV 4/5] END .............min_samples_split=0.3;, score=0.840 total time=   0.5s
[CV 5/5] END .............min_samples_split=0.3;, score=0.828 total time=   0.6s
[CV 1/5] END .............min_samples_split=0.4;, score=0.778 total time=   0.4s
[CV 2/5] END .............min_samples_split=0.4;, score=0.773 total time=   0.4s
[CV 3/5] END .............min_samples_split=0.4;, score=0.775 total time=   0.4s
[CV 4/5] END .............min_samples_split=0.4;, score=0.773 total time=   0.4s
[CV 5/5] END .............min_samples_split=0.4;, score=0.774 total time=   0.4s
[CV 1/5] END .............min_samples_split=0.5;, score=0.775 total time=   0.4s
[CV 2/5] END .............min_samples_split=0.5;, score=0.775 total time=   0.4s
[CV 3/5] END .............min_samples_split=0.5;, score=0.775 total time=   0.4s
[CV 4/5] END .............min_samples_split=0.5;, score=0.775 total time=   0.4s
[CV 5/5] END .............min_samples_split=0.5;, score=0.775 total time=   0.4s
[CV 1/5] END .............min_samples_split=0.6;, score=0.775 total time=   0.3s
[CV 2/5] END .............min_samples_split=0.6;, score=0.775 total time=   0.4s
[CV 3/5] END .............min_samples_split=0.6;, score=0.775 total time=   0.4s
[CV 4/5] END .............min_samples_split=0.6;, score=0.775 total time=   0.3s
[CV 5/5] END .............min_samples_split=0.6;, score=0.775 total time=   0.4s
[CV 1/5] END .............min_samples_split=0.7;, score=0.775 total time=   0.3s
[CV 2/5] END .............min_samples_split=0.7;, score=0.775 total time=   0.3s
[CV 3/5] END .............min_samples_split=0.7;, score=0.775 total time=   0.3s
[CV 4/5] END .............min_samples_split=0.7;, score=0.775 total time=   0.3s
[CV 5/5] END .............min_samples_split=0.7;, score=0.775 total time=   0.3s
[CV 1/5] END .............min_samples_split=0.8;, score=0.775 total time=   0.4s
[CV 2/5] END .............min_samples_split=0.8;, score=0.775 total time=   0.3s
[CV 3/5] END .............min_samples_split=0.8;, score=0.775 total time=   0.3s
[CV 4/5] END .............min_samples_split=0.8;, score=0.775 total time=   0.3s
[CV 5/5] END .............min_samples_split=0.8;, score=0.775 total time=   0.3s
[CV 1/5] END .............min_samples_split=0.9;, score=0.775 total time=   0.4s
[CV 2/5] END .............min_samples_split=0.9;, score=0.775 total time=   0.4s
[CV 3/5] END .............min_samples_split=0.9;, score=0.775 total time=   0.4s
[CV 4/5] END .............min_samples_split=0.9;, score=0.775 total time=   0.4s
[CV 5/5] END .............min_samples_split=0.9;, score=0.775 total time=   0.6s

Out[478]:

GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, random_state=0),
             param_grid={'min_samples_split': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
                                               0.7, 0.8, 0.9]},
             verbose=3)

In [479]:

gs.best_estimator_

Out[479]:

DecisionTreeClassifier(max_depth=65, min_samples_split=0.1, random_state=0)

In [480]:

gs.best_params_

Out[480]:

{'min_samples_split': 0.1}

min_samples_leaf¶

This parameter also takes both integer and floating-point value where the default value is 1. The value of this param decides the bare minimum of samples needed at a leaf node. Only split points that leave at least min_samples_leaf training samples in both the left and right branches will be taken into consideration. If int, then take min_samples_leaf as the lowest value. The minimum number of samples for each node is ceil (min_samples_leaf * n_samples), and the fraction min_samples_leaf is a fraction if the data type is float. GridSearchCV is used to select the best value of this parameter, first integer values in the range of 1 to 10 and the then floating-point value in the range of 0.1 to 1 is tested. With GridSearchCV for integer value between 1 to 10, the best parameter value is 4 and for floating-point value the best value is found to be 0.1

In [482]:

    gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
                                 max_depth=65,min_samples_split=0.1, random_state=0),
             param_grid={'min_samples_leaf': list(range(1, 10))},
             verbose=3)

In [483]:

gs.fit(X_train, y_train)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END ................min_samples_leaf=1;, score=0.890 total time=   1.3s
[CV 2/5] END ................min_samples_leaf=1;, score=0.883 total time=   1.2s
[CV 3/5] END ................min_samples_leaf=1;, score=0.887 total time=   1.1s
[CV 4/5] END ................min_samples_leaf=1;, score=0.894 total time=   1.1s
[CV 5/5] END ................min_samples_leaf=1;, score=0.892 total time=   1.1s
[CV 1/5] END ................min_samples_leaf=2;, score=0.888 total time=   1.0s
[CV 2/5] END ................min_samples_leaf=2;, score=0.884 total time=   1.0s
[CV 3/5] END ................min_samples_leaf=2;, score=0.883 total time=   1.0s
[CV 4/5] END ................min_samples_leaf=2;, score=0.894 total time=   1.0s
[CV 5/5] END ................min_samples_leaf=2;, score=0.891 total time=   1.0s
[CV 1/5] END ................min_samples_leaf=3;, score=0.890 total time=   1.1s
[CV 2/5] END ................min_samples_leaf=3;, score=0.884 total time=   1.0s
[CV 3/5] END ................min_samples_leaf=3;, score=0.888 total time=   1.4s
[CV 4/5] END ................min_samples_leaf=3;, score=0.897 total time=   1.0s
[CV 5/5] END ................min_samples_leaf=3;, score=0.892 total time=   1.0s
[CV 1/5] END ................min_samples_leaf=4;, score=0.890 total time=   1.0s
[CV 2/5] END ................min_samples_leaf=4;, score=0.884 total time=   0.9s
[CV 3/5] END ................min_samples_leaf=4;, score=0.889 total time=   1.0s
[CV 4/5] END ................min_samples_leaf=4;, score=0.897 total time=   0.9s
[CV 5/5] END ................min_samples_leaf=4;, score=0.892 total time=   1.0s
[CV 1/5] END ................min_samples_leaf=5;, score=0.890 total time=   1.1s
[CV 2/5] END ................min_samples_leaf=5;, score=0.884 total time=   0.9s
[CV 3/5] END ................min_samples_leaf=5;, score=0.888 total time=   0.9s
[CV 4/5] END ................min_samples_leaf=5;, score=0.896 total time=   1.0s
[CV 5/5] END ................min_samples_leaf=5;, score=0.892 total time=   1.0s
[CV 1/5] END ................min_samples_leaf=6;, score=0.889 total time=   1.0s
[CV 2/5] END ................min_samples_leaf=6;, score=0.884 total time=   1.2s
[CV 3/5] END ................min_samples_leaf=6;, score=0.887 total time=   1.0s
[CV 4/5] END ................min_samples_leaf=6;, score=0.896 total time=   0.9s
[CV 5/5] END ................min_samples_leaf=6;, score=0.892 total time=   0.9s
[CV 1/5] END ................min_samples_leaf=7;, score=0.888 total time=   0.9s
[CV 2/5] END ................min_samples_leaf=7;, score=0.883 total time=   0.8s
[CV 3/5] END ................min_samples_leaf=7;, score=0.887 total time=   0.9s
[CV 4/5] END ................min_samples_leaf=7;, score=0.896 total time=   0.9s
[CV 5/5] END ................min_samples_leaf=7;, score=0.892 total time=   0.9s
[CV 1/5] END ................min_samples_leaf=8;, score=0.888 total time=   0.9s
[CV 2/5] END ................min_samples_leaf=8;, score=0.884 total time=   0.8s
[CV 3/5] END ................min_samples_leaf=8;, score=0.886 total time=   0.9s
[CV 4/5] END ................min_samples_leaf=8;, score=0.894 total time=   0.8s
[CV 5/5] END ................min_samples_leaf=8;, score=0.891 total time=   0.9s
[CV 1/5] END ................min_samples_leaf=9;, score=0.888 total time=   0.8s
[CV 2/5] END ................min_samples_leaf=9;, score=0.882 total time=   0.8s
[CV 3/5] END ................min_samples_leaf=9;, score=0.885 total time=   1.1s
[CV 4/5] END ................min_samples_leaf=9;, score=0.894 total time=   0.8s
[CV 5/5] END ................min_samples_leaf=9;, score=0.891 total time=   0.8s

Out[483]:

GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65,
                                              min_samples_split=0.1,
                                              random_state=0),
             param_grid={'min_samples_leaf': [1, 2, 3, 4, 5, 6, 7, 8, 9]},
             verbose=3)

In [484]:

gs.best_params_

Out[484]:

{'min_samples_leaf': 4}

In [485]:

gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65,min_samples_split=0.1, random_state=0),
             param_grid={'min_samples_leaf': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]},
             verbose=3)

In [486]:

gs.fit(X_train, y_train)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END ..............min_samples_leaf=0.1;, score=0.775 total time=   0.5s
[CV 2/5] END ..............min_samples_leaf=0.1;, score=0.775 total time=   0.4s
[CV 3/5] END ..............min_samples_leaf=0.1;, score=0.775 total time=   0.4s
[CV 4/5] END ..............min_samples_leaf=0.1;, score=0.775 total time=   0.4s
[CV 5/5] END ..............min_samples_leaf=0.1;, score=0.775 total time=   0.4s
[CV 1/5] END ..............min_samples_leaf=0.2;, score=0.775 total time=   0.4s
[CV 2/5] END ..............min_samples_leaf=0.2;, score=0.775 total time=   0.4s
[CV 3/5] END ..............min_samples_leaf=0.2;, score=0.775 total time=   0.4s
[CV 4/5] END ..............min_samples_leaf=0.2;, score=0.775 total time=   0.4s
[CV 5/5] END ..............min_samples_leaf=0.2;, score=0.775 total time=   0.4s
[CV 1/5] END ..............min_samples_leaf=0.3;, score=0.775 total time=   0.3s
[CV 2/5] END ..............min_samples_leaf=0.3;, score=0.775 total time=   0.4s
[CV 3/5] END ..............min_samples_leaf=0.3;, score=0.775 total time=   0.4s
[CV 4/5] END ..............min_samples_leaf=0.3;, score=0.775 total time=   0.4s
[CV 5/5] END ..............min_samples_leaf=0.3;, score=0.775 total time=   0.3s
[CV 1/5] END ..............min_samples_leaf=0.4;, score=0.775 total time=   0.3s
[CV 2/5] END ..............min_samples_leaf=0.4;, score=0.775 total time=   0.4s
[CV 3/5] END ..............min_samples_leaf=0.4;, score=0.775 total time=   0.3s
[CV 4/5] END ..............min_samples_leaf=0.4;, score=0.775 total time=   0.3s
[CV 5/5] END ..............min_samples_leaf=0.4;, score=0.775 total time=   0.3s
[CV 1/5] END ..............min_samples_leaf=0.5;, score=0.775 total time=   0.4s
[CV 2/5] END ..............min_samples_leaf=0.5;, score=0.775 total time=   0.3s
[CV 3/5] END ..............min_samples_leaf=0.5;, score=0.775 total time=   0.3s
[CV 4/5] END ..............min_samples_leaf=0.5;, score=0.775 total time=   0.0s
[CV 5/5] END ..............min_samples_leaf=0.5;, score=0.775 total time=   0.0s
[CV 1/5] END ..............min_samples_leaf=0.6;, score=0.775 total time=   0.0s
[CV 2/5] END ..............min_samples_leaf=0.6;, score=0.775 total time=   0.0s
[CV 3/5] END ..............min_samples_leaf=0.6;, score=0.775 total time=   0.0s
[CV 4/5] END ..............min_samples_leaf=0.6;, score=0.775 total time=   0.0s
[CV 5/5] END ..............min_samples_leaf=0.6;, score=0.775 total time=   0.0s
[CV 1/5] END ..............min_samples_leaf=0.7;, score=0.775 total time=   0.0s
[CV 2/5] END ..............min_samples_leaf=0.7;, score=0.775 total time=   0.0s
[CV 3/5] END ..............min_samples_leaf=0.7;, score=0.775 total time=   0.0s
[CV 4/5] END ..............min_samples_leaf=0.7;, score=0.775 total time=   0.0s
[CV 5/5] END ..............min_samples_leaf=0.7;, score=0.775 total time=   0.0s
[CV 1/5] END ..............min_samples_leaf=0.8;, score=0.775 total time=   0.0s
[CV 2/5] END ..............min_samples_leaf=0.8;, score=0.775 total time=   0.0s
[CV 3/5] END ..............min_samples_leaf=0.8;, score=0.775 total time=   0.0s
[CV 4/5] END ..............min_samples_leaf=0.8;, score=0.775 total time=   0.0s
[CV 5/5] END ..............min_samples_leaf=0.8;, score=0.775 total time=   0.0s
[CV 1/5] END ..............min_samples_leaf=0.9;, score=0.775 total time=   0.0s
[CV 2/5] END ..............min_samples_leaf=0.9;, score=0.775 total time=   0.0s
[CV 3/5] END ..............min_samples_leaf=0.9;, score=0.775 total time=   0.0s
[CV 4/5] END ..............min_samples_leaf=0.9;, score=0.775 total time=   0.0s
[CV 5/5] END ..............min_samples_leaf=0.9;, score=0.775 total time=   0.0s

Out[486]:

GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65,
                                              min_samples_split=0.1,
                                              random_state=0),
             param_grid={'min_samples_leaf': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
                                              0.8, 0.9]},
             verbose=3)

In [487]:

gs.best_params_

Out[487]:

{'min_samples_leaf': 0.1}

max_features¶

This parameter values can be int, float or "auto," "sqrt," and "log2", where the default value is None. For integer value the number of features to take into account when choosing the best split is max_features while splitting.

In the event that floating point value is provided, max_features are a fraction and max (1, int (max_features * n_features_in_)) features are taken into account at each split. If "auto," max_features are equal to sqrt(n_features). Max_features = sqrt(n_features) if "sqrt" is true. Max_features = log2(n_features) if "log2". If None, max_features=n_features. Manual search is used for testing the values None, "auto," "sqrt," and "log2". GridSearchCV is used to select the best value of this parameter among integer values in the range of 1 to 10 and for floating-point value in the range of 0.1 to 1 is tested. With GridSearchCV for integer value between 1 to 10, the best parameter value is 9 and for floating-point value the best value is found to be 0.9.

In [489]:

gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, min_samples_split=0.1, 
                             min_samples_leaf=4,random_state=0),
             param_grid={'max_features': list(range(1, 10))},
             verbose=3)

In [490]:

gs.fit(X_train, y_train)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END ....................max_features=1;, score=0.775 total time=   0.0s
[CV 2/5] END ....................max_features=1;, score=0.775 total time=   0.0s
[CV 3/5] END ....................max_features=1;, score=0.775 total time=   0.0s
[CV 4/5] END ....................max_features=1;, score=0.775 total time=   0.0s
[CV 5/5] END ....................max_features=1;, score=0.775 total time=   0.0s
[CV 1/5] END ....................max_features=2;, score=0.775 total time=   0.0s
[CV 2/5] END ....................max_features=2;, score=0.775 total time=   0.0s
[CV 3/5] END ....................max_features=2;, score=0.775 total time=   0.0s
[CV 4/5] END ....................max_features=2;, score=0.775 total time=   0.0s
[CV 5/5] END ....................max_features=2;, score=0.775 total time=   0.0s
[CV 1/5] END ....................max_features=3;, score=0.775 total time=   0.0s
[CV 2/5] END ....................max_features=3;, score=0.775 total time=   0.0s
[CV 3/5] END ....................max_features=3;, score=0.775 total time=   0.0s
[CV 4/5] END ....................max_features=3;, score=0.775 total time=   0.0s
[CV 5/5] END ....................max_features=3;, score=0.775 total time=   0.0s
[CV 1/5] END ....................max_features=4;, score=0.775 total time=   0.0s
[CV 2/5] END ....................max_features=4;, score=0.775 total time=   0.0s
[CV 3/5] END ....................max_features=4;, score=0.775 total time=   0.0s
[CV 4/5] END ....................max_features=4;, score=0.775 total time=   0.0s
[CV 5/5] END ....................max_features=4;, score=0.775 total time=   0.0s
[CV 1/5] END ....................max_features=5;, score=0.775 total time=   0.0s
[CV 2/5] END ....................max_features=5;, score=0.775 total time=   0.0s
[CV 3/5] END ....................max_features=5;, score=0.775 total time=   0.0s
[CV 4/5] END ....................max_features=5;, score=0.775 total time=   0.0s
[CV 5/5] END ....................max_features=5;, score=0.775 total time=   0.0s
[CV 1/5] END ....................max_features=6;, score=0.775 total time=   0.0s
[CV 2/5] END ....................max_features=6;, score=0.775 total time=   0.0s
[CV 3/5] END ....................max_features=6;, score=0.775 total time=   0.0s
[CV 4/5] END ....................max_features=6;, score=0.775 total time=   0.0s
[CV 5/5] END ....................max_features=6;, score=0.775 total time=   0.0s
[CV 1/5] END ....................max_features=7;, score=0.775 total time=   0.0s
[CV 2/5] END ....................max_features=7;, score=0.775 total time=   0.0s
[CV 3/5] END ....................max_features=7;, score=0.775 total time=   0.0s
[CV 4/5] END ....................max_features=7;, score=0.775 total time=   0.0s
[CV 5/5] END ....................max_features=7;, score=0.775 total time=   0.0s
[CV 1/5] END ....................max_features=8;, score=0.775 total time=   0.0s
[CV 2/5] END ....................max_features=8;, score=0.775 total time=   0.0s
[CV 3/5] END ....................max_features=8;, score=0.775 total time=   0.0s
[CV 4/5] END ....................max_features=8;, score=0.775 total time=   0.0s
[CV 5/5] END ....................max_features=8;, score=0.775 total time=   0.0s
[CV 1/5] END ....................max_features=9;, score=0.775 total time=   0.0s
[CV 2/5] END ....................max_features=9;, score=0.775 total time=   0.0s
[CV 3/5] END ....................max_features=9;, score=0.775 total time=   0.0s
[CV 4/5] END ....................max_features=9;, score=0.775 total time=   0.0s
[CV 5/5] END ....................max_features=9;, score=0.775 total time=   0.0s

Out[490]:

GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4,
                                              min_samples_split=0.1,
                                              random_state=0),
             param_grid={'max_features': [1, 2, 3, 4, 5, 6, 7, 8, 9]},
             verbose=3)

In [491]:

gs.best_params_

Out[491]:

{'max_features': 9}

In [492]:

gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, min_samples_split=0.1, 
                             min_samples_leaf=4,random_state=0),
             param_grid={'max_features': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]},
             verbose=3)

In [493]:

gs.fit(X_train, y_train)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END ..................max_features=0.1;, score=0.879 total time=   0.6s
[CV 2/5] END ..................max_features=0.1;, score=0.871 total time=   0.8s
[CV 3/5] END ..................max_features=0.1;, score=0.855 total time=   0.8s
[CV 4/5] END ..................max_features=0.1;, score=0.864 total time=   0.6s
[CV 5/5] END ..................max_features=0.1;, score=0.830 total time=   0.5s
[CV 1/5] END ..................max_features=0.2;, score=0.839 total time=   0.8s
[CV 2/5] END ..................max_features=0.2;, score=0.854 total time=   1.0s
[CV 3/5] END ..................max_features=0.2;, score=0.868 total time=   0.9s
[CV 4/5] END ..................max_features=0.2;, score=0.841 total time=   0.9s
[CV 5/5] END ..................max_features=0.2;, score=0.839 total time=   0.8s
[CV 1/5] END ..................max_features=0.3;, score=0.881 total time=   1.2s
[CV 2/5] END ..................max_features=0.3;, score=0.849 total time=   0.9s
[CV 3/5] END ..................max_features=0.3;, score=0.851 total time=   1.1s
[CV 4/5] END ..................max_features=0.3;, score=0.869 total time=   1.3s
[CV 5/5] END ..................max_features=0.3;, score=0.879 total time=   1.1s
[CV 1/5] END ..................max_features=0.4;, score=0.882 total time=   1.3s
[CV 2/5] END ..................max_features=0.4;, score=0.875 total time=   1.6s
[CV 3/5] END ..................max_features=0.4;, score=0.887 total time=   1.3s
[CV 4/5] END ..................max_features=0.4;, score=0.881 total time=   1.1s
[CV 5/5] END ..................max_features=0.4;, score=0.876 total time=   1.1s
[CV 1/5] END ..................max_features=0.5;, score=0.883 total time=   1.6s
[CV 2/5] END ..................max_features=0.5;, score=0.872 total time=   1.3s
[CV 3/5] END ..................max_features=0.5;, score=0.877 total time=   1.4s
[CV 4/5] END ..................max_features=0.5;, score=0.896 total time=   1.2s
[CV 5/5] END ..................max_features=0.5;, score=0.882 total time=   1.3s
[CV 1/5] END ..................max_features=0.6;, score=0.882 total time=   1.5s
[CV 2/5] END ..................max_features=0.6;, score=0.885 total time=   1.4s
[CV 3/5] END ..................max_features=0.6;, score=0.888 total time=   1.6s
[CV 4/5] END ..................max_features=0.6;, score=0.890 total time=   1.5s
[CV 5/5] END ..................max_features=0.6;, score=0.884 total time=   1.4s
[CV 1/5] END ..................max_features=0.7;, score=0.889 total time=   1.5s
[CV 2/5] END ..................max_features=0.7;, score=0.884 total time=   1.4s
[CV 3/5] END ..................max_features=0.7;, score=0.878 total time=   1.5s
[CV 4/5] END ..................max_features=0.7;, score=0.897 total time=   1.4s
[CV 5/5] END ..................max_features=0.7;, score=0.892 total time=   1.5s
[CV 1/5] END ..................max_features=0.8;, score=0.890 total time=   1.6s
[CV 2/5] END ..................max_features=0.8;, score=0.884 total time=   1.6s
[CV 3/5] END ..................max_features=0.8;, score=0.879 total time=   2.0s
[CV 4/5] END ..................max_features=0.8;, score=0.880 total time=   1.7s
[CV 5/5] END ..................max_features=0.8;, score=0.882 total time=   1.6s
[CV 1/5] END ..................max_features=0.9;, score=0.890 total time=   1.6s
[CV 2/5] END ..................max_features=0.9;, score=0.884 total time=   1.6s
[CV 3/5] END ..................max_features=0.9;, score=0.889 total time=   1.6s
[CV 4/5] END ..................max_features=0.9;, score=0.897 total time=   1.6s
[CV 5/5] END ..................max_features=0.9;, score=0.892 total time=   1.5s

Out[493]:

GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4,
                                              min_samples_split=0.1,
                                              random_state=0),
             param_grid={'max_features': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
                                          0.8, 0.9]},
             verbose=3)

In [494]:

gs.best_params_

Out[494]:

{'max_features': 0.9}

max_leaf_nodes¶

This parameter takes integer values with None as its default value. With max_leaf_nodes, create a tree using the best-first method. Relative impurity reduction is the definition of best nodes. The number of leaf nodes is limitless if None. GridSearchCV is used to find the best parameter value from the range of 1 to 100. And the best results were obtained using 63 as max_leaf_nodes.

In [496]:

gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, min_samples_split=0.1, 
                             min_samples_leaf=4,max_features=None,random_state=0),
             param_grid={'max_leaf_nodes': list(range(1, 100))},
             verbose=3)

In [497]:

gs.fit(X_train, y_train)

Fitting 5 folds for each of 99 candidates, totalling 495 fits
[CV 1/5] END ....................max_leaf_nodes=1;, score=nan total time=   0.0s
[CV 2/5] END ....................max_leaf_nodes=1;, score=nan total time=   0.0s
[CV 3/5] END ....................max_leaf_nodes=1;, score=nan total time=   0.0s
[CV 4/5] END ....................max_leaf_nodes=1;, score=nan total time=   0.0s
[CV 5/5] END ....................max_leaf_nodes=1;, score=nan total time=   0.0s
[CV 1/5] END ..................max_leaf_nodes=2;, score=0.775 total time=   0.8s
[CV 2/5] END ..................max_leaf_nodes=2;, score=0.775 total time=   0.8s
[CV 3/5] END ..................max_leaf_nodes=2;, score=0.775 total time=   0.9s
[CV 4/5] END ..................max_leaf_nodes=2;, score=0.775 total time=   1.0s
[CV 5/5] END ..................max_leaf_nodes=2;, score=0.775 total time=   0.8s
[CV 1/5] END ..................max_leaf_nodes=3;, score=0.775 total time=   1.0s
[CV 2/5] END ..................max_leaf_nodes=3;, score=0.775 total time=   1.0s
[CV 3/5] END ..................max_leaf_nodes=3;, score=0.775 total time=   1.0s
[CV 4/5] END ..................max_leaf_nodes=3;, score=0.775 total time=   1.0s
[CV 5/5] END ..................max_leaf_nodes=3;, score=0.775 total time=   1.0s
[CV 1/5] END ..................max_leaf_nodes=4;, score=0.775 total time=   1.3s
[CV 2/5] END ..................max_leaf_nodes=4;, score=0.775 total time=   1.2s
[CV 3/5] END ..................max_leaf_nodes=4;, score=0.775 total time=   1.3s
[CV 4/5] END ..................max_leaf_nodes=4;, score=0.775 total time=   1.3s
[CV 5/5] END ..................max_leaf_nodes=4;, score=0.775 total time=   1.2s
[CV 1/5] END ..................max_leaf_nodes=5;, score=0.775 total time=   1.7s
[CV 2/5] END ..................max_leaf_nodes=5;, score=0.775 total time=   1.6s
[CV 3/5] END ..................max_leaf_nodes=5;, score=0.775 total time=   1.4s
[CV 4/5] END ..................max_leaf_nodes=5;, score=0.775 total time=   1.5s
[CV 5/5] END ..................max_leaf_nodes=5;, score=0.775 total time=   1.4s
[CV 1/5] END ..................max_leaf_nodes=6;, score=0.775 total time=   1.7s
[CV 2/5] END ..................max_leaf_nodes=6;, score=0.775 total time=   1.6s
[CV 3/5] END ..................max_leaf_nodes=6;, score=0.775 total time=   1.6s
[CV 4/5] END ..................max_leaf_nodes=6;, score=0.775 total time=   1.6s
[CV 5/5] END ..................max_leaf_nodes=6;, score=0.775 total time=   1.6s
[CV 1/5] END ..................max_leaf_nodes=7;, score=0.778 total time=   2.2s
[CV 2/5] END ..................max_leaf_nodes=7;, score=0.773 total time=   2.0s
[CV 3/5] END ..................max_leaf_nodes=7;, score=0.775 total time=   2.3s
[CV 4/5] END ..................max_leaf_nodes=7;, score=0.773 total time=   1.9s
[CV 5/5] END ..................max_leaf_nodes=7;, score=0.774 total time=   1.7s
[CV 1/5] END ..................max_leaf_nodes=8;, score=0.790 total time=   1.9s
[CV 2/5] END ..................max_leaf_nodes=8;, score=0.786 total time=   1.9s
[CV 3/5] END ..................max_leaf_nodes=8;, score=0.798 total time=   2.0s
[CV 4/5] END ..................max_leaf_nodes=8;, score=0.789 total time=   2.1s
[CV 5/5] END ..................max_leaf_nodes=8;, score=0.790 total time=   2.2s
[CV 1/5] END ..................max_leaf_nodes=9;, score=0.800 total time=   2.2s
[CV 2/5] END ..................max_leaf_nodes=9;, score=0.796 total time=   2.0s
[CV 3/5] END ..................max_leaf_nodes=9;, score=0.808 total time=   2.0s
[CV 4/5] END ..................max_leaf_nodes=9;, score=0.800 total time=   2.2s
[CV 5/5] END ..................max_leaf_nodes=9;, score=0.794 total time=   2.0s
[CV 1/5] END .................max_leaf_nodes=10;, score=0.809 total time=   2.6s
[CV 2/5] END .................max_leaf_nodes=10;, score=0.807 total time=   2.2s
[CV 3/5] END .................max_leaf_nodes=10;, score=0.821 total time=   2.2s
[CV 4/5] END .................max_leaf_nodes=10;, score=0.812 total time=   2.2s
[CV 5/5] END .................max_leaf_nodes=10;, score=0.805 total time=   2.2s
[CV 1/5] END .................max_leaf_nodes=11;, score=0.818 total time=   2.3s
[CV 2/5] END .................max_leaf_nodes=11;, score=0.817 total time=   2.6s
[CV 3/5] END .................max_leaf_nodes=11;, score=0.829 total time=   2.4s
[CV 4/5] END .................max_leaf_nodes=11;, score=0.822 total time=   2.4s
[CV 5/5] END .................max_leaf_nodes=11;, score=0.816 total time=   2.3s
[CV 1/5] END .................max_leaf_nodes=12;, score=0.823 total time=   2.5s
[CV 2/5] END .................max_leaf_nodes=12;, score=0.823 total time=   2.6s
[CV 3/5] END .................max_leaf_nodes=12;, score=0.835 total time=   2.8s
[CV 4/5] END .................max_leaf_nodes=12;, score=0.830 total time=   3.2s
[CV 5/5] END .................max_leaf_nodes=12;, score=0.822 total time=   2.6s
[CV 1/5] END .................max_leaf_nodes=13;, score=0.831 total time=   2.7s
[CV 2/5] END .................max_leaf_nodes=13;, score=0.830 total time=   2.7s
[CV 3/5] END .................max_leaf_nodes=13;, score=0.842 total time=   2.8s
[CV 4/5] END .................max_leaf_nodes=13;, score=0.839 total time=   3.3s
[CV 5/5] END .................max_leaf_nodes=13;, score=0.828 total time=   2.7s
[CV 1/5] END .................max_leaf_nodes=14;, score=0.838 total time=   2.8s
[CV 2/5] END .................max_leaf_nodes=14;, score=0.837 total time=   2.8s
[CV 3/5] END .................max_leaf_nodes=14;, score=0.847 total time=   2.8s
[CV 4/5] END .................max_leaf_nodes=14;, score=0.846 total time=   3.2s
[CV 5/5] END .................max_leaf_nodes=14;, score=0.840 total time=   2.9s
[CV 1/5] END .................max_leaf_nodes=15;, score=0.842 total time=   3.0s
[CV 2/5] END .................max_leaf_nodes=15;, score=0.840 total time=   2.9s
[CV 3/5] END .................max_leaf_nodes=15;, score=0.852 total time=   3.0s
[CV 4/5] END .................max_leaf_nodes=15;, score=0.852 total time=   3.1s
[CV 5/5] END .................max_leaf_nodes=15;, score=0.845 total time=   3.3s
[CV 1/5] END .................max_leaf_nodes=16;, score=0.847 total time=   3.1s
[CV 2/5] END .................max_leaf_nodes=16;, score=0.847 total time=   3.0s
[CV 3/5] END .................max_leaf_nodes=16;, score=0.856 total time=   3.1s
[CV 4/5] END .................max_leaf_nodes=16;, score=0.855 total time=   3.1s
[CV 5/5] END .................max_leaf_nodes=16;, score=0.850 total time=   3.1s
[CV 1/5] END .................max_leaf_nodes=17;, score=0.848 total time=   3.1s
[CV 2/5] END .................max_leaf_nodes=17;, score=0.851 total time=   3.0s
[CV 3/5] END .................max_leaf_nodes=17;, score=0.861 total time=   3.7s
[CV 4/5] END .................max_leaf_nodes=17;, score=0.860 total time=   3.7s
[CV 5/5] END .................max_leaf_nodes=17;, score=0.854 total time=   3.1s
[CV 1/5] END .................max_leaf_nodes=18;, score=0.854 total time=   3.2s
[CV 2/5] END .................max_leaf_nodes=18;, score=0.852 total time=   3.4s
[CV 3/5] END .................max_leaf_nodes=18;, score=0.862 total time=   3.2s
[CV 4/5] END .................max_leaf_nodes=18;, score=0.862 total time=   3.6s
[CV 5/5] END .................max_leaf_nodes=18;, score=0.855 total time=   3.4s
[CV 1/5] END .................max_leaf_nodes=19;, score=0.857 total time=   3.5s
[CV 2/5] END .................max_leaf_nodes=19;, score=0.855 total time=   3.7s
[CV 3/5] END .................max_leaf_nodes=19;, score=0.863 total time=   4.1s
[CV 4/5] END .................max_leaf_nodes=19;, score=0.864 total time=   3.4s
[CV 5/5] END .................max_leaf_nodes=19;, score=0.857 total time=   3.3s
[CV 1/5] END .................max_leaf_nodes=20;, score=0.861 total time=   3.5s
[CV 2/5] END .................max_leaf_nodes=20;, score=0.857 total time=   3.6s
[CV 3/5] END .................max_leaf_nodes=20;, score=0.865 total time=   4.2s
[CV 4/5] END .................max_leaf_nodes=20;, score=0.867 total time=   3.7s
[CV 5/5] END .................max_leaf_nodes=20;, score=0.860 total time=   3.7s
[CV 1/5] END .................max_leaf_nodes=21;, score=0.863 total time=   3.8s
[CV 2/5] END .................max_leaf_nodes=21;, score=0.861 total time=   4.5s
[CV 3/5] END .................max_leaf_nodes=21;, score=0.867 total time=   3.9s
[CV 4/5] END .................max_leaf_nodes=21;, score=0.871 total time=   3.7s
[CV 5/5] END .................max_leaf_nodes=21;, score=0.865 total time=   3.7s
[CV 1/5] END .................max_leaf_nodes=22;, score=0.867 total time=   4.2s
[CV 2/5] END .................max_leaf_nodes=22;, score=0.864 total time=   3.8s
[CV 3/5] END .................max_leaf_nodes=22;, score=0.871 total time=   3.8s
[CV 4/5] END .................max_leaf_nodes=22;, score=0.873 total time=   3.8s
[CV 5/5] END .................max_leaf_nodes=22;, score=0.868 total time=   4.2s
[CV 1/5] END .................max_leaf_nodes=23;, score=0.870 total time=   3.8s
[CV 2/5] END .................max_leaf_nodes=23;, score=0.867 total time=   4.4s
[CV 3/5] END .................max_leaf_nodes=23;, score=0.874 total time=   4.2s
[CV 4/5] END .................max_leaf_nodes=23;, score=0.874 total time=   3.9s
[CV 5/5] END .................max_leaf_nodes=23;, score=0.870 total time=   3.8s
[CV 1/5] END .................max_leaf_nodes=24;, score=0.872 total time=   3.9s
[CV 2/5] END .................max_leaf_nodes=24;, score=0.868 total time=   4.1s
[CV 3/5] END .................max_leaf_nodes=24;, score=0.876 total time=   4.1s
[CV 4/5] END .................max_leaf_nodes=24;, score=0.876 total time=   4.0s
[CV 5/5] END .................max_leaf_nodes=24;, score=0.872 total time=   3.9s
[CV 1/5] END .................max_leaf_nodes=25;, score=0.872 total time=   4.3s
[CV 2/5] END .................max_leaf_nodes=25;, score=0.869 total time=   4.0s
[CV 3/5] END .................max_leaf_nodes=25;, score=0.878 total time=   4.0s
[CV 4/5] END .................max_leaf_nodes=25;, score=0.876 total time=   4.0s
[CV 5/5] END .................max_leaf_nodes=25;, score=0.875 total time=   4.3s
[CV 1/5] END .................max_leaf_nodes=26;, score=0.876 total time=   4.3s
[CV 2/5] END .................max_leaf_nodes=26;, score=0.871 total time=   4.2s
[CV 3/5] END .................max_leaf_nodes=26;, score=0.878 total time=   4.6s
[CV 4/5] END .................max_leaf_nodes=26;, score=0.879 total time=   5.1s
[CV 5/5] END .................max_leaf_nodes=26;, score=0.879 total time=   4.3s
[CV 1/5] END .................max_leaf_nodes=27;, score=0.877 total time=   4.5s
[CV 2/5] END .................max_leaf_nodes=27;, score=0.874 total time=   4.7s
[CV 3/5] END .................max_leaf_nodes=27;, score=0.881 total time=   4.4s
[CV 4/5] END .................max_leaf_nodes=27;, score=0.884 total time=   4.4s
[CV 5/5] END .................max_leaf_nodes=27;, score=0.881 total time=   4.4s
[CV 1/5] END .................max_leaf_nodes=28;, score=0.881 total time=   4.8s
[CV 2/5] END .................max_leaf_nodes=28;, score=0.876 total time=   4.4s
[CV 3/5] END .................max_leaf_nodes=28;, score=0.882 total time=   4.5s
[CV 4/5] END .................max_leaf_nodes=28;, score=0.886 total time=   4.6s
[CV 5/5] END .................max_leaf_nodes=28;, score=0.883 total time=   4.2s
[CV 1/5] END .................max_leaf_nodes=29;, score=0.881 total time=   4.7s
[CV 2/5] END .................max_leaf_nodes=29;, score=0.877 total time=   5.3s
[CV 3/5] END .................max_leaf_nodes=29;, score=0.883 total time=   4.4s
[CV 4/5] END .................max_leaf_nodes=29;, score=0.888 total time=   4.4s
[CV 5/5] END .................max_leaf_nodes=29;, score=0.883 total time=   4.4s
[CV 1/5] END .................max_leaf_nodes=30;, score=0.883 total time=   4.8s
[CV 2/5] END .................max_leaf_nodes=30;, score=0.879 total time=   4.5s
[CV 3/5] END .................max_leaf_nodes=30;, score=0.884 total time=   4.6s
[CV 4/5] END .................max_leaf_nodes=30;, score=0.888 total time=   4.9s
[CV 5/5] END .................max_leaf_nodes=30;, score=0.885 total time=   4.5s
[CV 1/5] END .................max_leaf_nodes=31;, score=0.884 total time=   4.6s
[CV 2/5] END .................max_leaf_nodes=31;, score=0.879 total time=   4.9s
[CV 3/5] END .................max_leaf_nodes=31;, score=0.884 total time=   5.1s
[CV 4/5] END .................max_leaf_nodes=31;, score=0.888 total time=   5.7s
[CV 5/5] END .................max_leaf_nodes=31;, score=0.886 total time=   5.3s
[CV 1/5] END .................max_leaf_nodes=32;, score=0.884 total time=   5.0s
[CV 2/5] END .................max_leaf_nodes=32;, score=0.881 total time=   4.8s
[CV 3/5] END .................max_leaf_nodes=32;, score=0.884 total time=   4.9s
[CV 4/5] END .................max_leaf_nodes=32;, score=0.888 total time=   5.4s
[CV 5/5] END .................max_leaf_nodes=32;, score=0.888 total time=   4.9s
[CV 1/5] END .................max_leaf_nodes=33;, score=0.884 total time=   5.0s
[CV 2/5] END .................max_leaf_nodes=33;, score=0.880 total time=   5.3s
[CV 3/5] END .................max_leaf_nodes=33;, score=0.884 total time=   4.8s
[CV 4/5] END .................max_leaf_nodes=33;, score=0.888 total time=   5.0s
[CV 5/5] END .................max_leaf_nodes=33;, score=0.888 total time=   5.2s
[CV 1/5] END .................max_leaf_nodes=34;, score=0.886 total time=   5.0s
[CV 2/5] END .................max_leaf_nodes=34;, score=0.881 total time=   4.8s
[CV 3/5] END .................max_leaf_nodes=34;, score=0.884 total time=   5.2s
[CV 4/5] END .................max_leaf_nodes=34;, score=0.888 total time=   4.9s
[CV 5/5] END .................max_leaf_nodes=34;, score=0.888 total time=   4.9s
[CV 1/5] END .................max_leaf_nodes=35;, score=0.886 total time=   5.3s
[CV 2/5] END .................max_leaf_nodes=35;, score=0.881 total time=   5.0s
[CV 3/5] END .................max_leaf_nodes=35;, score=0.884 total time=   5.0s
[CV 4/5] END .................max_leaf_nodes=35;, score=0.890 total time=   5.4s
[CV 5/5] END .................max_leaf_nodes=35;, score=0.888 total time=   5.2s
[CV 1/5] END .................max_leaf_nodes=36;, score=0.886 total time=   5.3s
[CV 2/5] END .................max_leaf_nodes=36;, score=0.881 total time=   5.7s
[CV 3/5] END .................max_leaf_nodes=36;, score=0.884 total time=   5.3s
[CV 4/5] END .................max_leaf_nodes=36;, score=0.892 total time=   5.3s
[CV 5/5] END .................max_leaf_nodes=36;, score=0.888 total time=   5.6s
[CV 1/5] END .................max_leaf_nodes=37;, score=0.886 total time=   5.3s
[CV 2/5] END .................max_leaf_nodes=37;, score=0.881 total time=   5.3s
[CV 3/5] END .................max_leaf_nodes=37;, score=0.885 total time=   5.7s
[CV 4/5] END .................max_leaf_nodes=37;, score=0.892 total time=   5.4s
[CV 5/5] END .................max_leaf_nodes=37;, score=0.888 total time=   6.0s
[CV 1/5] END .................max_leaf_nodes=38;, score=0.886 total time=   5.5s
[CV 2/5] END .................max_leaf_nodes=38;, score=0.881 total time=   5.3s
[CV 3/5] END .................max_leaf_nodes=38;, score=0.886 total time=   5.8s
[CV 4/5] END .................max_leaf_nodes=38;, score=0.893 total time=   5.3s
[CV 5/5] END .................max_leaf_nodes=38;, score=0.888 total time=   5.3s
[CV 1/5] END .................max_leaf_nodes=39;, score=0.886 total time=   5.7s
[CV 2/5] END .................max_leaf_nodes=39;, score=0.881 total time=   5.3s
[CV 3/5] END .................max_leaf_nodes=39;, score=0.886 total time=   5.5s
[CV 4/5] END .................max_leaf_nodes=39;, score=0.894 total time=   5.7s
[CV 5/5] END .................max_leaf_nodes=39;, score=0.887 total time=   5.3s
[CV 1/5] END .................max_leaf_nodes=40;, score=0.885 total time=   5.5s
[CV 2/5] END .................max_leaf_nodes=40;, score=0.880 total time=   5.9s
[CV 3/5] END .................max_leaf_nodes=40;, score=0.886 total time=   6.3s
[CV 4/5] END .................max_leaf_nodes=40;, score=0.895 total time=   5.8s
[CV 5/5] END .................max_leaf_nodes=40;, score=0.888 total time=   6.1s
[CV 1/5] END .................max_leaf_nodes=41;, score=0.885 total time=   5.7s
[CV 2/5] END .................max_leaf_nodes=41;, score=0.880 total time=   6.4s
[CV 3/5] END .................max_leaf_nodes=41;, score=0.886 total time=   5.9s
[CV 4/5] END .................max_leaf_nodes=41;, score=0.895 total time=   5.7s
[CV 5/5] END .................max_leaf_nodes=41;, score=0.888 total time=   6.0s
[CV 1/5] END .................max_leaf_nodes=42;, score=0.886 total time=   5.6s
[CV 2/5] END .................max_leaf_nodes=42;, score=0.880 total time=   5.5s
[CV 3/5] END .................max_leaf_nodes=42;, score=0.886 total time=   6.6s
[CV 4/5] END .................max_leaf_nodes=42;, score=0.894 total time=   5.8s
[CV 5/5] END .................max_leaf_nodes=42;, score=0.889 total time=   5.7s
[CV 1/5] END .................max_leaf_nodes=43;, score=0.887 total time=   5.7s
[CV 2/5] END .................max_leaf_nodes=43;, score=0.881 total time=   5.8s
[CV 3/5] END .................max_leaf_nodes=43;, score=0.886 total time=   6.0s
[CV 4/5] END .................max_leaf_nodes=43;, score=0.893 total time=   6.2s
[CV 5/5] END .................max_leaf_nodes=43;, score=0.890 total time=   5.8s
[CV 1/5] END .................max_leaf_nodes=44;, score=0.888 total time=   6.1s
[CV 2/5] END .................max_leaf_nodes=44;, score=0.883 total time=   6.0s
[CV 3/5] END .................max_leaf_nodes=44;, score=0.886 total time=   6.4s
[CV 4/5] END .................max_leaf_nodes=44;, score=0.894 total time=   6.1s
[CV 5/5] END .................max_leaf_nodes=44;, score=0.890 total time=   5.9s
[CV 1/5] END .................max_leaf_nodes=45;, score=0.888 total time=   6.3s
[CV 2/5] END .................max_leaf_nodes=45;, score=0.883 total time=   5.9s
[CV 3/5] END .................max_leaf_nodes=45;, score=0.887 total time=   6.3s
[CV 4/5] END .................max_leaf_nodes=45;, score=0.894 total time=   6.0s
[CV 5/5] END .................max_leaf_nodes=45;, score=0.891 total time=   5.8s
[CV 1/5] END .................max_leaf_nodes=46;, score=0.889 total time=   6.6s
[CV 2/5] END .................max_leaf_nodes=46;, score=0.883 total time=   5.9s
[CV 3/5] END .................max_leaf_nodes=46;, score=0.887 total time=   6.2s
[CV 4/5] END .................max_leaf_nodes=46;, score=0.894 total time=   6.0s
[CV 5/5] END .................max_leaf_nodes=46;, score=0.890 total time=   5.7s
[CV 1/5] END .................max_leaf_nodes=47;, score=0.889 total time=   6.2s
[CV 2/5] END .................max_leaf_nodes=47;, score=0.884 total time=   5.8s
[CV 3/5] END .................max_leaf_nodes=47;, score=0.887 total time=   5.9s
[CV 4/5] END .................max_leaf_nodes=47;, score=0.894 total time=   6.2s
[CV 5/5] END .................max_leaf_nodes=47;, score=0.890 total time=   6.0s
[CV 1/5] END .................max_leaf_nodes=48;, score=0.888 total time=   6.4s
[CV 2/5] END .................max_leaf_nodes=48;, score=0.884 total time=   6.6s
[CV 3/5] END .................max_leaf_nodes=48;, score=0.887 total time=   6.7s
[CV 4/5] END .................max_leaf_nodes=48;, score=0.895 total time=   6.6s
[CV 5/5] END .................max_leaf_nodes=48;, score=0.890 total time=   6.1s
[CV 1/5] END .................max_leaf_nodes=49;, score=0.888 total time=   6.5s
[CV 2/5] END .................max_leaf_nodes=49;, score=0.883 total time=   6.9s
[CV 3/5] END .................max_leaf_nodes=49;, score=0.886 total time=   7.1s
[CV 4/5] END .................max_leaf_nodes=49;, score=0.895 total time=   8.6s
[CV 5/5] END .................max_leaf_nodes=49;, score=0.890 total time=   9.0s
[CV 1/5] END .................max_leaf_nodes=50;, score=0.888 total time=   8.6s
[CV 2/5] END .................max_leaf_nodes=50;, score=0.883 total time=   7.7s
[CV 3/5] END .................max_leaf_nodes=50;, score=0.886 total time=   7.5s
[CV 4/5] END .................max_leaf_nodes=50;, score=0.896 total time=   7.2s
[CV 5/5] END .................max_leaf_nodes=50;, score=0.890 total time=   6.3s
[CV 1/5] END .................max_leaf_nodes=51;, score=0.888 total time=   6.3s
[CV 2/5] END .................max_leaf_nodes=51;, score=0.883 total time=   7.5s
[CV 3/5] END .................max_leaf_nodes=51;, score=0.886 total time=   7.1s
[CV 4/5] END .................max_leaf_nodes=51;, score=0.895 total time=   8.0s
[CV 5/5] END .................max_leaf_nodes=51;, score=0.890 total time=   7.7s
[CV 1/5] END .................max_leaf_nodes=52;, score=0.888 total time=   7.0s
[CV 2/5] END .................max_leaf_nodes=52;, score=0.883 total time=   6.8s
[CV 3/5] END .................max_leaf_nodes=52;, score=0.886 total time=   7.6s
[CV 4/5] END .................max_leaf_nodes=52;, score=0.896 total time=   7.5s
[CV 5/5] END .................max_leaf_nodes=52;, score=0.890 total time=   6.6s
[CV 1/5] END .................max_leaf_nodes=53;, score=0.889 total time=24.5min
[CV 2/5] END .................max_leaf_nodes=53;, score=0.883 total time=  10.6s
[CV 3/5] END .................max_leaf_nodes=53;, score=0.886 total time=   6.8s
[CV 4/5] END .................max_leaf_nodes=53;, score=0.896 total time=   6.6s
[CV 5/5] END .................max_leaf_nodes=53;, score=0.890 total time=   6.5s
[CV 1/5] END .................max_leaf_nodes=54;, score=0.889 total time=   8.1s
[CV 2/5] END .................max_leaf_nodes=54;, score=0.883 total time=   7.0s
[CV 3/5] END .................max_leaf_nodes=54;, score=0.886 total time=   7.3s
[CV 4/5] END .................max_leaf_nodes=54;, score=0.896 total time=   7.3s
[CV 5/5] END .................max_leaf_nodes=54;, score=0.890 total time=   7.6s
[CV 1/5] END .................max_leaf_nodes=55;, score=0.889 total time=   7.2s
[CV 2/5] END .................max_leaf_nodes=55;, score=0.884 total time=   7.2s
[CV 3/5] END .................max_leaf_nodes=55;, score=0.886 total time=   8.4s
[CV 4/5] END .................max_leaf_nodes=55;, score=0.896 total time=   7.9s
[CV 5/5] END .................max_leaf_nodes=55;, score=0.890 total time=   7.0s
[CV 1/5] END .................max_leaf_nodes=56;, score=0.889 total time=   7.1s
[CV 2/5] END .................max_leaf_nodes=56;, score=0.884 total time=   7.0s
[CV 3/5] END .................max_leaf_nodes=56;, score=0.886 total time=   6.9s
[CV 4/5] END .................max_leaf_nodes=56;, score=0.896 total time=   7.1s
[CV 5/5] END .................max_leaf_nodes=56;, score=0.890 total time=   6.8s
[CV 1/5] END .................max_leaf_nodes=57;, score=0.889 total time=   7.6s
[CV 2/5] END .................max_leaf_nodes=57;, score=0.884 total time=   7.5s
[CV 3/5] END .................max_leaf_nodes=57;, score=0.886 total time=   7.6s
[CV 4/5] END .................max_leaf_nodes=57;, score=0.896 total time=   7.1s
[CV 5/5] END .................max_leaf_nodes=57;, score=0.891 total time=   7.3s
[CV 1/5] END .................max_leaf_nodes=58;, score=0.889 total time=   7.1s
[CV 2/5] END .................max_leaf_nodes=58;, score=0.884 total time=   7.1s
[CV 3/5] END .................max_leaf_nodes=58;, score=0.887 total time=   7.2s
[CV 4/5] END .................max_leaf_nodes=58;, score=0.896 total time=   7.3s
[CV 5/5] END .................max_leaf_nodes=58;, score=0.891 total time=   6.9s
[CV 1/5] END .................max_leaf_nodes=59;, score=0.889 total time=   7.0s
[CV 2/5] END .................max_leaf_nodes=59;, score=0.884 total time=   7.4s
[CV 3/5] END .................max_leaf_nodes=59;, score=0.887 total time=   7.6s
[CV 4/5] END .................max_leaf_nodes=59;, score=0.896 total time=   7.4s
[CV 5/5] END .................max_leaf_nodes=59;, score=0.892 total time=   7.2s
[CV 1/5] END .................max_leaf_nodes=60;, score=0.890 total time=   7.4s
[CV 2/5] END .................max_leaf_nodes=60;, score=0.884 total time=   7.3s
[CV 3/5] END .................max_leaf_nodes=60;, score=0.887 total time=   7.2s
[CV 4/5] END .................max_leaf_nodes=60;, score=0.897 total time=   6.9s
[CV 5/5] END .................max_leaf_nodes=60;, score=0.892 total time=   6.9s
[CV 1/5] END .................max_leaf_nodes=61;, score=0.890 total time=   6.9s
[CV 2/5] END .................max_leaf_nodes=61;, score=0.884 total time=   7.1s
[CV 3/5] END .................max_leaf_nodes=61;, score=0.887 total time=   7.1s
[CV 4/5] END .................max_leaf_nodes=61;, score=0.897 total time=   6.9s
[CV 5/5] END .................max_leaf_nodes=61;, score=0.892 total time=   6.9s
[CV 1/5] END .................max_leaf_nodes=62;, score=0.890 total time=   7.1s
[CV 2/5] END .................max_leaf_nodes=62;, score=0.884 total time=   7.5s
[CV 3/5] END .................max_leaf_nodes=62;, score=0.888 total time=   7.4s
[CV 4/5] END .................max_leaf_nodes=62;, score=0.897 total time=   7.2s
[CV 5/5] END .................max_leaf_nodes=62;, score=0.892 total time=   6.9s
[CV 1/5] END .................max_leaf_nodes=63;, score=0.890 total time=   7.2s
[CV 2/5] END .................max_leaf_nodes=63;, score=0.884 total time=   7.3s
[CV 3/5] END .................max_leaf_nodes=63;, score=0.888 total time=   7.3s
[CV 4/5] END .................max_leaf_nodes=63;, score=0.897 total time=   7.3s
[CV 5/5] END .................max_leaf_nodes=63;, score=0.892 total time=   6.9s
[CV 1/5] END .................max_leaf_nodes=64;, score=0.890 total time=   7.2s
[CV 2/5] END .................max_leaf_nodes=64;, score=0.884 total time=   7.4s
[CV 3/5] END .................max_leaf_nodes=64;, score=0.888 total time=   7.3s
[CV 4/5] END .................max_leaf_nodes=64;, score=0.897 total time=   7.2s
[CV 5/5] END .................max_leaf_nodes=64;, score=0.892 total time=   6.9s
[CV 1/5] END .................max_leaf_nodes=65;, score=0.890 total time=   7.1s
[CV 2/5] END .................max_leaf_nodes=65;, score=0.884 total time=   7.6s
[CV 3/5] END .................max_leaf_nodes=65;, score=0.888 total time=   7.1s
[CV 4/5] END .................max_leaf_nodes=65;, score=0.897 total time=   7.2s
[CV 5/5] END .................max_leaf_nodes=65;, score=0.892 total time=   7.1s
[CV 1/5] END .................max_leaf_nodes=66;, score=0.890 total time=   7.1s
[CV 2/5] END .................max_leaf_nodes=66;, score=0.884 total time=   7.8s
[CV 3/5] END .................max_leaf_nodes=66;, score=0.888 total time=   7.1s
[CV 4/5] END .................max_leaf_nodes=66;, score=0.897 total time=   7.4s
[CV 5/5] END .................max_leaf_nodes=66;, score=0.892 total time=   7.1s
[CV 1/5] END .................max_leaf_nodes=67;, score=0.890 total time=   7.2s
[CV 2/5] END .................max_leaf_nodes=67;, score=0.884 total time=   7.8s
[CV 3/5] END .................max_leaf_nodes=67;, score=0.889 total time=   7.3s
[CV 4/5] END .................max_leaf_nodes=67;, score=0.897 total time=   8.0s
[CV 5/5] END .................max_leaf_nodes=67;, score=0.892 total time=   7.4s
[CV 1/5] END .................max_leaf_nodes=68;, score=0.890 total time=   7.3s
[CV 2/5] END .................max_leaf_nodes=68;, score=0.884 total time=   8.0s
[CV 3/5] END .................max_leaf_nodes=68;, score=0.889 total time=   7.5s
[CV 4/5] END .................max_leaf_nodes=68;, score=0.897 total time=   7.7s
[CV 5/5] END .................max_leaf_nodes=68;, score=0.892 total time=   7.4s
[CV 1/5] END .................max_leaf_nodes=69;, score=0.890 total time=   7.4s
[CV 2/5] END .................max_leaf_nodes=69;, score=0.884 total time=   8.1s
[CV 3/5] END .................max_leaf_nodes=69;, score=0.889 total time=   7.6s
[CV 4/5] END .................max_leaf_nodes=69;, score=0.897 total time=   7.8s
[CV 5/5] END .................max_leaf_nodes=69;, score=0.892 total time=   7.5s
[CV 1/5] END .................max_leaf_nodes=70;, score=0.890 total time=   7.6s
[CV 2/5] END .................max_leaf_nodes=70;, score=0.884 total time=   8.6s
[CV 3/5] END .................max_leaf_nodes=70;, score=0.889 total time=   8.5s
[CV 4/5] END .................max_leaf_nodes=70;, score=0.897 total time=   8.2s
[CV 5/5] END .................max_leaf_nodes=70;, score=0.892 total time=   7.8s
[CV 1/5] END .................max_leaf_nodes=71;, score=0.890 total time=   7.7s
[CV 2/5] END .................max_leaf_nodes=71;, score=0.884 total time=   8.6s
[CV 3/5] END .................max_leaf_nodes=71;, score=0.889 total time=   7.9s
[CV 4/5] END .................max_leaf_nodes=71;, score=0.897 total time=   8.1s
[CV 5/5] END .................max_leaf_nodes=71;, score=0.892 total time=   7.9s
[CV 1/5] END .................max_leaf_nodes=72;, score=0.890 total time=   7.9s
[CV 2/5] END .................max_leaf_nodes=72;, score=0.884 total time=   8.5s
[CV 3/5] END .................max_leaf_nodes=72;, score=0.889 total time=   7.9s
[CV 4/5] END .................max_leaf_nodes=72;, score=0.897 total time=   8.2s
[CV 5/5] END .................max_leaf_nodes=72;, score=0.892 total time=   7.8s
[CV 1/5] END .................max_leaf_nodes=73;, score=0.890 total time=   8.0s
[CV 2/5] END .................max_leaf_nodes=73;, score=0.884 total time=   8.6s
[CV 3/5] END .................max_leaf_nodes=73;, score=0.889 total time=   8.1s
[CV 4/5] END .................max_leaf_nodes=73;, score=0.897 total time=   8.5s
[CV 5/5] END .................max_leaf_nodes=73;, score=0.892 total time=   9.0s
[CV 1/5] END .................max_leaf_nodes=74;, score=0.890 total time=   9.0s
[CV 2/5] END .................max_leaf_nodes=74;, score=0.884 total time=   9.3s
[CV 3/5] END .................max_leaf_nodes=74;, score=0.889 total time=   8.9s
[CV 4/5] END .................max_leaf_nodes=74;, score=0.897 total time=   9.2s
[CV 5/5] END .................max_leaf_nodes=74;, score=0.892 total time=   8.8s
[CV 1/5] END .................max_leaf_nodes=75;, score=0.890 total time=   8.8s
[CV 2/5] END .................max_leaf_nodes=75;, score=0.884 total time=   9.4s
[CV 3/5] END .................max_leaf_nodes=75;, score=0.889 total time=   9.0s
[CV 4/5] END .................max_leaf_nodes=75;, score=0.897 total time=   9.1s
[CV 5/5] END .................max_leaf_nodes=75;, score=0.892 total time=   8.6s
[CV 1/5] END .................max_leaf_nodes=76;, score=0.890 total time=   8.7s
[CV 2/5] END .................max_leaf_nodes=76;, score=0.884 total time=   9.4s
[CV 3/5] END .................max_leaf_nodes=76;, score=0.889 total time=   9.0s
[CV 4/5] END .................max_leaf_nodes=76;, score=0.897 total time=   9.4s
[CV 5/5] END .................max_leaf_nodes=76;, score=0.892 total time=   8.4s
[CV 1/5] END .................max_leaf_nodes=77;, score=0.890 total time=   8.5s
[CV 2/5] END .................max_leaf_nodes=77;, score=0.884 total time=   9.0s
[CV 3/5] END .................max_leaf_nodes=77;, score=0.889 total time=   8.7s
[CV 4/5] END .................max_leaf_nodes=77;, score=0.897 total time=   8.8s
[CV 5/5] END .................max_leaf_nodes=77;, score=0.892 total time=   8.6s
[CV 1/5] END .................max_leaf_nodes=78;, score=0.890 total time=   8.7s
[CV 2/5] END .................max_leaf_nodes=78;, score=0.884 total time=   9.6s
[CV 3/5] END .................max_leaf_nodes=78;, score=0.889 total time=   8.6s
[CV 4/5] END .................max_leaf_nodes=78;, score=0.897 total time=   9.3s
[CV 5/5] END .................max_leaf_nodes=78;, score=0.892 total time=   9.0s
[CV 1/5] END .................max_leaf_nodes=79;, score=0.890 total time=   8.7s
[CV 2/5] END .................max_leaf_nodes=79;, score=0.884 total time=   9.4s
[CV 3/5] END .................max_leaf_nodes=79;, score=0.889 total time=   8.8s
[CV 4/5] END .................max_leaf_nodes=79;, score=0.897 total time=   9.0s
[CV 5/5] END .................max_leaf_nodes=79;, score=0.892 total time=   8.8s
[CV 1/5] END .................max_leaf_nodes=80;, score=0.890 total time=   8.9s
[CV 2/5] END .................max_leaf_nodes=80;, score=0.884 total time=   9.5s
[CV 3/5] END .................max_leaf_nodes=80;, score=0.889 total time=   8.9s
[CV 4/5] END .................max_leaf_nodes=80;, score=0.897 total time=   9.2s
[CV 5/5] END .................max_leaf_nodes=80;, score=0.892 total time=   9.0s
[CV 1/5] END .................max_leaf_nodes=81;, score=0.890 total time=   8.9s
[CV 2/5] END .................max_leaf_nodes=81;, score=0.884 total time=   9.6s
[CV 3/5] END .................max_leaf_nodes=81;, score=0.889 total time=   8.9s
[CV 4/5] END .................max_leaf_nodes=81;, score=0.897 total time=   9.3s
[CV 5/5] END .................max_leaf_nodes=81;, score=0.892 total time=   8.9s
[CV 1/5] END .................max_leaf_nodes=82;, score=0.890 total time=   9.2s
[CV 2/5] END .................max_leaf_nodes=82;, score=0.884 total time=   9.6s
[CV 3/5] END .................max_leaf_nodes=82;, score=0.889 total time=   9.1s
[CV 4/5] END .................max_leaf_nodes=82;, score=0.897 total time=   9.5s
[CV 5/5] END .................max_leaf_nodes=82;, score=0.892 total time=   9.2s
[CV 1/5] END .................max_leaf_nodes=83;, score=0.890 total time=   9.6s
[CV 2/5] END .................max_leaf_nodes=83;, score=0.884 total time=   9.8s
[CV 3/5] END .................max_leaf_nodes=83;, score=0.889 total time=   9.3s
[CV 4/5] END .................max_leaf_nodes=83;, score=0.897 total time=   9.9s
[CV 5/5] END .................max_leaf_nodes=83;, score=0.892 total time=   9.4s
[CV 1/5] END .................max_leaf_nodes=84;, score=0.890 total time=   9.4s
[CV 2/5] END .................max_leaf_nodes=84;, score=0.884 total time=   9.8s
[CV 3/5] END .................max_leaf_nodes=84;, score=0.889 total time=   9.3s
[CV 4/5] END .................max_leaf_nodes=84;, score=0.897 total time=   9.7s
[CV 5/5] END .................max_leaf_nodes=84;, score=0.892 total time=   9.3s
[CV 1/5] END .................max_leaf_nodes=85;, score=0.890 total time=   9.6s
[CV 2/5] END .................max_leaf_nodes=85;, score=0.884 total time=  10.0s
[CV 3/5] END .................max_leaf_nodes=85;, score=0.889 total time=   9.5s
[CV 4/5] END .................max_leaf_nodes=85;, score=0.897 total time=   9.9s
[CV 5/5] END .................max_leaf_nodes=85;, score=0.892 total time=   9.6s
[CV 1/5] END .................max_leaf_nodes=86;, score=0.890 total time=   9.6s
[CV 2/5] END .................max_leaf_nodes=86;, score=0.884 total time=  10.2s
[CV 3/5] END .................max_leaf_nodes=86;, score=0.889 total time=   9.6s
[CV 4/5] END .................max_leaf_nodes=86;, score=0.897 total time=  10.1s
[CV 5/5] END .................max_leaf_nodes=86;, score=0.892 total time=   9.7s
[CV 1/5] END .................max_leaf_nodes=87;, score=0.890 total time=   9.8s
[CV 2/5] END .................max_leaf_nodes=87;, score=0.884 total time=  10.3s
[CV 3/5] END .................max_leaf_nodes=87;, score=0.889 total time=   9.7s
[CV 4/5] END .................max_leaf_nodes=87;, score=0.897 total time=  10.1s
[CV 5/5] END .................max_leaf_nodes=87;, score=0.892 total time=   9.8s
[CV 1/5] END .................max_leaf_nodes=88;, score=0.890 total time=   9.9s
[CV 2/5] END .................max_leaf_nodes=88;, score=0.884 total time=  10.3s
[CV 3/5] END .................max_leaf_nodes=88;, score=0.889 total time=  10.0s
[CV 4/5] END .................max_leaf_nodes=88;, score=0.897 total time=  10.5s
[CV 5/5] END .................max_leaf_nodes=88;, score=0.892 total time=  10.7s
[CV 1/5] END .................max_leaf_nodes=89;, score=0.890 total time=  10.7s
[CV 2/5] END .................max_leaf_nodes=89;, score=0.884 total time=  11.1s
[CV 3/5] END .................max_leaf_nodes=89;, score=0.889 total time=  10.7s
[CV 4/5] END .................max_leaf_nodes=89;, score=0.897 total time=  11.1s
[CV 5/5] END .................max_leaf_nodes=89;, score=0.892 total time=  10.7s
[CV 1/5] END .................max_leaf_nodes=90;, score=0.890 total time=  10.8s
[CV 2/5] END .................max_leaf_nodes=90;, score=0.884 total time=  10.9s
[CV 3/5] END .................max_leaf_nodes=90;, score=0.889 total time=  10.2s
[CV 4/5] END .................max_leaf_nodes=90;, score=0.897 total time=  10.6s
[CV 5/5] END .................max_leaf_nodes=90;, score=0.892 total time=  10.4s
[CV 1/5] END .................max_leaf_nodes=91;, score=0.890 total time=  10.7s
[CV 2/5] END .................max_leaf_nodes=91;, score=0.884 total time=  11.3s
[CV 3/5] END .................max_leaf_nodes=91;, score=0.889 total time=  11.0s
[CV 4/5] END .................max_leaf_nodes=91;, score=0.897 total time=  10.9s
[CV 5/5] END .................max_leaf_nodes=91;, score=0.892 total time=  10.8s
[CV 1/5] END .................max_leaf_nodes=92;, score=0.890 total time=  10.9s
[CV 2/5] END .................max_leaf_nodes=92;, score=0.884 total time=  11.2s
[CV 3/5] END .................max_leaf_nodes=92;, score=0.889 total time=  10.7s
[CV 4/5] END .................max_leaf_nodes=92;, score=0.897 total time=  11.1s
[CV 5/5] END .................max_leaf_nodes=92;, score=0.892 total time=  10.8s
[CV 1/5] END .................max_leaf_nodes=93;, score=0.890 total time=  10.6s
[CV 2/5] END .................max_leaf_nodes=93;, score=0.884 total time=  11.2s
[CV 3/5] END .................max_leaf_nodes=93;, score=0.889 total time=  10.6s
[CV 4/5] END .................max_leaf_nodes=93;, score=0.897 total time=  10.7s
[CV 5/5] END .................max_leaf_nodes=93;, score=0.892 total time=  10.6s
[CV 1/5] END .................max_leaf_nodes=94;, score=0.890 total time=  10.8s
[CV 2/5] END .................max_leaf_nodes=94;, score=0.884 total time=  10.9s
[CV 3/5] END .................max_leaf_nodes=94;, score=0.889 total time=  10.8s
[CV 4/5] END .................max_leaf_nodes=94;, score=0.897 total time=  11.1s
[CV 5/5] END .................max_leaf_nodes=94;, score=0.892 total time=  10.6s
[CV 1/5] END .................max_leaf_nodes=95;, score=0.890 total time=  11.2s
[CV 2/5] END .................max_leaf_nodes=95;, score=0.884 total time=  11.4s
[CV 3/5] END .................max_leaf_nodes=95;, score=0.889 total time=  10.7s
[CV 4/5] END .................max_leaf_nodes=95;, score=0.897 total time=  11.1s
[CV 5/5] END .................max_leaf_nodes=95;, score=0.892 total time=  10.7s
[CV 1/5] END .................max_leaf_nodes=96;, score=0.890 total time=  11.0s
[CV 2/5] END .................max_leaf_nodes=96;, score=0.884 total time=  11.2s
[CV 3/5] END .................max_leaf_nodes=96;, score=0.889 total time=  11.0s
[CV 4/5] END .................max_leaf_nodes=96;, score=0.897 total time=  11.2s
[CV 5/5] END .................max_leaf_nodes=96;, score=0.892 total time=  10.8s
[CV 1/5] END .................max_leaf_nodes=97;, score=0.890 total time=  11.3s
[CV 2/5] END .................max_leaf_nodes=97;, score=0.884 total time=  11.5s
[CV 3/5] END .................max_leaf_nodes=97;, score=0.889 total time=  11.1s
[CV 4/5] END .................max_leaf_nodes=97;, score=0.897 total time=  11.2s
[CV 5/5] END .................max_leaf_nodes=97;, score=0.892 total time=  11.0s
[CV 1/5] END .................max_leaf_nodes=98;, score=0.890 total time=  11.1s
[CV 2/5] END .................max_leaf_nodes=98;, score=0.884 total time=  11.5s
[CV 3/5] END .................max_leaf_nodes=98;, score=0.889 total time=  11.2s
[CV 4/5] END .................max_leaf_nodes=98;, score=0.897 total time=  11.4s
[CV 5/5] END .................max_leaf_nodes=98;, score=0.892 total time=  11.2s
[CV 1/5] END .................max_leaf_nodes=99;, score=0.890 total time=  12.1s
[CV 2/5] END .................max_leaf_nodes=99;, score=0.884 total time=  12.3s
[CV 3/5] END .................max_leaf_nodes=99;, score=0.889 total time=  12.3s
[CV 4/5] END .................max_leaf_nodes=99;, score=0.897 total time=  12.4s
[CV 5/5] END .................max_leaf_nodes=99;, score=0.892 total time=  11.8s

Out[497]:

GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4,
                                              min_samples_split=0.1,
                                              random_state=0),
             param_grid={'max_leaf_nodes': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
                                            12, 13, 14, 15, 16, 17, 18, 19, 20,
                                            21, 22, 23, 24, 25, 26, 27, 28, 29,
                                            30, ...]},
             verbose=3)

Out[497]:

GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, min_samples_leaf=4,
                                              min_samples_split=0.1,
                                              random_state=0),
             param_grid={'max_leaf_nodes': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
                                            12, 13, 14, 15, 16, 17, 18, 19, 20,
                                            21, 22, 23, 24, 25, 26, 27, 28, 29,
                                            30, ...]},
             verbose=3)

In [498]:

gs.best_params_

Out[498]:

{'max_leaf_nodes': 63}

min_impurity_decrease¶

This parameter takes floating point value where the default value is 0.0. If a split causes an impurity level to drop by more than or equal to this amount, a node will be split. GridSearchCV is used to find the best parameter value from the range of 0.1 to 1. And the best results were obtained using 0.1 as min_impurity_decrease.

In [500]:

gs=GridSearchCV(DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, min_samples_split=0.1, 
                             min_samples_leaf=4,max_features=None,
                                       max_leaf_nodes=63,random_state=0),
             param_grid={'min_impurity_decrease': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]},
             verbose=3)

In [501]:

gs.fit(X_train, y_train)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END .........min_impurity_decrease=0.1;, score=0.775 total time=   0.4s
[CV 2/5] END .........min_impurity_decrease=0.1;, score=0.775 total time=   0.4s
[CV 3/5] END .........min_impurity_decrease=0.1;, score=0.775 total time=   0.4s
[CV 4/5] END .........min_impurity_decrease=0.1;, score=0.775 total time=   0.4s
[CV 5/5] END .........min_impurity_decrease=0.1;, score=0.775 total time=   0.4s
[CV 1/5] END .........min_impurity_decrease=0.2;, score=0.775 total time=   0.4s
[CV 2/5] END .........min_impurity_decrease=0.2;, score=0.775 total time=   0.4s
[CV 3/5] END .........min_impurity_decrease=0.2;, score=0.775 total time=   0.4s
[CV 4/5] END .........min_impurity_decrease=0.2;, score=0.775 total time=   0.3s
[CV 5/5] END .........min_impurity_decrease=0.2;, score=0.775 total time=   0.4s
[CV 1/5] END .........min_impurity_decrease=0.3;, score=0.775 total time=   0.4s
[CV 2/5] END .........min_impurity_decrease=0.3;, score=0.775 total time=   0.4s
[CV 3/5] END .........min_impurity_decrease=0.3;, score=0.775 total time=   0.4s
[CV 4/5] END .........min_impurity_decrease=0.3;, score=0.775 total time=   0.4s
[CV 5/5] END .........min_impurity_decrease=0.3;, score=0.775 total time=   0.4s
[CV 1/5] END .........min_impurity_decrease=0.4;, score=0.775 total time=   0.4s
[CV 2/5] END .........min_impurity_decrease=0.4;, score=0.775 total time=   0.4s
[CV 3/5] END .........min_impurity_decrease=0.4;, score=0.775 total time=   0.4s
[CV 4/5] END .........min_impurity_decrease=0.4;, score=0.775 total time=   0.4s
[CV 5/5] END .........min_impurity_decrease=0.4;, score=0.775 total time=   0.3s
[CV 1/5] END .........min_impurity_decrease=0.5;, score=0.775 total time=   0.3s
[CV 2/5] END .........min_impurity_decrease=0.5;, score=0.775 total time=   0.3s
[CV 3/5] END .........min_impurity_decrease=0.5;, score=0.775 total time=   0.3s
[CV 4/5] END .........min_impurity_decrease=0.5;, score=0.775 total time=   0.3s
[CV 5/5] END .........min_impurity_decrease=0.5;, score=0.775 total time=   0.3s
[CV 1/5] END .........min_impurity_decrease=0.6;, score=0.775 total time=   0.3s
[CV 2/5] END .........min_impurity_decrease=0.6;, score=0.775 total time=   0.3s
[CV 3/5] END .........min_impurity_decrease=0.6;, score=0.775 total time=   0.3s
[CV 4/5] END .........min_impurity_decrease=0.6;, score=0.775 total time=   0.3s
[CV 5/5] END .........min_impurity_decrease=0.6;, score=0.775 total time=   0.3s
[CV 1/5] END .........min_impurity_decrease=0.7;, score=0.775 total time=   0.3s
[CV 2/5] END .........min_impurity_decrease=0.7;, score=0.775 total time=   0.3s
[CV 3/5] END .........min_impurity_decrease=0.7;, score=0.775 total time=   0.3s
[CV 4/5] END .........min_impurity_decrease=0.7;, score=0.775 total time=   0.3s
[CV 5/5] END .........min_impurity_decrease=0.7;, score=0.775 total time=   0.3s
[CV 1/5] END .........min_impurity_decrease=0.8;, score=0.775 total time=   0.3s
[CV 2/5] END .........min_impurity_decrease=0.8;, score=0.775 total time=   0.3s
[CV 3/5] END .........min_impurity_decrease=0.8;, score=0.775 total time=   0.3s
[CV 4/5] END .........min_impurity_decrease=0.8;, score=0.775 total time=   0.3s
[CV 5/5] END .........min_impurity_decrease=0.8;, score=0.775 total time=   0.3s
[CV 1/5] END .........min_impurity_decrease=0.9;, score=0.775 total time=   0.4s
[CV 2/5] END .........min_impurity_decrease=0.9;, score=0.775 total time=   0.4s
[CV 3/5] END .........min_impurity_decrease=0.9;, score=0.775 total time=   0.4s
[CV 4/5] END .........min_impurity_decrease=0.9;, score=0.775 total time=   0.3s
[CV 5/5] END .........min_impurity_decrease=0.9;, score=0.775 total time=   0.3s

Out[501]:

GridSearchCV(estimator=DecisionTreeClassifier(max_depth=65, max_leaf_nodes=63,
                                              min_samples_leaf=4,
                                              min_samples_split=0.1,
                                              random_state=0),
             param_grid={'min_impurity_decrease': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
                                                   0.7, 0.8, 0.9]},
             verbose=3)

In [502]:

clf.get_params()

Out[502]:

{'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'random_state': 0,
 'splitter': 'best'}

In [503]:

gs.best_params_

Out[503]:

{'min_impurity_decrease': 0.1}

The fine-tuned hyperparameters with their values are
DecisionTreeClassifier (criterion='gini', splitter='best', max_depth=65, min_samples_split=0.1, min_samples_leaf=4, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, random_state=0)

In [504]:

clf = DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, min_samples_split=0.1, 
                             min_samples_leaf=4,max_features=None,
                             max_leaf_nodes=None,min_impurity_decrease=0.0,
                             random_state=0)
clf.fit(X_train, y_train)

Out[504]:

DecisionTreeClassifier(max_depth=65, min_samples_leaf=4, min_samples_split=0.1,
                       random_state=0)

In [505]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

89.45527908540686

In [506]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8945527908540686
              precision    recall  f1-score   support

           0       0.44      0.15      0.22       427
           1       0.94      0.94      0.94      5747
           2       0.77      0.93      0.84      1261

    accuracy                           0.89      7435
   macro avg       0.72      0.67      0.67      7435
weighted avg       0.88      0.89      0.88      7435

In [507]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[  63  290   74]
 [  58 5410  279]
 [  22   61 1178]]

Ensemble¶

BaggingClassifier¶

A bagging classifier is an ensemble meta-estimator that fits base classifiers one at a time to random subsets of the original dataset, then combines each prediction (either through voting or average) to create the final prediction. Typically, a black-box estimator's variance can be decreased by using a meta-estimator of this type (E.g., a decision tree) by adding randomization to its construction process and creating an ensemble from it. Bagging Classifier is applied in two ways. First the bagging classifier is applied on decision tree with default parameters and then using fine-tuned parameters.

In [509]:

from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(DecisionTreeClassifier(
                             random_state=0))
clf.fit(X_train, y_train)

Out[509]:

BaggingClassifier(estimator=DecisionTreeClassifier(random_state=0))

In [510]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

89.44182918628111

In [511]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8944182918628111
              precision    recall  f1-score   support

           0       0.46      0.22      0.30       427
           1       0.93      0.94      0.94      5747
           2       0.80      0.92      0.85      1261

    accuracy                           0.89      7435
   macro avg       0.73      0.69      0.70      7435
weighted avg       0.88      0.89      0.89      7435

In [512]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[  94  278   55]
 [ 104 5401  242]
 [   7   99 1155]]

The Bagging classifier modeled with fine-tuned parameters performed better. The confusion matrix for bagging classifier created with fine-tuned decision tree parameters is

In [513]:

from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, min_samples_split=0.1, 
                             min_samples_leaf=4,max_features=None,
                             max_leaf_nodes=None,min_impurity_decrease=0.0,
                             random_state=0))
clf.fit(X_train, y_train)

Out[513]:

BaggingClassifier(estimator=DecisionTreeClassifier(max_depth=65,
                                                   min_samples_leaf=4,
                                                   min_samples_split=0.1,
                                                   random_state=0))

In [514]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

89.06523201075991

In [515]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8906523201075992
              precision    recall  f1-score   support

           0       0.44      0.16      0.23       427
           1       0.95      0.93      0.94      5747
           2       0.74      0.95      0.83      1261

    accuracy                           0.89      7435
   macro avg       0.71      0.68      0.67      7435
weighted avg       0.88      0.89      0.88      7435

In [516]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[  68  261   98]
 [  65 5356  326]
 [  20   43 1198]]

In [517]:

from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(
    DecisionTreeClassifier(
    random_state=0))
clf.fit(X_train, y_train)

Out[517]:

BaggingClassifier(estimator=DecisionTreeClassifier(random_state=0))

In [518]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

88.97108271687962

In [519]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8897108271687962
              precision    recall  f1-score   support

           0       0.41      0.22      0.28       427
           1       0.94      0.93      0.93      5747
           2       0.79      0.91      0.85      1261

    accuracy                           0.89      7435
   macro avg       0.71      0.69      0.69      7435
weighted avg       0.88      0.89      0.88      7435

In [520]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[  92  280   55]
 [ 120 5370  257]
 [  15   93 1153]]

AdaBoostClassifier¶

A meta-estimator called an AdaBoost classifier starts by fitting a classifier to the initial dataset. It then fits additional copies of the classifier to the same dataset, but with the weights of instances that were incorrectly classified being changed so that later classifiers would concentrate more on challenging cases. AdaBoost Classifier is applied in two ways. First the AdaBoost classifier is applied on decision tree with default parameters and then using fine-tuned parameters.

In [521]:

from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier(
    base_estimator=DecisionTreeClassifier(
                             random_state=0))
clf.fit(X_train, y_train)

Out[521]:

AdaBoostClassifier(base_estimator=DecisionTreeClassifier(random_state=0))

In [522]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

88.15063887020848

In [523]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8815063887020848
              precision    recall  f1-score   support

           0       0.37      0.24      0.29       427
           1       0.93      0.93      0.93      5747
           2       0.79      0.86      0.83      1261

    accuracy                           0.88      7435
   macro avg       0.70      0.68      0.68      7435
weighted avg       0.87      0.88      0.88      7435

In [524]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[ 104  266   57]
 [ 154 5361  232]
 [  23  149 1089]]

The AdaBoost classifier modeled with default parameters performed better. The confusion matrix for AdaBoost classifier created with default decision tree parameters is

In [525]:

from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier(
    base_estimator=DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, min_samples_split=0.1, 
                             min_samples_leaf=4,max_features=None,
                             max_leaf_nodes=None,min_impurity_decrease=0.0,
                             random_state=0))
clf.fit(X_train, y_train)

Out[525]:

AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=65,
                                                         min_samples_leaf=4,
                                                         min_samples_split=0.1,
                                                         random_state=0))

In [526]:

predictions=clf.predict(X_test)
score=clf.score(X_test,y_test)
print(score*100)

83.81977135171486

In [527]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8381977135171487
              precision    recall  f1-score   support

           0       0.24      0.27      0.25       427
           1       0.89      0.91      0.90      5747
           2       0.81      0.71      0.76      1261

    accuracy                           0.84      7435
   macro avg       0.65      0.63      0.64      7435
weighted avg       0.84      0.84      0.84      7435

In [528]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[ 115  275   37]
 [ 355 5225  167]
 [  16  353  892]]

VotingClassifier¶

Using a majority vote or the average predicted probabilities (soft vote), the Voting Classifier combines conceptually distinct machine learning classifiers to predict class labels. In order to counteract the weaknesses of a group of models with similar performance, a classifier like this can be helpful. In order to predict class labels, the Voting Classifier combines conceptually different machine learning classifiers and casts a vote based on a majority or the average predicted probabilities (soft vote). A classifier like this can be useful to compensate for the shortcomings of a set of models with comparable performance. Three classifiers are used in Voting classifier. The first classifier is are MultinomialNB with fine-tuned parameters, the second is DecisionTreeClassifier with default parameters and the third is DecisionTreeClassifier with fine-tuned parameters. Different weights are assigned to all these classifiers.

In [529]:

from sklearn.ensemble import VotingClassifier

The DecisionTreeClassifier classifier modeled with the weights 1 for MultinomialNB with fine-tuned parameters, 1, for decision tree with default parameters and 2 for decision tree with fine-tuned parameters performed the best. The confusion matrix for voting classifier created with three classifiers with weights 1,1,2 is

In [530]:

clf1 = MultinomialNB(alpha=1.0, fit_prior=False, class_prior=None)
clf2 = DecisionTreeClassifier(random_state=0)
clf3 = DecisionTreeClassifier(criterion='gini',splitter='best',
                             max_depth=65, min_samples_split=0.1, 
                             min_samples_leaf=4,max_features=None,
                             max_leaf_nodes=None,min_impurity_decrease=0.0,
                             random_state=0)
eclf = VotingClassifier(estimators=[('mnb', clf1), ('dt', clf2), ('ft-dt', clf3)],
                        voting='soft', weights=[1, 1, 2])
#
eclf.fit(X_train, y_train)

Out[530]:

VotingClassifier(estimators=[('mnb', MultinomialNB(fit_prior=False)),
                             ('dt', DecisionTreeClassifier(random_state=0)),
                             ('ft-dt',
                              DecisionTreeClassifier(max_depth=65,
                                                     min_samples_leaf=4,
                                                     min_samples_split=0.1,
                                                     random_state=0))],
                 voting='soft', weights=[1, 1, 2])

In [531]:

predictions=eclf.predict(X_test)
score=eclf.score(X_test,y_test)
print(score*100)

89.14593140551446

In [532]:

print("Accuracy:",metrics.accuracy_score(y_test, predictions))
print(metrics.classification_report(y_test,predictions))

Accuracy: 0.8914593140551446
              precision    recall  f1-score   support

           0       0.47      0.12      0.20       427
           1       0.92      0.95      0.94      5747
           2       0.79      0.88      0.84      1261

    accuracy                           0.89      7435
   macro avg       0.73      0.65      0.66      7435
weighted avg       0.87      0.89      0.88      7435

In [533]:

cm=metrics.confusion_matrix(y_test,predictions)
print(cm)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square=True, cmap="Blues_r");
plt.ylabel("Actual label");
plt.xlabel("Predicted label")
all_sample_title="Accuracy score: {0}".format(score)
plt.title(all_sample_title,size=15)
plt.show()

[[  53  315   59]
 [  54 5465  228]
 [   6  145 1110]]

In [ ]:

Abusive Language Detection Code - Code Explanation

Supervised Machine Learning

Language Classification

Abusive Language Detection

Dataset: Hate Speech and Offensive Language Dataset

Data Characteristics¶

Index¶

Count¶

hate_speech¶

Offensive_language¶

neither¶

class¶

Tweet¶

Data Pre-processing¶

The data is checked for null values but no null value is found.¶

The data is checked for duplicated records but no duplicate value is found in the data¶

Exploratory Data Analysis¶

Machine Learning¶

Count Vectorizer¶

MultinomialNB¶

TfidfVectorizer¶

MultinomialNB¶

Count Vectorizer¶

Fine-Tuning¶

Count Vectorizer¶

DecisionTreeClassifier¶

Hyper parameter fine-tuning¶

max_depth¶

min_samples_split¶

min_samples_leaf¶

max_features¶

max_leaf_nodes¶

min_impurity_decrease¶

Ensemble¶

BaggingClassifier¶

AdaBoostClassifier¶

VotingClassifier¶

Similar Post:

Comments

Post a Comment