Decision tree in r is not forming with my training data
library(caret)
library(rpart.plot)
car_df <- read.csv("TrainingDataSet.csv", sep = ',', header = TRUE)
str(car_df)
set.seed(3033)
intrain <- createDataPartition(y = car_df$Result, p= 0.7, list = FALSE)
training <- car_df[intrain,]
testing <- car_df[-intrain,]
dim(training)
dim(testing)
anyNA(car_df)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
I get this warning:
Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights,
info = trainInfo, : There were missing values in resampled
performance measures.
I am trying to classify whether a movie is hit or flop using number of positive and negative sentiment. Here my data
dput(car_df)
structure(list(MovieName = structure(c(20L, 5L, 31L, 26L, 27L,
12L, 36L, 29L, 38L, 4L, 6L, 8L, 10L, 15L, 18L, 21L, 24L, 34L,
35L, 7L, 37L, 25L, 23L, 2L, 11L, 40L, 33L, 28L, 14L, 3L, 17L,
16L, 32L, 22L, 30L, 1L, 19L, 39L, 9L, 13L), .Label = c("#96Movie",
"#alphamovie", "#APrivateWar", "#AStarIsBorn", "#BlackPanther",
"#BohemianRhapsody", "#CCV", "#Creed2", "#CrimesOfGrindelwald",
"#Deadpool2", "#firstman", "#GameNight", "#GreenBookMovie", "#grinchmovie",
"#Incredibles2", "#indivisiblemovie", "#InstantFamily", "#JurassicWorld",
"#KolamaavuKokila", "#Oceans8", "#Overlord", "#PariyerumPerumal",
"#RalphBreaksTheInternet", "#Rampage", "#Ratchasan", "#ReadyPlayerOne",
"#RedSparrow", "#RobinHoodMovie", "#Sarkar", "#Seemaraja", "#Skyscraper",
"#Suspiria", "#TheLastKey", "#TheNun", "#ThugsOfHindostan", "#TombRaider",
"#VadaChennai", "#Venom", "#Vishwaroopam2", "#WidowsMovie"), class = "factor"),
PositivePercent = c(40.10554, 67.65609, 80.46796, 71.34831,
45.36082, 68.82591, 46.78068, 63.85787, 47.20497, 32.11753,
63.7, 39.2, 82.76553, 88.78613, 72.18274, 72.43187, 31.0089,
38.50932, 38.9, 19.9, 84.26854, 29.4382, 58.13953, 86.9281,
64.54965, 56, 0, 56.61914, 58.82353, 54.98891, 78.21682,
90, 64.3002, 85.8, 51.625, 67.71894, 92.21557, 53.84615,
40.12158, 68.08081), NegativePercent = c(11.34565, 21.28966,
6.408952, 13.10861, 26.80412, 17.10526, 18.61167, 10.55838,
46.48033, 56.231, 9.9, 12.1, 9.018036, 6.473988, 13.90863,
16.77149, 63.20475, 42.54658, 40.9, 5.4, 3.907816, 2.022472,
10.51567, 3.267974, 15.12702, 15.3, 100, 18.12627, 11.76471,
13.41463, 5.775076, 10, 20.08114, 2.1, 5.5, 7.739308, 0,
34.61538, 12.86727, 10.70707), Result = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Flop", "Hit"
), class = "factor")), class = "data.frame", row.names = c(NA,
-40L))
r machine-learning decision-tree
add a comment |
library(caret)
library(rpart.plot)
car_df <- read.csv("TrainingDataSet.csv", sep = ',', header = TRUE)
str(car_df)
set.seed(3033)
intrain <- createDataPartition(y = car_df$Result, p= 0.7, list = FALSE)
training <- car_df[intrain,]
testing <- car_df[-intrain,]
dim(training)
dim(testing)
anyNA(car_df)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
I get this warning:
Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights,
info = trainInfo, : There were missing values in resampled
performance measures.
I am trying to classify whether a movie is hit or flop using number of positive and negative sentiment. Here my data
dput(car_df)
structure(list(MovieName = structure(c(20L, 5L, 31L, 26L, 27L,
12L, 36L, 29L, 38L, 4L, 6L, 8L, 10L, 15L, 18L, 21L, 24L, 34L,
35L, 7L, 37L, 25L, 23L, 2L, 11L, 40L, 33L, 28L, 14L, 3L, 17L,
16L, 32L, 22L, 30L, 1L, 19L, 39L, 9L, 13L), .Label = c("#96Movie",
"#alphamovie", "#APrivateWar", "#AStarIsBorn", "#BlackPanther",
"#BohemianRhapsody", "#CCV", "#Creed2", "#CrimesOfGrindelwald",
"#Deadpool2", "#firstman", "#GameNight", "#GreenBookMovie", "#grinchmovie",
"#Incredibles2", "#indivisiblemovie", "#InstantFamily", "#JurassicWorld",
"#KolamaavuKokila", "#Oceans8", "#Overlord", "#PariyerumPerumal",
"#RalphBreaksTheInternet", "#Rampage", "#Ratchasan", "#ReadyPlayerOne",
"#RedSparrow", "#RobinHoodMovie", "#Sarkar", "#Seemaraja", "#Skyscraper",
"#Suspiria", "#TheLastKey", "#TheNun", "#ThugsOfHindostan", "#TombRaider",
"#VadaChennai", "#Venom", "#Vishwaroopam2", "#WidowsMovie"), class = "factor"),
PositivePercent = c(40.10554, 67.65609, 80.46796, 71.34831,
45.36082, 68.82591, 46.78068, 63.85787, 47.20497, 32.11753,
63.7, 39.2, 82.76553, 88.78613, 72.18274, 72.43187, 31.0089,
38.50932, 38.9, 19.9, 84.26854, 29.4382, 58.13953, 86.9281,
64.54965, 56, 0, 56.61914, 58.82353, 54.98891, 78.21682,
90, 64.3002, 85.8, 51.625, 67.71894, 92.21557, 53.84615,
40.12158, 68.08081), NegativePercent = c(11.34565, 21.28966,
6.408952, 13.10861, 26.80412, 17.10526, 18.61167, 10.55838,
46.48033, 56.231, 9.9, 12.1, 9.018036, 6.473988, 13.90863,
16.77149, 63.20475, 42.54658, 40.9, 5.4, 3.907816, 2.022472,
10.51567, 3.267974, 15.12702, 15.3, 100, 18.12627, 11.76471,
13.41463, 5.775076, 10, 20.08114, 2.1, 5.5, 7.739308, 0,
34.61538, 12.86727, 10.70707), Result = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Flop", "Hit"
), class = "factor")), class = "data.frame", row.names = c(NA,
-40L))
r machine-learning decision-tree
Can you include a sample of your data using thedput(car_df)
command?
– RAB
Nov 25 '18 at 23:52
i have edited please find it above @user10626943
– Rakesh
Nov 25 '18 at 23:55
This is a warning, not an error. Does you tree get fitted or not?
– desertnaut
Nov 26 '18 at 0:01
no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?
– Rakesh
Nov 26 '18 at 0:03
i converted to factor but still same error msg
– Rakesh
Nov 26 '18 at 0:13
add a comment |
library(caret)
library(rpart.plot)
car_df <- read.csv("TrainingDataSet.csv", sep = ',', header = TRUE)
str(car_df)
set.seed(3033)
intrain <- createDataPartition(y = car_df$Result, p= 0.7, list = FALSE)
training <- car_df[intrain,]
testing <- car_df[-intrain,]
dim(training)
dim(testing)
anyNA(car_df)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
I get this warning:
Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights,
info = trainInfo, : There were missing values in resampled
performance measures.
I am trying to classify whether a movie is hit or flop using number of positive and negative sentiment. Here my data
dput(car_df)
structure(list(MovieName = structure(c(20L, 5L, 31L, 26L, 27L,
12L, 36L, 29L, 38L, 4L, 6L, 8L, 10L, 15L, 18L, 21L, 24L, 34L,
35L, 7L, 37L, 25L, 23L, 2L, 11L, 40L, 33L, 28L, 14L, 3L, 17L,
16L, 32L, 22L, 30L, 1L, 19L, 39L, 9L, 13L), .Label = c("#96Movie",
"#alphamovie", "#APrivateWar", "#AStarIsBorn", "#BlackPanther",
"#BohemianRhapsody", "#CCV", "#Creed2", "#CrimesOfGrindelwald",
"#Deadpool2", "#firstman", "#GameNight", "#GreenBookMovie", "#grinchmovie",
"#Incredibles2", "#indivisiblemovie", "#InstantFamily", "#JurassicWorld",
"#KolamaavuKokila", "#Oceans8", "#Overlord", "#PariyerumPerumal",
"#RalphBreaksTheInternet", "#Rampage", "#Ratchasan", "#ReadyPlayerOne",
"#RedSparrow", "#RobinHoodMovie", "#Sarkar", "#Seemaraja", "#Skyscraper",
"#Suspiria", "#TheLastKey", "#TheNun", "#ThugsOfHindostan", "#TombRaider",
"#VadaChennai", "#Venom", "#Vishwaroopam2", "#WidowsMovie"), class = "factor"),
PositivePercent = c(40.10554, 67.65609, 80.46796, 71.34831,
45.36082, 68.82591, 46.78068, 63.85787, 47.20497, 32.11753,
63.7, 39.2, 82.76553, 88.78613, 72.18274, 72.43187, 31.0089,
38.50932, 38.9, 19.9, 84.26854, 29.4382, 58.13953, 86.9281,
64.54965, 56, 0, 56.61914, 58.82353, 54.98891, 78.21682,
90, 64.3002, 85.8, 51.625, 67.71894, 92.21557, 53.84615,
40.12158, 68.08081), NegativePercent = c(11.34565, 21.28966,
6.408952, 13.10861, 26.80412, 17.10526, 18.61167, 10.55838,
46.48033, 56.231, 9.9, 12.1, 9.018036, 6.473988, 13.90863,
16.77149, 63.20475, 42.54658, 40.9, 5.4, 3.907816, 2.022472,
10.51567, 3.267974, 15.12702, 15.3, 100, 18.12627, 11.76471,
13.41463, 5.775076, 10, 20.08114, 2.1, 5.5, 7.739308, 0,
34.61538, 12.86727, 10.70707), Result = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Flop", "Hit"
), class = "factor")), class = "data.frame", row.names = c(NA,
-40L))
r machine-learning decision-tree
library(caret)
library(rpart.plot)
car_df <- read.csv("TrainingDataSet.csv", sep = ',', header = TRUE)
str(car_df)
set.seed(3033)
intrain <- createDataPartition(y = car_df$Result, p= 0.7, list = FALSE)
training <- car_df[intrain,]
testing <- car_df[-intrain,]
dim(training)
dim(testing)
anyNA(car_df)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
I get this warning:
Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights,
info = trainInfo, : There were missing values in resampled
performance measures.
I am trying to classify whether a movie is hit or flop using number of positive and negative sentiment. Here my data
dput(car_df)
structure(list(MovieName = structure(c(20L, 5L, 31L, 26L, 27L,
12L, 36L, 29L, 38L, 4L, 6L, 8L, 10L, 15L, 18L, 21L, 24L, 34L,
35L, 7L, 37L, 25L, 23L, 2L, 11L, 40L, 33L, 28L, 14L, 3L, 17L,
16L, 32L, 22L, 30L, 1L, 19L, 39L, 9L, 13L), .Label = c("#96Movie",
"#alphamovie", "#APrivateWar", "#AStarIsBorn", "#BlackPanther",
"#BohemianRhapsody", "#CCV", "#Creed2", "#CrimesOfGrindelwald",
"#Deadpool2", "#firstman", "#GameNight", "#GreenBookMovie", "#grinchmovie",
"#Incredibles2", "#indivisiblemovie", "#InstantFamily", "#JurassicWorld",
"#KolamaavuKokila", "#Oceans8", "#Overlord", "#PariyerumPerumal",
"#RalphBreaksTheInternet", "#Rampage", "#Ratchasan", "#ReadyPlayerOne",
"#RedSparrow", "#RobinHoodMovie", "#Sarkar", "#Seemaraja", "#Skyscraper",
"#Suspiria", "#TheLastKey", "#TheNun", "#ThugsOfHindostan", "#TombRaider",
"#VadaChennai", "#Venom", "#Vishwaroopam2", "#WidowsMovie"), class = "factor"),
PositivePercent = c(40.10554, 67.65609, 80.46796, 71.34831,
45.36082, 68.82591, 46.78068, 63.85787, 47.20497, 32.11753,
63.7, 39.2, 82.76553, 88.78613, 72.18274, 72.43187, 31.0089,
38.50932, 38.9, 19.9, 84.26854, 29.4382, 58.13953, 86.9281,
64.54965, 56, 0, 56.61914, 58.82353, 54.98891, 78.21682,
90, 64.3002, 85.8, 51.625, 67.71894, 92.21557, 53.84615,
40.12158, 68.08081), NegativePercent = c(11.34565, 21.28966,
6.408952, 13.10861, 26.80412, 17.10526, 18.61167, 10.55838,
46.48033, 56.231, 9.9, 12.1, 9.018036, 6.473988, 13.90863,
16.77149, 63.20475, 42.54658, 40.9, 5.4, 3.907816, 2.022472,
10.51567, 3.267974, 15.12702, 15.3, 100, 18.12627, 11.76471,
13.41463, 5.775076, 10, 20.08114, 2.1, 5.5, 7.739308, 0,
34.61538, 12.86727, 10.70707), Result = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Flop", "Hit"
), class = "factor")), class = "data.frame", row.names = c(NA,
-40L))
r machine-learning decision-tree
r machine-learning decision-tree
edited Nov 26 '18 at 0:33
mischva11
890818
890818
asked Nov 25 '18 at 23:48
RakeshRakesh
3417
3417
Can you include a sample of your data using thedput(car_df)
command?
– RAB
Nov 25 '18 at 23:52
i have edited please find it above @user10626943
– Rakesh
Nov 25 '18 at 23:55
This is a warning, not an error. Does you tree get fitted or not?
– desertnaut
Nov 26 '18 at 0:01
no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?
– Rakesh
Nov 26 '18 at 0:03
i converted to factor but still same error msg
– Rakesh
Nov 26 '18 at 0:13
add a comment |
Can you include a sample of your data using thedput(car_df)
command?
– RAB
Nov 25 '18 at 23:52
i have edited please find it above @user10626943
– Rakesh
Nov 25 '18 at 23:55
This is a warning, not an error. Does you tree get fitted or not?
– desertnaut
Nov 26 '18 at 0:01
no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?
– Rakesh
Nov 26 '18 at 0:03
i converted to factor but still same error msg
– Rakesh
Nov 26 '18 at 0:13
Can you include a sample of your data using the
dput(car_df)
command?– RAB
Nov 25 '18 at 23:52
Can you include a sample of your data using the
dput(car_df)
command?– RAB
Nov 25 '18 at 23:52
i have edited please find it above @user10626943
– Rakesh
Nov 25 '18 at 23:55
i have edited please find it above @user10626943
– Rakesh
Nov 25 '18 at 23:55
This is a warning, not an error. Does you tree get fitted or not?
– desertnaut
Nov 26 '18 at 0:01
This is a warning, not an error. Does you tree get fitted or not?
– desertnaut
Nov 26 '18 at 0:01
no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?
– Rakesh
Nov 26 '18 at 0:03
no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?
– Rakesh
Nov 26 '18 at 0:03
i converted to factor but still same error msg
– Rakesh
Nov 26 '18 at 0:13
i converted to factor but still same error msg
– Rakesh
Nov 26 '18 at 0:13
add a comment |
1 Answer
1
active
oldest
votes
> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...
> with(car_df, table( Result))
Result
Flop Hit
5 35
> dtree_fit
CART
29 samples
3 predictor
2 classes: 'Flop', 'Hit'
So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"
BTW: should be (but unsurprisingly doesn't clear the warning):
(split = "information")
If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:
> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs
Theres no need to be snarky, there are much nicer ways to say what you did.
– RAB
Nov 26 '18 at 0:42
so what can I do to overcome this issue? Please help.
– Rakesh
Nov 26 '18 at 0:44
Get more data, or increase the amount of flop data that gets passed to the CV steps.
– 42-
Nov 26 '18 at 0:46
Is that the reason for the above warning? Lack of flop data? @42
– Rakesh
Nov 26 '18 at 0:54
i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0
– Rakesh
Nov 26 '18 at 1:04
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473161%2fdecision-tree-in-r-is-not-forming-with-my-training-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...
> with(car_df, table( Result))
Result
Flop Hit
5 35
> dtree_fit
CART
29 samples
3 predictor
2 classes: 'Flop', 'Hit'
So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"
BTW: should be (but unsurprisingly doesn't clear the warning):
(split = "information")
If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:
> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs
Theres no need to be snarky, there are much nicer ways to say what you did.
– RAB
Nov 26 '18 at 0:42
so what can I do to overcome this issue? Please help.
– Rakesh
Nov 26 '18 at 0:44
Get more data, or increase the amount of flop data that gets passed to the CV steps.
– 42-
Nov 26 '18 at 0:46
Is that the reason for the above warning? Lack of flop data? @42
– Rakesh
Nov 26 '18 at 0:54
i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0
– Rakesh
Nov 26 '18 at 1:04
|
show 1 more comment
> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...
> with(car_df, table( Result))
Result
Flop Hit
5 35
> dtree_fit
CART
29 samples
3 predictor
2 classes: 'Flop', 'Hit'
So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"
BTW: should be (but unsurprisingly doesn't clear the warning):
(split = "information")
If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:
> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs
Theres no need to be snarky, there are much nicer ways to say what you did.
– RAB
Nov 26 '18 at 0:42
so what can I do to overcome this issue? Please help.
– Rakesh
Nov 26 '18 at 0:44
Get more data, or increase the amount of flop data that gets passed to the CV steps.
– 42-
Nov 26 '18 at 0:46
Is that the reason for the above warning? Lack of flop data? @42
– Rakesh
Nov 26 '18 at 0:54
i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0
– Rakesh
Nov 26 '18 at 1:04
|
show 1 more comment
> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...
> with(car_df, table( Result))
Result
Flop Hit
5 35
> dtree_fit
CART
29 samples
3 predictor
2 classes: 'Flop', 'Hit'
So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"
BTW: should be (but unsurprisingly doesn't clear the warning):
(split = "information")
If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:
> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs
> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...
> with(car_df, table( Result))
Result
Flop Hit
5 35
> dtree_fit
CART
29 samples
3 predictor
2 classes: 'Flop', 'Hit'
So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"
BTW: should be (but unsurprisingly doesn't clear the warning):
(split = "information")
If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:
> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs
edited Nov 26 '18 at 1:11
answered Nov 26 '18 at 0:35
42-42-
215k15263401
215k15263401
Theres no need to be snarky, there are much nicer ways to say what you did.
– RAB
Nov 26 '18 at 0:42
so what can I do to overcome this issue? Please help.
– Rakesh
Nov 26 '18 at 0:44
Get more data, or increase the amount of flop data that gets passed to the CV steps.
– 42-
Nov 26 '18 at 0:46
Is that the reason for the above warning? Lack of flop data? @42
– Rakesh
Nov 26 '18 at 0:54
i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0
– Rakesh
Nov 26 '18 at 1:04
|
show 1 more comment
Theres no need to be snarky, there are much nicer ways to say what you did.
– RAB
Nov 26 '18 at 0:42
so what can I do to overcome this issue? Please help.
– Rakesh
Nov 26 '18 at 0:44
Get more data, or increase the amount of flop data that gets passed to the CV steps.
– 42-
Nov 26 '18 at 0:46
Is that the reason for the above warning? Lack of flop data? @42
– Rakesh
Nov 26 '18 at 0:54
i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0
– Rakesh
Nov 26 '18 at 1:04
Theres no need to be snarky, there are much nicer ways to say what you did.
– RAB
Nov 26 '18 at 0:42
Theres no need to be snarky, there are much nicer ways to say what you did.
– RAB
Nov 26 '18 at 0:42
so what can I do to overcome this issue? Please help.
– Rakesh
Nov 26 '18 at 0:44
so what can I do to overcome this issue? Please help.
– Rakesh
Nov 26 '18 at 0:44
Get more data, or increase the amount of flop data that gets passed to the CV steps.
– 42-
Nov 26 '18 at 0:46
Get more data, or increase the amount of flop data that gets passed to the CV steps.
– 42-
Nov 26 '18 at 0:46
Is that the reason for the above warning? Lack of flop data? @42
– Rakesh
Nov 26 '18 at 0:54
Is that the reason for the above warning? Lack of flop data? @42
– Rakesh
Nov 26 '18 at 0:54
i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0
– Rakesh
Nov 26 '18 at 1:04
i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0
– Rakesh
Nov 26 '18 at 1:04
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473161%2fdecision-tree-in-r-is-not-forming-with-my-training-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you include a sample of your data using the
dput(car_df)
command?– RAB
Nov 25 '18 at 23:52
i have edited please find it above @user10626943
– Rakesh
Nov 25 '18 at 23:55
This is a warning, not an error. Does you tree get fitted or not?
– desertnaut
Nov 26 '18 at 0:01
no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?
– Rakesh
Nov 26 '18 at 0:03
i converted to factor but still same error msg
– Rakesh
Nov 26 '18 at 0:13