Questions tagged [xgboost]

XGBoost is a library for constructing boosted tree models in R, Python, Java, Scala, and C++. Use this tag for issues specific to the package (i.e. input/output, installation, functionality).

0
votes
0answers
11 views

Request intutive explanation of xgboost leaf scores

I refer to the stackoverflow question at this link regarding calculation of scores at the leaves by xgboost algorithm, once again. I have searched documentation a lot but could not find any intuitive ...
1
vote
1answer
30 views

XGBoost decision tree selection

I have a question regarding which decision tree should I choose from XGBoost. I will use the following code as an example. #import packages import xgboost as xgb import matplotlib.pyplot as plt # ...
0
votes
0answers
21 views

Integrating missing values in Isolation Forests

Current XGBoost algorithms are able to handle missing values by chosing the -best- direction during training by minimizing the loss (source). Within our institution this feature has been of great ...
0
votes
0answers
23 views

Is it possible to pass additional arguments to xgboost custom cost functions in python?

All examples I've managed to find of using xgboost with custom cost functions involve writing a cost function which takes two arguments, the first being a vector of predictions, the second being an ...
0
votes
1answer
18 views

XGBoost - how should I set the nthread parameter?

I am trying to optimize my python training script (I need to run in multiple times, so it makes sense to try to speed up). I have a dataset composed of 9 months of data. The validation setup is a kind ...
0
votes
0answers
14 views

How to add focal loss to the custom objective function of lightgbm&xgboost

here is my custome objective function,I would like to add the ‘focal loss’ to lightgbm。 ############### my code from numpy import log def grad_func(y_true,y_pred): if y_true==1: return -0....
0
votes
0answers
22 views

what does alpha in xgboost signify?

I am a beginner in data science and just started to explore xgboost. To understand different parameters, I was looping through them using gridSearchCV. However I am unable to understand what happened ...
0
votes
0answers
10 views

Confusion of 'label' parameter in xgb.DMatrix

I'm learning XGBoost from Datacamp. When they talk about boosting, they use xgb.DMatrix(data=X, label=y). I also find people using xgb.DMatrix(data=X, label=label) in other sources when I try to ...
0
votes
1answer
30 views

Implementing XGBoost on Methyl450k data set in R

I'm attempting to implement a the XGBoost on a Methyl450k data set. The data has approximately 480000+ specific CpG sites with subsequent beta values between 0 and 1. Here is a look at the data (...
1
vote
1answer
54 views

XGBoost model tree value insight

I have created an XGBoost model in Python and been using following code to understand the model better: xgb.plot_importance(model) or xgb.plot_importance(model, importance_type="gain") I can get ...
0
votes
0answers
17 views

Confusion about xgboost sklearn api plot_tree()

I am trying to train one such sample. X= [ [1 2] [1 2] [2 2]] y = [5, 5 , 8] And I am trying try to use the code below to train the sample reg=XGBRegressor(max_depth=2,learning_rate=1.0, ...
0
votes
1answer
14 views

offset in makeRegrTask mlr package

Is it possible to have an offset in mlr makeRegrTask? I want to use xgboost with a poisson count objective with non-unit weights and hence wish to use base_margin but unless I am missing something ...
0
votes
1answer
16 views

Blankspace and colon not found in firstline

I have a jupyter notebook in SageMaker in which I want to run the XGBoost algorithm. The data has to match 3 criteria: -No header row -Outcome variable in the first column, features in the rest of ...
0
votes
1answer
30 views

extract feature names from trained model

I have a pre-trained XGBoost model read from a pickle file. When I was trying to make predictions on a new dataset with some columns outside of the feature set of the model, I received the error ...
0
votes
1answer
17 views

Install graphiz on AWS Sagemaker

I'm on a Jupyter notebook using Python3 and trying to plot a tree with code like this: import xgboost as xgb from xgboost import plot_tree plot_tree(model, num_trees=4) On the last line I get: ~/...
1
vote
0answers
17 views

Dask hangs when using dask_xgboost train method

I am trying to reproduce the dask xgboost example from the dask-ml docs at http://ml.dask.org/examples/xgboost.html. Unfortunately, Dask doesn't seem to complete the training and I'm having a hard ...
0
votes
0answers
12 views

Evaluate and track/plot of TimeSeriesSplit

I have got time series data. So I used TimeSeriesSplit with 3 splits for XGBRegressor, see the following code. from sklearn.model_selection import TimeSeriesSplit from xgboost.sklearn import ...
1
vote
0answers
23 views

How to know whether xgboost is running in parallel?

I'm training a xgboost regressor on a Linux server with 28 cores. I installed xgboost by running mkdir build cd build cmake .. make -j28 The xgboost regressor is constructed with from xgboost ...
-1
votes
0answers
26 views

The best way to calculate the prediction interval for a xgboost regression model?

I have constructed two different xgboost regression model using XGBoostRegressor() and a xgboost classifier. Due to the distributions of the data one of the regressor models uses the objective reg:...
1
vote
0answers
17 views

Does `tree_method = 'exact'` in `xgboost` really mean exact greedy algorithm?

Does tree_method ='exact' in xgboost really mean using the exact greedy algorithm for split finding? I'm asking this question because xgboost runs unreasonably fast. Here is the script that I used ...
0
votes
0answers
18 views

Plot tree xgboost giving me an error in R

I ran an xgboost model, and I am trying to plot the trees but the formula is giving me an error. The model for the X-gboost training data: tuneGridXGB <- expand.grid( nrounds=c(150), ...
0
votes
0answers
16 views

A/E for XGBoost - Poisson distribution with varying exposure / offset is not correct

I'm wondering how to get A/E close to 1, for the R example given by XGBoost - Poisson distribution with varying exposure / offset > sum(d$claims) [1] 20640 > #predicted value > sum(d$...
0
votes
0answers
14 views

xgboost predict contrib to probabilities

I am using xgboost's feature pred_contribs in order to get kind of interpretability (shapley values) for each sample of my model. booster.predict(test, pred_contribs=True) It returns a vector of ...
0
votes
1answer
24 views

Custom Error Metric not changing predictions XGBoost R

I have created a custom error metric which prints as I run the XGBoost xgb.train but does not actually have any affect on the output. From what I can tell it is simply printing the custom error metric ...
0
votes
1answer
34 views

XGBoost too large for pickle/joblib

I'm having difficulty loading an XGBoost regression with both pickle and joblib. One difficulty could be the fact I am writing the pickle/joblib on a Windows desktop, but I am trying to load on a ...
-1
votes
0answers
5 views

What effect i can take by using xgb.DMatrix

Many kagglers uses DMatrix when they use xgboost algorithm. I want to know the effect by using DMatrix and the reason.
-3
votes
0answers
22 views

Sales Forecasting in Python based on supervised machine learning approach

We are creating the machine learning model for sales forecasting in python and integrating it with Power -BI. Now, We need to predict the sales for the next future months(3 months). Currently, I am ...
-2
votes
0answers
22 views

Unmatrix a DMatrix object in R?

I new to using xgboost and would like to understand if it is possible to obtain the predicted values beside my original test dataset. The problem I am facing is that the test dataset is presently in ...
-1
votes
1answer
22 views

How to get model summary of a machine learning model (particularly XG boost)like the screenshot attached below?

I have run Xgboost regressor on a data set and I need the model summary(for eg R squared, Kurtosis etc) similar to the screenshot attached. Any help will be greatly appreciated. Screenshot
-1
votes
2answers
37 views

Is there a way to get the probability of a prediction using XGBoostRegressor?

I have built a XGBoostRegressor model using around 200 categorical features predicting a countinous time variable. But I would want to get both the actual prediction and the probability of that ...
0
votes
0answers
19 views

TypeError when writing my own evaluation_metric for xgboost in Python

I want to create my own f_score evaluation metric when using xgboost classifier for binary classification. My metric is def f_score(preds,y): df = pd.DataFrame({'predict':pred, 'true':y....
1
vote
1answer
30 views

How would I construct my own evaluation metric for minimizing test error for my highly unbalanced class using XGBoost?

I have collected data on how long it takes for a product to be released in a release pipeline. 95% of the data so far has taken <400 minutes [outlier = 0]. Then 5% of the data is between [700,40 ...
2
votes
1answer
25 views

The impact of number of negative samples used in a highly imbalanced dataset (XGBoost)

I am trying to model a classifier using XGBoost on a highly imbalanced data-set, with a limited number of positive samples and practically infinite number of negative samples. Is it possible that ...
-3
votes
0answers
15 views

When to use SVR over other regression models

I am confused about when to use Support Vector Regression over other models like Random Forest Regression and XGBoost. I expected XGBoost to give the best prediction score(r2_score) for my regression ...
0
votes
0answers
11 views

hyperopt result exceeds my hp.choice restriction, why? (XGBoost)

I met a strange problem: I defined my XGB hyper-parameter 'max_depth' by hyperopt hp.choice('max_depth',range(2,20)) But I got 'max_depth' = 0 or 1 result, which is not within [2,20) restriction. ...
0
votes
1answer
31 views

Is there a way to print one XGBoostRegressor tree in python?

I have constructed a XGBoostRegressor model where I now want to try and plot one of the trees. I know that regular xgb classifier has the function plot_tree but unfortunately XGBoostRegressor does not....
1
vote
0answers
23 views

Gradient and leaf score for the first tree of XGBoost

So, when deriving the first tree in xgboost, we need to know the first order and second order gradients, p-y and p(1-p) to calculate the leaf weights and overall tree score. But since we do not have ...
0
votes
0answers
16 views

hyperparameter tuning xgboost learning api for ranking task

I'm using Xgboost with the learning API, first is there any difference between the learning API to the scikit-learn API for ranking tasks?. I'm trying to use the cv (cross-validation) in order to do ...
1
vote
0answers
66 views

How to get actual feature names in XGBoost feature importance plot without retraining the model?

I have come across several questions on Stackoverflow, where the problem faced by masses is that they preprocess the training data, such as using centre and scale etc. before fitting/training the ...
0
votes
0answers
24 views

Spark Scala, train many models concurrently

I have a train and test dataset with features, and several thousand customerId values. My goal is to concurrently train one binary xgboost classifier per customerId in Spark. I'm essentially trying ...
0
votes
1answer
26 views

XGBoost best max_depth=1

I use xgboost to train a classification model. GridCVSearch gives the best max_depth=1. This means all my hundreds of trees are split at a single node. Does this mean that the problem/dataset that I ...
0
votes
0answers
19 views

Windows permission error to delete joblibmem mapping folder in python

I am trying to run a python code that performs XGboosting and I wanted it to run parallely to take less time in building a model. But I am facing this issue while running a code. PermissionError: [...
0
votes
1answer
37 views

XGBoost produces non-binary predictions

After training my model with XGBoost, I tried to test the model but the predictions are some sorts of floating point numbers which cause error when I want to get performacne measures. This is the code:...
0
votes
0answers
22 views

XGBoost - get probabilities after multi:softmax function

I have a question regarding xgboost and multiclass. I am not using the sklearn wrapper as I always struggle with some parameters. I was sondering if it is possible to get the probability vector plus ...
0
votes
1answer
32 views

In my Xgboost machine learning model, when features have 0 importance, should you discard them or group them together?

I have been trying to build a ML model which predicts the time it takes for different products to go through a deployment pipeline. I have created around 30-40 different features, 90% which are ...
-2
votes
0answers
11 views

Suggestions for predicting low volume data

Problem statement - Make predictions for every min of tomorrow(Only one day in future and not more than that) I have a time series which is composed of very low volume data , typically ranges from ...
0
votes
1answer
26 views

The evaluation metric in the validation set in xgboost Python differs from the one I get when making a prediction

I am using an evaluation set to implement early stopping with xgboost in Python. What puzzles me is that the evaluation metric reported during the training as optimal is much better than the one I ...
2
votes
1answer
38 views

XGBoost, handling continous and fixed data for loan dataset

Background: I am using XGBoost to develop a model to predict whether a particular loan will default or not. I have now included time-series data on Fico score, and other variables that change ...
1
vote
0answers
26 views

How to create an integrated hybrid model of LSTM with XGBoost for classifying tabular (non-image) data?

I have a tabular dataset with the following structure: Sample Structure of the Dataset: I have using LSTM for sequence classification of the 14 different classes in the last column of the data. I ...
0
votes
0answers
24 views

Xgboost : How to make a one CSV to one label in Dmatrix

I tried to use xgboost to build a regression model. Today, each of my inputs is a 2-D array, and each input has a corresponding one value. However, I had a problem converting data into Dmatrix. ...

http://mssss.yulina-kosm.ru