# Questions tagged [xgboost]

XGBoost is a library for constructing boosted tree models in R, Python, Java, Scala, and C++. Use this tag for issues specific to the package (i.e. input/output, installation, functionality).

**0**

votes

**0**answers

11 views

### Request intutive explanation of xgboost leaf scores

I refer to the stackoverflow question at this link regarding calculation of scores at the leaves by xgboost algorithm, once again. I have searched documentation a lot but could not find any intuitive ...

**1**

vote

**1**answer

30 views

### XGBoost decision tree selection

I have a question regarding which decision tree should I choose from XGBoost.
I will use the following code as an example.
#import packages
import xgboost as xgb
import matplotlib.pyplot as plt
# ...

**0**

votes

**0**answers

21 views

### Integrating missing values in Isolation Forests

Current XGBoost algorithms are able to handle missing values by chosing the -best- direction during training by minimizing the loss (source). Within our institution this feature has been of great ...

**0**

votes

**0**answers

23 views

### Is it possible to pass additional arguments to xgboost custom cost functions in python?

All examples I've managed to find of using xgboost with custom cost functions involve writing a cost function which takes two arguments, the first being a vector of predictions, the second being an ...

**0**

votes

**1**answer

18 views

### XGBoost - how should I set the nthread parameter?

I am trying to optimize my python training script (I need to run in multiple times, so it makes sense to try to speed up). I have a dataset composed of 9 months of data. The validation setup is a kind ...

**0**

votes

**0**answers

14 views

### How to add focal loss to the custom objective function of lightgbm&xgboost

here is my custome objective function，I would like to add the ‘focal loss’ to lightgbm。
############### my code
from numpy import log
def grad_func(y_true,y_pred):
if y_true==1:
return -0....

**0**

votes

**0**answers

22 views

### what does alpha in xgboost signify?

I am a beginner in data science and just started to explore xgboost. To understand different parameters, I was looping through them using gridSearchCV. However I am unable to understand what happened ...

**0**

votes

**0**answers

10 views

### Confusion of 'label' parameter in xgb.DMatrix

I'm learning XGBoost from Datacamp. When they talk about boosting, they use xgb.DMatrix(data=X, label=y). I also find people using xgb.DMatrix(data=X, label=label) in other sources when I try to ...

**0**

votes

**1**answer

30 views

### Implementing XGBoost on Methyl450k data set in R

I'm attempting to implement a the XGBoost on a Methyl450k data set. The data has approximately 480000+ specific CpG sites with subsequent beta values between 0 and 1. Here is a look at the data (...

**1**

vote

**1**answer

54 views

### XGBoost model tree value insight

I have created an XGBoost model in Python and been using following code to understand the model better:
xgb.plot_importance(model)
or
xgb.plot_importance(model, importance_type="gain")
I can get ...

**0**

votes

**0**answers

17 views

### Confusion about xgboost sklearn api plot_tree()

I am trying to train one such sample.
X= [ [1 2]
[1 2]
[2 2]]
y = [5, 5 , 8]
And I am trying try to use the code below to train the sample
reg=XGBRegressor(max_depth=2,learning_rate=1.0, ...

**0**

votes

**1**answer

14 views

### offset in makeRegrTask mlr package

Is it possible to have an offset in mlr makeRegrTask?
I want to use xgboost with a poisson count objective with non-unit weights and hence wish to use base_margin but unless I am missing something ...

**0**

votes

**1**answer

16 views

### Blankspace and colon not found in firstline

I have a jupyter notebook in SageMaker in which I want to run the XGBoost algorithm. The data has to match 3 criteria:
-No header row
-Outcome variable in the first column, features in the rest of ...

**0**

votes

**1**answer

30 views

### extract feature names from trained model

I have a pre-trained XGBoost model read from a pickle file. When I was trying to make predictions on a new dataset with some columns outside of the feature set of the model, I received the error ...

**0**

votes

**1**answer

17 views

### Install graphiz on AWS Sagemaker

I'm on a Jupyter notebook using Python3 and trying to plot a tree with code like this:
import xgboost as xgb
from xgboost import plot_tree
plot_tree(model, num_trees=4)
On the last line I get:
~/...

**1**

vote

**0**answers

17 views

### Dask hangs when using dask_xgboost train method

I am trying to reproduce the dask xgboost example from the dask-ml docs at http://ml.dask.org/examples/xgboost.html. Unfortunately, Dask doesn't seem to complete the training and I'm having a hard ...

**0**

votes

**0**answers

12 views

### Evaluate and track/plot of TimeSeriesSplit

I have got time series data. So I used TimeSeriesSplit with 3 splits for XGBRegressor, see the following code.
from sklearn.model_selection import TimeSeriesSplit
from xgboost.sklearn import ...

**1**

vote

**0**answers

23 views

### How to know whether xgboost is running in parallel?

I'm training a xgboost regressor on a Linux server with 28 cores.
I installed xgboost by running
mkdir build
cd build
cmake ..
make -j28
The xgboost regressor is constructed with
from xgboost ...

**-1**

votes

**0**answers

26 views

### The best way to calculate the prediction interval for a xgboost regression model?

I have constructed two different xgboost regression model using XGBoostRegressor() and a xgboost classifier. Due to the distributions of the data one of the regressor models uses the objective reg:...

**1**

vote

**0**answers

17 views

### Does `tree_method = 'exact'` in `xgboost` really mean exact greedy algorithm?

Does tree_method ='exact' in xgboost really mean using the exact greedy algorithm for split finding?
I'm asking this question because xgboost runs unreasonably fast. Here is the script that I used ...

**0**

votes

**0**answers

18 views

### Plot tree xgboost giving me an error in R

I ran an xgboost model, and I am trying to plot the trees but the formula is giving me an error.
The model for the X-gboost training data:
tuneGridXGB <- expand.grid(
nrounds=c(150),
...

**0**

votes

**0**answers

16 views

### A/E for XGBoost - Poisson distribution with varying exposure / offset is not correct

I'm wondering how to get A/E close to 1, for the R example given by XGBoost - Poisson distribution with varying exposure / offset
> sum(d$claims)
[1] 20640
> #predicted value
> sum(d$...

**0**

votes

**0**answers

14 views

### xgboost predict contrib to probabilities

I am using xgboost's feature pred_contribs in order to get kind of interpretability (shapley values) for each sample of my model.
booster.predict(test, pred_contribs=True)
It returns a vector of ...

**0**

votes

**1**answer

24 views

### Custom Error Metric not changing predictions XGBoost R

I have created a custom error metric which prints as I run the XGBoost xgb.train but does not actually have any affect on the output. From what I can tell it is simply printing the custom error metric ...

**0**

votes

**1**answer

34 views

### XGBoost too large for pickle/joblib

I'm having difficulty loading an XGBoost regression with both pickle and joblib.
One difficulty could be the fact I am writing the pickle/joblib on a Windows desktop, but I am trying to load on a ...

**-1**

votes

**0**answers

5 views

### What effect i can take by using xgb.DMatrix

Many kagglers uses DMatrix when they use xgboost algorithm.
I want to know the effect by using DMatrix and the reason.

**-3**

votes

**0**answers

22 views

### Sales Forecasting in Python based on supervised machine learning approach

We are creating the machine learning model for sales forecasting in python and integrating it with Power -BI. Now, We need to predict the sales for the next future months(3 months). Currently, I am ...

**-2**

votes

**0**answers

22 views

### Unmatrix a DMatrix object in R?

I new to using xgboost and would like to understand if it is possible to obtain the predicted values beside my original test dataset. The problem I am facing is that the test dataset is presently in ...

**-1**

votes

**1**answer

22 views

### How to get model summary of a machine learning model (particularly XG boost)like the screenshot attached below?

I have run Xgboost regressor on a data set and I need the model summary(for eg R squared, Kurtosis etc) similar to the screenshot attached. Any help will be greatly appreciated.
Screenshot

**-1**

votes

**2**answers

37 views

### Is there a way to get the probability of a prediction using XGBoostRegressor?

I have built a XGBoostRegressor model using around 200 categorical features predicting a countinous time variable.
But I would want to get both the actual prediction and the probability of that ...

**0**

votes

**0**answers

19 views

### TypeError when writing my own evaluation_metric for xgboost in Python

I want to create my own f_score evaluation metric when using xgboost classifier for binary classification.
My metric is
def f_score(preds,y):
df = pd.DataFrame({'predict':pred, 'true':y....

**1**

vote

**1**answer

30 views

### How would I construct my own evaluation metric for minimizing test error for my highly unbalanced class using XGBoost?

I have collected data on how long it takes for a product to be released in a release pipeline. 95% of the data so far has taken <400 minutes [outlier = 0]. Then 5% of the data is between [700,40 ...

**2**

votes

**1**answer

25 views

### The impact of number of negative samples used in a highly imbalanced dataset (XGBoost)

I am trying to model a classifier using XGBoost on a highly imbalanced data-set, with a limited number of positive samples and practically infinite number of negative samples.
Is it possible that ...

**-3**

votes

**0**answers

15 views

### When to use SVR over other regression models

I am confused about when to use Support Vector Regression over other models like Random Forest Regression and XGBoost. I expected XGBoost to give the best prediction score(r2_score) for my regression ...

**0**

votes

**0**answers

11 views

### hyperopt result exceeds my hp.choice restriction, why? (XGBoost)

I met a strange problem:
I defined my XGB hyper-parameter 'max_depth' by hyperopt
hp.choice('max_depth',range(2,20))
But I got 'max_depth' = 0 or 1 result, which is not within [2,20) restriction. ...

**0**

votes

**1**answer

31 views

### Is there a way to print one XGBoostRegressor tree in python?

I have constructed a XGBoostRegressor model where I now want to try and plot one of the trees. I know that regular xgb classifier has the function plot_tree but unfortunately XGBoostRegressor does not....

**1**

vote

**0**answers

23 views

### Gradient and leaf score for the first tree of XGBoost

So, when deriving the first tree in xgboost, we need to know the first order and second order gradients, p-y and p(1-p) to calculate the leaf weights and overall tree score. But since we do not have ...

**0**

votes

**0**answers

16 views

### hyperparameter tuning xgboost learning api for ranking task

I'm using Xgboost with the learning API, first is there any difference between the learning API to the scikit-learn API for ranking tasks?.
I'm trying to use the cv (cross-validation) in order to do ...

**1**

vote

**0**answers

66 views

### How to get actual feature names in XGBoost feature importance plot without retraining the model?

I have come across several questions on Stackoverflow, where the problem faced by masses is that they preprocess the training data, such as using centre and scale etc. before fitting/training the ...

**0**

votes

**0**answers

24 views

### Spark Scala, train many models concurrently

I have a train and test dataset with features, and several thousand customerId values.
My goal is to concurrently train one binary xgboost classifier per customerId in Spark.
I'm essentially trying ...

**0**

votes

**1**answer

26 views

### XGBoost best max_depth=1

I use xgboost to train a classification model. GridCVSearch gives the best max_depth=1. This means all my hundreds of trees are split at a single node.
Does this mean that the problem/dataset that I ...

**0**

votes

**0**answers

19 views

### Windows permission error to delete joblibmem mapping folder in python

I am trying to run a python code that performs XGboosting and I wanted it to run parallely to take less time in building a model. But I am facing this issue while running a code.
PermissionError: [...

**0**

votes

**1**answer

37 views

### XGBoost produces non-binary predictions

After training my model with XGBoost, I tried to test the model but the predictions are some sorts of floating point numbers which cause error when I want to get performacne measures. This is the code:...

**0**

votes

**0**answers

22 views

### XGBoost - get probabilities after multi:softmax function

I have a question regarding xgboost and multiclass. I am not using the sklearn wrapper as I always struggle with some parameters. I was sondering if it is possible to get the probability vector plus ...

**0**

votes

**1**answer

32 views

### In my Xgboost machine learning model, when features have 0 importance, should you discard them or group them together?

I have been trying to build a ML model which predicts the time it takes for different products to go through a deployment pipeline. I have created around 30-40 different features, 90% which are ...

**-2**

votes

**0**answers

11 views

### Suggestions for predicting low volume data

Problem statement - Make predictions for every min of tomorrow(Only one day in future and not more than that)
I have a time series which is composed of very low volume data , typically ranges from ...

**0**

votes

**1**answer

26 views

### The evaluation metric in the validation set in xgboost Python differs from the one I get when making a prediction

I am using an evaluation set to implement early stopping with xgboost in Python. What puzzles me is that the evaluation metric reported during the training as optimal is much better than the one I ...

**2**

votes

**1**answer

38 views

### XGBoost, handling continous and fixed data for loan dataset

Background:
I am using XGBoost to develop a model to predict whether a particular loan will default or not. I have now included time-series data on Fico score, and other variables that change ...

**1**

vote

**0**answers

26 views

### How to create an integrated hybrid model of LSTM with XGBoost for classifying tabular (non-image) data?

I have a tabular dataset with the following structure: Sample Structure of the Dataset:
I have using LSTM for sequence classification of the 14 different classes in the last column of the data. I ...

**0**

votes

**0**answers

24 views

### Xgboost : How to make a one CSV to one label in Dmatrix

I tried to use xgboost to build a regression model.
Today, each of my inputs is a 2-D array, and each input has a corresponding one value.
However, I had a problem converting data into Dmatrix.
...