TypeError in Categorical Data

Hi Francois!

I’ve just discovered about your Sweetviz and decided trying to perform some EDA with it! It’s indeed a nice work of you! It worked perfectly in some data I have, but when I tried to create some visualizations using explicit categorical data I’ve received this error:

TypeError: Categorical cannot perform the operation sum

In your documentation you said one can use Sweetviz with categorical data. Am I missing some point?

My data is composed by 3 categorical columns which are encoded as integers. When my dataframe is configured as integer, everything works fine. However, after manually setting the data to Categorical I got the error above.

The command I’ve used: to configure the data:
df[‘field’] = df[‘field’].astype(‘category’)

1 Like

Hello! Thank you for the good words and taking the time to report this!

Categorical data is indeed supported… but not through the formal “categorical” data type. What I do is auto-detect the number of “categories” in a column, and if it is low, I treat the data as categorical.

This is all done when data is numerical/string and I do the “categorization” myself. This is because Pandas’ built-in “category” datatype is seldom used. (Although I believe this is changing)

So I have not tested with the pandas “category” datatype. I will do so and keep you posted. In the meantime, simply leave data as integers or strings.

If you find that the system is auto-detecting your data as numerical or text, simply use the FeatureConfig object to specify which columns you want to force as categorical. (see the readme on Github for details)

But as I said, I will get back to you on supporting this. Since I use many operations for the reports and graphs I am not sure how easy this will be yet.

Thanks again,

Francois

Hey folks,
I have just opened an issue that looks related to that question.
I used FeatureConfig to indicate a categorical column but the report failed converting something to boolean.