Trouble Implementing FeatureConfig

I’m using a large Crime database and trying to force one of the columns to be recognized as “categorical” - I need the “Type of Incident” column, i.e. robbery, vehicle thefts, etc., to be part of the correlation matrix. My code looks like this:

feature_config = sweetviz.FeatureConfig(force_cat=[“Type of Incident”])
my_report = sweetviz.analyze([incidents_cutdown_df,“DPD Incidents Since 2018”], feature_config)

The problem is that feature_config keeps getting recognized as the target variable rather than as a call to use the feature_config created in the line above. I’ve tried to insert different things like Target = None, but that doesn’t work, and I keep getting an error that says: “Feature ‘<sweetviz.feature_config.FeatureConfig object at 0x000001F9ED113B48>’ was specified as TARGET, but is NOT FOUND in the dataframe (watch case-sensitivity?).”

How can I run a simple analysis on a single dataframe, with no target, but with a custom feature_config? Thanks!

I’m new to Python / coding, so I might be asking too much for such a huge data set. Each 911 Call is an “incident”, 1 for each row. Right now I have 319160 rows and 34 columns with information like the time of call, time of dispatch, “Type of Incident”, lat long, etc. - There are 734 distinct values for “Type of Incident” though. That would make an unreadable table, so I can make a new data frame with only the top 50 or something, but is it possible to show these in an Associations table against Victim Race, Time of Day, Beat / Division / Area, etc., the other columns in the data frame?

Hi over-man39! I am happy you are looking into using Sweetviz and I hope it turns out to be useful for you! Regarding the crash with the target variable, you were pretty close in trying to specify the target as “None”, it’s just that you were using the wrong name; the parameter name is “target_feat”, so target_feat=None should work, or you can just add a None as a second parameter, e.g.:

sweetviz.analyze([incidents_cutdown_df,“DPD Incidents Since 2018”], None, feature_config)

I hope this helps! :slight_smile:

Francois

Hello again! I’m not sure if you are asking this after getting the feature configuration work and made this feature categorical, but the report should automatically show you only the top categories (I think there was room for 20 or so so that’s the ones you would see in the detail).

The associations graph will also show how the “Type of incident” correlates to the other variables. However if you’re asking about the correlation of each individual type of incident, that is not possible… yet. :wink:

I hope this answered your question, don’t hesitate to follow-up if I didn’t quite answer everything.

Francois

Hi Francois! Thanks so much for your help. The code you provided works perfectly and I don’t get any errors now - “Type of Incident” is now categorical, and it’s easy to see which incidents occur the most frequently in the details tab. At one point I remember seeing “target_feat” somewhere in the error code, so I should have tried that first!

-Ryan

On the second part of my question, that was kind of me thinking out loud after typing the first question and realizing that even if I got the “Type of Incident” to be recognized as categorical, which I have now, Sweetviz still won’t show each of the individual type of incidents with correlations for the other column values. That’s what I really need, as the most interesting information in the dataset comes from categorical information under various columns, i.e. “Type of Incident”, “Type Location”, “Victim Race”, etc. Right now it’s still interesting to see that the “Type of Incident” does seem to be predictive of “Victim Age / Race”, but it would be far more interesting to get a breakdown for each individual category within “Type of Incident.”

It’s great to hear that you have something planned for that type of deeper analysis, as right now it’s hard for me to wrap my head around how to do that with the tools & skills I currently have in Python. I have a background in Political Science but haven’t worked with statistics in almost 20 years - just now coming back to it with Python, so I figure you’ll end up getting a lot of beginners like me interested in your program. Thanks again for your help!

Hello Ryan,

I’m glad this fixed at least the first issue! And yes, it’s really the goal of the library to help out and simplify work which (as you can see) can/should be automated in the first place, so you can focus on actual analysis, not getting the code set up. :slight_smile:

Thanks again, and I hope the library will be useful to you for this project and in the future (hopefully with all these features we discussed!).

Francois

Thanks again Francois, I really appreciate your quick replies! I ended up creating new dataframe views in Pandas so I can have df’s for each major type of crime (assault_df, theft_df, vandalism_df, etc.) - then I can run those through Sweetviz individually, and with all of the tabs open in a browser, I can quickly compare the visuals for each Association chart, and even that is quite helpful. I can quickly see that Assault is predictive of location, whereas vehicle theft is not. I’m not sure if this is the best way to get at what I mentioned above, but it’s a great start at the very least.

1 Like