I have a numeric client id to find. I created a custom info types :
custom_info_types = [ { "info_type": {"name": "CLIENTID"}, "regex": {"pattern": r'd{7,8}'}, } ]
As expected, a lot of findings came out from the job and all with a very_likely likelyhood.
To reduce the findings, I’d like to use hotwords in “reverse” mode : if there’s not the string “cli” in the column name, then reduce likelyhood.
In the documentation there are examples on how to do the opposite, but as every findings has a “VERY_LIKELY” likelyhood, it does not help.
hotword_rule = { "hotword_regex": {"pattern": "(?i)(.*cli.*)(?-i)"}, "likelihood_adjustment": { "fixed_likelihood": dlp_v2.Likelihood.VERY_LIKELY }, "proximity": {"window_before": 1}, }
Is there any solution to do what I want ?
Thanks for your help !
Advertisement
Answer
In order to accomplish this you want to set the default likelihood for your custom_info_type to be VERY_UNLIKELY
and then keep your hotword rule as-is. This way if something matches it will flag as VERY_UNLIKELY
unless the header/context contains your match for “cli” in which case it will boost to VERY_LIKELY
.
Something like:
custom_info_types = [ { "info_type": {"name": "CLIENTID"}, "regex": {"pattern": r'd{7,8}'}, "likelihood": "VERY_UNLIKELY" } ]
When you leave the likelihood blank in the custom_info_type definition, then it defaults to VERY_LIKELY
.
Let me know if this works.