Function: Data processing and preparation (0e21609b-98b9-5f58-9be2-b7e627353c51)
Data processing and preparation includes transformation, processing, normalization, and validation of a set of data. Sources of cybersecurity data need to be validated for accuracy often due to a high number of false positives. The relevant data also typically comes in different formats, and new data needs to be combined with historical data before a complete analysis can be performed. Some types of data (such as news articles) may need to be analyzed or processed as part of the preparation process. One example would be extracting relevant security information from a news article (e.g., names, dates, places, technical information, weaknesses, system names) and comparing it with internal data for potential impacts. Some analysis methods require data to be stored in the same format, or for files to have the same number of records. There are multiple processing steps that may be involved to prepare the data. Data augmentation (also called enrichment) is performed by including other available information related to a given piece of data from other internal and external sources. For example, teams may collect information related to internet protocol addresses (IP addresses) such as autonomous system identifiers, country codes, or geo-location data. For internal asset information, teams may enrich their asset inventory data with the name of the asset owner, their role, their permissions on other assets, their physical working location over time, and more.