Where
Text is used as a data source during very early stage of AI applications. Mostly ML algorithms had used Text data as a input to ML applications. CSV file also used during that period of ML applications. Now text is used in Language translation as well.
Image is used as a data source in computer vision applicaitons. But things are become very interesting after use of Convolutional Neural Network (CNN ) in ImageNet challenge 2012 . Mostly DL ( Deep Learning ) algorithms have been using image data as a input to DL applications.
Video is used as a data source in creation of data set to train Deep learning model.
Voice is used as a data source in creation of data set to train Deep learning model.
Audio is used as a data source in creation of data set to train Deep learning model.
Radar is used as a data source in creation of data set to train Deep learning model. Mostly for DL application in Automotive Driver less car segment and Industry 4.0
Healthcare is used as a data source in creation of data set to train Deep learning model.
Industry 4.0 iis used as a data source in creation of data set to train Deep learning model.
etc ..
Sampling is one of the challenges during collection of Data for Data-set design. Gibbs sampling is recommended for text and other source of data. Shannon sampling used in Signal in voice, audio etc. But key challenge is how much data samples required to train given Deep Learning Model which can be used in enterprise qulality Inference.
Quantisation is one of the old problem in signal processing and same is happening to be a critical issue during collection of Data for Data-set design.
Following provides Data source and its associated potential model.
Pixel values are often unsigned integers in the range between 0 and 255. Although these pixel values can be presented directly to neural network models in their raw format, this can result in challenges during modeling, such as slower than expected training of the model. Instead, there can be a great benefit in preparing the image pixel values prior to modeling, such as simply scaling pixel values to the range 0-1 to centering and even standardizing the values. This is called normalization and can be performed directly on a loaded image. The example below uses the PIL library (the standard image handling library in Python) to load an image and normalize its pixel values.
How to normalize pixel values to a range between zero and one.
How to center pixel values both globally across channels and locally per channel.
How to standardize pixel values and how to shift standardized pixel values to +ve domain.