Package deepnetts.data
Class DataSets
java.lang.Object
deepnetts.data.DataSets
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic TabularDataSet
createBatchedDataset
(TabularDataSet<?> dataSet, int batchSize) static CsvFormat
detectCsvFormat
(String fileName) static float[]
oneHotEncode
(String hotLabel, String[] allLabels) Returns one hot encoded vector for the given label.static TabularDataSet
Creates and returns data set from specified CSV file.static javax.visrec.ml.data.DataSet
Create data set from CSV file, using coma (,) as default delimiter and no header (column names) in first row.static TabularDataSet
static TabularDataSet
static TabularDataSet
static MaxScaler
scaleToMax
(javax.visrec.ml.data.DataSet dataSet) static MinMaxScaler
scaleToMinMax
(javax.visrec.ml.data.DataSet dataSet) static TrainTestSplit
trainTestSplit
(javax.visrec.ml.data.DataSet<?> dataSet, double split)
-
Field Details
-
DELIMITER_SPACE
- See Also:
-
DELIMITER_COMMA
- See Also:
-
DELIMITER_SEMICOLON
- See Also:
-
DELIMITER_TAB
- See Also:
-
-
Constructor Details
-
DataSets
public DataSets()
-
-
Method Details
-
readCsv
public static TabularDataSet readCsv(File csvFile, int numInputs, int numOutputs, boolean hasColumnNames, String delimiter) throws FileNotFoundException, IOException Creates and returns data set from specified CSV file. Empty lines are skipped- Parameters:
csvFile
- CSV filenumInputs
- number of input values in a rownumOutputs
- number of output values in a rowhasColumnNames
- true if first row contains column namesdelimiter
- delimiter character used to separate values in a row- Returns:
- instance of data set with values loaded from file
- Throws:
FileNotFoundException
- if file was not foundIOException
- if there was an error reading file TODO: Detect if there are labels in the first line, if there are no labels, set class1, class2, class3 in classifier evaluation! and detect type of attributes Move this method to some factory class or something? or as a default method in data set? TODO: Autodetetect delimiter; column type
-
readCsv
public static TabularDataSet readCsv(String fileName, int numInputs, int numOutputs, boolean hasColumnNames, String delimiter) throws IOException - Throws:
IOException
-
readCsv
public static TabularDataSet readCsv(String fileName, int numInputs, int numOutputs, boolean hasColumnNames) throws IOException - Throws:
IOException
-
readCsv
public static TabularDataSet readCsv(String fileName, int numInputs, int numOutputs, String delimiter) throws IOException - Throws:
IOException
-
readCsv
public static javax.visrec.ml.data.DataSet readCsv(String fileName, int numInputs, int numOutputs) throws IOException Create data set from CSV file, using coma (,) as default delimiter and no header (column names) in first row.- Parameters:
fileName
- Name of the CSV filenumInputs
- Number of input columnsnumOutputs
- Number of output columns- Returns:
- Throws:
IOException
-
detectCsvFormat
- Throws:
FileNotFoundException
IOException
-
scaleToMax
-
scaleToMinMax
-
oneHotEncode
Returns one hot encoded vector for the given label. One-Hot encoded vector is a binary array in which each position corresponds to one label, and all elements are zero, except the one which corresponds to hotLabel which has value of one. Index of hotLabel in allLabels array, determines which position in vector should be one. Vector size equals to the number of labels in allLabels array.- Parameters:
hotLabel
- one label to encodeallLabels
- all labels (used to determine size and hot position of encoded vector)- Returns:
- one hot encoded vector for given label
-
trainTestSplit
-
createBatchedDataset
-