Class CsvReader


  • public class CsvReader
    extends Object
    Provides a stream based parser for parsing delimited text data from a file or a stream.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int ESCAPE_MODE_BACKSLASH
      Use a backslash character before the text qualifier to represent an occurance of the text qualifier.
      static int ESCAPE_MODE_DOUBLED
      Double up the text qualifier to represent an occurrence of the text qualifier.
    • Constructor Summary

      Constructors 
      Constructor Description
      CsvReader​(InputStream inputStream, char delimiter, Charset charset)
      Create a CsvReader object using an InputStream object as the data source.
      CsvReader​(InputStream inputStream, Charset charset)
      Create a CsvReader object using an InputStream object as the data source. Uses a comma as the column delimiter.
      CsvReader​(Reader inputStream)
      Create a CsvReader object using a Reader object as the data source. Uses a comma as the column delimiter.
      CsvReader​(Reader inputStream, char delimiter)
      Create a CsvReader object using a Reader object as the data source.
      CsvReader​(String fileName)
      Create a CsvReader object using a file as the data source. Uses a comma as the column delimiter and ISO-8859-1 as the Charset.
      CsvReader​(String fileName, char delimiter)
      Create a CsvReader object using a file as the data source. Uses ISO-8859-1 as the Charset.
      CsvReader​(String fileName, char delimiter, Charset charset)
      Create a CsvReader object using a file as the data source.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      Close and releases all related resources.
      protected void finalize()  
      String get​(int columnIndex)
      Return the current column value for a given column index.
      String get​(String headerName)
      Returns the current column value for a given column header name.
      boolean getCaptureRawRecord()
      Return the "Capture Raw Record" setting
      int getColumnCount()
      Return the number of columns found in this record.
      char getComment()
      Return the character being used as a comment signal.
      long getCurrentRecord()
      Return the index of the current record.
      char getDelimiter()
      Return the character being used as the column delimiter.
      int getEscapeMode()
      Return the current way to escape an occurrence of the text qualifier inside qualified data.
      String getHeader​(int columnIndex)
      Return the column header value for a given column index.
      int getHeaderCount()
      Return the number of headers read in by a previous call to readHeaders().
      String[] getHeaders()
      Return the header values as a string array.
      int getIndex​(String headerName)
      Return the corresponding column index for a given column header name.
      String getRawRecord()
      Return the raw record containing the current line read from the stream
      char getRecordDelimiter()
      Return the character to use as the record delimiter.
      boolean getSafetySwitch()
      Return the value of a safety switch to prevent the parser from using large amounts of memory in the case where parsing settings like file encodings don't end up matching the actual format of a file.
      boolean getSkipEmptyRecords()
      Return a flag to indicate whether empty records shall be skipped by the parser.
      char getTextQualifier()
      Return the character to use as a text qualifier in the data.
      boolean getTrimWhitespace()
      Return whether leading and trailing whitespace characters are being trimmed from non-textqualified column data.
      boolean getUseComments()
      Return whether comments (lines starting with the comment character) will be skipped while parsing or not.
      boolean getUseTextQualifier()
      Return whether text qualifiers will be used while parsing or not.
      String[] getValues()
      Return the list of column values.
      boolean isQualified​(int columnIndex)
      Return whether the entry in the given column was qualified, i.e.
      static CsvReader parse​(String data)
      Creates a CsvReader object using a string of data as the source. Uses ISO-8859-1 as the Charset.
      boolean readHeaders()
      Read the first record of data as column headers.
      boolean readRecord()
      Read the next record.
      void setCaptureRawRecord​(boolean captureRawRecord)
      Set the "Capture Raw Record" setting
      void setComment​(char comment)
      Set the character being used as a comment signal.
      void setDelimiter​(char delimiter)
      Set the character to use as the column delimiter.
      void setEscapeMode​(int escapeMode)
      Set the current way to escape an occurance of the text qualifier inside qualified data.
      void setHeaders​(String[] headers)
      Set the header values.
      void setRecordDelimiter​(char recordDelimiter)
      Set the character to use as the record delimiter.
      void setSafetySwitch​(boolean safetySwitch)
      Set the value of a safety switch to prevent the parser from using large amounts of memory in the case where parsing settings like file encodings don't end up matching the actual format of a file.
      void setSkipEmptyRecords​(boolean skipEmptyRecords)
      Set a flag to indicate whether empty records shall be skipped by the parser.
      void setTextQualifier​(char textQualifier)
      Set the character to use as a text qualifier in the data.
      void setTrimWhitespace​(boolean trimWhitespace)
      Set whether leading and trailing whitespace characters should be trimmed from non-textqualified column data or not.
      void setUseComments​(boolean useComments)
      Set whether comments (lines starting with the comment character) will be skipped while parsing or not.
      void setUseTextQualifier​(boolean useTextQualifier)
      Set whether text qualifiers will be used while parsing or not.
      boolean skipLine()
      Skip the next line of data using the standard end of line characters and will not do any column delimited parsing.
      boolean skipRecord()
      Skip the next record of data by parsing each column. Will not increment getCurrentRecord().
    • Field Detail

      • ESCAPE_MODE_DOUBLED

        public static final int ESCAPE_MODE_DOUBLED
        Double up the text qualifier to represent an occurrence of the text qualifier.
        See Also:
        Constant Field Values
      • ESCAPE_MODE_BACKSLASH

        public static final int ESCAPE_MODE_BACKSLASH
        Use a backslash character before the text qualifier to represent an occurance of the text qualifier.
        See Also:
        Constant Field Values
    • Constructor Detail

      • CsvReader

        public CsvReader​(String fileName,
                         char delimiter,
                         Charset charset)
                  throws FileNotFoundException
        Create a CsvReader object using a file as the data source.
        Parameters:
        fileName - the path to the file to use as the data source
        delimiter - the character to use as the column delimiter
        charset - the Charset to use while parsing the data
        Throws:
        FileNotFoundException - if the file does not exist
      • CsvReader

        public CsvReader​(String fileName,
                         char delimiter)
                  throws FileNotFoundException
        Create a CsvReader object using a file as the data source. Uses ISO-8859-1 as the Charset.
        Parameters:
        fileName - the path to the file to use as the data source
        delimiter - the character to use as the column delimiter
        Throws:
        FileNotFoundException - if the file does not exist
      • CsvReader

        public CsvReader​(String fileName)
                  throws FileNotFoundException
        Create a CsvReader object using a file as the data source. Uses a comma as the column delimiter and ISO-8859-1 as the Charset.
        Parameters:
        fileName - the path to the file to use as the data source
        Throws:
        FileNotFoundException - if the file does not exist
      • CsvReader

        public CsvReader​(Reader inputStream,
                         char delimiter)
        Create a CsvReader object using a Reader object as the data source.
        Parameters:
        inputStream - the stream to use as the data source
        delimiter - the character to use as the column delimiter
      • CsvReader

        public CsvReader​(Reader inputStream)
        Create a CsvReader object using a Reader object as the data source. Uses a comma as the column delimiter.
        Parameters:
        inputStream - the stream to use as the data source
      • CsvReader

        public CsvReader​(InputStream inputStream,
                         char delimiter,
                         Charset charset)
        Create a CsvReader object using an InputStream object as the data source.
        Parameters:
        inputStream - the stream to use as the data source
        delimiter - the character to use as the column delimiter
        charset - the Charset to use while parsing the data
      • CsvReader

        public CsvReader​(InputStream inputStream,
                         Charset charset)
        Create a CsvReader object using an InputStream object as the data source. Uses a comma as the column delimiter.
        Parameters:
        inputStream - the stream to use as the data source
        charset - the Charset to use while parsing the data
    • Method Detail

      • getCaptureRawRecord

        public boolean getCaptureRawRecord()
        Return the "Capture Raw Record" setting
        Returns:
        the current value of the "Capture Raw Record" setting
      • setCaptureRawRecord

        public void setCaptureRawRecord​(boolean captureRawRecord)
        Set the "Capture Raw Record" setting
        Parameters:
        captureRawRecord - the new value for the "Capture Raw Record" setting
      • getRawRecord

        public String getRawRecord()
        Return the raw record containing the current line read from the stream
        Returns:
        the raw record
      • getTrimWhitespace

        public boolean getTrimWhitespace()
        Return whether leading and trailing whitespace characters are being trimmed from non-textqualified column data. Default is true.
        Returns:
        whether leading and trailing whitespace characters are being trimmed from non-textqualified column data.
      • setTrimWhitespace

        public void setTrimWhitespace​(boolean trimWhitespace)
        Set whether leading and trailing whitespace characters should be trimmed from non-textqualified column data or not. Default is true.
        Parameters:
        trimWhitespace - whether leading and trailing whitespace characters should be trimmed from non-textqualified column data or not.
      • getDelimiter

        public char getDelimiter()
        Return the character being used as the column delimiter. Default is comma, ','.
        Returns:
        the character being used as the column delimiter.
      • setDelimiter

        public void setDelimiter​(char delimiter)
        Set the character to use as the column delimiter. Default is comma, ','.
        Parameters:
        delimiter - the character to use as the column delimiter.
      • getRecordDelimiter

        public char getRecordDelimiter()
        Return the character to use as the record delimiter.
        Returns:
        the character to use as the record delimiter. The default is a combination of standard end of line characters for Windows, Unix, and Mac.
      • setRecordDelimiter

        public void setRecordDelimiter​(char recordDelimiter)
        Set the character to use as the record delimiter.
        Parameters:
        recordDelimiter - the character to use as the record delimiter. The default is a combination of standard end of line characters for Windows, Unix, and Mac.
      • getTextQualifier

        public char getTextQualifier()
        Return the character to use as a text qualifier in the data.
        Returns:
        the character to use as a text qualifier in the data.
      • setTextQualifier

        public void setTextQualifier​(char textQualifier)
        Set the character to use as a text qualifier in the data.
        Parameters:
        textQualifier - the character to use as a text qualifier in the data.
      • getUseTextQualifier

        public boolean getUseTextQualifier()
        Return whether text qualifiers will be used while parsing or not.
        Returns:
        whether text qualifiers will be used while parsing
      • setUseTextQualifier

        public void setUseTextQualifier​(boolean useTextQualifier)
        Set whether text qualifiers will be used while parsing or not.
        Parameters:
        useTextQualifier - whether to use a text qualifier while parsing or not
      • getComment

        public char getComment()
        Return the character being used as a comment signal. The default comment character is the pound character ('#'). Lines starting with this character will be ignored if useComments is set.
        Returns:
        the character being used as a comment signal.
      • setComment

        public void setComment​(char comment)
        Set the character being used as a comment signal. The default comment character is the pound character ('#'). Lines starting with this character will be ignored if useComments is set.
        Parameters:
        comment - the character to use as a comment signal
      • getUseComments

        public boolean getUseComments()
        Return whether comments (lines starting with the comment character) will be skipped while parsing or not.
        Returns:
        whether comments are being looked for while parsing
      • setUseComments

        public void setUseComments​(boolean useComments)
        Set whether comments (lines starting with the comment character) will be skipped while parsing or not.
        Parameters:
        useComments - whether comments are being looked for while parsing
      • getEscapeMode

        public int getEscapeMode()
        Return the current way to escape an occurrence of the text qualifier inside qualified data.
        Returns:
        the current way to escape an occurrence of the text qualifier inside qualified data.
      • setEscapeMode

        public void setEscapeMode​(int escapeMode)
                           throws IllegalArgumentException
        Set the current way to escape an occurance of the text qualifier inside qualified data.
        Parameters:
        escapeMode - the way to escape an occurance of the text qualifier inside qualified data
        Throws:
        IllegalArgumentException - When an illegal value is specified for escapeMode
      • getSkipEmptyRecords

        public boolean getSkipEmptyRecords()
        Return a flag to indicate whether empty records shall be skipped by the parser.
        Returns:
        whether empty records will be skipped
      • setSkipEmptyRecords

        public void setSkipEmptyRecords​(boolean skipEmptyRecords)
        Set a flag to indicate whether empty records shall be skipped by the parser.
        Parameters:
        skipEmptyRecords - whether empty records will be skipped
      • getSafetySwitch

        public boolean getSafetySwitch()
        Return the value of a safety switch to prevent the parser from using large amounts of memory in the case where parsing settings like file encodings don't end up matching the actual format of a file. This switch can be turned off if the file format is known and tested. With the switch off, the max column lengths and max column count per record supported by the parser will greatly increase. Default is true.
        Returns:
        the current setting of the safety switch.
      • setSafetySwitch

        public void setSafetySwitch​(boolean safetySwitch)
        Set the value of a safety switch to prevent the parser from using large amounts of memory in the case where parsing settings like file encodings don't end up matching the actual format of a file. This switch can be turned off if the file format is known and tested. With the switch off, the max column lengths and max column count per record supported by the parser will greatly increase. Default is true.
        Parameters:
        safetySwitch - the new setting of the safety switch
      • getColumnCount

        public int getColumnCount()
        Return the number of columns found in this record.
        Returns:
        The column count
      • getCurrentRecord

        public long getCurrentRecord()
        Return the index of the current record.
        Returns:
        The index of the current record
      • getHeaderCount

        public int getHeaderCount()
        Return the number of headers read in by a previous call to readHeaders().
        Returns:
        the number of headers read in by a previous call to readHeaders().
      • getHeaders

        public String[] getHeaders()
                            throws IOException
        Return the header values as a string array.
        Returns:
        the header values as a String array
        Throws:
        IOException - if this object has already been closed.
      • setHeaders

        public void setHeaders​(String[] headers)
        Set the header values.
        Parameters:
        headers - the new header values
      • getValues

        public String[] getValues()
                           throws IOException
        Return the list of column values.
        Returns:
        the list of column values
        Throws:
        IOException - if this object has already been closed
      • get

        public String get​(int columnIndex)
                   throws IOException
        Return the current column value for a given column index.
        Parameters:
        columnIndex - the index of the column
        Returns:
        the current column value
        Throws:
        IOException - if this object has already been closed
      • get

        public String get​(String headerName)
                   throws IOException
        Returns the current column value for a given column header name.
        Parameters:
        headerName - the header name of the column
        Returns:
        the current column value
        Throws:
        IOException - if this object has already been closed
      • parse

        public static CsvReader parse​(String data)
        Creates a CsvReader object using a string of data as the source. Uses ISO-8859-1 as the Charset.
        Parameters:
        data - the non-null data String object to use as the source
        Returns:
        a CsvReader object using the String of data as the source
      • readRecord

        public boolean readRecord()
                           throws IOException
        Read the next record.
        Returns:
        whether another record was successfully read
        Throws:
        IOException - if an error occurred while reading data from the source stream
      • readHeaders

        public boolean readHeaders()
                            throws IOException
        Read the first record of data as column headers.
        Returns:
        whether the header record was successfully read
        Throws:
        IOException - if an error occurred while reading data from the source stream
      • getHeader

        public String getHeader​(int columnIndex)
                         throws IOException
        Return the column header value for a given column index.
        Parameters:
        columnIndex - the index of the header column being requested
        Returns:
        the value of the column header at the given column index
        Throws:
        IOException - if this object has already been closed
      • isQualified

        public boolean isQualified​(int columnIndex)
                            throws IOException
        Return whether the entry in the given column was qualified, i.e. started with a qualifier character.
        Parameters:
        columnIndex - the index of the column whose entry should be investigated
        Returns:
        whether the value is qualified
        Throws:
        IOException - if this object has already been closed
      • getIndex

        public int getIndex​(String headerName)
                     throws IOException
        Return the corresponding column index for a given column header name.
        Parameters:
        headerName - the header name of the column.
        Returns:
        The column index for the given column header name. Returns -1 if not found.
        Throws:
        IOException - if this object has already been closed.
      • skipRecord

        public boolean skipRecord()
                           throws IOException
        Skip the next record of data by parsing each column. Will not increment getCurrentRecord().
        Returns:
        whether another record was successfully skipped
        Throws:
        IOException - if an error occurred while reading data from the source stream.
      • skipLine

        public boolean skipLine()
                         throws IOException
        Skip the next line of data using the standard end of line characters and will not do any column delimited parsing.
        Returns:
        whether a line was successfully skipped
        Throws:
        IOException - if an error occurred while reading data from the source stream
      • close

        public void close()
        Close and releases all related resources.
      • finalize

        protected void finalize()
        Overrides:
        finalize in class Object