API Reference¶
Reading Data from SPSS¶
to_dataframe¶
-
to_dataframe
(data: Union[bytes, _io.BytesIO, os.PathLike[Any]], limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Reads SPSS data and returns a
tuple
with a PandasDataFrame
object and relevantMetadata
.- Parameters
data (Path-like filename,
bytes
orBytesIO
) – The SPSS data to load. Accepts either a series of bytes or a filename.limit (
int
orNone
) – The number of records to read from the data. IfNone
will return all records. Defaults toNone
.offset (
int
) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
str
orNone
) – A list of the variables that should be ignored when reading data. Defaults toNone
.include_variables (iterable of
str
orNone
) – A list of the variables that should be explicitly included when reading data. Defaults toNone
.metadata_only (
bool
) – IfTrue
, will return no data records in the resultingDataFrame
but will return a completeMetadata
instance. Defaults toFalse
.apply_labels (
bool
) – IfTrue
, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse
.labels_as_categories (
bool
) –If
True
, will convert labeled or formatted values to Pandas categories. Defaults toTrue
.Caution
This parameter will only have an effect if the
apply_labels
parameter isTrue
.missing_as_NaN (
bool
) – IfTrue
, will return any missing values asNaN
. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse
, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool
) – ifTrue
, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime
, ordate
, etc. representations (or Pandasdatetime64
, depending on thedates_as_datetime64
parameter). IfFalse
, will leave the original integer representation. Defaults toTrue
.dates_as_datetime64 (
bool
) –If
True
, will return any date values as Pandasdatetime64
types. Defaults toFalse
.Caution
This parameter is only applied if
convert_datetimes
is set toTrue
.
- Returns
A
DataFrame
representation of the SPSS data (orNone
) and aMetadata
representation of the data’s meta-data (value and labels / data map).- Return type
pandas.DataFrame
/None
andMetadata
to_csv¶
-
to_csv
(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, include_header: bool = True, delimter: str = '|', null_text: str = 'NaN', wrapper_character: str = "'", escape_character: str = '\\', line_terminator: str = '\r\n', decimal: str = '.', limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
data
into a CSV string where each row represents a record of SPSS data.- Parameters
data (Path-like filename,
bytes
orBytesIO
) – The SPSS data to load. Accepts either a series of bytes or a filename.target (Path-like /
StringIO
/str
/None
) – The destination where the CSV representation should be stored. Accepts either a filename, file-pointer or aStringIO
, orNone
. IfNone
, will return astr
object stored in-memory. Defaults toNone
.include_header (
bool
) – IfTrue
, will include a header row with column labels. IfFalse
, will not include a header row. Defaults toTrue
.delimiter (
str
) – The delimiter used between columns. Defaults to|
.null_text (
str
) – The text value to use in place of empty values. Only applies ifwrap_empty_values
isTrue
. Defaults to'NaN'
.wrapper_character (
str
) – The string used to wrap string values when wrapping is necessary. Defaults to'
.escape_character (
str
) – The character to use when escaping nested wrapper characters. Defaults to\
.line_terminator (
str
) – The character used to mark the end of a line. Defaults to\r\n
.decimal (
str
) – The character used to indicate a decimal place in a numerical value. Defaults to.
.limit (
int
orNone
) – The number of records to read from the data. IfNone
will return all records. Defaults toNone
.offset (
int
) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
str
orNone
) – A list of the variables that should be ignored when reading data. Defaults toNone
.include_variables (iterable of
str
orNone
) – A list of the variables that should be explicitly included when reading data. Defaults toNone
.metadata_only (
bool
) – IfTrue
, will return no data records in the resultingDataFrame
but will return a completeMetadata
instance. Defaults toFalse
.apply_labels (
bool
) – IfTrue
, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse
.labels_as_categories (
bool
) –If
True
, will convert labeled or formatted values to Pandas categories. Defaults toTrue
.Caution
This parameter will only have an effect if the
apply_labels
parameter isTrue
.missing_as_NaN (
bool
) – IfTrue
, will return any missing values asNaN
. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse
, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool
) – ifTrue
, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime
, ordate
, etc. representations (or Pandasdatetime64
, depending on thedates_as_datetime64
parameter). IfFalse
, will leave the original integer representation. Defaults toTrue
.dates_as_datetime64 (
bool
) –If
True
, will return any date values as Pandasdatetime64
types. Defaults toFalse
.Caution
This parameter is only applied if
convert_datetimes
is set toTrue
.
- Returns
None
iftarget
was notNone
, otherwise astr
representation of the CSV file.- Return type
to_excel¶
-
to_excel
(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.BytesIO, pandas.io.excel._base.ExcelWriter]] = None, sheet_name: str = 'Sheet1', start_row: int = 0, start_column: int = 0, null_text: str = 'NaN', include_header: bool = True, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
data
into an Excel file where each row represents a record of SPSS data.- Parameters
data (Path-like filename,
bytes
orBytesIO
) – The SPSS data to load. Accepts either a series of bytes or a filename.target (Path-like /
BytesIO
/ExcelWriter
) – The destination where the Excel file should be stored. Accepts either a filename, file-pointer or aBytesIO
, or anExcelWriter
instance.sheet_name (
str
) – The worksheet on which the SPSS data should be written. Defaults to'Sheet1'
.start_row (
int
) – The row number (starting at 0) where the SPSS data should begin. Defaults to0
.start_column (
int
) – The column number (starting at 0) where the SPSS data should begin. Defaults to0
.null_text (
str
) – The way that missing values should be represented in the Excel file. Defaults to''
(an empty string).include_header (
bool
) – IfTrue
, will include a header row with column labels. IfFalse
, will not include a header row. Defaults toTrue
.limit (
int
orNone
) – The number of records to read from the data. IfNone
will return all records. Defaults toNone
.offset (
int
) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
str
orNone
) – A list of the variables that should be ignored when reading data. Defaults toNone
.include_variables (iterable of
str
orNone
) – A list of the variables that should be explicitly included when reading data. Defaults toNone
.metadata_only (
bool
) – IfTrue
, will return no data records in the resultingDataFrame
but will return a completeMetadata
instance. Defaults toFalse
.apply_labels (
bool
) – IfTrue
, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse
.labels_as_categories (
bool
) –If
True
, will convert labeled or formatted values to Pandas categories. Defaults toTrue
.Caution
This parameter will only have an effect if the
apply_labels
parameter isTrue
.missing_as_NaN (
bool
) – IfTrue
, will return any missing values asNaN
. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse
, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool
) – ifTrue
, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime
, ordate
, etc. representations (or Pandasdatetime64
, depending on thedates_as_datetime64
parameter). IfFalse
, will leave the original integer representation. Defaults toTrue
.dates_as_datetime64 (
bool
) –If
True
, will return any date values as Pandasdatetime64
types. Defaults toFalse
.Caution
This parameter is only applied if
convert_datetimes
is set toTrue
.
- Returns
None
iftarget
was notNone
, otherwise aBytesIO
representation of the Excel file.- Return type
to_json¶
-
to_json
(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
data
into a JSON string.- Parameters
data (Path-like filename,
bytes
orBytesIO
) – The SPSS data to load. Accepts either a series of bytes or a filename.target (Path-like /
StringIO
/str
/None
) – The destination where the JSON representation should be stored. Accepts either a filename, file-pointer orStringIO
, orNone
. IfNone
, will return astr
object stored in-memory. Defaults toNone
.layout (
str
) –Indicates the layout schema to use for the JSON representation of the data. Accepts:
records
, where the resulting JSON object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding valuetable
, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects
Defaults to
records
.double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to
10
.limit (
int
orNone
) – The number of records to read from the data. IfNone
will return all records. Defaults toNone
.offset (
int
) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
str
orNone
) – A list of the variables that should be ignored when reading data. Defaults toNone
.include_variables (iterable of
str
orNone
) – A list of the variables that should be explicitly included when reading data. Defaults toNone
.metadata_only (
bool
) – IfTrue
, will return no data records in the resultingDataFrame
but will return a completeMetadata
instance. Defaults toFalse
.apply_labels (
bool
) – IfTrue
, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse
.labels_as_categories (
bool
) –If
True
, will convert labeled or formatted values to Pandas categories. Defaults toTrue
.Caution
This parameter will only have an effect if the
apply_labels
parameter isTrue
.missing_as_NaN (
bool
) – IfTrue
, will return any missing values asNaN
. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse
, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool
) – ifTrue
, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime
, ordate
, etc. representations (or Pandasdatetime64
, depending on thedates_as_datetime64
parameter). IfFalse
, will leave the original integer representation. Defaults toTrue
.dates_as_datetime64 (
bool
) –If
True
, will return any date values as Pandasdatetime64
types. Defaults toFalse
.Caution
This parameter is only applied if
convert_datetimes
is set toTrue
.
- Returns
None
iftarget
was notNone
, otherwise astr
representation of the JSON output.- Return type
to_yaml¶
-
to_yaml
(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
data
into a YAML string.- Parameters
data (Path-like filename,
bytes
orBytesIO
) – The SPSS data to load. Accepts either a series of bytes or a filename.target (Path-like /
StringIO
/str
/None
) – The destination where the YAML representation should be stored. Accepts either a filename, file-pointer orStringIO
, orNone
. IfNone
, will return astr
object stored in-memory. Defaults toNone
.layout (
str
) –Indicates the layout schema to use for the JSON representation of the data. Accepts:
records
, where the resulting YAML object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding valuetable
, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects
Defaults to
records
.double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to
10
.limit (
int
orNone
) – The number of records to read from the data. IfNone
will return all records. Defaults toNone
.offset (
int
) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
str
orNone
) – A list of the variables that should be ignored when reading data. Defaults toNone
.include_variables (iterable of
str
orNone
) – A list of the variables that should be explicitly included when reading data. Defaults toNone
.metadata_only (
bool
) – IfTrue
, will return no data records in the resultingDataFrame
but will return a completeMetadata
instance. Defaults toFalse
.apply_labels (
bool
) – IfTrue
, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse
.labels_as_categories (
bool
) –If
True
, will convert labeled or formatted values to Pandas categories. Defaults toTrue
.Caution
This parameter will only have an effect if the
apply_labels
parameter isTrue
.missing_as_NaN (
bool
) – IfTrue
, will return any missing values asNaN
. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse
, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool
) – ifTrue
, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime
, ordate
, etc. representations (or Pandasdatetime64
, depending on thedates_as_datetime64
parameter). IfFalse
, will leave the original integer representation. Defaults toTrue
.dates_as_datetime64 (
bool
) –If
True
, will return any date values as Pandasdatetime64
types. Defaults toFalse
.Caution
This parameter is only applied if
convert_datetimes
is set toTrue
.
- Returns
None
iftarget
was notNone
, otherwise astr
representation of the YAML output.- Return type
to_dict¶
-
to_dict
(data: Union[os.PathLike[Any], _io.BytesIO, bytes], layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
data
into a Pythondict
.- Parameters
data (Path-like filename,
bytes
orBytesIO
) – The SPSS data to load. Accepts either a series of bytes or a filename.layout (
str
) –Indicates the layout schema to use for the JSON representation of the data. Accepts:
records
, where the resulting YAML object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding valuetable
, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects
Defaults to
records
.double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to
10
.limit (
int
orNone
) – The number of records to read from the data. IfNone
will return all records. Defaults toNone
.offset (
int
) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
str
orNone
) – A list of the variables that should be ignored when reading data. Defaults toNone
.include_variables (iterable of
str
orNone
) – A list of the variables that should be explicitly included when reading data. Defaults toNone
.metadata_only (
bool
) – IfTrue
, will return no data records in the resultingDataFrame
but will return a completeMetadata
instance. Defaults toFalse
.apply_labels (
bool
) – IfTrue
, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse
.labels_as_categories (
bool
) –If
True
, will convert labeled or formatted values to Pandas categories. Defaults toTrue
.Caution
This parameter will only have an effect if the
apply_labels
parameter isTrue
.missing_as_NaN (
bool
) – IfTrue
, will return any missing values asNaN
. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse
, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool
) – ifTrue
, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime
, ordate
, etc. representations (or Pandasdatetime64
, depending on thedates_as_datetime64
parameter). IfFalse
, will leave the original integer representation. Defaults toTrue
.dates_as_datetime64 (
bool
) –If
True
, will return any date values as Pandasdatetime64
types. Defaults toFalse
.Caution
This parameter is only applied if
convert_datetimes
is set toTrue
.
- Returns
None
iftarget
was notNone
, otherwise alist
ofdict
iflayout
isrecords
, or adict
iflayout
istable
.- Return type
get_metadata¶
-
get_metadata
(data)[source]¶ Retrieve the metadata that describes the coded representation of the data, corresponding formatting information, and their related human-readable labels.
- Parameters
data (Path-like filename,
bytes
orBytesIO
) – The SPSS data to load. Accepts either a series of bytes or a filename.- Returns
The metadata that describes the raw data and its corresponding labels.
- Return type
Metadata
Writing Data to SPSS¶
from_dataframe¶
-
from_dataframe
(df: pandas.core.frame.DataFrame, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, metadata: Optional[spss_converter.Metadata.Metadata] = None, compress: bool = False)[source]¶ Create an SPSS dataset from a Pandas
DataFrame
.- Parameters
df (
pandas.DataFrame
) – TheDataFrame
to serialize to an SPSS dataset.target (Path-like /
BytesIO
/None
) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIO
object, orNone
. IfNone
will return aBytesIO
object containing the SPSS dataset. Defaults toNone
.metadata (
Metadata
/None
) – TheMetadata
associated with the dataset. IfNone
, will attempt to derive it formdf
. Defaults toNone
.compress (
bool
) – IfTrue
, will return data in the compressed ZSAV format. IfFalse
, will return data in the standards SAV format. Defaults toFalse
.
- Returns
A
BytesIO
object containing the SPSS data iftarget
isNone
or not a filename, otherwiseNone
- Return type
- Raises
ValueError – if
df
is not apandas.DataFrame
ValueError – if
metadata
is not aMetadata
from_csv¶
-
from_csv
(as_csv: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, delimiter='|', **kwargs)[source]¶ Convert a CSV file into an SPSS dataset.
Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
pandas.read_csv()
function.- Parameters
as_csv (
str
/ File-location /BytesIO
) – The CSV data that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO
/None
) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIO
object, orNone
. IfNone
will return aBytesIO
object containing the SPSS dataset. Defaults toNone
.compress (
bool
) – IfTrue
, will return data in the compressed ZSAV format. IfFalse
, will return data in the standards SAV format. Defaults toFalse
.delimiter (
str
) – The delimiter used between columns. Defaults to|
.kwargs (
dict
) – Additional keyword arguments which will be passed onto thepandas.read_csv()
function.
- Returns
A
BytesIO
object containing the SPSS data iftarget
isNone
or not a filename, otherwiseNone
- Return type
from_excel¶
-
from_excel
(as_excel, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶ Convert Excel data into an SPSS dataset.
Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
pandas.read_excel()
function.- Parameters
as_excel (
str
/ File-location /BytesIO
/bytes
/ExcelFile
) – The Excel data that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO
/None
) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIO
object, orNone
. IfNone
will return aBytesIO
object containing the SPSS dataset. Defaults toNone
.compress (
bool
) – IfTrue
, will return data in the compressed ZSAV format. IfFalse
, will return data in the standards SAV format. Defaults toFalse
.kwargs (
dict
) – Additional keyword arguments which will be passed onto thepandas.read_excel()
function.
- Returns
A
BytesIO
object containing the SPSS data iftarget
isNone
or not a filename, otherwiseNone
- Return type
from_json¶
-
from_json
(as_json: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶ Convert JSON data into an SPSS dataset.
Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
pandas.read_json()
function.- Parameters
as_json (
str
/ File-location /BytesIO
) – The JSON data that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO
/None
) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIO
object, orNone
. IfNone
will return aBytesIO
object containing the SPSS dataset. Defaults toNone
.compress (
bool
) – IfTrue
, will return data in the compressed ZSAV format. IfFalse
, will return data in the standards SAV format. Defaults toFalse
.kwargs (
dict
) – Additional keyword arguments which will be passed onto thepandas.read_json()
function.
- Returns
A
BytesIO
object containing the SPSS data iftarget
isNone
or not a filename, otherwiseNone
- Return type
from_yaml¶
-
from_yaml
(as_yaml: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶ Convert YAML data into an SPSS dataset.
Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
DataFrame.from_dict()
method.- Parameters
as_yaml (
str
/ File-location /BytesIO
) – The YAML data that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO
/None
) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIO
object, orNone
. IfNone
will return aBytesIO
object containing the SPSS dataset. Defaults toNone
.compress (
bool
) – IfTrue
, will return data in the compressed ZSAV format. IfFalse
, will return data in the standards SAV format. Defaults toFalse
.kwargs (
dict
) – Additional keyword arguments which will be passed onto theDataFrame.from_dict()
method.
- Returns
A
BytesIO
object containing the SPSS data iftarget
isNone
or not a filename, otherwiseNone
- Return type
from_dict¶
-
from_dict
(as_dict: dict, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶ Convert a
dict
object into an SPSS dataset.Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
DataFrame.from_dict()
method.- Parameters
as_dict (
dict
) – Thedict
data that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO
/None
) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIO
object, orNone
. IfNone
will return aBytesIO
object containing the SPSS dataset. Defaults toNone
.compress (
bool
) – IfTrue
, will return data in the compressed ZSAV format. IfFalse
, will return data in the standards SAV format. Defaults toFalse
.kwargs (
dict
) – Additional keyword arguments which will be passed onto theDataFrame.from_dict()
method.
- Returns
A
BytesIO
object containing the SPSS data iftarget
isNone
or not a filename, otherwiseNone
- Return type
apply_metadata¶
-
apply_metadata
(df: pandas.core.frame.DataFrame, metadata: Union[spss_converter.Metadata.Metadata, dict, pyreadstat._readstat_parser.metadata_container], as_category: bool = True)[source]¶ Updates the
DataFrame
df
based on themetadata
.- Parameters
df (
pandas.DataFrame
) – TheDataFrame
to update.metadata (
Metadata
,pyreadstat.metadata_container
, or compatibledict
) – TheMetadata
to apply todf
.as_category (
bool
) – ifTrue
, will variables with formats will be transformed into categories in theDataFrame
. Defaults toTrue
.
- Returns
A copy of
df
updated to reflectmetadata
.- Return type
Utility Classes¶
Metadata¶
-
class
Metadata
(**kwargs)[source]¶ Object representation of metadata retrieved from an SPSS file.
-
classmethod
from_pyreadstat
(as_metadata)[source]¶ Create a
Metadata
instance from a Pyreadstat metadata object.- Parameters
as_metadata (
Pyreadstat.metadata_container
) –The Pyreadstat metadata object from which the
Metadata
instance should be created.- Returns
The
Metadata
instance.- Return type
-
to_pyreadstat
()[source]¶ Create a Pyreadstat metadata representation of the
Metadata
instance.- Returns
The Pyreadstat metadata.
- Return type
metadata_container <pyreadstat:_readstat_parser.metadata_container
-
property
column_metadata
¶ Collection of metadata that describes each column or variable within the dataset.
- Returns
A
dict
where the key is the name of the column/variable and the value is aColumnMetadata
object or compatibledict
.- Return type
-
property
file_label
¶ The file label.
Note
This property is irrelevant for SPSS, but is relevant for SAS data.
-
classmethod
ColumnMetadata¶
-
class
ColumnMetadata
(**kwargs)[source]¶ Object representation of the metadata that describes a column or variable form an SPSS file.
-
add_to_pyreadstat
(pyreadstat)[source]¶ Update
pyreadstat
to include the metadata for this column/variable.- Parameters
pyreadstat (
metadata_container <pyreadstat:_readstat_parser.metadata_container
) –The Pyreadstat metadata object where the
ColumnMetadata
data should be updated.- Returns
The Pyreadstat metadata.
- Return type
metadata_container <pyreadstat:_readstat_parser.metadata_container
-
classmethod
from_dict
(as_dict: dict)[source]¶ Create a new
ColumnMetadata
instance from adict
representation.- Parameters
as_dict (
dict
) – Thedict
representation of theColumnMetadata
.- Returns
The
ColumnMetadata
instance.- Return type
-
classmethod
from_pyreadstat_metadata
(name: str, as_metadata)[source]¶ Create a new
ColumnMetadata
instance from a Pyreadstat metadata object.- Parameters
name (
str
) – The name of the variable for which aColumnMetadata
instance should be created.as_metadata (
Pyreadstat.metadata_container
) –The Pyreadstat metadata object from which the column’s metadata should be extracted.
- Returns
The
ColumnMetadata
instance.- Return type
-
property
alignment
¶ The alignment to apply to values from this column/variable when displaying data. Defaults to
'unknown'
.Accepts either
'unknown'
,'left'
,'center'
, or'right'
as either a case-insensitivestr
or aVariableAlignmentEnum
.- Return type
VariableAlignmentEnum
-
property
display_width
¶ The maximum width at which the value is displayed. Defaults to 0.
- Return type
-
property
measure
¶ A classification of the type of measure (or value type) represented by the variable. Defaults to
'unknown'
.Accepts either
'unknown'
,'nominal'
,'ordinal'
, or'scale'
.- Return type
VariableMeasureEnum
-
property
missing_range_metadata
¶ Collection of meta data that defines the numerical ranges that are to be considered missing in the underlying data.
-
property
missing_value_metadata
¶ Value used to represent misisng values in the raw data. Defaults to
None
.Note
This is not actually relevant for SPSS data, but is an artifact for SAS and SATA data.
-
property
storage_width
¶ The width of data to store in the data file for the value. Defaults to 0.
- Rytpe
-
property
value_metadata
¶ Collection of values possible for the column/variable, with corresponding labels for each value.
-