baloo.core.indexes package

baloo.core.indexes.base module

class baloo.core.indexes.base.Index(data, dtype=None, name=None)[source]

Bases: baloo.weld.lazy_result.LazyArrayResult, baloo.core.generic.BinaryOps, baloo.core.generic.BitOps, baloo.core.generic.IndexCommon, baloo.core.generic.BalooCommon

Weld-ed Pandas Index.

Examples

>>> import baloo as bl
>>> import numpy as np
>>> ind = bl.Index(np.array(['a', 'b', 'c'], dtype=np.dtype(np.bytes_)))
>>> ind  # repr
Index(name=None, dtype=|S1)
>>> print(ind)  # str
[b'a' b'b' b'c']
>>> ind.values
array([b'a', b'b', b'c'], dtype='|S1')
>>> len(ind)  # eager
3
Attributes:
dtype
name

Name of the Index.

__getitem__(item)[source]

Select from the Index. Currently used internally through DataFrame and Series.

Supported selection functionality exemplified below.

Examples

>>> ind = bl.Index(np.arange(3))
>>> print(ind[ind < 2].evaluate())
[0 1]
>>> print(ind[1:2].evaluate())
[1]
__init__(data, dtype=None, name=None)[source]

Initialize an Index object.

Parameters:
data : np.ndarray or WeldObject or list

Raw data or Weld expression.

dtype : np.dtype, optional

Numpy dtype of the elements. Inferred from data by default.

name : str, optional

Name of the Index.

dropna()[source]

Returns Index without null values according to Baloo’s convention.

Returns:
Index

Index with no null values.

evaluate(verbose=False, decode=True, passes=None, num_threads=1, apply_experimental=True)[source]

Evaluates by creating an Index containing evaluated data.

See LazyResult

Returns:
Index

Index with evaluated data.

fillna(value)[source]

Returns Index with missing values replaced with value.

Parameters:
value : {int, float, bytes, bool}

Scalar value to replace missing values with.

Returns:
Index

With missing values replaced.

classmethod from_pandas(index)[source]

Create baloo Index from pandas Index.

Parameters:
index : pandas.base.Index
Returns:
Index
head(n=5)[source]

Return Index with first n values.

Parameters:
n : int

Number of values.

Returns:
Series

Index containing the first n values.

Examples

>>> ind = bl.Index(np.arange(3, dtype=np.float64))
>>> print(ind.head(2).evaluate())
[0. 1.]
name

Name of the Index.

Returns:
str

name

tail(n=5)[source]

Return Index with the last n values.

Parameters:
n : int

Number of values.

Returns:
Series

Index containing the last n values.

Examples

>>> ind = bl.Index(np.arange(3, dtype=np.float64))
>>> print(ind.tail(2).evaluate())
[1. 2.]
to_pandas()[source]

Convert to pandas Index.

Returns:
pandas.base.Index

baloo.core.indexes.range module

class baloo.core.indexes.range.RangeIndex(start=None, stop=None, step=None, name=None)[source]

Bases: baloo.core.indexes.base.Index

Weld-ed Pandas RangeIndex.

Examples

>>> import baloo as bl
>>> import numpy as np
>>> ind = bl.RangeIndex(3)
>>> ind  # repr
RangeIndex(start=0, stop=3, step=1)
>>> weld_code = str(ind)  # weld_code
>>> ind.evaluate()
Index(name=None, dtype=int64)
>>> print(ind.evaluate())
[0 1 2]
>>> len(ind)  # eager
3
>>> (ind * 2).evaluate().values
array([0, 2, 4])
>>> (ind - bl.Series(np.arange(1, 4))).evaluate().values
array([-1, -1, -1])
Attributes:
start
stop
step
dtype
__init__(start=None, stop=None, step=None, name=None)[source]

Initialize a RangeIndex object.

If only 1 value (start) is passed, it will be considered the stop value. Note that this 1 value may also be a WeldObject for cases such as creating a Series with no index as argument.

Parameters:
start : int or WeldObject
stop : int or WeldObject, optional
step : int, optional
empty

Check whether the data structure is empty.

Returns:
bool
evaluate(verbose=False, decode=True, passes=None, num_threads=1, apply_experimental=True)[source]

Evaluates by creating an Index containing evaluated data.

See LazyResult

Returns:
Index

Index with evaluated data.

baloo.core.indexes.multi module

class baloo.core.indexes.multi.MultiIndex(data, names=None)[source]

Bases: baloo.core.generic.IndexCommon, baloo.core.generic.BalooCommon

Weld-ed MultiIndex, however completely different to Pandas.

This version merely groups a few columns together to act as an index and hence does not follow the labels/levels approach of Pandas.

Examples

>>> import baloo as bl
>>> import numpy as np
>>> ind = bl.MultiIndex([[1, 2, 3], np.array([4, 5, 6], dtype=np.float64)], names=['i1', 'i2'])
>>> ind  # repr
MultiIndex(names=['i1', 'i2'], dtypes=[dtype('int64'), dtype('float64')])
>>> print(ind)  # str
  i1    i2
----  ----
   1     4
   2     5
   3     6
>>> ind.values
[Index(name=i1, dtype=int64), Index(name=i2, dtype=float64)]
>>> len(ind)  # eager
3
Attributes:
names
dtypes
__getitem__(item)[source]

Select from the MultiIndex.

Supported functionality exemplified below.

Examples

>>> mi = bl.MultiIndex([np.array([1, 2, 3]), np.array([4., 5., 6.])], names=['i1', 'i2'])
>>> print(mi.values[0])
[1 2 3]
>>> print(mi[:2].evaluate())
  i1    i2
----  ----
   1     4
   2     5
>>> print(mi[mi.values[0] != 2].evaluate())
  i1    i2
----  ----
   1     4
   3     6
__init__(data, names=None)[source]

Initialize a MultiIndex object.

Parameters:
data : list of (numpy.ndarray or Index or list)

The internal data.

names : list of str, optional

The names of the data.

__len__()[source]

Eagerly get the length of the MultiIndex.

Note that if the length is unknown (such as for WeldObjects), it will be eagerly computed.

Returns:
int

Length of the MultiIndex.

dropna()[source]

Returns MultiIndex without any rows containing null values according to Baloo’s convention.

Returns:
MultiIndex

MultiIndex with no null values.

empty

Check whether the data structure is empty.

Returns:
bool
evaluate(verbose=False, decode=True, passes=None, num_threads=1, apply_experimental=True)[source]

Evaluates by creating a MultiIndex containing evaluated data and index.

See LazyResult

Returns:
MultiIndex

MultiIndex with evaluated data.

classmethod from_pandas(index)[source]

Create baloo MultiIndex from pandas MultiIndex.

Parameters:
index : pandas.multi.MultiIndex
Returns:
MultiIndex
name

Name of the Index.

Returns:
str

name

tail(n=5)[source]

Return MultiIndex with the last n values in each column.

Parameters:
n : int

Number of values.

Returns:
MultiIndex

MultiIndex containing the last n values in each column.

to_pandas()[source]

Convert to pandas MultiIndex.

Returns:
pandas.base.MultiIndex
values

Retrieve internal data.

Returns:
list

The internal list data representation.