-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: do not cast ints to floats if inputs of crosstab are not aligned #17011
New issue
Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? No Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ | |
from pandas import Series, DataFrame, MultiIndex, Index | ||
from pandas.core.groupby import Grouper | ||
from pandas.core.reshape.util import cartesian_product | ||
from pandas.core.index import _get_combined_index | ||
from pandas.compat import range, lrange, zip | ||
from pandas import compat | ||
import pandas.core.common as com | ||
|
@@ -493,6 +494,13 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None, | |
rownames = _get_names(index, rownames, prefix='row') | ||
colnames = _get_names(columns, colnames, prefix='col') | ||
|
||
obs_idxes = [obj.index for objs in (index, columns) for obj in objs | ||
if hasattr(obj, 'index')] | ||
if obs_idxes: | ||
common_idx = _get_combined_index(obs_idxes, intersect=True) | ||
else: | ||
common_idx = None | ||
|
||
data = {} | ||
data.update(zip(rownames, index)) | ||
data.update(zip(colnames, columns)) | ||
|
@@ -503,20 +511,21 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None, | |
if values is not None and aggfunc is None: | ||
raise ValueError("values cannot be used without an aggfunc.") | ||
|
||
df = DataFrame(data, index=common_idx) | ||
if values is None: | ||
df = DataFrame(data) | ||
df['__dummy__'] = 0 | ||
table = df.pivot_table('__dummy__', index=rownames, columns=colnames, | ||
aggfunc=len, margins=margins, | ||
margins_name=margins_name, dropna=dropna) | ||
table = table.fillna(0).astype(np.int64) | ||
|
||
kwargs = {'aggfunc': len, 'fill_value': 0} | ||
else: | ||
data['__dummy__'] = values | ||
df = DataFrame(data) | ||
table = df.pivot_table('__dummy__', index=rownames, columns=colnames, | ||
aggfunc=aggfunc, margins=margins, | ||
margins_name=margins_name, dropna=dropna) | ||
df['__dummy__'] = values | ||
kwargs = {'aggfunc': aggfunc} | ||
|
||
table = df.pivot_table('__dummy__', index=rownames, columns=colnames, | ||
margins=margins, margins_name=margins_name, | ||
dropna=dropna, **kwargs) | ||
|
||
# GH 17013: | ||
if values is None and margins: | ||
table = table.fillna(0).astype(np.int64) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you shouldn't need the astype, fillna already does inference. or does something break? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this needed when not margins as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not following you... if
Now that I added the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||
|
||
# Post-process | ||
if normalize is not False: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think might be nicer to fix
_get_combined_index
to ignoreNone
in the arrayThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of this solution (the list comprehension would look only slightly better). What about adding a new method
_get_objs_combined_index
which gets a list of objects (with or without index) and does the above? It would also be used in theDataFrame
constructor.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's see what that looks like / simplifies as a separate PR