pytorch笔记篇:pandas之数据预处理(更新中)
- 开源代码
- 2025-07-22 04:51:02

pytorch笔记篇:pandas之数据预处理 pytorch笔记篇:pandas之数据预处理(更新中)测试例代码相关的算子 pytorch笔记篇:pandas之数据预处理(更新中) 测试例代码 print(train_data.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]]) # (※1) 为什么test_data的列最后不是-1,是因为test_data没有价格这个列项 all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:])) print('-----------------------------------------------') print(all_features.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]]) # (※2) 获取到不是数值的列index] numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index # print('++++++++++++++++++++++++') # (※3) print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]]) # print('----------------------') all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std())) # print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]]) # input() # (※4) 在标准化数据之后,所有均值消失,因此我们可以将缺失值设置为0 all_features[numeric_features] = all_features[numeric_features].fillna(0) # (※5) dummies & pd to tensor print('++++++++++ demo test dummies +++++++++++') test = pd.DataFrame({'“x”':[1,2,3,4,5, 6], "seasion":['here', 'over', '', 'next', '', 'here']}) print(test) print('-------------------------------') test = pd.get_dummies(test, dummy_na=True) print(test) test = test*1 print(test) print('++++++++++ test trans to tensor +++++++++++') # test1 = torch.tensor(test) # 全部转化 test1 = torch.tensor(test.values, dtype=torch.float32) print(test1.shape) print(test1) print('-------------------------------') # 不用iloc的话就是光是行处理 test2 = torch.tensor(test[:3].values, dtype=torch.float32) print(test2.shape) print(test2) print('-------------------------------') # 特定行列转化需要熟练运动iloc test3 = torch.tensor(test.iloc[:2, :-1].values, dtype=torch.float32) print(test3.shape) print(test3) input() output-begin: (1460, 81) (1459, 80) Id MSSubClass MSZoning LotFrontage SaleType SaleCondition SalePrice 0 1 60 RL 65.0 WD Normal 208500 1 2 20 RL 80.0 WD Normal 181500 2 3 60 RL 68.0 WD Normal 223500 3 4 70 RL 60.0 WD Abnorml 140000 ----------------------------------------------- MSSubClass MSZoning LotFrontage LotArea YrSold SaleType SaleCondition 0 60 RL 65.0 8450 2008 WD Normal 1 20 RL 80.0 9600 2007 WD Normal 2 60 RL 68.0 11250 2008 WD Normal 3 70 RL 60.0 9550 2006 WD Abnorml ++++++++++ demo test dummies +++++++++++ “x” seasion 0 1 here 1 2 over 2 3 3 4 next 4 5 5 6 here ------------------------------- “x” seasion_ seasion_here seasion_next seasion_over seasion_nan 0 1 False True False False False 1 2 False False False True False 2 3 True False False False False 3 4 False False True False False 4 5 True False False False False 5 6 False True False False False “x” seasion_ seasion_here seasion_next seasion_over seasion_nan 0 1 0 1 0 0 0 1 2 0 0 0 1 0 2 3 1 0 0 0 0 3 4 0 0 1 0 0 4 5 1 0 0 0 0 5 6 0 1 0 0 0 ++++++++++ test trans to tensor +++++++++++ torch.Size([6, 6]) tensor([[1., 0., 1., 0., 0., 0.], [2., 0., 0., 0., 1., 0.], [3., 1., 0., 0., 0., 0.], [4., 0., 0., 1., 0., 0.], [5., 1., 0., 0., 0., 0.], [6., 0., 1., 0., 0., 0.]]) ------------------------------- torch.Size([3, 6]) tensor([[1., 0., 1., 0., 0., 0.], [2., 0., 0., 0., 1., 0.], [3., 1., 0., 0., 0., 0.]]) ------------------------------- torch.Size([2, 5]) tensor([[1., 0., 1., 0., 0.], [2., 0., 0., 0., 1.]]) output-end 相关的算子
concat — 合并. iloc — 筛选行列. apply — 处理列数据. fillna — 填补数值空缺. get_dummies — 独热编码(自行测试显示)
无
PS: 略。
pytorch笔记篇:pandas之数据预处理(更新中)由讯客互联开源代码栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“pytorch笔记篇:pandas之数据预处理(更新中)”