当前位置: 澳门新濠3559 > 数据库 > 正文

1、查找全部重复记录,info的主键uid为自增的id后

时间:2019-11-09 19:40来源:数据库
 面试碰到一个MySQl的有趣的题目,如何从student表中删除重复名字的行,并保留最小id的记录? 刚开始,根据我的想法,这个很简单嘛,上sql语句 mysql中数据去重和优化 前言 很遗憾当时

 面试碰到一个MySQl的有趣的题目,如何从student表中删除重复名字的行,并保留最小id的记录?

刚开始,根据我的想法,这个很简单嘛,上sql语句

mysql中数据去重和优化

前言

很遗憾当时没有做出来,回家搜索了一番,发现利用子查询的可以很快解决。

delete from zqzrdp where tel in (select min(dpxx_id) from zqzrdp group by tel having count(tel)>1);

 

本文主要给大家介绍了关于MySQL中查询、删除重复记录的方法,分享出来供大家参考学习,下面来看看详细的介绍:

1、删除表中多余的重复记录,重复记录是username判断,只留有id最小的记录

 

delete from studentwhere
username in ( select username from studentgroup by username having count(username)>1)
and id not in (select min(id) as id from studentgroup by username having count(username)>1 )

(上面这条语句在mysql中执行会报错:

执行报错:1093 - You can't specify target table 'student' for update in FROM clause

原因是:更新数据时使用了查询,而查询的数据又做了更新的条件,mysql不支持这种方式。oracel和msserver都支持这种方式。

怎么规避这个问题?

再加一层封装,

delete from student where
username in (select username from ( select username from student group by username having count(username)>1) a)
and id not in ( select id from (select min(id) as id from student group by username having count(username)>1 ) b)

 注意select min(id) 后面要有as id.

其实还有更简单的办法(针对单个字段):

delete from student where
id not in (select id from (select min(id) as id from student group by username) b);

 

拓展:

执行,报错!!~!~

更改表user_info的主键uid为自增的id后,忘了设置原来主键uid属性为unique,结果导致产生uid重复的记录。为此需要清理后来插入的重复记录。

查找所有重复标题的记录:

2、删除表中多余的重复记录(多个字段),只留有id最小的记录

delete from student a
where (a.username,a.seq) in (select username,seq from (select username,seq from a group by username,seq having count(*) > 1)  t1)
and id not in ( select id from (select min(id) from vitae group by username,seq having count(*)>1) t2)

参考文章:

图片 1

 

select title,count(*) as count from user_table group by title having count>1; 


SELECT * FROM t_info a WHERE ((SELECT COUNT(*) FROM t_info WHERE Title = a.Title) > 1) ORDER BY Title DESC

异常意为:你不能指定目标表的更新在FROM子句。傻了,MySQL 这样写,不行,让人郁闷。

基本方法可以参考后面的附上的资料,但是由于mysql不支持同时对一个表进行操作,即子查询和要进行的操作不能是同一个表,因此需要通过零时表中转一下。

一、查找重复记录

难倒只能分步操作,蛋疼

 

1、查找全部重复记录

以下是网友写的,同样是坑爹的代码,我机器上运行不了。

写在前面:数据量大时,一定要多涉及的关键字段创建索引!!!否则很慢很慢很慢,慢到想死的心都有了

SELECT * FROM t_info a WHERE ((SELECT COUNT(*) FROM t_info WHERE Title = a.Title) > 1) ORDER BY Title DESC
  1. 查询需要删除的记录,会保留一条记录。

 

2、过滤重复记录(只显示一条)

select a.id,a.subject,a.RECEIVER from test1 a left join (select c.subject,c.RECEIVER ,max(c.id) as bid from test1 c where status=0 GROUP BY RECEIVER,SUBJECT having count(1) >1) b on a.id< b.bid where a.subject=b.subject and a.RECEIVER = b.RECEIVER and a.id < b.bid

1 单字段重复

Select * From HZT Where ID In (Select Max(ID) From HZT Group By Title)
  1. 删除重复记录,只保留一条记录。注意,subject,RECEIVER 要索引,否则会很慢的。

 

注:此处显示ID最大一条记录

delete a from test1 a, (select c.subject,c.RECEIVER ,max(c.id) as bid from test1 c where status=0 GROUP BY RECEIVER,SUBJECT having count(1) >1) b where a.subject=b.subject and a.RECEIVER = b.RECEIVER and a.id < b.bid;

生成零时表,其中uid是需要去重的字段

二、删除重复记录

  1. 查找表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断

 

1、删除全部重复记录(慎用

select * from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)

create table tmpuid as (select uid from userinfo group by uid having count(uid))

Delete 表 Where 重复字段 In (Select 重复字段 From 表 Group By 重复字段 Having Count(*)>1)

4. 删除表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断,只留有rowid最小的记录

 

2、保留一条(这个应该是大多数人所需要的 ^_^)

delete from people where peopleId in (select peopleId from people group by peopleId  having count(peopleId) > 1) and rowid not in (select min(rowid) from people group by peopleId having count(peopleId )>1)

create table tmpid as (select min(id) from userinfo group by uid having count(uid))

Delete HZT Where ID Not In (Select Max(ID) From HZT Group By Title)

5.删除表中多余的重复记录(多个字段),只留有rowid最小的记录

 

1、查找全部重复记录,info的主键uid为自增的id后。注:此处保留ID最大一条记录

delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

数据量大时一定要为uid创建索引

三、举例

看来想偷懒使用一句命令完成这个事好像不太显示,还是老老实实的分步处理吧,思路先建立复制一个临时表,然后对比临时表内的数据,删除主表里的数据

 

1、查找表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断

alter table tableName add autoID int auto_increment not null; 

create table tmp select min(autoID) as autoID from tableName group by Name,Address; 

create table tmp2 select tableName.* from tableName,tmp where tableName.autoID = tmp.autoID; 

drop table tableName; 

rename table tmp2 to tableName; 

create index indexuid on tmpuid

select * from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)

您可能感兴趣的文章:

  • mysql查找删除重复数据并只保留一条实例详解
  • Mysql删除重复的数据 Mysql数据去重复
  • MySQL中删除重复数据的简单方法
  • 删除MySQL重复数据的方法
  • MYSQL删除重复数据的简单方法
  • MySQL数据库中删除重复记录的方法总结[推荐]
  • 删除mysql数据库中的重复数据记录
  • Mysql删除重复数据保留最小的id 的解决方法

 

2、删除表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断,只留有rowid最小的记录

create index indexid on tmpid

delete from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1) and rowid not in (select min(rowid) from people group by peopleId having count(peopleId )>1)

 

3、查找表中多余的重复记录(多个字段)

删除多余的重复记录,保留重复项中id最小的

select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)

 

4、删除表中多余的重复记录(多个字段),只留有rowid最小的记录

delete from user_info where id not in (select id from tmp_id) and uid in (select uid from tmp_uid)

delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

 

5、查找表中多余的重复记录(多个字段),不包含rowid最小的记录

2.多字段重复

select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

 

四、补充

由uid的重复间接的导致了relationship中的记录重复,故继续去重。先介绍正常处理流程,在介绍本人根据自身数据特点实践的更加有效的方法!

有两个以上的重复记录,一是完全重复的记录,也即所有字段均重复的记录,二是部分关键字段重复的记录,比如Name字段重复,而其他字段不一定重复或都重复可以忽略。

 

1、对于第一种重复,比较容易解决,使用

2.1一般方法

select distinct * from tableName

 

就可以得到无重复记录的结果集。

基本的同上面:

如果该表需要删除重复的记录(重复记录保留1条),可以按以下方法删除

 

select distinct * into #Tmp from tableName
drop table tableName
select * into tableName from #Tmp
drop table #Tmp

生成零时表

发生这种重复的原因是表设计不周产生的,增加唯一索引列即可解决。

 

2、这类重复问题通常要求保留重复记录中的第一条记录,操作方法如下

create table tmp_relation as (select source,target from relationship group by source,target having count(*)>1)

假设有重复的字段为Name,Address,要求得到这两个字段唯一的结果集

 

select identity(int,1,1) as autoID, * into #Tmp from tableName
select min(autoID) as autoID into #Tmp2 from #Tmp group by Name,autoID
select * from #Tmp where autoID in(select autoID from #tmp2)

create table tmprelationshipid as (select min(id) as id from relationship group by source,target having count(*)>1)

总结

 

以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流,谢谢大家对脚本之家的支持。

创建索引

您可能感兴趣的文章:

  • MySQL 查询某个字段不重复的所有记录
  • MySQL大表中重复字段的高效率查询方法
  • MySQL中distinct语句去查询重复记录及相关的性能讨论
  • mysql 查询重复的数据的SQL优化方案
  • mysql查询表里的重复数据方法
  • Mysql一些复杂的sql语句(查询与删除重复的行)
  • mysql删除重复记录语句的方法
  • mysql 数据表中查找重复记录
  • MySql避免重复插入记录的几种方法
  • MySQL 删除数据库中重复数据方法小结

 

create index indexid on tmprelationship_id

 

删除

 

delete from relationship where id not in (select id from tmprelationshipid) and (source,target) in (select source,target from relationship)

 

2.2 实践出真知

 

实践中发现上面的删除字段重复的方法,由于没有办法为多字段重建索引,导致数据量大时效率极低,低到无法忍受。最后,受不了等了半天没反应的状况,本人决定,另辟蹊径。

 

考虑到,估计同一记录的重复次数比较低。一般为2,或3,重复次数比较集中。所以可以尝试直接删除重复项中最大的,直到删除到不重复,这时其id自然也是当时重复的里边最小的。

 

大致流程如下:

 

1)选择每个重复项中id最大的一个记录

 

create table tmprelationid2 as (select max(id) from relationship group by source,target having count(*)>1)

 

2)创建索引(仅需在第一次时执行)

 

create index indexid on tmprelation_id2

 

3)删除 重复项中id最大的记录

 

delete from relationship where id in (select id from tmprelationid2)

 

4)删除临时表

 

drop table tmprelationid2

 

重复上述步骤1),2),3),4),直到创建的临时表中不存在记录就结束(对于重复次数的数据,比较高效)

 

查询及删除重复记录的方法

 

(一) 1、查找表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断 select * from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)

 

2、删除表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断,只留有rowid最小的记录 delete from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1) and rowid not in (select min(rowid) from people group by peopleId having count(peopleId )>1)

 

3、查找表中多余的重复记录(多个字段) select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)

 

4、删除表中多余的重复记录(多个字段),只留有rowid最小的记录 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count() > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count()>1)

 

更改表user_info的主键uid为自增的id后,忘了设置原来主键uid属性为unique,结果导致产生uid重复的记录。为此需要清理...

编辑:数据库 本文来源:1、查找全部重复记录,info的主键uid为自增的id后

关键词:

  • 上一篇:没有了
  • 下一篇:没有了