Compare files

Yanosh · 2367

Yanosh

  • Member
  • ***
    • Posts: 106
    • Karma: +4/-0
on: July 24, 2020, 03:21:20 PM
How can I comprare two directory or devices so I can see which file is missing on the other device and if the same file is copied elsewhere and buried in a different directory?

Sorry for my bad english.



magorium

  • Legendary Member
  • *****
    • Posts: 632
    • Karma: +62/-0
  • Convicted non contributor
Reply #1 on: July 24, 2020, 04:38:07 PM
How can I comprare two directory or devices so I can see which file is missing on the other device and if the same file is copied elsewhere and buried in a different directory?

In case you are looking for a single command that is able to do that for you then the answer to that question would be that you can't (unless someone programs that functionality explicitly into a single command/script).

Usually, that would be a two step process:
1 compare drawers and their contents
2 in case a file is missing, look/search for that file elsewhere

I'm perhaps not the smartest of people when it comes to using shell commands but I can at least try and help you to retrieve a list of missing/different files. The simplest for that imho would be to use the command diff from the development package.

You can invoke diff to compare two drawers:
Code: [Select]
diff drawer1 drawer2

For diff to report only the missing or files that are different, you can use the option -q
Code: [Select]
diff -q drawer1 drawer2

But, that will not compare the sub-drawers. We need recursion for that with the option -r
Code: [Select]
diff -qr drawer1 drawer2
That should be able to get you a list of all files that differ or are missing, comparing the provided drawers and their sub-drawers.

More information about the diff command: https://www.man7.org/linux/man-pages/man1/diff.1.html


For the second part (looking for that file elsewhere), you could make use of the command find (also part of development package).

Code: [Select]
Find search_location -type f -name "filename_to_search_for"

more information about the find command: https://man7.org/linux/man-pages/man1/find.1.html


Depending on your results from step 1 these results could perhaps be piped to the find command used in step 2.

In case the results from step1 only gives you a small list of results then it would perhaps be easier to search for these files manually using the find command for each individual missing/diff file.

Of course it is always possible to create some batch script as well.

In/At how many locations do you need to start searching for the diff/missing files ?


Yanosh

  • Member
  • ***
    • Posts: 106
    • Karma: +4/-0
Reply #2 on: July 27, 2020, 02:15:56 PM
Quote
In case you are looking for a single command that is able to do that for you then the answer to that question would be that you can't (unless someone programs that functionality explicitly into a single command/script).

I need to backup two 40 gb hd. It's hard and boring doing it manually.

Quote
You can invoke diff to compare two drawers:
Code: [Select]
diff drawer1 drawer2

For diff to report only the missing or files that are different, you can use the option -q
Code: [Select]
diff -q drawer1 drawer2

But, that will not compare the sub-drawers. We need recursion for that with the option -r
Code: [Select]
diff -qr drawer1 drawer2
That should be able to get you a list of all files that differ or are missing, comparing the provided drawers and their sub-drawers.

Thanks. :-) This will help me a lot.

Quote
Of course it is always possible to create some batch script as well.

I don't know how to do it.

Quote
In/At how many locations do you need to start searching for the diff/missing files ?

I don't understand what you do mean.



magorium

  • Legendary Member
  • *****
    • Posts: 632
    • Karma: +62/-0
  • Convicted non contributor
Reply #3 on: July 30, 2020, 02:33:41 AM
I need to backup two 40 gb hd. It's hard and boring doing it manually.
Pardon me, but wouldn't it not be better to use an actual backup program instead in that case ?

I seem to remember user Nigel Tromans made such a tool some years ago.


edit:
I have no experience with the tool myself. Found it here http://archives.aros-exec.org/index.php?function=showfile&file=utility/shell/backupcopy_v1.0.i386-aros.tar.gz

Quote
I don't know how to do it.
Well, atm me neither because....

Quote
I don't understand what you do mean.
I meant the "elsewhere" part in your sentence (see below).

Quote
... and if the same file is copied elsewhere and buried in a different directory
How many "elsewheres" are there ? How should all these elsewhere be determined ? Should it be an option given by user ? are these elsewhere's all standard locations ? Does this elsewhere contain any saints, doctors or other healthcare workers ?

And what should such a tool do in case it does detect the same file elsewhere ?

Mind that depending on your answer it could literally take days before such a tool is able to provide the correct answer, especially in the case elsewhere actually means anywhere for you.

Even backup programs do not offer such a feature other than keeping track of already backup-ed files inside a database that is searched before an actual backup takes place.
« Last Edit: July 30, 2020, 03:34:57 AM by magorium »



Yanosh

  • Member
  • ***
    • Posts: 106
    • Karma: +4/-0
Reply #4 on: August 01, 2020, 02:00:09 PM
I have no experience with the tool myself. Found it here http://archives.aros-exec.org/index.php?function=showfile&file=utility/shell/backupcopy_v1.0.i386-aros.tar.gz

I was not aware of its existence. And it's on Icaros too... :/ I'm a bit scared by programs like that. I can compare the source and the destination with "brik" (do you remember it? :) ), but this works well with a small ammount of files... or with a full backup. When it comes to update an old backup, things are more complicated because maybe I have the old version of a file that I want to keep safe while copying the new one with a new name. If the backup program copies an deletes files on his own, something I care can be deleted and I won't know this until i need that file again.

Quote
Quote
... and if the same file is copied elsewhere and buried in a different directory
How many "elsewheres" are there ? How should all these elsewhere be determined ? Should it be an option given by user ? are these elsewhere's all standard locations ? Does this elsewhere contain any saints, doctors or other healthcare workers ?

Does this mean that this word was used in a wrong way? If yes, I apologize for this. My english is pretty basic and sometimes I make mistakes. I'm really sorry for this. :(

Quote
And what should such a tool do in case it does detect the same file elsewhere ?

Maybe something like "A copy of the XYZ file was found in the ZYX folder. Do you want to copy, delete or move it?" would be nice. :)

Quote
Mind that depending on your answer it could literally take days before such a tool is able to provide the correct answer, especially in the case elsewhere actually means anywhere for you.

Checking files with brik takes long times, so this is not a problem for me. I can have my computer work on it all night if needed.



magorium

  • Legendary Member
  • *****
    • Posts: 632
    • Karma: +62/-0
  • Convicted non contributor
Reply #5 on: August 02, 2020, 06:42:11 AM
I'm a bit scared by programs like that.
I can understand that. It is always difficult to let your files be handled by some program. That is putting a lot of trust into some software that needs to handle your files with care especially if you are not familiar with it.

Quote
I can compare the source and the destination with "brik" (do you remember it? :) )
Yeah, I remembered :-)

Quote
Does this mean that this word was used in a wrong way? If yes, I apologize for this. My english is pretty basic and sometimes I make mistakes. I'm really sorry for this. :(
No, no, no. I am sorry. Please don't let me confuse you. If it is used the wrong way, then it was my doing.

I made a reference (as a bad joke) to some television show about a hospital that aired in the States in the (I believe) seventies/eighties which was named St. Elsewhere.

Quote
Maybe something like "A copy of the XYZ file was found in the ZYX folder. Do you want to copy, delete or move it?" would be nice. :)
Note that this is exactly the kind of thing that is able to cause trust related issues with regards to some piece of software.

Because when you write "Do you want to copy, delete or move it?" then which file should be copied, deleted or moved ? The one from the source location.

And in case it is the one from the destination location, where should it be copied to/from should that file be moved or deleted ?

These are all nitty gritty details that make it hard for a back-up program to incorporate, even if you think it is pretty easy to accomplish as a human.

We humans have much better understanding of these kind of things that we think are logical. But even as we speak to each other about it, it can cause confusion about what exactly you meant by using these exact words.

Quote
Checking files with brik takes long times, so this is not a problem for me. I can have my computer work on it all night if needed.
True, but if you take the interaction into account that was talked about in the paragraph above then this can become quite a burden as you would need to watch this process in order to be able to react on questions from such software.

These kind of things that you talked about are exactly the reason why it is not as easy to implement a good (in what a specific user consider to be good) back-up program.

Because a good program would have to take every user individual wished into account, and at the same time that make such program complicated to use (or at the least ambiguous when it comes to certain settings/preferences and/or questions).

Not saying it is impossible to realise, but just not as easy to implement as you perhaps might think.

In case it makes you feel better, I still use a system of backup that is only partially automated. Most things I still do by manually copying directories.

For situations where I need to keep things in sync, I use rsync. But that only works for me when I as a user do not interfere meaning there are most definitely multiple copies of the same file scattered around (simply because I make mistakes).


Yanosh

  • Member
  • ***
    • Posts: 106
    • Karma: +4/-0
Reply #6 on: August 04, 2020, 01:10:55 PM
I made a reference (as a bad joke) to some television show about a hospital that aired in the States in the (I believe) seventies/eighties which was named St. Elsewhere.

Ah, now I understand it. :D I think you are talking about this:

https://en.wikipedia.org/wiki/St._Elsewhere
https://it.wikipedia.org/wiki/A_cuore_aperto

I don't remember it, maybe I've never seen it. Here in Italy it was called "A cuore aperto".

Quote
And in case it is the one from the destination location, where should it be copied to/from should that file be moved or deleted ?

...

True, but if you take the interaction into account that was talked about in the paragraph above then this can become quite a burden as you would need to watch this process in order to be able to react on questions from such software.


I'd like to have a modified version of Brik, so it checks all the files and outputs on a file on ram with a list of duplicate files and the location. Then, I can check manually the duplicate files an choose the right action to perform later.

Quote
These kind of things that you talked about are exactly the reason why it is not as easy to implement a good (in what a specific user consider to be good) back-up program.

I'm happy with a simple program that checks on the whole source/destination for duplicated and missing files. :)

Quote
In case it makes you feel better, I still use a system of backup that is only partially automated. Most things I still do by manually copying directories.

I like to copy files manually too... :) but it's a pain when the source has updated and moved files.

Quote
For situations where I need to keep things in sync, I use rsync. But that only works for me when I as a user do not interfere meaning there are most definitely multiple copies of the same file scattered around (simply because I make mistakes).

Is this program available on AROS?



magorium

  • Legendary Member
  • *****
    • Posts: 632
    • Karma: +62/-0
  • Convicted non contributor
Reply #7 on: August 06, 2020, 08:27:50 AM
Ah, now I understand it. :D I think you are talking about this:
Yeah, that was the tv-show indeed  ;D

Quote
I don't remember it, maybe I've never seen it. Here in Italy it was called "A cuore aperto".
I do not think you missed out on anything important there. It was nice to watch, whilst eating diner  ::)

Point being that Elsewhere can mean many places/locations  :D

Again, I'm sorry for the mix-up/bad joke.

Quote
I'd like to have a modified version of Brik, so it checks all the files and outputs on a file on ram with a list of duplicate files and the location. Then, I can check manually the duplicate files an choose the right action to perform later.
Oh, wait... what ?

Could you help me refresh my memory ?

Brik did not actually copy files on its own, now did it ? It only calculated the checksums if I remember correctly and then generated a list of those checksums ?

I'm actually a bit unsure, what functionality you think is actually missing in brik to make it work the way you would like.

Could you perhaps explain that (again) for me in a way you would speak to a 8-year old (as I would like to circumvent mistakes in understanding) ?

Quote
I'm happy with a simple program that checks on the whole source/destination for duplicated and missing files. :)
Ok, ok. So indeed not actually copying/syncing files but only generating a list ?

Quote
Is this program available on AROS?
Not by default. Originally it is a *nix program, but it seems there is a version located on aminet for Amiga.

As far as I am able to tell there three issues with that version located at aminet:
1. I have no idea if it is compilable for AROS
2. I have no idea how well it works on AROS
3. The aminet version is fairly outdated.

It is a tool that actually copies (actually sync) files, based on a whole lot of options that the user is able to provide in order to help decide what the program should do when it encounters a certain situation.

But, if all you require is just a list of different/missing files, based on two locations, additionally searching for such missing file in either of those two locations then that should be do-able to realise with a simple program and/or script.

Unfortunately, atm I'm quite busy in a way that I am not really able to concentrate on programming (this covid crap is giving us headaches here) :-/ Are you in a hurry ?

In case you are in a hurry then you could perhaps try to have a look at some of the other back-up programs/tools that are present on aminet. Perhaps there is a title amongst them that might be able to help you out.

Also Chris Handley made some sorts of backup-tool with his version of E-language.


Yanosh

  • Member
  • ***
    • Posts: 106
    • Karma: +4/-0
Reply #8 on: August 08, 2020, 02:50:39 PM
Point being that Elsewhere can mean many places/locations  :D

Again, I'm sorry for the mix-up/bad joke.

Don't worry about it. :) It's something that happens often to me. I'm used to read some comics online and I miss a lot of references, so I understand half of the jokes made there. The other half is missed because I don't have the same background as the author of that comics.

Quote
Brik did not actually copy files on its own, now did it ? It only calculated the checksums if I remember correctly and then generated a list of those checksums ?

You are right. Brik doesn't copy files. :)

Quote
I'm actually a bit unsure, what functionality you think is actually missing in brik to make it work the way you would like.

Could you perhaps explain that (again) for me in a way you would speak to a 8-year old (as I would like to circumvent mistakes in understanding) ?

This is what i'd like to add to Brik:
1) A file on ram: with the missing files list.
2) A file on ram: with the bad checksum file list.
3) A file on ram: with the duplicate files found, the full path to them and files with different names, so I can decide later what to do with them.
4) Something that show me it's still working. If I check my hd I can see the hd led blinking, but if I check a usb stick without led, I can't see if it's still working or if it has crashed without warnings. I also had a usb stick that still blinked even if the program crashed due a filetransfer error. :(
5) An option to check only for duplicate files even with different names.

With >ram:"output.txt" I can only get the bad checksum list and all the other info is missing as the buffer of the shell is overflow very quickly.

Quote
Ok, ok. So indeed not actually copying/syncing files but only generating a list ?

I'm doing the backup manually. I only need to be sure that all the files are ok.

Quote
As far as I am able to tell there three issues with that version located at aminet:
1. I have no idea if it is compilable for AROS
2. I have no idea how well it works on AROS
3. The aminet version is fairly outdated.

Better doing the backup manually, so if something goes bad I can only blame myself. :)

Quote
But, if all you require is just a list of different/missing files, based on two locations, additionally searching for such missing file in either of those two locations then that should be do-able to realise with a simple program and/or script.

I think a coder can modify brik the way I need doing just a little of work. Sadly I'm not a coder. :(

Quote
Unfortunately, atm I'm quite busy in a way that I am not really able to concentrate on programming (this covid crap is giving us headaches here) :-/

Hope this will end soon. Too many people have already died because of it.

Quote
Are you in a hurry ?

Not anymore. :) I had some issues with my hd, so I had to synch the old backup. I did it the old way with brik.
« Last Edit: August 08, 2020, 02:55:56 PM by Yanosh »



magorium

  • Legendary Member
  • *****
    • Posts: 632
    • Karma: +62/-0
  • Convicted non contributor
Reply #9 on: August 14, 2020, 01:27:28 AM
@yanosh,


Thank you for the answers. I see what I can do the moment I got some time to spare when having a clear head (and this darn heatwave has come to an end here).


Yanosh

  • Member
  • ***
    • Posts: 106
    • Karma: +4/-0
Reply #10 on: August 17, 2020, 12:17:45 PM
Thank you for your help. I'm not in a hurry, not now at least. :) I can wait.