Bug 113170 - Better estimation of download times
Summary: Better estimation of download times
Status: RESOLVED FIXED
Alias: None
Product: ktorrent
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR wishlist
Target Milestone: ---
Assignee: Ivan Vasic
URL:
Keywords:
: 134994 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-09-23 23:15 UTC by Frank Osterfeld
Modified: 2006-10-03 16:21 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Sample moving-average program (4.68 KB, text/plain)
2005-09-28 04:19 UTC, Thiago Macieira
Details
Custom source for testing purposes (904.54 KB, application/x-gzip)
2005-10-04 17:20 UTC, Ivan Vasic
Details
Patch file that will enable data logging (3.12 KB, patch)
2005-10-06 02:14 UTC, Ivan Vasic
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Osterfeld 2005-09-23 23:15:01 UTC
Something that always annoys me with most if not all P2P clients and apps that 
manage downloads in general, is the stupidity of the algorithm used for 
download time estimation. They all seem to use "bytes remaining / current 
speed" which is totally useless unless you got a nearly constant download 
speed (which is never the case for me when using p2p). Mostly, download speed 
goes up and down heavily, so using the current speed for estimation is of no 
use. I wonder if we could use better algorithms for estimation. A simple one 
should be  
 
(bytes downloaded/time elapsed)*(bytes left) 
 
This needs a bit more work though as it needs the app to record the time 
elapsed (as you can't just use the age of the file, because of shutdowns/not 
running ktorrent in between, connection could be down...), maybe 
connecting/disconnecting from the tracker could start/stop a time counter or 
something.  
One could think of even more complex algos, that take trends into account 
(e.g. the dl speed could increase over time, when more peers get available) 
 
When I find the time I will log some torrent downloads and analyze them, and 
have a look which estimation algorithm could be used to minimize the 
prediction error...
Comment 1 Joris Guisson 2005-09-24 11:59:45 UTC
You're proposing average speed * bytes left, we allready keep track of the average speed so this is possible. But I don't think this is a good algorithm :
- Say the first part of a download runs at 10 KB/s on average
- Suddenly we connect to a big seeder who starts sending us stuff at 100 KB/s
- The average speed will only increase slowly because of the long 10 KB/s part, while at the same time we're getting stuff much faster, so the prediction isn't accurate.
Comment 2 Frank Osterfeld 2005-09-27 18:45:03 UTC
Well, I don't say that is a particularily good algorithm. But it is better than using the current speed. It depends how long the 10KB/s phase is, for me it's mostly slow in the beginning, and then goes up and down something like a sine curve (well, at least up and down). A really good algorithm would take the slow start into account and detect trends. I would like to play with different algos (in the next weeks when I find time), for stats logging it would be useful to have "real data" from different people... 
Writing something like (peers, peers connected, bytes downloaded) every x seconds to a text file should do I guess.

Or am I thinking too complicated and there are already clients with good estimation methods ktorrent could copy? ;-)
Comment 3 Joris Guisson 2005-09-27 19:58:18 UTC
The average speed in the last few seconds, might be a better algorithm. It will be less jumpy then the current speed, and should adapt to a trend change pretty quickly (depends on the size of the chosen interval).
Comment 4 Thiago Macieira 2005-09-28 04:19:04 UTC
Created attachment 12742 [details]
Sample moving-average program

I suggest a moving-average algorithm, for instance, of 10 minutes. If you
sample the download speed every 30 seconds (using a QTimer and storing the
current speed ktorrent already calculates), you need 20 samples only to make
this moving-average algorithm in a circular buffer.

I am attaching a sample program I made several years ago that does just that.
It uses a 8192-point array to calculate moving averages of 10, 30, 60, 180 and
300 seconds. It also calculates the instantaneous (250 ms) and overall transfer
speed.

Note the program stores file sizes, not the speed, so the math is much simpler.
Also note the program will not work for downloads running longer than 12.5 days
(the "current" variable will overflow).
Comment 5 Joris Guisson 2005-09-28 20:06:02 UTC
After some googling I think I understand this moving-average thing :

You have got a bunch of samples (speed every X seconds) :
s0 .. s19 

And then take the average of these samples to calculate the time_left.

But what role does the current speed play, do we take this into account when we calculate the average of the samples ?

Or am I totally not getting it ? 
Comment 6 Frank Osterfeld 2005-09-28 20:36:13 UTC
I played a bit in oocalc with some simple (artificial) examples. As they are not real data, and I only tried 2-3 simple downloads, I wouldn't give too much on these observations though.

Current Speed Algo (let's call it CSA) sucks, the prediction error goes up to astronomical heights(well, no wonder, the error approaches infinity when the speed goes down to 0.0). It performs relatively well at the end of the download though.
(My current speed isn't a "real" current speed though, it's the average speed in ten seconds)

Global Average Speed Algorithm (=: GASA, my first simple proposal) seems to be not too bad, but it gets worse at the end of the download: The relative error (abs(actual_time_left-ETA)/time_left_actual) increases, so it can happen it predicts 4 times the time actually needed. Using the current speed as done now seems to perform better here.

I tested Thiago's approach using the average over the last 6 speed values (6*10 secs = a minute in my example), and it was somewhere in between: The peaks were higher than with GASA, but it was better in the end phase. I guess it makes more sense to use a window of minutes instead of seconds.

I also hacked up a little logger, logging state changes (activated, deactivated, finished) and, when active, every 10 seconds (bytes downloaded, bytes left, seeders total/connected, leechers total/connected) to a per-torrent logfile. So I will continue playing with the algos if I have some nice real-world examples.
Comment 7 Thiago Macieira 2005-09-29 03:23:18 UTC
> But what role does the current speed play, do we take this into account when
> we calculate the average of the samples ?

Well, the current speed is the latest sample.

Just remember two important details:
1) the number of samples is the delay in iterations it takes for the algo to work. So, if you're taking a 10-minute average, the full list of samples won't be filled until 10 minutes have elapsed, so you have to account that into the start sequence (i.e., use only the samples that were read)

2) no one wants to see the speed update once every 30 seconds. This means you must take one sample every second or more often (mine did 4 per second), or find a way to update the speed every second. One trick could be to average the 10-minute moving average with the current speed.
Comment 8 Frank Osterfeld 2005-10-02 18:34:24 UTC
Here some charts, generated from the stats of a 175MB torrent. There were about 800 seeders and 400 leechers available, the download took 3h40m though  (btw, azureus is still much faster, for whatever reason).

The speed (calculated from samples taken every 10 seconds):

http://www-user.rhrk.uni-kl.de/~f_osterf/temp/estimation1-speed.png

The relative error (abs(estimation-time_left)/time_left) of three algorithms:

http://www-user.rhrk.uni-kl.de/~f_osterf/temp/estimation1.png

CSA: uses the average speed between the last two samples, that is 10 seconds 
GASA: the average speed from the beginning on, calculated from all samples
WIN5MIN: uses the average speed of the last 5 minutes, calculated from the current sample and the one which was taken 5 minutes back.

Ignore the first 15 pixels of WIN5MIN with relative error of 15, this is wrong, I just was too lazy to adjust the formula for the first 5 minutes.

In this example, GASA is the best in the beginning of the download, but  WIN5MIN is better in the last 50%. GASA just sucks here. 
CSA is inacceptable all the time, except at the very end.

It would be interesting to to find the optimal window size for the moving average algo. Well, I certainly won't continue evaluation with openoffice (1.1.3), as the handling drives me crazy and it gets damn slow if you have let's say 1300 samples rendered. ;-)
Maybe some ruby scripts and gnuplot or make it more convenient.
Comment 9 Ivan Vasic 2005-10-02 19:11:27 UTC
Maybe combination of WIN5MIN and moving average will do the trick? 
Let's say I take samples every second and calculate average speed after X samples (seconds). Until this point I can use CSA just to avoid 0 for time_left. After X samples I have the first average speed which I can use to calculate time_left. After this point I calculate average speed every X samples by (last_average+current_average)/2 and this value becomes last_average. The next calculation will do the same thing etc... We can test X value for optimal results (or even make it relative to torrent filesize?). 
I'm worried about some download stages, though:
- 0 dl speeds in some download stages
- End stages

Note that I may have misunderstood moving average algorithm...
Comment 10 Frank Osterfeld 2005-10-02 19:33:49 UTC
It could also use different algorithms for different phases, like 

return (bytes_downloaded/bytes_total < 0.5) ? gasa.estimate() : win5min.estimate();

For the first 50% use the total average, after that moving average (or whatever turns out to be a good choice).

Unless you do this 50 times a second, I don't think a slightly more complex algorithm would do any harm to performance. (And I don't think the rate has to be updated all the time. E.g., if your estimation is 01h23m12s, the seconds are totally random and useless noise anyway and could just be omitted. (something the ms windows copy dialog gets right IMHO))

And estimation code could go to a special Estimator class, so you don't have to mess with it in the rest of the code.
Comment 11 Ivan Vasic 2005-10-03 20:13:50 UTC
Frank, I'm going to make a 'custom' build of KTorrent with Estimator class that will have at least CSA, GASA and WIN5MIN algorithms (maybe I'll add one more). KT will log the values from all algorithms and we can easily extract this values from KT log file and import comma-separated values into some plot application. I'll try doing this myself but since I have veeery slow connection maybe it would be nice that you test it and attach results here, if it's not a problem?
Comment 12 Frank Osterfeld 2005-10-03 22:43:46 UTC
Is it necessary to log the algorithm results? While downloading, we just need to get the data to feed the algorithms with later. The actual algorithms can work on that logged data. So we can collect a bunch of logs and try any algorithms on them. When we just log the data as we would pass them to the estimator, there is absolutely no difference between "real" estimation and simulation.
About your slow connection, I think it would be interesting to see whether that makes a difference for estimation. The more variation in the test cases the better.
Comment 13 Ivan Vasic 2005-10-04 01:10:52 UTC
You're right. I was thinking of making things easier but that was not really necessary.
KT update interval (speed and estimated time left) is 1 second. I don't think we're going to change this (or use another timer) because of the performance.
I can make KT log speed (or bytes_downloaded) every second and we can use this data for testing.
Comment 14 Ivan Vasic 2005-10-04 01:30:12 UTC
Could we, please, continue this discussion on KT forum: http://ktorrent.pwsp.net/forum/viewtopic.php?p=116#116 ?
Comment 15 Ivan Vasic 2005-10-04 17:20:19 UTC
Created attachment 12844 [details]
Custom source for testing purposes

Here's the source as explained here:
http://ktorrent.pwsp.net/forum/viewtopic.php?p=119#119
Comment 16 Ivan Vasic 2005-10-06 02:14:11 UTC
Created attachment 12878 [details]
Patch file that will enable data logging
Comment 17 Frank Osterfeld 2005-10-06 09:47:30 UTC
I created a page where I'll put all scripts, logs, analysis results related to this:

http://www-user.rhrk.uni-kl.de/~f_osterf/projekte/estimation/

First content is a script that reads the log file and splits it into logs for every torrent.
Comment 18 Ivan Vasic 2006-10-02 14:43:31 UTC
*** Bug 134994 has been marked as a duplicate of this bug. ***
Comment 19 Frank Osterfeld 2006-10-02 17:07:51 UTC
> Or the opposite, current speed is much faster than average, it will say 2
> hours remaining, then finish in under 10 minuter.

It's probably more likely that the speed will go down again and oscillate around the average.

> The calculation should be remaining size of download divided by current d/l
> speed. 

...which calculates a completely wrong ETA if the download rate is unsteady. Imagine a download where you have 0, 0, 0, 20, 20, 20, 0, 0, 0, 20, 20, 20, 0... KB/s. So the average is 10 KB/s. The ETA will jump between "infinite" and the half of the real time needed. If the rate is really flaky, the ETA will change constantly and be of no use at all. Average will work much better in this case.
I took some example logs a year back and used some different estimation algorithms (current speed, average speed, sliding average (using the average from the last n minutes). IIRC using average speed was better than current speed, sliding average was best for certain values of n.
I am not sure what ktorrent uses these days though. It seems to be average or some kind of sliding average, at least the estimation is more stable than it was a year back.


 
Comment 20 Ivan Vasic 2006-10-02 17:14:05 UTC
I somehow forgot about this, but I'm going to fix it these days.

KTorrent currently uses combination of global average speed algorithm and, as we called it, CSA algorithm.

I created an estimator class and so far implemented CSA, GASA and WINX algorithms. It will be capable to use different algorithm for different download phase and as soon as I finish moving average algorithm I'll check out those logs and tests and I'll see what combination will suite best.
Comment 21 Ivan Vasic 2006-10-03 16:21:15 UTC
SVN commit 591970 by ivasic:

Added ETA calculator class.

BUG:113170

 M  +8 -18     apps/ktorrent/ktorrentviewitem.cpp  
 M  +7 -0      libktorrent/interfaces/torrentinterface.h  
 M  +20 -20    libktorrent/torrent/Makefile.am  
 A             libktorrent/torrent/timeestimator.cpp   [License: GPL (v2+)]
 A             libktorrent/torrent/timeestimator.h   [License: GPL (v2+)]
 M  +9 -0      libktorrent/torrent/torrentcontrol.cpp  
 M  +9 -0      libktorrent/torrent/torrentcontrol.h  


--- trunk/extragear/network/ktorrent/apps/ktorrent/ktorrentviewitem.cpp #591969:591970
@@ -26,7 +26,6 @@
 #include <interfaces/functions.h>
 #include "ktorrentviewitem.h"
 
-
 using namespace bt;
 using namespace kt;
 /*
@@ -128,27 +127,18 @@
 	}
 	else if (s.running) 
 	{
-		float bytes_downloaded = (float)s.bytes_downloaded;
-		if( bytes_downloaded < 1 ) //if we just started download use old algorithm
+		Uint32 secs = tc->getETA();
+		if(secs == -1)
 		{
-			if (s.download_rate == 0)
-			{
-				setText(7,i18n("infinity"));
-				eta = -2;
-			}
-			else
-			{
-				Uint32 secs = (int)floor( (float)s.bytes_left_to_download / (float)s.download_rate);
-				eta = secs;
-				setText(7,DurationToString(secs));
-			}
+			setText(7,i18n("infinity"));
+			eta = -2;
 		}
-		else 
+		else
 		{
-			double avg_speed = (double)bytes_downloaded / (double)tc->getRunningTimeDL();
-			eta = (Int64)floor(s.bytes_left_to_download / avg_speed);
-			setText(7,DurationToString((int)floor(s.bytes_left_to_download / avg_speed)));
+			eta = secs;
+			setText(7,DurationToString(secs));
 		}
+			
 	}
 	else
 	{
--- trunk/extragear/network/ktorrent/libktorrent/interfaces/torrentinterface.h #591969:591970
@@ -279,7 +279,14 @@
 		///Is manual announce allowed?
 		virtual bool announceAllowed() = 0;
 		
+		
 		/**
+		 * Returns estimated time left for finishing download. Returned value is in seconds.
+		 * Uses TimeEstimator class to calculate this value.
+		 */
+		virtual Uint32 getETA() = 0;
+		
+		/**
 		 * Verify the correctness of all data.
 		 * @param lst The listener
 		 * @param auto_import Wether or not this is an initial import
--- trunk/extragear/network/ktorrent/libktorrent/torrent/Makefile.am #591969:591970
@@ -5,27 +5,27 @@
 noinst_LTLIBRARIES = libtorrent.la
 libtorrent_la_LDFLAGS = $(all_libraries)
 noinst_HEADERS = bencoder.h value.h bdecoder.h bnode.h torrent.h chunkmanager.h \
-								chunk.h peer.h globals.h peermanager.h tracker.h downloader.h authenticate.h \
-								uploader.h torrentcontrol.h choker.h chunkdownload.h speedestimater.h \
-								peerdownloader.h request.h piece.h peerid.h announcelist.h peeruploader.h uploadcap.h \
-								packetreader.h packetwriter.h packet.h httptracker.h udptracker.h cache.h \
-								singlefilecache.h multifilecache.h torrentcreator.h downloadcap.h torrentfile.h \
-								chunkselector.h server.h serverauthenticate.h authenticatebase.h udptrackersocket.h \
-								upspeedestimater.h ipblocklist.h queuemanager.h chunkcounter.h oldchokealgorithm.h \
-								newchokealgorithm.h statsfile.h preallocationthread.h cap.h advancedchokealgorithm.h \
-								cachefile.h dndfile.h authenticationmonitor.h peersourcemanager.h
+									chunk.h peer.h globals.h peermanager.h tracker.h downloader.h authenticate.h \
+									uploader.h torrentcontrol.h choker.h chunkdownload.h speedestimater.h \
+									peerdownloader.h request.h piece.h peerid.h announcelist.h peeruploader.h uploadcap.h \
+									packetreader.h packetwriter.h packet.h httptracker.h udptracker.h cache.h \
+									singlefilecache.h multifilecache.h torrentcreator.h downloadcap.h torrentfile.h \
+									chunkselector.h server.h serverauthenticate.h authenticatebase.h udptrackersocket.h \
+									upspeedestimater.h ipblocklist.h queuemanager.h chunkcounter.h oldchokealgorithm.h \
+									newchokealgorithm.h statsfile.h preallocationthread.h cap.h advancedchokealgorithm.h \
+									cachefile.h dndfile.h authenticationmonitor.h peersourcemanager.h timeestimator.h
 					
 libtorrent_la_SOURCES = bencoder.cpp value.cpp bdecoder.cpp bnode.cpp \
-		torrent.cpp chunkmanager.cpp chunk.cpp peer.cpp globals.cpp peermanager.cpp \
-		tracker.cpp downloader.cpp authenticate.cpp uploader.cpp torrentcontrol.cpp \
-		choker.cpp chunkdownload.cpp speedestimater.cpp peerdownloader.cpp request.cpp \
-		piece.cpp peerid.cpp announcelist.cpp peeruploader.cpp uploadcap.cpp \
-		packetreader.cpp packetwriter.cpp packet.cpp httptracker.cpp udptracker.cpp cache.cpp \
-		singlefilecache.cpp multifilecache.cpp torrentcreator.cpp downloadcap.cpp torrentfile.cpp \
-		chunkselector.cpp server.cpp serverauthenticate.cpp authenticatebase.cpp \
-		udptrackersocket.cpp upspeedestimater.cpp ipblocklist.cpp queuemanager.cpp chunkcounter.cpp \
-		newchokealgorithm.cpp statsfile.cpp preallocationthread.cpp cap.cpp \
-		advancedchokealgorithm.cpp cachefile.cpp dndfile.cpp authenticationmonitor.cpp \
-	peersourcemanager.cpp
+			torrent.cpp chunkmanager.cpp chunk.cpp peer.cpp globals.cpp peermanager.cpp \
+			tracker.cpp downloader.cpp authenticate.cpp uploader.cpp torrentcontrol.cpp \
+			choker.cpp chunkdownload.cpp speedestimater.cpp peerdownloader.cpp request.cpp \
+			piece.cpp peerid.cpp announcelist.cpp peeruploader.cpp uploadcap.cpp \
+			packetreader.cpp packetwriter.cpp packet.cpp httptracker.cpp udptracker.cpp cache.cpp \
+			singlefilecache.cpp multifilecache.cpp torrentcreator.cpp downloadcap.cpp torrentfile.cpp \
+			chunkselector.cpp server.cpp serverauthenticate.cpp authenticatebase.cpp \
+			udptrackersocket.cpp upspeedestimater.cpp ipblocklist.cpp queuemanager.cpp chunkcounter.cpp \
+			newchokealgorithm.cpp statsfile.cpp preallocationthread.cpp cap.cpp \
+			advancedchokealgorithm.cpp cachefile.cpp dndfile.cpp authenticationmonitor.cpp \
+		peersourcemanager.cpp timeestimator.cpp
 
 
--- trunk/extragear/network/ktorrent/libktorrent/torrent/torrentcontrol.cpp #591969:591970
@@ -61,6 +61,7 @@
 #include "statsfile.h"
 #include "announcelist.h"
 #include "preallocationthread.h"
+#include "timeestimator.h"
 
 using namespace kt;
 
@@ -98,6 +99,8 @@
 		custom_output_name = false;
 		updateStats();
 		prealoc_thread = 0;
+		
+		m_eta = new TimeEstimator(this);
 	}
 
 
@@ -117,6 +120,7 @@
 		delete pman;
 		delete psman;
 		delete tor;
+		delete m_eta;
 	}
 
 	void TorrentControl::update()
@@ -1298,6 +1302,11 @@
 		// emit signal to show a systray message
 		corruptedDataFound(this);
 	}
+	
+	Uint32 TorrentControl::getETA()
+	{
+		return m_eta->estimate();
+	}
 }
 
 #include "torrentcontrol.moc"
--- trunk/extragear/network/ktorrent/libktorrent/torrent/torrentcontrol.h #591969:591970
@@ -32,6 +32,7 @@
 #include <interfaces/trackerslist.h>
 
 class QStringList;
+class QString;
 
 
 namespace bt
@@ -47,6 +48,7 @@
 	class BitSet;
 	class QueueManager;
 	class PreallocationThread;
+	class TimeEstimator;
 	
 	/**
 	 * @author Joris Guisson
@@ -216,6 +218,12 @@
 		 */
 		void resetTrackerStats();
 		
+		/**
+		 * Returns estimated time left for finishing download. Returned value is in seconds.
+		 * Uses TimeEstimator class to calculate this value.
+		 */
+		Uint32 getETA();
+		
 	public slots:
 		/**
 		 * Update the object, should be called periodically.
@@ -285,6 +293,7 @@
 		Downloader* down;
 		Uploader* up;
 		Choker* choke;
+		TimeEstimator* m_eta;
 		
 		Timer choker_update_timer,stats_save_timer,stalled_timer;