Age | Commit message (Collapse) | Author |
|
This change makes it mandatory for programs to call segflush() on
code that is not in the text segment if they want to execute it.
As a side effect, this means that everything but the text segment
will be non-executable by default, even without the SG_NOEXEC
attribute. Segments with the SG_NOEXEC attribute never become
executable, even when segflush() is called on them.
|
|
do not touch s->map on SG_PHYSICAL type segments as they do
not have a pte map (s->mapsize == 0 && s->map == nil).
also remove the SG_PHYSICAL switch in freepte(), this is never
reached.
|
|
fault() now has an additional pc argument that is
used to detect fault on a non-executable segment.
that is, we check on read fault if the segment
has the SG_NOEXEC attribute and the program counter
is within faulting page.
|
|
add PTECACHED bits
a portable SG_NOEXEC segment attribute was added to allow
non-executable (physical) segments. which will set the
PTENOEXEC bits for putmmu().
in the future, this can be used to make non-executable
stack / bss segments.
the SG_DEVICE attribute was added to distinguish between
mmio regions and uncached memory. only matterns on arm64.
on arm, theres the issue that PTEUNCACHED would have
no bits set when using the hardware bit definitions.
this is the reason bcm, kw, teg2 and omap kernels use
arteficial PTE constants. on zynq, the XN bit was used
as a hack to give PTEUNCACHED a non-zero value and when
the bit is clear then cache attributes where added to
the pte.
to fix this, PTECACHED constant was added.
the portable mmu code in fault.c will now explicitely set
PTECACHED bits for cached memory and PTEUNCACHED for
uncached memory. that way the hardware bit definitions
can be used everywhere.
|
|
this driver makes regions of physical memory accessible as a disk.
to use it, ramdiskinit() has to be called before confinit(), so
that conf.mem[] banks can be reserved. currently, only pc and pc64
kernel use it, but otherwise the implementation is portable.
ramdisks are not zeroed when allocated, so that the contents are
preserved across warm reboots.
to not waste memory, physical segments do not allocate Page structures
or populate the segment pte's anymore. theres also a new SG_CHACHED
attribute.
|
|
|
|
the problem is that segio doesnt check segment attributes
and it can't really in case of SG_FAULT which can be
inherited from pseg and toggle at any time.
so instead of returning -1 from fault into the fault$cputype
handler which then panics when fault happend kernel mode,
we jump into segio's waserror() block just like in the
demand load i/o error case (faulterror()).
|
|
access to the axi segment hangs the machine when the fpga
is not programmed yet. to prevent access, we introduce a
new SG_FAULT flag, that when set on the Segment.type or
Physseg.attr, causes the fault handler to immidiately
return with an error (as if the segment would not be mapped).
during programming, we temporarily set the SG_FAULT flag
on the axi physseg, flush all processes tlb's that have
the segment mapped and when programming is done, we clear
the flag again.
|
|
instead of checking addr+len >= addr, check len >= -addr so
that addr == 0 is never valid for len > 0 even if we decide
to have memory at the zero page so theres never any chance
user can pass in "nil" pointers.
put up some signs where we fall thru the switch cases in
fixfault()
|
|
|
|
|
|
the intend of posting a note to the faulting process is to
interrupt the syscall to give the note handler a chance
to handle it. kernel processes however, have no note handlers
and all the postnote() does is set up->notepending which will
make the next attempt to sleep raise an Eintr[] error. this
is harmless, but usually not what we want.
|
|
devproc's procctlmemio() did not handle physical segment
types correctly, as it assumed it can just kmap() the page
in question and write to it. physical segments however
need to be mapped uncached but kmap() will always map
cached as it assumes normal memory. on some machines with
aliasing memory with different cache attributes
leads to undefined behaviour!
we borrow the code from devsegment and provide a generic
segio() function to read and write user segments which
handles all the cases without using kmap by just spawning
a kproc that attaches the segment that needs to be read
from or written to. fault() will setup the right mmu
attributes for us. it will also properly flush pages for
segments that maintain instruction cache when written.
however, tlb's have to be flushed separately.
segio() is used for devsegment and devproc now, which
also allows for simplification of fixfault() as there is no
special error handling case anymore as fixfault() is now
called from faulting process *only*.
reads from /proc/$pid/mem can now span multiple pages.
|
|
kernel
|
|
fixed segments are continuous in physical memory but
allocated in user pages. unlike shared segments, they
are not allocated on demand but the pages are allocated
on creation time (devsegment). fixed segments are
never swapped out, segfreed or resized and can only be
destroyed as a whole.
the physical base address can be discovered by userspace
reading the ctl file in devsegment.
|
|
mcountseg(), mfreeseg():
use Pte.first/last pointers when possible and avoid constructs
like s->map[i]->pages[j].
freepte():
do not zero entries in freepte(), the segment is going away and
here is no point in zeroing page pointers. hoist common code at
the top avoiding duplication.
segpage(), fixfault():
avoid load after store for Pte** pointer.
fixfault():
return -1 in default case to avoid the "used but not set" warning
for mmuphys and get rid of the useless initialization.
syssegflush():
due to len being unsigned, the pe = PGROUND(pe) can make "chunk"
bigger than len causing a overflow. rewrite the function and deal
with page alignment and errors at the beginning.
syssegflush(), segpage(), fixfault(), putseg(), relocateseg(),
mcountseg(), mfreeseg():
keep naming consistent.
|
|
ignore physical segments in mcountseg() and mfreeseg(). physical
segments are not backed by user pages, and doing putpage() on
physical segment pages in mfreeseg() is an error.
do now allow physical segemnts to be resized. the segment size
is only checked in segattach() to be within the physical segment!
ignore physical segments in portcountpagerefs() as pagenumber()
does not work on the malloced page structures of a physical segment.
get rid of Physseg.pgalloc() and Physseg.pgfree() indirection as
this was never used and if theres a need to do more efficient
allocation, it should be done in a portable way.
|
|
there are no kernels currently that do page coloring,
so the only use of cachectl[] is flushing the icache
(on arm and ppc).
on pc64, cachectl consumes 32 bytes in each page resulting
in over 200 megabytes of overhead for 32gb of ram with 4K
pages.
this change removes cachectl[] and adds txtflush ulong
that is set to ~0 by pio() to instruct putmmu() to flush
the icache.
|
|
the purpose of checkpages() is to verify consitency of the hardware mmu state,
not to notify on the console that a program faulted. a program could also
continue after handling the note. (this seems to be the case in go programs)
|
|
procctl() is always called with up and it would not
work correctly if passed a different process, so
remove the Proc* argument and use up directly.
|
|
dont kill the calling process when demand load fails if fixfault()
is called from devproc. this happens when you delete the binary
of a running process and try to debug the process accessing uncached
pages thru /proc/$pid/mem file.
fixes to procctlmemio():
- fix missed unlock as txt2data() can error
- make sure the segment isnt freed by taking a reference (under p->seglock)
- access the page with segment locked (see comment)
- get rid of the segment stealer lock
other stuff:
- move txt2data() and data2txt() to segment.c
- add procpagecount() function
- make return type mcounseg() to ulong
|
|
Lock
make the Page stucture less than half its original size by getting rid of
the Lock and the lru.
The Lock was required to coordinate the unchaining of pages that where
both cached and on the lru freelist.
now pages have a single next pointer that is used for palloc.head
freelist xor for page cache hash chains in Image.pghash[].
cached pages are not on the freelist anymore, but will be reclaimed
from images by the pager when the freelist runs out of pages.
each Image has its own 512 hash chains for cached page lookup. That is
2MB worth of pages and there should be no collisions for most text images.
page reclaiming can be done without holding palloc.lock as the Image is
the owner of the page hash chains protected by the Image's lock.
reclaiming Image structures can be done quickly by only reclaiming pages from
inactive images, that is images which are not currently in use by segments.
the Ref structure has no Lock anymore. Only a single long that is atomically
incremented or decremnted using cmpswap().
there are various other changes as a consequence code. and lots of pikeshedding,
sorry.
|
|
lock(), use uintptr intstead of long for pc values
change Proc.nlocks from Ref to int and just use normal increment and decrelemt
as done in erik quanstros 9atom.
It is not clear why we used atomic increment in the fist place as even if we
get preempted by interrupt and scheduled before we write back the incremented
value, it shouldnt be a problem and we'll just continue where we left off as
our process is the only one that can write to it.
Yoann Padioleau found that the Mach pointer Lock.m wasnt maintained
consistently for lock() vs canlock() and ilock(). Fixed.
Use uintptr instead of ulong for maxlockpc, maxilockpc and ilockpc debug variables.
|
|
on amd64, the text segment is aligned and padded to
2MB, but segment granularity is 4K which can give
us page faults that are beyond the highest file
offset. this is perfectly valid, but was not handled
correctly in pio().
|
|
simplifying paging code by getting rid of duppage(). instead,
fixfault() now always makes a copy of the shared/cached page
and leaves the cache alone. newpage() uncaches pages as
neccesary.
thanks charles forsyth for the suggestion.
from http://9fans.net/archive/2014/03/26:
> It isn't needed at all. When a cached page is written, it's trying hard to
> replace the page in the cache by a new copy,
> to return the previously cached page. Instead, I copy the cached page and
> return the copy, which is what it already
> does in another instance. ...
|
|
access to user memory can pagefault and newfd2() holds
fgrp spinlock while writing to it. make temporary copy
on the stack in syspipe().
|
|
|
|
this change is in preparation for amd64. the systab calling
convention was also changed to return uintptr (as segattach
returns a pointer) and the arguments are now passed as
va_list which handles amd64 arguments properly (all arguments
are passed in 64bit quantities on the stack, tho the upper
part will not be initialized when the element is smaller
than 8 bytes).
this is partial. xalloc needs to be converted in the future.
|
|
pprint() might block or even (maliciously) call into
devproc write which will corrupt the qlock chain on attempt
to qlock up->debug again.
|
|
the lock order of page.Lock -> palloc.hashlock was
violated in cachedel() which is called from the
pager. change the code to do it in the right oder
to prevent deadlock.
change lookpage to retry on false hit. i assume that
a false hit means:
a) we'r low on memory -> cached page got uncached/reused
b) duppage() got called on the page, meaning theres another
cached copy in the image now.
paging in is expensive compared to the hashtable lookup, so
i think retrying is better.
cleanup fixfault, adding comments.
|
|
|
|
|
|
attachimage() instead of temporarily reseting in pio().
|
|
|
|
|
|
|
|
|